Quantecon Python Intro
Quantecon Python Intro
I Introduction 3
1 About These Lectures 5
1.1 About . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
II Economic Data 7
2 Economic Growth Evidence 9
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Setting up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 GDP plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 The industrialized world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Constructing a plot similar to Tooze’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Regional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Business Cycles 25
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 GDP growth rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Unemployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Leading indicators and correlated factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
i
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
ii
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.2 Structure of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
13.3 Representing key equations with linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
13.4 Harvesting returns from our matrix formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
13.5 Forecast errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
13.6 Technical condition for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
iii
VI Nonlinear Dynamics 313
20 The Solow-Swan Growth Model 315
20.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
20.2 A graphical perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
20.3 Growth in continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
20.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
iv
26.3 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
26.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
33 Networks 499
33.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
33.2 Economic and financial networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
33.3 An introduction to graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
v
33.4 Weighted graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
33.5 Adjacency matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
33.6 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
33.7 Network centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
33.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
33.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
XI Estimation 557
36 Simple Linear Regression Model 559
36.1 How does error change with respect to 𝛼 and 𝛽 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
36.2 Calculating optimal values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
39 References 595
Bibliography 599
vi
Index 605
vii
viii
A First Course in Quantitative Economics with Python
CONTENTS 1
A First Course in Quantitative Economics with Python
2 CONTENTS
Part I
Introduction
3
CHAPTER
ONE
1.1 About
This lecture series introduces quantitative economics using elementary mathematics and statistics plus computer code
written in Python.
The lectures emphasize simulation and visualization through code as a way to convey ideas, rather than focusing on
mathematical details.
Although the presentation is quite novel, the ideas are rather foundational.
We emphasize the deep and fundamental importance of economic theory, as well as the value of analyzing data and
understanding stylized facts.
The lectures can be used for university courses, self-study, reading groups or workshops.
Researchers and policy professionals might also find some parts of the series valuable for their work.
We hope the lectures will be of interest to students of economics who want to learn both economics and computing, as
well as students from fields such as computer science and engineering who are curious about economics.
1.2 Level
5
A First Course in Quantitative Economics with Python
1.3 Credits
In building this lecture series, we had invaluable assistance from research assistants at QuantEcon, as well as our Quan-
tEcon colleagues. Without their help this series would not have been possible.
In particular, we sincerely thank and give credit to
• Aakash Gupta
• Shu Hu
• Jiacheng Li
• Smit Lunagariya
• Matthew McKay
• Maanasee Sharma
• Humphrey Yang
We also thank Noritaka Kudoh for encouraging us to start this project and providing thoughtful suggestions.
Economic Data
7
CHAPTER
TWO
2.1 Overview
In this lecture we use Python, Pandas, and Matplotlib to download, organize, and visualize historical data on GDP growth.
In addition to learning how to deploy these tools more generally, we’ll use them to describe facts about economic growth
experiences across many countries over several centuries.
Such “growth facts” are interesting for a variety of reasons.
Explaining growth facts is a principal purpose of both “development economics” and “economic history”.
And growth facts are important inputs into historians’ studies of geopolitical forces and dynamics.
Thus, Adam Tooze’s account of the geopolitical precedents and antecedents of World War I begins by describing how
Gross National Products of European Great Powers had evolved during the 70 years preceding 1914 (see chapter 1 of
[Too14]).
Using the very same data that Tooze used to construct his figure, here is our version of his chapter 1 figure.
(This is just a copy of our figure Fig. 2.6. We desribe how we constructed it later in this lecture.)
Chapter 1 of [Too14] used his graph to show how US GDP started the 19th century way behind the GDP of the British
Empire.
9
A First Course in Quantitative Economics with Python
By the end of the nineteenth century, US GDP had caught up with GDP of the British Empire, and how during the first
half of the 20th century, US GDP surpassed that of the British Empire.
For Adam Tooze, that fact was a key geopolitical underpinning for the “American century”.
Looking at this graph and how it set the geopolitical stage for “the American (20th) century” naturally tempts one to want
a counterpart to his graph for 2014 or later.
(An impatient reader seeking a hint at the answer might now want to jump ahead and look at figure Fig. 2.7.)
As we’ll see, reasoning by analogy, this graph perhaps set the stage for an “XXX (21st) century”, where you are free to
fill in your guess for country XXX.
As we gather data to construct those two graphs, we’ll also study growth experiences for a number of countries for time
horizons extending as far back as possible.
These graphs will portray how the “Industrial Revolution” began in Britain in the late 18th century, then migrated to one
country after another.
In a nutshell, this lecture records growth trajectories of various countries over long time periods.
While some countries have experienced long term rapid growth across that has lasted a hundred years, others have not.
Since populations differ across countries and vary within a country over time, it will be interesting to describe both total
GDP and GDP per capita as it evolves within a country.
First let’s import the packages needed to explore what the data says about long run growth
import pandas as pd
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
from collections import namedtuple
from matplotlib.lines import Line2D
2.2 Setting up
A project initiated by Angus Maddison has collected many historical time series related to economic growth, some dating
back to the first century.
The data can be downloaded from the Maddison Historical Statistics webpage by clicking on the “Latest Maddison Project
Release”.
For convenience, here is a copy of the 2020 data in Excel format.
Let’s read it into a pandas dataframe:
We can see that this dataset contains GDP per capita (gdppc) and population (pop) for many countries and years.
Let’s look at how many and which countries are available in this dataset
len(data.country.unique())
169
We can now explore some of the 169 countries that are available.
Let’s loop over each country to understand which years are available for each country
cntry_years = []
for cntry in data.country.unique():
cy_data = data[data.country == cntry]['year']
ymin, ymax = cy_data.min(), cy_data.max()
cntry_years.append((cntry, ymin, ymax))
cntry_years = pd.DataFrame(cntry_years, columns=['country', 'Min Year', 'Max Year']).
↪set_index('country')
cntry_years
Let’s now reshape the original data into some convenient variables to enable quicker access to countries time series data.
We can build a useful mapping between country codes and country names in this dataset
code_to_name = data[['countrycode','country']].drop_duplicates().reset_
↪index(drop=True).set_index(['countrycode'])
2.2. Setting up 11
A First Course in Quantitative Economics with Python
data
gdppc = data.set_index(['countrycode','year'])['gdppc']
gdppc = gdppc.unstack('countrycode')
gdppc
We create a color mapping between country codes and colors for consistency
Looking at the United Kingdom we can first confirm we are using the correct country code
fig, ax = plt.subplots(dpi=300)
cntry = 'GBR'
_ = gdppc[cntry].plot(
ax = fig.gca(),
ylabel = 'International $\'s',
xlabel = 'Year',
linestyle='-',
color=color_mapping['GBR'])
Note: International Dollars are a hypothetical unit of currency that has the same purchasing power parity that the U.S.
Dollar has in the United States at any given time. They are also known as Geary–Khamis dollars (GK Dollars).
We can see that the data is non-continuous for longer periods in the early 250 years of this millennium, so we could
choose to interpolate to get a continuous line plot.
Here we use dashed lines to indicate interpolated trends
fig, ax = plt.subplots(dpi=300)
cntry = 'GBR'
ax.plot(gdppc[cntry].interpolate(),
linestyle='--',
lw=2,
color=color_mapping[cntry])
ax.plot(gdppc[cntry],
linestyle='-',
lw=2,
color=color_mapping[cntry])
ax.set_ylabel('International $\'s')
ax.set_xlabel('Year')
plt.show()
We can now put this into a function to generate plots for a list of countries
for i, c in enumerate(cntry):
# Get the interpolated data
df_interpolated = series[c].interpolate(limit_area='inside')
interpolated_data = df_interpolated[series[c].isnull()]
if logscale == True:
ax.set_yscale('log')
return ax
As you can see from this chart, economic growth started in earnest in the 18th century and continued for the next two
hundred years.
How does this compare with other countries’ growth trajectories?
Let’s look at the United States (USA), United Kingdom (GBR), and China (CHN)
The preceding graph of per capita GDP strikingly reveals how the spread of the industrial revolution has over time
gradually lifted the living standards of substantial groups of people
• most of the growth happened in the past 150 years after the industrial revolution.
• per capita GDP in the US and UK rose and diverged from that of China from 1820 to 1940.
• the gap has closed rapidly after 1950 and especially after the late 1970s.
• these outcomes reflect complicated combinations of technological and economic-policy factors that students of
economic growth try to understand and quantify.
It is fascinating to see China’s GDP per capita levels from 1500 through to the 1970s.
Notice the long period of declining GDP per capital levels from the 1700s until the early 20th century.
Thus, the graph indicates
• a long economic downturn and stagnation after the Closed-door Policy by the Qing government.
• China’s very different experience than the UK’s after the onset of the industrial revolution in the UK.
• how the Self-Strengthening Movement seemed mostly to help China to grow.
• how stunning have been the growth achievements of modern Chinese economic policies by the PRC that culminated
with its late 1970s reform and liberalization.
We can also look at the United States (USA) and United Kingdom (GBR) in more detail
In the following graph, please watch for
• impact of trade policy (Navigation Act).
• productivity changes brought by the industrial revolution.
• how the US gradually approaches and then surpasses the UK, setting the stage for the ‘‘American Century’’.
• the often unanticipated consequences of wars.
• interruptions and scars left by business cycle recessions and depressions.
Now we’ll construct some graphs of interest to geopolitical historians like Adam Tooze.
We’ll focus on total Gross Domestic Product (GDP) (as a proxy for ‘‘national geopolitical-military power’’) rather than
focusing on GDP per capita (as a proxy for living standards).
We first visualize the trend of China, the Former Soviet Union, Japan, the UK and the US.
The most notable trend is the rise of the US, surpassing the UK in the 1860s and China in the 1880s.
The growth continued until the large dip in the 1930s when the Great Depression hit.
Meanwhile, Russia experienced significant setbacks during World War I and recovered significantly after the February
Revolution.
fig, ax = plt.subplots(dpi=300)
ax = fig.gca()
cntry = ['CHN', 'SUN', 'JPN', 'GBR', 'USA']
start_year, end_year = (1820, 1945)
ax = draw_interp_plots(gdp[cntry].loc[start_year:end_year],
'International $\'s','Year',
color_mapping, code_to_name, 2, False, ax)
In this section we describe how we have constructed a version of the striking figure from chapter 1 of [Too14] that we
discussed at the start of this lecture.
Let’s first define a collection of countries that consist of the British Empire (BEM) so we can replicate that series in
Tooze’s chart.
Let’s take a look at the aggregation that represents the British Empire.
<Axes: xlabel='year'>
code_to_name
country
countrycode
AFG Afghanistan
AGO Angola
ALB Albania
ARE United Arab Emirates
ARG Argentina
... ...
YEM Yemen
YUG Former Yugoslavia
ZAF South Africa
ZMB Zambia
ZWE Zimbabwe
Now let’s assemble our series and get ready to plot them.
fig, ax = plt.subplots(dpi=300)
ax = fig.gca()
cntry = ['DEU', 'USA', 'SUN', 'BEM', 'FRA', 'JPN']
start_year, end_year = (1821, 1945)
ax = draw_interp_plots(gdp[cntry].loc[start_year:end_year],
'Real GDP in 2011 $\'s','Year',
color_mapping, code_to_name, 2, False, ax)
plt.savefig("./_static/lecture_specific/long_run_growth/tooze_ch1_graph.png", dpi=300,
↪ bbox_inches='tight')
plt.show()
At the start of this lecture, we noted how US GDP came from “nowhere” at the start of the 19th century to rival and then
overtake the GDP of the British Empire by the end of the 19th century, setting the geopolitical stage for the “American
(twentieth) century”.
Let’s move forward in time and start roughly where Tooze’s graph stopped after World War II.
In the spirit of Tooze’s chapter 1 analysis, doing this will provide some information about geopolitical realities today.
The following graph displays how quickly China has grown, especially since the late 1970s.
fig, ax = plt.subplots(dpi=300)
ax = fig.gca()
cntry = ['CHN', 'SUN', 'JPN', 'GBR', 'USA']
start_year, end_year = (1950, 2020)
ax = draw_interp_plots(gdp[cntry].loc[start_year:end_year],
'International $\'s','Year',
color_mapping, code_to_name, 2, False, ax)
It is tempting to compare this graph with figure Fig. 2.6 that showed the US overtaking the UK near the start of the
“American Century”, a version of the graph featured in chapter 1 of [Too14].
We often want to study historical experiences of countries outside the club of “World Powers”.
Fortunately, the Maddison Historical Statistics dataset also includes regional aggregations
data.columns = data.columns.droplevel(level=2)
We can save the raw data in a more convenient format to build a single table of regional GDP per capita
regionalgdppc = data['gdppc_2011'].copy()
regionalgdppc.index = pd.to_datetime(regionalgdppc.index, format='%Y')
Let’s interpolate based on time to fill in any gaps in the dataset for the purpose of plotting
regionalgdppc.interpolate(method='time', inplace=True)
fig = plt.figure(dpi=300)
ax = fig.gca()
ax = worldgdppc.plot(
ax = ax,
xlabel='Year',
ylabel='2011 US$',
)
Looking more closely, let’s compare the time series for Western Offshoots and Sub-Saharan Africa and
more broadly at a number of different regions around the world.
Again we see the divergence of the West from the rest of the world after the industrial revolution and the convergence of
the world after the 1950s
fig = plt.figure(dpi=300)
ax = fig.gca()
line_styles = ['-', '--', ':', '-.', '.', 'o', '-', '--', '-']
ax = regionalgdppc.plot(ax = ax, style=line_styles)
ax.set_yscale('log')
plt.legend(loc='lower center',
ncol=3, bbox_to_anchor=[0.5, -0.4])
plt.show()
THREE
BUSINESS CYCLES
3.1 Overview
We will use the World Bank’s data API wbgapi and pandas_datareader to retrieve data.
We can use wb.series.info with the argument q to query available data from the World Bank.
For example, let’s retrieve the GDP growth data ID to query GDP growth data.
wb.series.info(q='GDP growth')
id value
----------------- ---------------------
NY.GDP.MKTP.KD.ZG GDP growth (annual %)
1 elements
25
A First Course in Quantitative Economics with Python
gdp_growth = wb.data.DataFrame('NY.GDP.MKTP.KD.ZG',
['USA', 'ARG', 'GBR', 'GRC', 'JPN'],
labels=True)
gdp_growth
YR2021 YR2022
economy
JPN 2.142487 1.028625
GRC 8.434426 5.913708
GBR 7.597471 4.101621
ARG 10.398249 5.243044
USA 5.945485 2.061593
[5 rows x 64 columns]
We can look at the series’ metadata to learn more about the series (click to expand).
wb.series.metadata.get('NY.GDP.MKTP.KD.ZG')
gdp_growth
[5 rows x 63 columns]
We write a function to generate plots for individual countries taking into account the recessions.
Let’s start with the United States.
fig, ax = plt.subplots()
GDP growth is positive on average and trending slightly downward over time.
We also see fluctuations over GDP growth over time, some of which are quite large.
Let’s look at a few more countries to get a basis for comparison.
The United Kingdom (UK) has a similar pattern to the US, with a slow decline in the growth rate and significant fluctu-
ations.
Notice the very large dip during the Covid-19 pandemic.
fig, ax = plt.subplots()
Now let’s consider Japan, which experienced rapid growth in the 1960s and 1970s, followed by slowed expansion in the
past two decades.
Major dips in the growth rate coincided with the Oil Crisis of the 1970s, the Global Financial Crisis (GFC) and the
Covid-19 pandemic.
fig, ax = plt.subplots()
fig, ax = plt.subplots()
country = 'Greece'
plot_series(gdp_growth, country,
ylabel, 0.1, ax,
g_params, b_params, t_params)
plt.show()
Greece experienced a very large drop in GDP growth around 2010-2011, during the peak of the Greek debt crisis.
Next let’s consider Argentina.
fig, ax = plt.subplots()
Notice that Argentina has experienced far more volatile cycles than the economies examined above.
At the same time, Argentina’s growth rate did not fall during the two developed economy recessions in the 1970s and
1990s.
3.4 Unemployment
3.4. Unemployment 31
A First Course in Quantitative Economics with Python
• cycles are, in general, asymmetric: sharp rises in unemployment are followed by slow recoveries.
It also shows us how unique labor market conditions were in the US during the post-pandemic recovery.
The labor market recovered at an unprecedented rate after the shock in 2020-2021.
3.5 Synchronization
In our previous discussion, we found that developed economies have had relatively synchronized periods of recession.
At the same time, this synchronization did not appear in Argentina until the 2000s.
Let’s examine this trend further.
With slight modifications, we can use our previous function to draw a plot that includes multiple countries.
Here we compare the GDP growth rate of developed economies and developing economies.
We use the United Kingdom, United States, Germany, and Japan as examples of developed economies.
The comparison of GDP growth rates above suggests that business cycles are becoming more synchronized in 21st-century
recessions.
However, emerging and less developed economies often experience more volatile changes throughout the economic cycles.
3.5. Synchronization 33
A First Course in Quantitative Economics with Python
Despite the synchronization in GDP growth, the experience of individual countries during the recession often differs.
We use the unemployment rate and the recovery of labor market conditions as another example.
Here we compare the unemployment rate of the United States, the United Kingdom, Japan, and France.
We see that France, with its strong labor unions, typically experiences relatively slow labor market recoveries after negative
shocks.
We also notice that Japan has a history of very low and stable unemployment rates.
Examining leading indicators and correlated factors helps policymakers to understand the causes and results of business
cycles.
We will discuss potential leading indicators and correlated factors from three perspectives: consumption, production, and
credit level.
3.6.1 Consumption
Consumption depends on consumers’ confidence towards their income and the overall performance of the economy in the
future.
One widely cited indicator for consumer confidence is the consumer sentiment index published by the University of
Michigan.
Here we plot the University of Michigan Consumer Sentiment Index and year-on-year core consumer price index (CPI)
change from 1978-2022 in the US.
We see that
• consumer sentiment often remains high during expansions and drops before recessions.
• there is a clear negative correlation between consumer sentiment and the CPI.
When the price of consumer commodities rises, consumer confidence diminishes.
This trend is more significant during stagflation.
3.6.2 Production
Credit contractions often occur during recessions, as lenders become more cautious and borrowers become more hesitant
to take on additional debt.
This is due to factors such as a decrease in overall economic activity and gloomy expectations for the future.
One example is domestic credit to the private sector by banks in the UK.
The following graph shows the domestic credit to the private sector as a percentage of GDP by banks from 1970 to 2022
in the UK.
Note that the credit rises during economic expansions and stagnates or even contracts after recessions.
FOUR
4.1 Overview
In this section we
• provide motivation for the techniques deployed in the lecture and
• import code libraries needed for our work.
Many historians argue that inequality played a key role in the fall of the Roman Republic.
After defeating Carthage and invading Spain, money flowed into Rome and greatly enriched those in power.
Meanwhile, ordinary citizens were taken from their farms to fight for long periods, diminishing their wealth.
The resulting growth in inequality caused political turmoil that shook the foundations of the republic.
Eventually, the Roman Republic gave way to a series of dictatorships, starting with Octavian (Augustus) in 27 BCE.
This history is fascinating in its own right, and we can see some parallels with certain countries in the modern world.
Many recent political debates revolve around inequality.
Many economic policies, from taxation to the welfare state, are aimed at addressing inequality.
4.1.2 Measurement
One problem with these debates is that inequality is often poorly defined.
Moreover, debates on inequality are often tied to political beliefs.
This is dangerous for economists because allowing political beliefs to shape our findings reduces objectivity.
To bring a truly scientific perspective to the topic of inequality we must start with careful definitions.
In this lecture we discuss standard measures of inequality used in economic research.
For each of these measures, we will look at both simulated and real data.
We will install the following libraries.
39
A First Course in Quantitative Economics with Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
import random as rd
from interpolation import interp
4.2.1 Definition
𝑖 ∑𝑗≤𝑖 𝑤𝑗
𝑥𝑖 = , 𝑦𝑖 = , 𝑖 = 1, … , 𝑛
𝑛 ∑𝑗≤𝑛 𝑤𝑗
Now the Lorenz curve 𝐿 is formed from these data points using interpolation.
(If we use a line plot in Matplotlib, the interpolation will be done for us.)
The meaning of the statement 𝑦 = 𝐿(𝑥) is that the lowest (100 × 𝑥)% of people have (100 × 𝑦)% of all wealth.
• if 𝑥 = 0.5 and 𝑦 = 0.1, then the bottom 50% of the population owns 10% of the wealth.
In the discussion above we focused on wealth but the same ideas apply to income, consumption, etc.
n = 2000
sample = np.exp(np.random.randn(n))
fig, ax = plt.subplots()
ax.legend(fontsize=12)
ax.set_ylim((0, 1))
ax.set_xlim((0, 1))
plt.show()
Next let’s look at the real data, focusing on income and wealth in the US in 2016.
The following code block imports a subset of the dataset SCF_plus, which is derived from the Survey of Consumer
Finances (SCF).
url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_
↪plus/SCF_plus_mini.csv'
df = pd.read_csv(url)
df = df.dropna()
df_income_wealth = df
df_income_wealth.head()
The following code block uses data stored in dataframe df_income_wealth to generate the Lorenz curves.
(The code is somewhat complex because we need to adjust the data according to population weights supplied by the SCF.)
Now we plot Lorenz curves for net wealth, total income and labor income in the US in 2016.
fig, ax = plt.subplots()
ax.legend(fontsize=12)
plt.show()
4.3.1 Definition
As before, suppose that the sample 𝑤1 , … , 𝑤𝑛 has been sorted from smallest to largest.
The Gini coefficient is defined for the sample above as
𝑛 𝑛
∑𝑖=1 ∑𝑗=1 |𝑤𝑗 − 𝑤𝑖 |
𝐺 ∶= 𝑛 . (4.1)
2𝑛 ∑𝑖=1 𝑤𝑖
fig, ax = plt.subplots()
ax.legend(fontsize=12)
ax.set_ylim((0, 1))
ax.set_xlim((0, 1))
plt.show()
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
ginis = []
for σ in σ_vals:
μ = -σ**2 / 2
y = np.exp(μ + σ * np.random.randn(n))
ginis.append(qe.gini_coefficient(y))
fig, ax = plt.subplots()
ax.plot(x, y, marker='o', label=legend)
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.legend(fontsize=12)
plt.show()
plot_inequality_measures(σ_vals,
ginis,
'simulated',
'$\sigma$',
'gini coefficients')
The plots show that inequality rises with 𝜎, according to the Gini coefficient.
Now let’s look at Gini coefficients for US data derived from the SCF.
The following code creates a list called Ginis.
It stores data of Gini coefficients generated from the dataframe df_income_wealth and method gini_coefficient,
from QuantEcon library.
Let’s plot the Gini coefficients for net wealth, labor income and total income.
xlabel = "year"
ylabel = "gini coefficient"
fig, ax = plt.subplots()
(continues on next page)
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
plt.show()
xlabel = "year"
ylabel = "gini coefficient"
fig, ax = plt.subplots()
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.legend(fontsize=12)
plt.show()
We see that, by this measure, inequality in wealth and income has risen substantially since 1980.
The wealth time series exhibits a strong U-shape.
4.4.1 Definition
As before, suppose that the sample 𝑤1 , … , 𝑤𝑛 has been sorted from smallest to largest.
Given the Lorenz curve 𝑦 = 𝐿(𝑥) defined above, the top 100 × 𝑝% share is defined as
∑𝑗≥𝑖 𝑤𝑗
𝑇 (𝑝) = 1 − 𝐿(1 − 𝑝) ≈ , 𝑖 = ⌊𝑛(1 − 𝑝)⌋ (4.2)
∑𝑗≤𝑛 𝑤𝑗
Here ⌊⋅⌋ is the floor function, which rounds any number down to the integer less than or equal to that number.
The following code uses the data from dataframe df_income_wealth to generate another dataframe
df_topshares.
df_topshares stores the top 10 percent shares for the total income, the labor income and net wealth from 1950 to
2016 in US.
Then let’s plot the top shares.
xlabel = "year"
ylabel = "top $10\%$ share"
fig, ax = plt.subplots()
ax.plot(years, df_topshares["topshare_l_income"],
marker='o', label="labor income")
ax.plot(years, df_topshares["topshare_n_wealth"],
marker='o', label="net wealth")
ax.plot(years, df_topshares["topshare_t_income"],
marker='o', label="total income")
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.legend(fontsize=12)
plt.show()
4.5 Exercises
Exercise 4.5.1
Using simulation, compute the top 10 percent shares for the collection of lognormal distributions associated with the
random variables 𝑤𝜎 = exp(𝜇 + 𝜎𝑍), where 𝑍 ∼ 𝑁 (0, 1) and 𝜎 varies over a finite grid between 0.2 and 4.
As 𝜎 increases, so does the variance of 𝑤𝜎 .
To focus on volatility, adjust 𝜇 at each step to maintain the equality 𝜇 = −𝜎2 /2.
For each 𝜎, generate 2,000 independent draws of 𝑤𝜎 and calculate the Lorenz curve and Gini coefficient.
Confirm that higher variance generates more dispersion in the sample, and hence greater inequality.
s = np.sort(s)
n = len(s)
index = int(n * (1 - p))
return s[index:].sum() / s.sum()
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
topshares = []
ginis = []
f_vals = []
l_vals = []
for σ in σ_vals:
μ = -σ ** 2 / 2
y = np.exp(μ + σ * np.random.randn(n))
f_val, l_val = qe._inequality.lorenz_curve(y)
f_vals.append(f_val)
l_vals.append(l_val)
ginis.append(qe._inequality.gini_coefficient(y))
topshares.append(calculate_top_share(y))
plot_inequality_measures(σ_vals,
topshares,
"simulated data",
"$\sigma$",
"top $10\%$ share")
plot_inequality_measures(σ_vals,
ginis,
(continues on next page)
4.5. Exercises 51
A First Course in Quantitative Economics with Python
fig, ax = plt.subplots()
ax.plot([0,1],[0,1], label=f"equality")
for i in range(len(f_vals)):
ax.plot(f_vals[i], l_vals[i], label=f"$\sigma$ = {σ_vals[i]}")
plt.legend()
plt.show()
Exercise 4.5.2
According to the definition of the top shares (4.2) we can also calculate the top percentile shares using the Lorenz curve.
Compute the top shares of US net wealth using the corresponding Lorenz curves data: f_vals_nw, l_vals_nw and
linear interpolation.
Plot the top shares generated from Lorenz curve and the top shares approximated from data together.
4.5. Exercises 53
A First Course in Quantitative Economics with Python
top_shares_nw = []
for f_val, l_val in zip(f_vals_nw, l_vals_nw):
top_shares_nw.append(lorenz2top(f_val, l_val))
xlabel = "year"
ylabel = "top $10\%$ share"
fig, ax = plt.subplots()
ax.set_xlabel(xlabel, fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
4.5. Exercises 55
A First Course in Quantitative Economics with Python
Essential Tools
57
CHAPTER
FIVE
5.1 Overview
import numpy as np
import matplotlib.pyplot as plt
59
A First Course in Quantitative Economics with Python
Equilibrium holds when supply equals demand (𝑞0𝑠 = 𝑞0𝑑 and 𝑞1𝑠 = 𝑞1𝑑 ).
This yields the linear system
100 − 10𝑝0 − 5𝑝1 = 10𝑝0 + 5𝑝1
(5.3)
50 − 𝑝0 − 10𝑝1 = 5𝑝0 + 10𝑝1
We can solve this with pencil and paper to get
Inserting these results into either (5.1) or (5.2) yields the equilibrium quantities
𝑞0 = 50 and 𝑞1 = 33.82.
Pencil and paper methods are easy in the two good case.
But what if there are many goods?
For such problems we need matrix algebra.
Before solving problems with matrix algebra, let’s first recall the basics of vectors and matrices, in both theory and
computation.
5.3 Vectors
A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as 𝑥 = (𝑥1 , … , 𝑥𝑛 ) or
𝑥 = [𝑥1 , … , 𝑥𝑛 ].
We can write these sequences either horizontally or vertically.
But when we use matrix operations, our default assumption is that vectors are column vectors.
The set of all 𝑛-vectors is denoted by ℝ𝑛 .
For example,
4 3 4 + 3 7
[ ]+[ ]=[ ] = [ ].
−2 3 −2 + 3 1
In general,
𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
.
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑥𝑛 ⎦ ⎣𝑦𝑛 ⎦ ⎣𝑥𝑛 + 𝑦𝑛 ⎦
5.3. Vectors 61
A First Course in Quantitative Economics with Python
3 −2 × 3 −6
−2 [ ]=[ ] = [ ].
−7 −2 × −7 14
𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥ .
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦
In Python, a vector can be represented as a list or tuple, such as x = [2, 4, 6] or x = (2, 4, 6).
However, it is more common to represent vectors with NumPy arrays.
One advantage of NumPy arrays is that scalar multiplication and addition have very natural syntax.
4 * x # Scalar multiply
𝑦1
⎡𝑦 ⎤ 𝑛
⊤
𝑥 𝑦 = [𝑥1 𝑥2 ⋯ 𝑥𝑛 ] ⎢ 2 ⎥ = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛 ∶= ∑ 𝑥𝑖 𝑦𝑖 .
⎢ ⋮ ⎥ 𝑖=1
⎣𝑦𝑛 ⎦
The norm of a vector 𝑥 represents its “length” (i.e., its distance from the zero vector) and is defined as
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥⊤ 𝑥 ∶= (∑ 𝑥2𝑖 ) .
𝑖=1
5.3. Vectors 63
A First Course in Quantitative Economics with Python
12.0
12.0
1.7320508075688772
1.7320508075688772
Just as was the case for vectors, we can add, subtract and scalar multiply matrices.
Scalar multiplication and addition are generalizations of the vector case:
Here is an example of scalar multiplication
2 −13 6 −39
3[ ]=[ ].
0 5 0 15
1 5 12 −1 13 4
[ ]+[ ]=[ ].
7 3 0 9 7 12
In general,
In the latter case, the matrices must have the same shape in order for the definition to make sense.
𝑎11 𝑎12 𝑏11 𝑏12 𝑎 𝑏 + 𝑎12 𝑏21 𝑎11 𝑏12 + 𝑎12 𝑏22
𝐴𝐵 = [ ][ ] ∶= [ 11 11 ]
𝑎21 𝑎22 𝑏21 𝑏22 𝑎21 𝑏11 + 𝑎22 𝑏21 𝑎21 𝑏12 + 𝑎22 𝑏22
There are many tutorials to help you further visualize this operation, such as
• this one, or
• the discussion on the Wikipedia page.
Note: Unlike number products, 𝐴𝐵 and 𝐵𝐴 are not generally the same thing.
One important special case is the identity matrix, which has ones on the principal diagonal and zero elsewhere:
1 ⋯ 0
𝐼=⎡
⎢⋮ ⋱ ⋮⎥
⎤
⎣0 ⋯ 1⎦
It is a useful exercise to check the following:
• if 𝐴 is 𝑛 × 𝑘 and 𝐼 is the 𝑘 × 𝑘 identity matrix, then 𝐴𝐼 = 𝐴, and
• if 𝐼 is the 𝑛 × 𝑛 identity matrix, then 𝐼𝐴 = 𝐴.
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all the standard matrix oper-
ations.
You can create them manually from tuples of tuples (or lists of lists) as follows
A = ((1, 2),
(3, 4))
type(A)
tuple
A = np.array(A)
type(A)
numpy.ndarray
A.shape
(2, 2)
The shape attribute is a tuple giving the number of rows and columns — see here for more discussion.
To get the transpose of A, use A.transpose() or, more simply, A.T.
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition have very natural syntax.
A + B
We can now revisit the two good model and solve (5.3) numerically via matrix algebra.
This involves some extra steps but the method is widely applicable — as we will see when we include more goods.
First we rewrite (5.1) as
𝑞𝑑 −10 −5 100
𝑞 𝑑 = 𝐷𝑝 + ℎ where 𝑞 𝑑 = [ 0𝑑 ] 𝐷=[ ] and ℎ=[ ]. (5.5)
𝑞1 −1 −10 50
𝑞𝑠 10 5
𝑞 𝑠 = 𝐶𝑝 where 𝑞 𝑠 = [ 0𝑠 ] and 𝐶 = [ ]. (5.6)
𝑞1 5 10
𝐶𝑝 = 𝐷𝑝 + ℎ.
(𝐶 − 𝐷)𝑝 = ℎ.
If all of the terms were numbers, we could solve for prices as 𝑝 = ℎ/(𝐶 − 𝐷).
Matrix algebra allows us to do something similar: we can solve for equilibrium prices using the inverse of 𝐶 − 𝐷:
𝑝 = (𝐶 − 𝐷)−1 ℎ. (5.7)
(𝐷 − 𝐶)𝑝 = 𝑒 − ℎ. (5.8)
𝑝 = (𝐷 − 𝐶)−1 (𝑒 − ℎ).
𝑎11 ⋯ 𝑎1𝑛 𝑏1
𝐴𝑥 = 𝑏 where 𝐴 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤⎥ and 𝑏 = ⎡ ⎤
⎢ ⋮ ⎥. (5.10)
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑛 ⎦ ⎣𝑏𝑛 ⎦
For example, (5.8) has this form with
𝐴 = 𝐷 − 𝐶, 𝑏 =𝑒−ℎ and 𝑥 = 𝑝.
When considering problems such as (5.10), we need to ask at least some of the following questions
• Does a solution actually exist?
• If a solution exists, how should we compute it?
Recall again the system of equations (5.9), which we write here again as
𝐴𝑥 = 𝑏. (5.11)
The problem we face is to find a vector 𝑥 ∈ ℝ𝑛 that solves (5.11), taking 𝑏 and 𝐴 as given.
We may not always find a unique vector 𝑥 that solves (5.11).
We illustrate two such cases below.
5.5.1 No solution
fig, ax = plt.subplots()
x = np.linspace(-10, 10)
plt.plot(x, (3-x)/3, label=f'$x + 3y = 3$')
plt.plot(x, (-8-2*x)/6, label=f'$2x + 6y = -8$')
plt.legend()
plt.show()
Clearly, these are parallel lines and hence we will never find a point 𝑥 ∈ ℝ2 such that these lines intersect.
Thus, this system has no possible solution.
We can rewrite this system in matrix form as
1 3 3
𝐴𝑥 = 𝑏 where 𝐴=[ ] and 𝑏=[ ]. (5.12)
2 6 −8
It can be noted that the 2𝑛𝑑 row of matrix 𝐴 = (2, 6) is just a scalar multiple of the 1𝑠𝑡 row of matrix 𝐴 = (1, 3).
The rows of matrix 𝐴 in this case are called linearly dependent.
Note: Advanced readers can find a detailed explanation of linear dependence and independence here.
But these details are not needed in what follows.
Now consider,
𝑥 − 2𝑦 = −4
−2𝑥 + 4𝑦 = 8.
Any vector 𝑣 = (𝑥, 𝑦) such that 𝑥 = 2𝑦 − 4 will solve the above system.
Since we can find infinite such vectors this system has infinitely many solutions.
This is because the rows of the corresponding matrix
1 −2
𝐴=[ ]. (5.13)
−2 4
are linearly dependent — can you see why?
We now impose conditions on 𝐴 in (5.11) that rule out these problems.
To every square matrix we can assign a unique number called the determinant.
For 2 × 2 matrices, the determinant is given by,
𝑎 𝑏
[ ] = 𝑎𝑑 − 𝑏𝑐.
𝑐 𝑑
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
A square matrix 𝐴 is nonsingular if and only if the rows and columns of 𝐴 are linearly independent.
A more detailed explanation of matrix inverse can be found here.
You can check yourself that the in (5.12) and (5.13) with linearly dependent rows are singular matrices.
This gives us a useful one-number summary of whether or not a square matrix can be inverted.
In particular, a square matrix 𝐴 has a nonzero determinant, if and only if it possesses an inverse matrix 𝐴−1 , with the
property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼.
As a consequence, if we pre-multiply both sides of 𝐴𝑥 = 𝑏 by 𝐴−1 , we get
𝑥 = 𝐴−1 𝑏. (5.14)
𝑝 = (𝐶 − 𝐷)−1 ℎ.
C = np.array(C)
340.0000000000001
array([[4.41176471],
[1.17647059]])
q = C @ p # equilibrium quantities
q
array([[50. ],
[33.82352941]])
Notice that we get the same solutions as the pencil and paper case.
We can also solve for 𝑝 using solve(A, h) as follows.
array([[4.41176471],
[1.17647059]])
q = C @ p # equilibrium quantities
q
array([[50. ],
[33.82352941]])
Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y).
The latter method uses a different algorithm that is numerically more stable and hence should be the default option.
5.6 Exercises
Exercise 5.6.1
Let’s consider a market with 3 commodities - good 0, good 1 and good 2.
The demand for each good depends on the price of the other two goods and is given by:
(Here demand decreases when own price increases but increases when prices of other goods increase.)
The supply of each good is given by:
Equilibrium holds when supply equals demand, i.e, 𝑞0𝑑 = 𝑞0𝑠 , 𝑞1𝑑 = 𝑞1𝑠 and 𝑞2𝑑 = 𝑞2𝑠 .
1. Set up the market as a system of linear equations.
2. Use matrix algebra to solve for equilibrium prices. Do this using both the numpy.linalg.solve and inv(A)
methods. Compare the solutions.
import numpy as np
from numpy.linalg import det
9999.99999999999
# Using inverse
from numpy.linalg import det
A_inv = inv(A)
p = A_inv @ b
p
array([[4.9625],
[7.0625],
[7.675 ]])
# Using numpy.linalg.solve
from numpy.linalg import solve
p = solve(A, b)
p
array([[4.9625],
[7.0625],
[7.675 ]])
Exercise 5.6.2
Earlier in the lecture we discussed cases where the system of equations given by 𝐴𝑥 = 𝑏 has no solution.
In this case 𝐴𝑥 = 𝑏 is called an inconsistent system of equations.
When faced with an inconsistent system we try to find the best “approximate” solution.
There are various methods to do this, one such method is the method of least squares.
Suppose we have an inconsistent system
𝐴𝑥 = 𝑏 (5.15)
5.6. Exercises 73
A First Course in Quantitative Economics with Python
That is,
‖𝐴𝑥̂ − 𝑏‖ ≤ ‖𝐴𝑥 − 𝑏‖
It can be shown that, for the system of equations 𝐴𝑥 = 𝑏, the least squares solution 𝑥̂ is
Now consider the general equation of a linear demand curve of a good given by:
𝑝 = 𝑚 − 𝑛𝑞
Requiring the demand curve 𝑝 = 𝑚 − 𝑛𝑞 to pass through all these points leads to the following three equations:
1 = 𝑚 − 9𝑛
3 = 𝑚 − 7𝑛
8 = 𝑚 − 3𝑛
1 −9 1
𝑚
Thus we obtain a system of equations 𝐴𝑥 = 𝑏 where 𝐴 = ⎡ ⎤ ⎡ ⎤
⎢1 −7⎥, 𝑥 = [ 𝑛 ] and 𝑏 = ⎢3⎥.
⎣1 −3⎦ ⎣8⎦
It can be verified that this system has no solutions.
(The problem is that we have three equations and only two unknowns.)
We will thus try to find the best approximate solution for 𝑥.
1. Use (5.16) and matrix algebra to find the least squares solution 𝑥.̂
2. Find the least squares solution using numpy.linalg.lstsq and compare the results.
import numpy as np
from numpy.linalg import inv
x = inv(A_T @ A) @ A_T @ b
x
array([[11.46428571],
[ 1.17857143]])
# Using numpy.linalg.lstsq
x, res, _, _ = np.linalg.lstsq(A, b, rcond=None)
x̂ = [[11.46428571]
[ 1.17857143]]
‖Ax̂ - b‖² = 0.07142857142857081
Here is a visualization of how the least squares method approximates the equation of a line connecting a set of points.
We can also describe this as “fitting” a line between a set of points.
fig, ax = plt.subplots()
p = np.array((1, 3, 8))
q = np.array((9, 7, 3))
a, b = x
5.6. Exercises 75
A First Course in Quantitative Economics with Python
SIX
6.1 Overview
77
A First Course in Quantitative Economics with Python
𝐴𝑥 = 𝑦
If we fix 𝐴 and consider different choices of 𝑥, we can understand 𝐴 as a map transforming 𝑥 to 𝐴𝑥.
Because 𝐴 is 𝑛 × 𝑚, it transforms 𝑚-vectors to 𝑛-vectors.
We can write this formally as 𝐴 ∶ ℝ𝑚 → ℝ𝑛 .
You might argue that if 𝐴 is a function then we should write 𝐴(𝑥) = 𝑦 rather than 𝐴𝑥 = 𝑦 but the second notation is
more conventional.
2 1 1 5
[ ][ ] = [ ]
−1 1 3 2
2 1
𝐴=[ ]
−1 1
1 5
transforms the vector 𝑥 = [ ] to the vector 𝑦 = [ ].
3 2
Let’s visualize this using Python:
A = np.array([[2, 1],
[-1, 1]])
fig, ax = plt.subplots()
# Set the axes through the origin
plt.show()
6.3.1 Scaling
scales vectors across the x-axis by a factor 𝛼 and along the y-axis by a factor 𝛽.
Here we illustrate a simple example where 𝛼 = 𝛽 = 3.
6.3.2 Shearing
stretches vectors along the x-axis by an amount proportional to the y-coordinate of a point.
6.3.3 Rotation
6.3.4 Permutation
Since matrices act as functions that transform one vector to another, we can apply the concept of function composition to
matrices as well.
We can observe that applying the transformation 𝐴𝐵 on the vector 𝑥 is the same as first applying 𝐵 on 𝑥 and then applying
𝐴 on the vector 𝐵𝑥.
Thus the matrix product 𝐴𝐵 is the composition of the matrix transformations 𝐴 and 𝐵
This means first apply transformation 𝐵 and then transformation 𝐴.
When we matrix multiply an 𝑛 × 𝑚 matrix 𝐴 with an 𝑚 × 𝑘 matrix 𝐵 the obtained matrix product is an 𝑛 × 𝑘 matrix
𝐴𝐵.
Thus, if 𝐴 and 𝐵 are transformations such that 𝐴 ∶ ℝ𝑚 → ℝ𝑛 and 𝐵 ∶ ℝ𝑘 → ℝ𝑚 , then 𝐴𝐵 transforms ℝ𝑘 to ℝ𝑛 .
Viewing matrix multiplication as composition of maps helps us understand why, under matrix multiplication, 𝐴𝐵 is
generally not equal to 𝐵𝐴.
(After all, when we compose functions, the order usually matters.)
6.4.2 Examples
0 1
Let 𝐴 be the 90∘ clockwise rotation matrix given by [ ] and let 𝐵 be a shear matrix along the x-axis given by
−1 0
1 2
[ ].
0 1
We will visualize how a grid of points changes when we apply the transformation 𝐴𝐵 and then compare it with the
transformation 𝐵𝐴.
grid_composition_transform(A, B) # transformation AB
grid_composition_transform(B,A) # transformation BA
It is evident that the transformation 𝐴𝐵 is not the same as the transformation 𝐵𝐴.
In economics (and especially in dynamic modeling), we are often interested in analyzing behavior where we repeatedly
apply a fixed matrix.
For example, given a vector 𝑣 and a matrix 𝐴, we are interested in studying the sequence
𝑣, 𝐴𝑣, 𝐴𝐴𝑣 = 𝐴2 𝑣, …
Let’s first see examples of a sequence of iterates (𝐴𝑘 𝑣)𝑘≥0 under different maps 𝐴.
B = np.array([[1, -1],
[1, 0]])
fig, ax = plt.subplots()
ellipse = B @ xy
ax.plot(ellipse[0, :], ellipse[1, :], color='black',
(continues on next page)
for i in range(n):
iteration = matrix_power(A, i) @ v
v1 = iteration[0]
v2 = iteration[1]
ax.scatter(v1, v2, color=colors[i])
if i == 0:
ax.text(v1+0.25, v2, f'$v$')
elif i == 1:
ax.text(v1+0.25, v2, f'$Av$')
elif 1 < i < 4:
ax.text(v1+0.25, v2, f'$A^{i}v$')
plt.show()
A = np.array([[sqrt(3) + 1, -2],
[1, sqrt(3) - 1]])
A = (1/(2*sqrt(2))) * A
v = (-3, -3)
n = 12
plot_series(A, v, n)
Here with each iteration the vectors get shorter, i.e., move closer to the origin.
In this case, repeatedly multiplying a vector by 𝐴 makes the vector “spiral in”.
B = np.array([[sqrt(3) + 1, -2],
[1, sqrt(3) - 1]])
B = (1/2) * B
v = (2.5, 0)
n = 12
plot_series(B, v, n)
Here with each iteration vectors do not tend to get longer or shorter.
In this case, repeatedly multiplying a vector by 𝐴 simply “rotates it around an ellipse”.
B = np.array([[sqrt(3) + 1, -2],
[1, sqrt(3) - 1]])
B = (1/sqrt(2)) * B
v = (-1, -0.25)
n = 6
plot_series(B, v, n)
Here with each iteration vectors tend to get longer, i.e., farther from the origin.
In this case, repeatedly multiplying a vector by 𝐴 makes the vector “spiral out”.
We thus observe that the sequence (𝐴𝑘 𝑣)𝑘≥0 behaves differently depending on the map 𝐴 itself.
We now discuss the property of A that determines this behavior.
6.6 Eigenvalues
6.6.1 Definitions
𝐴𝑣 = 𝜆𝑣.
A = [[1, 2],
(continues on next page)
plt.show()
6.6. Eigenvalues 89
A First Course in Quantitative Economics with Python
6.6.4 Facts
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows:
1. the determinant of 𝐴 equals the product of the eigenvalues
2. the trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of the eigenvalues
3. if 𝐴 is symmetric, then all of its eigenvalues are real
4. if 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are 1/𝜆1 , … , 1/𝜆𝑛 .
A corollary of the last statement is that a matrix is invertible if and only if all its eigenvalues are nonzero.
6.6.5 Computation
Using NumPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evals # eigenvalues
evecs # eigenvectors
In this section we present a famous result about series of matrices that has many applications in economics.
For a one-dimensional linear equation 𝑥 = 𝑎𝑥 + 𝑏 where x is unknown we can thus conclude that the solution 𝑥∗ is given
by:
∞
𝑏
𝑥∗ = = ∑ 𝑎𝑘 𝑏
1 − 𝑎 𝑘=0
𝑥∗ = (𝐼 − 𝐴)−1 𝑏 (6.2)
We can see the Neumann Series Lemma in action in the following example.
A = np.array([[0.4, 0.1],
[0.7, 0.2]])
0.582842712474619
Let’s check equality between the sum and the inverse methods.
np.allclose(A_sum, B_inverse)
True
Although we truncate the infinite sum at 𝑘 = 50, both methods give us the same result which illustrates the result of the
Neumann Series Lemma.
6.8 Exercises
Exercise 6.8.1
Power iteration is a method for finding the greatest absolute eigenvalue of a diagonalizable matrix.
The method starts with a random vector 𝑏0 and repeatedly applies the matrix 𝐴 to it
𝐴𝑏𝑘
𝑏𝑘+1 =
‖𝐴𝑏𝑘 ‖
6.8. Exercises 93
A First Course in Quantitative Economics with Python
# Define a matrix A
A = np.array([[1, 0, 3],
[0, 2, 0],
[3, 0, 1]])
num_iters = 20
errors = []
res = []
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.tick_params(axis='both', which='major', labelsize=7)
plt.show()
Exercise 6.8.2
We have discussed the trajectory of the vector 𝑣 after being transformed by 𝐴.
1 2 2
Consider the matrix 𝐴 = [ ] and the vector 𝑣 = [ ].
1 1 −2
Try to compute the trajectory of 𝑣 after being transformed by 𝐴 for 𝑛 = 4 iterations and plot the result.
6.8. Exercises 95
A First Course in Quantitative Economics with Python
A = np.array([[1, 2],
[1, 1]])
v = (0.4, -0.4)
n = 11
print(f'eigenvalues:\n {eigenvalues}')
print(f'eigenvectors:\n {eigenvectors}')
plot_series(A, v, n)
eigenvalues:
[ 2.41421356 -0.41421356]
eigenvectors:
[[ 0.81649658 -0.81649658]
[ 0.57735027 0.57735027]]
The result seems to converge to the eigenvector of 𝐴 with the largest eigenvalue.
Let’s use a vector field to visualize the transformation brought by A.
(This is a more advanced topic in linear algebra, please step ahead if you are comfortable with the math.)
6.8. Exercises 97
A First Course in Quantitative Economics with Python
# Draw eigenvectors
origin = np.zeros((2, len(eigenvectors)))
parameters = {'color': ['b', 'g'], 'angles': 'xy',
'scale_units': 'xy', 'scale': 0.1, 'width': 0.01}
plt.quiver(*origin, eigenvectors[0],
eigenvectors[1], **parameters)
plt.quiver(*origin, - eigenvectors[0],
- eigenvectors[1], **parameters)
plt.xlabel("x")
plt.ylabel("y")
plt.grid()
plt.gca().set_aspect('equal', adjustable='box')
plt.show()
Note that the vector field converges to the eigenvector of 𝐴 with the largest eigenvalue and diverges from the eigenvector
of 𝐴 with the smallest eigenvalue.
In fact, the eigenvectors are also the directions in which the matrix 𝐴 stretches or shrinks the space.
Specifically, the eigenvector with the largest eigenvalue is the direction in which the matrix 𝐴 stretches the space the most.
We will see more intriguing examples in the following exercise.
Exercise 6.8.3
Previously, we demonstrated the trajectory of the vector 𝑣 after being transformed by 𝐴 for three different matrices.
Use the visualization in the previous exercise to explain the trajectory of the vector 𝑣 after being transformed by 𝐴 for
the three different matrices.
B = np.array([[sqrt(3) + 1, -2],
[1, sqrt(3) - 1]])
B = (1/2) * B
C = np.array([[sqrt(3) + 1, -2],
[1, sqrt(3) - 1]])
C = (1/sqrt(2)) * C
examples = [A, B, C]
eigenvalues_real = eigenvalues.real
eigenvectors_real = eigenvectors.real
6.8. Exercises 99
A First Course in Quantitative Economics with Python
# Draw eigenvectors
parameters = {'color': ['b', 'g'], 'angles': 'xy',
'scale_units': 'xy', 'scale': 1,
'width': 0.01, 'alpha': 0.5}
origin = np.zeros((2, len(eigenvectors)))
ax[i].quiver(*origin, eigenvectors_real[0],
eigenvectors_real[1], **parameters)
ax[i].quiver(*origin,
- eigenvectors_real[0],
- eigenvectors_real[1],
**parameters)
ax[i].set_xlabel("x-axis")
ax[i].set_ylabel("y-axis")
ax[i].grid()
ax[i].set_aspect('equal', adjustable='box')
plt.show()
Example 1:
eigenvalues:
[0.61237244+0.35355339j 0.61237244-0.35355339j]
eigenvectors:
[[0.81649658+0.j 0.81649658-0.j ]
[0.40824829-0.40824829j 0.40824829+0.40824829j]]
Example 2:
eigenvalues:
[0.8660254+0.5j 0.8660254-0.5j]
eigenvectors:
[[0.81649658+0.j 0.81649658-0.j ]
[0.40824829-0.40824829j 0.40824829+0.40824829j]]
Example 3:
eigenvalues:
[1.22474487+0.70710678j 1.22474487-0.70710678j]
eigenvectors:
[[0.81649658+0.j 0.81649658-0.j ]
[0.40824829-0.40824829j 0.40824829+0.40824829j]]
The vector fields explain why we observed the trajectories of the vector 𝑣 multiplied by 𝐴 iteratively before.
The pattern demonstrated here is because we have complex eigenvalues and eigenvectors.
We can plot the complex plane for one of the matrices using Arrow3D class retrieved from stackoverflow.
class Arrow3D(FancyArrowPatch):
def __init__(self, xs, ys, zs, *args, **kwargs):
super().__init__((0, 0), (0, 0), *args, **kwargs)
self._verts3d = xs, ys, zs
def do_3d_projection(self):
xs3d, ys3d, zs3d = self._verts3d
xs, ys, zs = proj3d.proj_transform(xs3d, ys3d, zs3d,
self.axes.M)
self.set_positions((0.1*xs[0], 0.1*ys[0]),
(0.1*xs[1], 0.1*ys[1]))
return np.min(zs)
# Create 3D figure
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
vlength = np.linalg.norm(eigenvectors)
ax.quiver(x, y, u_imag, u_real-x, v_real-y, v_imag-u_imag,
colors='b', alpha=0.3, length=.2,
arrow_length_ratio=0.01)
arrow_prop_dict = dict(mutation_scale=5,
arrowstyle='-|>', shrinkA=0, shrinkB=0)
# Plot 3D eigenvectors
for c, i in zip(['b', 'g'], [0, 1]):
a = Arrow3D([0, eigenvectors[0][i].real],
[0, eigenvectors[1][i].real],
(continues on next page)
plt.draw()
plt.show()
SEVEN
7.1 Overview
This lecture is about some models of equilibrium prices and quantities, one of the main topics of elementary microeco-
nomics.
Throughout the lecture, we focus on models with one good and one price.
In a subsequent lecture we will investigate settings with many goods.
Key infrastructure concepts that we’ll encounter in this lecture are
• inverse demand curves
• inverse supply curves
• consumer surplus
• producer surplus
• social welfare as the sum of consumer and producer surpluses
• relationship between equilibrium quantity and social welfare optimum
Throughout the lectures, we’ll assume that inverse demand and supply curves are affine functions of quantity.
(“Affine” means “linear plus a constant” and here is a nice discussion about it.)
We’ll also assume affine inverse supply and demand functions when we study models with multiple consumption goods in
our subsequent lecture.
We do this in order to simplify the exposition and enable us to use just a few tools from linear algebra, namely, matrix
multiplication and matrix inversion.
In our exposition we will use the following imports.
import numpy as np
import matplotlib.pyplot as plt
103
A First Course in Quantitative Economics with Python
We study a market for a single good in which buyers and sellers exchange a quantity 𝑞 for a price 𝑝.
Quantity 𝑞 and price 𝑝 are both scalars.
We assume that inverse demand and supply curves for the good are:
𝑝 = 𝑑0 − 𝑑1 𝑞, 𝑑0 , 𝑑1 > 0
𝑝 = 𝑠0 + 𝑠1 𝑞, 𝑠0 , 𝑠1 > 0
We call them inverse demand and supply curves because price is on the left side of the equation rather than on the right
side as it would be in a direct demand or supply function.
Here is a class that stores parameters for our single good market, as well as implementing the inverse demand and supply
curves.
class Market:
def __init__(self,
d_0=1.0, # demand intercept
d_1=0.6, # demand slope
s_0=0.1, # supply intercept
s_1=0.4): # supply slope
market = Market()
In the above graph, an equilibrium price-quantity pair occurs at the intersection of the supply and demand curves.
Let a quantity 𝑞 be given and let 𝑝 ∶= 𝑑0 − 𝑑1 𝑞 be the corresponding price on the inverse demand curve.
We define consumer surplus 𝑆𝑐 (𝑞) as the area under an inverse demand curve minus 𝑝𝑞:
𝑞
𝑆𝑐 (𝑞) ∶= ∫ (𝑑0 − 𝑑1 𝑥)𝑑𝑥 − 𝑝𝑞 (7.1)
0
Let a quantity 𝑞 be given and let 𝑝 ∶= 𝑠0 + 𝑠1 𝑞 be the corresponding price on the inverse supply curve.
We define producer surplus as 𝑝𝑞 minus the area under an inverse supply curve
𝑞
𝑆𝑝 (𝑞) ∶= 𝑝𝑞 − ∫ (𝑠0 + 𝑠1 𝑥)𝑑𝑥 (7.2)
0
Sometimes economists measure social welfare by a welfare criterion that equals consumer surplus plus producer surplus,
assuming that consumers and producers pay the same price:
𝑞 𝑞
𝑊 (𝑞) = ∫ (𝑑0 − 𝑑1 𝑥)𝑑𝑥 − ∫ (𝑠0 + 𝑠1 𝑥)𝑑𝑥
0 0
Let’s now give a social planner the task of maximizing social welfare.
To compute a quantity that maximizes the welfare criterion, we differentiate 𝑊 with respect to 𝑞 and then set the derivative
to zero.
𝑑𝑊 (𝑞)
= 𝑑0 − 𝑠0 − (𝑑1 + 𝑠1 )𝑞 = 0
𝑑𝑞
Solving for 𝑞 yields
𝑑0 − 𝑠0
𝑞= (7.3)
𝑠1 + 𝑑1
Let’s remember the quantity 𝑞 given by equation (7.3) that a social planner would choose to maximize consumer surplus
plus producer surplus.
We’ll compare it to the quantity that emerges in a competitive equilibrium that equates supply to demand.
Instead of equating quantities supplied and demanded, we can accomplish the same thing by equating demand price to
supply price:
𝑝 = 𝑑 0 − 𝑑1 𝑞 = 𝑠 0 + 𝑠1 𝑞
If we solve the equation defined by the second equality in the above line for 𝑞, we obtain
𝑑0 − 𝑠0
𝑞= (7.4)
𝑠1 + 𝑑1
7.3 Generalizations
In a later lecture, we’ll derive generalizations of the above demand and supply curves from other objects.
Our generalizations will extend the preceding analysis of a market for a single good to the analysis of 𝑛 simultaneous
markets in 𝑛 goods.
In addition
• we’ll derive demand curves from a consumer problem that maximizes a utility function subject to a budget
constraint.
• we’ll derive supply curves from the problem of a producer who is price taker and maximizes his profits minus total
costs that are described by a cost function.
7.4 Exercises
Suppose now that the inverse demand and supply curves are modified to take the form
𝑝 = 𝑖𝑑 (𝑞) ∶= 𝑑0 − 𝑑1 𝑞 0.6
𝑝 = 𝑖𝑠 (𝑞) ∶= 𝑠0 + 𝑠1 𝑞 1.8
All parameters are positive, as before.
Exercise 7.4.1
Define a new Market class that holds the same parameter values as before by changing the inverse_demand and
inverse_supply methods to match these new definitions.
Using the class, plot the inverse demand and supply curves 𝑖𝑑 and 𝑖𝑠
class Market:
def __init__(self,
d_0=1.0, # demand intercept
d_1=0.6, # demand slope
s_0=0.1, # supply intercept
s_1=0.4): # supply slope
market = Market()
Exercise 7.4.2
As before, consumer surplus at 𝑞 is the area under the demand curve minus price times quantity:
𝑞
𝑆𝑐 (𝑞) = ∫ 𝑖𝑑 (𝑥)𝑑𝑥 − 𝑝𝑞
0
Solve the integrals and write a function to compute this quantity numerically at given 𝑞.
Plot welfare as a function of 𝑞.
𝑑1 𝑞 1.6 𝑠 𝑞 2.8
𝑊 (𝑞) = 𝑑0 𝑞 − − (𝑠0 𝑞 + 1 )
1.6 2.8
fig, ax = plt.subplots()
ax.plot(q_vals, W(q_vals, market), label='welfare')
ax.legend(frameon=False)
ax.set_xlabel('quantity')
plt.show()
Exercise 7.4.3
Due to nonlinearities, the new welfare function is not easy to maximize with pencil and paper.
Maximize it using scipy.optimize.minimize_scalar instead.
def objective(q):
return -W(q, market)
Solution found.
maximizing_q = result.x
print(f"{maximizing_q: .5f}")
0.90564
Exercise 7.4.4
Now compute the equilibrium quantity by finding the price that equates supply and demand.
You can do this numerically by finding the root of the excess demand function
def excess_demand(q):
return market.inverse_demand(q) - market.inverse_supply(q)
0.90564
Linear Dynamics
115
CHAPTER
EIGHT
PRESENT VALUES
8.1 Overview
This lecture describes the present value model that is a starting point of much asset pricing theory.
We’ll use the calculations described here in several subsequent lectures.
Our only tool is some elementary linear algebra operations, namely, matrix multiplication and matrix inversion.
Let’s dive in.
Let
• {𝑑𝑡 }𝑇𝑡=0 be a sequence of dividends or “payouts”
• {𝑝𝑡 }𝑇𝑡=0 be a sequence of prices of a claim on the continuation of the asset stream from date 𝑡 on, namely, {𝑑𝑠 }𝑇𝑠=𝑡
• 𝛿 ∈ (0, 1) be a one-period “discount factor”
• 𝑝𝑇∗ +1 be a terminal price of the asset at time 𝑇 + 1
We assume that the dividend stream {𝑑𝑡 }𝑇𝑡=0 and the terminal price 𝑝𝑇∗ +1 are both exogenous.
Assume the sequence of asset pricing equations
𝑝𝑡 = 𝑑𝑡 + 𝛿𝑝𝑡+1 , 𝑡 = 0, 1, … , 𝑇 (8.1)
import numpy as np
import matplotlib.pyplot as plt
117
A First Course in Quantitative Economics with Python
Exercise 8.2.1
Carry out the matrix multiplication in (8.2.2) by hand and confirm that you recover the equations in (8.2.1).
𝐴𝑝 = 𝑑 + 𝑏 (8.4)
𝑝 = 𝐴−1 (𝑑 + 𝑏) (8.5)
𝑑𝑡+1 = 1.05𝑑𝑡 , 𝑡 = 0, 1, … , 𝑇 − 1.
T = 6
current_d = 1.0
d = []
for t in range(T+1):
d.append(current_d)
current_d = current_d * 1.05
fig, ax = plt.subplots()
ax.plot(d, 'o', label='dividends')
ax.legend()
ax.set_xlabel('time')
plt.show()
δ = 0.99
p_star = 10.0
A = np.zeros((T+1, T+1))
for i in range(T+1):
for j in range(T+1):
if i == j:
A[i, j] = 1
if j < T:
A[i, j+1] = -δ
Let’s inspect 𝐴
array([[ 1. , -0.99, 0. , 0. , 0. , 0. , 0. ],
[ 0. , 1. , -0.99, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , -0.99, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 1. , -0.99, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 1. , -0.99, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 1. , -0.99],
[ 0. , 0. , 0. , 0. , 0. , 0. , 1. ]])
b = np.zeros(T+1)
b[-1] = δ * p_star
p = np.linalg.solve(A, d + b)
fig, ax = plt.subplots()
ax.plot(p, 'o', label='asset price')
ax.legend()
ax.set_xlabel('time')
plt.show()
T = 100
current_d = 1.0
d = []
for t in range(T+1):
d.append(current_d)
current_d = current_d * 1.01 + 0.1 * np.sin(t)
fig, ax = plt.subplots()
ax.plot(d, 'o-', ms=4, alpha=0.8, label='dividends')
ax.legend()
ax.set_xlabel('time')
plt.show()
Exercise 8.2.2
Compute the corresponding asset price sequence when 𝑝𝑇∗ +1 = 0 and 𝛿 = 0.98.
δ = 0.98
p_star = 0.0
A = np.zeros((T+1, T+1))
for i in range(T+1):
for j in range(T+1):
if i == j:
A[i, j] = 1
if j < T:
A[i, j+1] = -δ
b = np.zeros(T+1)
b[-1] = δ * p_star
p = np.linalg.solve(A, d + b)
fig, ax = plt.subplots()
ax.plot(p, 'o-', ms=4, alpha=0.8, label='asset price')
ax.legend()
ax.set_xlabel('time')
plt.show()
The weighted averaging associated with the present value calculation largely eliminates the cycles.
1 𝛿 𝛿2 ⋯ 𝛿 𝑇 −1 𝛿𝑇
⎡0 1 𝛿 ⋯ 𝛿 𝑇 −2 𝛿 𝑇 −1 ⎤
−1 ⎢ ⎥
𝐴 = ⎢⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⎥ (8.6)
⎢0 0 0 ⋯ 1 𝛿 ⎥
⎣0 0 0 ⋯ 0 1 ⎦
Exercise 8.3.1
Check this by showing that 𝐴𝐴−1 is equal to the identity matrix.
(By the inverse matrix theorem, a matrix 𝐵 is the inverse of 𝐴 whenever 𝐴𝐵 is the identity.)
If we use the expression (8.3.1) in (8.2.4) and perform the indicated matrix multiplication, we shall find that
𝑇
𝑝𝑡 = ∑ 𝛿 𝑠−𝑡 𝑑𝑠 + 𝛿 𝑇 +1−𝑡 𝑝𝑇∗ +1 (8.7)
𝑠=𝑡
Pricing formula (8.7) asserts that two components sum to the asset price 𝑝𝑡 :
𝑇
• a fundamental component ∑𝑠=𝑡 𝛿 𝑠−𝑡 𝑑𝑠 that equals the discounted present value of prospective dividends
𝑐𝛿 −𝑡
where
𝑐 ≡ 𝛿 𝑇 +1 𝑝𝑇∗ +1
For a few moments, let’s focus on the special case of an asset that will never pay dividends, in which case
𝑑0 0
⎡ 𝑑 ⎤ ⎡0⎤
⎢ 1 ⎥ ⎢ ⎥
⎢ 𝑑2 ⎥ = ⎢0⎥
⎢ ⋮ ⎥ ⎢⋮⎥
⎢𝑑𝑇 −1 ⎥ ⎢0⎥
⎣ 𝑑𝑇 ⎦ ⎣0⎦
In this case system (8.1) of our 𝑇 + 1 asset pricing equations takes the form of the single matrix equation
1 −𝛿 0 0 ⋯ 0 0 𝑝0 0
⎡0 1 −𝛿 0 ⋯ 0 0 ⎤ ⎡ 𝑝1 ⎤ ⎡ 0 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢0 0 1 −𝛿 ⋯ 0 0 ⎥ ⎢ 𝑝2 ⎥ ⎢ 0 ⎥
= (8.8)
⎢⋮ ⋮ ⋮ ⋮ ⋮ 0 0 ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 ⋯ 1 −𝛿 ⎥ ⎢𝑝𝑇 −1 ⎥ ⎢ 0 ⎥
⎣0 0 0 0 ⋯ 0 1 ⎦ ⎣ 𝑝𝑇 ⎦ ⎣𝛿𝑝𝑇∗ +1 ⎦
Evidently, if 𝑝𝑇∗ +1 = 0, a price vector 𝑝 of all entries zero solves this equation and the only the fundamental component
of our pricing formula (8.7) is present.
But let’s activate the bubble component by setting
𝑝𝑡 = 𝑐𝛿 −𝑡 (8.10)
Define the gross rate of return on holding the asset from period 𝑡 to period 𝑡 + 1 as
𝑝𝑡+1
𝑅𝑡 = (8.11)
𝑝𝑡
Equation (8.10) confirms that an asset whose sole source of value is a bubble that earns a gross rate of return
𝑅𝑡 = 𝛿 −1 > 1.
8.6 Exercises
Exercise 8.6.1
Give analytical expressions for the asset price 𝑝𝑡 under the following settings for 𝑑 and 𝑝𝑇∗ +1 :
1. 𝑝𝑇∗ +1 = 0, 𝑑𝑡 = 𝑔𝑡 𝑑0 (a modified version of the Gordon growth formula)
2. 𝑝𝑇∗ +1 = 𝑔𝑇 +1 𝑑0 , 𝑑𝑡 = 𝑔𝑡 𝑑0 (the plain vanilla Gordon growth formula)
3. 𝑝𝑇∗ +1 = 0, 𝑑𝑡 = 0 (price of a worthless stock)
4. 𝑝𝑇∗ +1 = 𝑐𝛿 −(𝑇 +1) , 𝑑𝑡 = 0 (price of a pure bubble stock)
NINE
CONSUMPTION SMOOTHING
9.1 Overview
Technically, this lecture is a sequel to this quantecon lecture present values, although it might not seem so at first.
It will take a while for a “present value” or asset price explicilty to appear in this lecture, but when it does it will be a key
actor.
In this lecture, we’ll study a famous model of the “consumption function” that Milton Friedman [Fri56] and Robert Hall
[Hal78]) proposed to fit some empirical data patterns that the simple Keynesian model described in this quantecon lecture
geometric series had missed.
The key insight of Friedman and Hall was that today’s consumption ought not to depend just on today’s non-financial
income: it should also depend on a person’s anticipations of her future non-financial incomes at various dates.
In this lecture, we’ll study what is sometimes called the “consumption-smoothing model” using only linear algebra, in
particular matrix multiplication and matrix inversion.
Formulas presented in present value formulas are at the core of the consumption smoothing model because they are used
to define a consumer’s “human wealth”.
As usual, we’ll start with by importing some Python modules.
import numpy as np
import matplotlib.pyplot as plt
from collections import namedtuple
Our model describes the behavior of a consumer who lives from time 𝑡 = 0, 1, … , 𝑇 , receives a stream {𝑦𝑡 }𝑇𝑡=0 of
non-financial income and chooses a consumption stream {𝑐𝑡 }𝑇𝑡=0 .
We usually think of the non-financial income stream as coming from the person’s salary from supplying labor.
The model takes that non-financial income stream as an input, regarding it as “exogenous” in the sense of not being
determined by the model.
The consumer faces a gross interest rate of 𝑅 > 1 that is constant over time, at which she is free to borrow or lend, up to
some limits that we’ll describe below.
To set up the model, let
• 𝑇 ≥ 2 be a positive integer that constitutes a time-horizon
• 𝑦 = {𝑦𝑡 }𝑇𝑡=0 be an exogenous sequence of non-negative non-financial incomes 𝑦𝑡
• 𝑎 = {𝑎𝑡 }𝑇𝑡=0
+1
be a sequence of financial wealth
• 𝑐 = {𝑐𝑡 }𝑇𝑡=0 be a sequence of non-negative consumption rates
• 𝑅 ≥ 1 be a fixed gross one period rate of return on financial assets
125
A First Course in Quantitative Economics with Python
Notice that there are 𝑇 + 1 such budget constraints, one for each 𝑡 = 0, 1, … , 𝑇 .
Given a sequence 𝑦 of non-financial income, there is a big set of pairs (𝑎, 𝑐) of (financial wealth, consumption) sequences
that satisfy the sequence of budget constraints (9.1).
Our model has the following logical flow.
• start with an exogenous non-financial income sequence 𝑦, an initial financial wealth 𝑎0 , and a candidate consumption
path 𝑐.
• use the system of equations (9.1) for 𝑡 = 0, … , 𝑇 to compute a path 𝑎 of financial wealth
• verify that 𝑎𝑇 +1 satisfies the terminal wealth constraint 𝑎𝑇 +1 ≥ 0.
– If it does, declare that the candidate path is budget feasible.
– if the candidate consumption path is not budget feasible, propose a path with less consumption sometimes
and start over
Below, we’ll describe how to execute these steps using linear algebra – matrix inversion and multiplication.
The above procedure seems like a sensible way to find “budget-feasible” consumption paths 𝑐, i.e., paths that are consistent
with the exogenous non-financial income stream 𝑦, the initial financial asset level 𝑎0 , and the terminal asset level 𝑎𝑇 +1 .
In general, there will be many budget feasible consumption paths 𝑐.
Among all budget-feasible consumption paths, which one should the consumer want to choose?
We shall eventually evaluate alternative budget feasible consumption paths 𝑐 using the following welfare criterion
𝑇
𝑔2 2
𝑊 = ∑ 𝛽 𝑡 (𝑔1 𝑐𝑡 − 𝑐 ) (9.2)
𝑡=0
2 𝑡
ConsumptionSmoothing = namedtuple("ConsumptionSmoothing",
["R", "g1", "g2", "β_seq", "T"])
A key object in the model is what Milton Friedman called “human” or “non-financial” wealth at time 0:
𝑦0
𝑇 ⎡𝑦 ⎤
ℎ0 ≡ ∑ 𝑅−𝑡 𝑦𝑡 = [1 𝑅−1 ⋯ 𝑅−𝑇 ] ⎢ 1 ⎥
𝑡=0
⎢ ⋮ ⎥
⎣𝑦𝑇 ⎦
Human or non-financial wealth is evidently just the present value at time 0 of the consumer’s non-financial income stream
𝑦.
Notice that formally it very much resembles the asset price that we computed in this quantecon lecture present values.
Indeed, this is why Milton Friedman called it “human capital”.
By iterating on equation (9.1) and imposing the terminal condition
𝑎𝑇 +1 = 0,
it is possible to convert a sequence of budget constraints into the single intertemporal constraint
𝑇
∑ 𝑅−𝑡 𝑐𝑡 = 𝑎0 + ℎ0 ,
𝑡=0
which says that the present value of the consumption stream equals the sum of finanical and non-financial (or human)
wealth.
Robert Hall [Hal78] showed that when 𝛽𝑅 = 1, a condition Milton Friedman had also assumed, it is “optimal” for a
consumer to smooth consumption by setting
𝑐𝑡 = 𝑐 0 𝑡 = 0, 1, … , 𝑇
(Later we’ll present a “variational argument” that shows that this constant path is indeed optimal when 𝛽𝑅 = 1.)
In this case, we can use the intertemporal budget constraint to write
−1
𝑇
𝑐𝑡 = 𝑐 0 = ( ∑ 𝑅 ) −𝑡
(𝑎0 + ℎ0 ), 𝑡 = 0, 1, … , 𝑇 . (9.3)
𝑡=0
As promised, we’ll provide step by step instructions on how to use linear algebra, readily implemented in Python, to
compute all the objects in play in the consumption-smoothing model.
In the calculations below, we’ll set default values of 𝑅 > 1, e.g., 𝑅 = 1.05, and 𝛽 = 𝑅−1 .
9.3.1 Step 1
9.3.2 Step 2
9.3.3 Step 3
In this step, we use the system of equations (9.1) for 𝑡 = 0, … , 𝑇 to compute a path 𝑎 of financial wealth.
To do this, we translated that system of difference equations into a single matrix equation as follows (we’ll say more about
the mechanics of using linear algebra to solve such difference equations later in the last part of this lecture):
1 0 0 ⋯ 0 0 0 𝑎1 𝑦0 + 𝑎 0 − 𝑐 0
⎡−𝑅 1 0 ⋯ 0 0 0⎤ ⎡ 𝑎2 ⎤ ⎡ 𝑦 −𝑐 ⎤
1 0
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 −𝑅 1 ⋯ 0 0 0⎥ ⎢ 𝑎3 ⎥
= 𝑅⎢
𝑦2 − 𝑐 0 ⎥
⎢ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋮⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 0 0 0 ⋯ −𝑅 1 0⎥ ⎢ 𝑎𝑇 ⎥ ⎢ 𝑦𝑇 −1 − 𝑐0 ⎥
⎣ 0 0 0 ⋯ 0 −𝑅 1⎦ ⎣𝑎𝑇 +1 ⎦ ⎣ 𝑦𝑇 − 𝑐 0 ⎦
Multiply both sides by the inverse of the matrix on the left side to compute
𝑎1
⎡ 𝑎 ⎤
⎢ 2 ⎥
⎢ 𝑎3 ⎥
⎢ ⋮ ⎥
⎢ 𝑎𝑇 ⎥
⎣𝑎𝑇 +1 ⎦
It should turn out automatically that
𝑎𝑇 +1 = 0.
We have built into the our calculations that the consumer leaves life with exactly zero assets, just barely satisfying the
terminal condition that 𝑎𝑇 +1 ≥ 0.
Let’s verify this with our Python code.
First we implement this model in compute_optimal
# non-financial wealth
h0 = model.β_seq @ y_seq # since β = 1/R
# c0
c0 = (1 - 1/R) / (1 - (1/R)**(T+1)) * (a0 + h0)
c_seq = c0*np.ones(T+1)
# verify
A = np.diag(-R*np.ones(T), k=-1) + np.eye(T+1)
b = y_seq - c_seq
b[0] = b[0] + a0
a_seq = np.linalg.inv(A) @ b
a_seq = np.concatenate([[a0], a_seq])
We use an example where the consumer inherits 𝑎0 < 0 (which can be interpreted as a student debt).
The non-financial process {𝑦𝑡 }𝑇𝑡=0 is constant and positive up to 𝑡 = 45 and then becomes zero afterward.
# Financial wealth
a0 = -2 # such as "student debt"
cs_model = creat_cs_model()
c_seq, a_seq = compute_optimal(cs_model, a0, y_seq)
print('check a_T+1=0:',
np.abs(a_seq[-1] - 0) <= 1e-8)
The visualization shows the path of non-financial income, consumption, and financial assets.
# Sequence Length
T = cs_model.T
plt.legend()
plt.xlabel(r'$t$')
plt.ylabel(r'$c_t,y_t,a_t$')
plt.show()
Welfare: 13.285050962183433
Earlier, we had promised to present an argument that supports our claim that a constant consumption play 𝑐𝑡 = 𝑐0 for all
𝑡 is optimal.
Let’s do that now.
Although simple and direct, the approach we’ll take is actually an example of what is called the “calculus of variations”.
Let’s dive in and see what the key idea is.
To explore what types of consumption paths are welfare-improving, we shall create an admissible consumption path
variation sequence {𝑣𝑡 }𝑇𝑡=0 that satisfies
𝑇
∑ 𝑅−𝑡 𝑣𝑡 = 0
𝑡=0
This equation says that the present value of admissible variations must be zero.
(So once again, we encounter our formula for the present value of an “asset”.)
Here we’ll compute a two-parameter class of admissible variations of the form
𝑣𝑡 = 𝜉 1 𝜙 𝑡 − 𝜉 0
We say two and not three-parameter class because 𝜉0 will be a function of (𝜙, 𝜉1 ; 𝑅) that guarantees that the variation is
feasible.
Let’s compute that function.
We require
𝑇
∑ [𝜉1 𝜙𝑡 − 𝜉0 ] = 0
𝑡=0
1 − 𝑅−1 1 − (𝜙𝑅−1 )𝑇 +1
𝜉0 = 𝜉0 (𝜙, 𝜉1 ; 𝑅) = 𝜉1 ( )( )
1−𝑅 −(𝑇 +1) 1 − 𝜙𝑅−1
if verbose == 1:
print('check feasible:', np.isclose(β_seq @ v_seq, 0)) # since β = 1/R
return cvar_seq
fig, ax = plt.subplots()
plt.plot(range(T+1), c_seq,
color='orange', label=r'Optimal $\vec{c}$ ')
plt.legend()
plt.xlabel(r'$t$')
plt.ylabel(r'$c_t$')
plt.show()
We can even use the Python np.gradient command to compute derivatives of welfare with respect to our two pa-
rameters.
We are teaching the key idea beneath the calculus of variations.
First, we define the welfare with respect to 𝜉1 and 𝜙
Then we can visualize the relationship between welfare and 𝜉1 and compute its derivatives
The consumption-smoothing model of Milton Friedman [Fri56] and Robert Hall [Hal78]) is a cornerstone of modern
macro that has important ramifications about the size of the Keynesian “fiscal policy multiplier” described briefly in
quantecon lecture geometric series.
In particular, Milton Friedman and others showed that it lowered the fiscal policy multiplier relative to the one implied
by the simple Keynesian consumption function presented in geometric series.
Friedman and Hall’s work opened the door to a lively literature on the aggregate consumption function and implied fiscal
multipliers that remains very active today.
In the preceding sections we have used linear algebra to solve a consumption smoothing model.
The same tools from linear algebra – matrix multiplication and matrix inversion – can be used to study many other dynamic
models too.
We’ll concluse this lecture by giving a couple of examples.
In particular, we’ll describe a useful way of representing and “solving” linear difference equations.
To generate some 𝑦 vectors, we’ll just write down a linear difference equation with appropriate initial conditions and then
use linear algebra to solve it.
We’ll start with a first-order linear difference equation for {𝑦𝑡 }𝑇𝑡=0 :
𝑦𝑡 = 𝜆𝑦𝑡−1 , 𝑡 = 1, 2, … , 𝑇
1 0 0 ⋯ 0 0 𝑦1 𝜆𝑦0
⎡−𝜆 1 0 ⋯ 0 0⎤ ⎡ 𝑦 2 ⎤ ⎡ 0 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 −𝜆 1 ⋯ 0 0 ⎥ ⎢ 𝑦3 ⎥ = ⎢ 0 ⎥
⎢ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 0 0 0 ⋯ −𝜆 1⎦ ⎣𝑦𝑇 ⎦ ⎣ 0 ⎦
Multiplying both sides by inverse of the matrix on the left provides the solution
𝑦1 1 0 0 ⋯ 0 0 𝜆𝑦0
⎡𝑦 ⎤ ⎡ 𝜆 1 0 ⋯ 0 0⎤ ⎡ 0 ⎤
2
⎢ ⎥ ⎢ 2 ⎥⎢ ⎥
⎢ 𝑦3 ⎥ = ⎢ 𝜆 𝜆 1 ⋯ 0 0⎥ ⎢ 0 ⎥ (9.4)
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮⎥⎢ ⋮ ⎥
⎣𝑦𝑇 ⎦ ⎣𝜆𝑇 −1 𝜆𝑇 −2 𝜆𝑇 −3 ⋯ 𝜆 1⎦ ⎣ 0 ⎦
Exercise 9.5.1
In the (9.4), we multiply the inverse of the matrix 𝐴. In this exercise, please confirm that
1 0 0 ⋯ 0 0
⎡ 𝜆 1 0 ⋯ 0 0⎤
⎢ 2 ⎥
⎢ 𝜆 𝜆 1 ⋯ 0 0⎥
⎢ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮⎥
⎣𝜆𝑇 −1 𝜆𝑇 −2 𝜆𝑇 −3 ⋯ 𝜆 1⎦
𝑦𝑡 = 𝜆1 𝑦𝑡−1 + 𝜆2 𝑦𝑡−2 , 𝑡 = 1, 2, … , 𝑇
where now 𝑦0 and 𝑦−1 are two given initial equations determined outside the model.
As we did with the first-order difference equation, we can cast this set of 𝑇 equations as a single matrix equation
1 0 0 ⋯ 0 0 0 𝑦1 𝜆1 𝑦0 + 𝜆2 𝑦−1
⎡−𝜆 1 0 ⋯ 0 0 0 ⎤ ⎡ 𝑦2 ⎤ ⎡ 𝜆2 𝑦0 ⎤
⎢ 1 ⎥⎢ ⎥ ⎢ ⎥
⎢−𝜆2 −𝜆1 1 ⋯ 0 0 0 ⎥ ⎢ 𝑦3 ⎥ = ⎢ 0 ⎥
⎢ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 0 0 0 ⋯ −𝜆2 −𝜆1 1⎦ ⎣𝑦𝑇 ⎦ ⎣ 0 ⎦
Multiplying both sides by inverse of the matrix on the left again provides the solution.
Exercise 9.5.2
As an exercise, we ask you to represent and solve a third order linear difference equation. How many initial conditions
must you specify?
TEN
10.1 Overview
This lecture presents a model of the college-high-school wage gap in which the “time to build” a college graduate plays a
key role.
The model is “incomplete” in the sense that it is just one “condition” in the form of a single equation that would be part
of set equations comprising all “equilibrium conditions” of a more fully articulated model.
The condition featured in our model determines a college, high-school wage ratio that equalizes the present values of a
high school worker and a college educated worker.
The idea behind this condition is that lifetime earnings have to adjust to make someone indifferent between going to
college and not going to college.
(The job of the “other equations” in a more complete model would be to fill in details about what adjusts to bring about
this outcome.)
It is just one instance of an “equalizing difference” theory of relative wage rates, a class of theories dating back at least to
Adam Smith’s Wealth of Nations [Smi10].
For most of this lecture, the only mathematical tools that we’ll use are from linear algebra, in particular, matrix multipli-
cation and matrix inversion.
However, at the very end of the lecture, we’ll use calculus just in case readers want to see how computing partial derivatives
could let us present some findings more concisely.
(And doing that will let us show off how good Python is at doing calculus!)
But if you don’t know calculus, our tools from linear algebra are certainly enough.
As usual, we’ll start by importing some Python modules.
import numpy as np
import matplotlib.pyplot as plt
139
A First Course in Quantitative Economics with Python
The key idea is that the initial college wage premium has to adjust to make a representative worker indifferent between
going to college and not going to college.
Let
• 𝑅 > 1 be the gross rate of return on a one-period bond
• 𝑡 = 0, 1, 2, … 𝑇 denote the years that a person either works or attends college
• 0 denote the first period after high school that a person can go to work
• 𝑇 denote the last period that a person works
• 𝑤𝑡ℎ be the wage at time 𝑡 of a high school graduate
• 𝑤𝑡𝑐 be the wage at time 𝑡 of a college graduate
• 𝛾ℎ > 1 be the (gross) rate of growth of wages of a high school graduate, so that 𝑤𝑡ℎ = 𝑤0ℎ 𝛾ℎ𝑡
• 𝛾𝑐 > 1 be the (gross) rate of growth of wages of a college graduate, so that 𝑤𝑡𝑐 = 𝑤0𝑐 𝛾𝑐𝑡
• 𝐷 be the upfront monetary costs of going to college
If someone goes to work immediately after high school and works for the 𝑇 + 1 years 𝑡 = 0, 1, 2, … , 𝑇 , she earns present
value
𝑇
1 − (𝑅−1 𝛾ℎ )𝑇 +1
ℎ0 = ∑ 𝑅−𝑡 𝑤𝑡ℎ = 𝑤0ℎ [ ] ≡ 𝑤0ℎ 𝐴ℎ
𝑡=0
1 − 𝑅−1 𝛾ℎ
where
1 − (𝑅−1 𝛾ℎ )𝑇 +1
𝐴ℎ = [ ].
1 − 𝑅−1 𝛾ℎ
The present value ℎ0 is the “human wealth” at the beginning of time 0 of someone who chooses not to attend college but
instead to go to work immediately at the wage of a high school graduate.
If someone goes to college for the four years 𝑡 = 0, 1, 2, 3 during which she earns 0, but then goes to work immediately
after college and works for the 𝑇 − 3 years 𝑡 = 4, 5, … , 𝑇 , she earns present value
𝑇
1 − (𝑅−1 𝛾𝑐 )𝑇 −3
𝑐0 = ∑ 𝑅−𝑡 𝑤𝑡𝑐 = 𝑤0𝑐 (𝑅−1 𝛾𝑐 )4 [ ] ≡ 𝑤0𝑐 𝐴𝑐
𝑡=4
1 − 𝑅−1 𝛾𝑐
where
1 − (𝑅−1 𝛾𝑐 )𝑇 −3
𝐴𝑐 = (𝑅−1 𝛾𝑐 )4 [ ]
1 − 𝑅−1 𝛾𝑐
The present value 𝑐0 is the “human wealth” at the beginning of time 0 of someone who chooses to attend college for four
years and then start to work at time 𝑡 = 4 at the wage of a college graduate.
Assume that college tuition plus four years of room and board paid for up front costs 𝐷.
So net of monetary cost of college, the present value of attending college as of the first period after high school is
𝑐0 − 𝐷
We now formulate a pure equalizing difference model of the initial college-high school wage gap 𝜙 defined by
Let
𝑤0𝑐 = 𝜙𝑤0ℎ
ℎ0 = 𝑐 0 − 𝐷
or
𝐴ℎ 𝐷
𝜙= + ℎ . (10.2)
𝐴𝑐 𝑤0 𝐴𝑐
In a free college special case 𝐷 = 0 so that the only cost of going to college is the forgone earnings from not working as
a high school worker.
In that case,
𝐴ℎ
𝜙= .
𝐴𝑐
Soon we’ll write Python code to compute the gap and plot it as a function of its determinants.
But first we’ll describe a possible alternative interpretation of our model.
We can add a parameter and reinterpret variables to get a model of entrepreneurs versus workers.
We now let ℎ be the present value of a “worker”.
We define the present value of an entrepreneur to be
𝑇
𝑐0 = 𝜋 ∑ 𝑅−𝑡 𝑤𝑡𝑐
𝑡=4
class equalizing_diff:
"""
A class of the equalizing difference model
"""
def compute_gap(self):
R, γ_h, γ_c, w_h0, D = self.R, self.γ_h, self.γ_c, self.w_h0, self.D
T, π = self.T, self.π
# tweaked model
if π!=None:
A_c = π*A_c
ϕ = A_h/A_c + D/(w_h0*A_c)
return ϕ
We can build some functions to help do comparative statics using vectorization instead of loops.
For a given instance of the class, we want to compute 𝜙 when one parameter changes and others remain unchanged.
Let’s do an example.
# ϕ_R
def ϕ_R(mc, R_new):
mc_new = equalizing_diff(R_new, mc.T, mc.γ_h, mc.γ_c, mc.w_h0, mc.D, mc.π)
return mc_new.compute_gap()
ϕ_R = np.vectorize(ϕ_R)
# ϕ_γh
def ϕ_γh(mc, γh_new):
mc_new = equalizing_diff(mc.R, mc.T, γh_new, mc.γ_c, mc.w_h0, mc.D, mc.π)
return mc_new.compute_gap()
ϕ_γh = np.vectorize(ϕ_γh)
# ϕ_γc
def ϕ_γc(mc, γc_new):
mc_new = equalizing_diff(mc.R, mc.T, mc.γ_h, γc_new, mc.w_h0, mc.D, mc.π)
return mc_new.compute_gap()
ϕ_γc = np.vectorize(ϕ_γc)
# ϕ_π
def ϕ_π(mc, π_new):
mc_new = equalizing_diff(mc.R, mc.T, mc.γ_h, mc.γ_c, mc.w_h0, mc.D, π_new)
return mc_new.compute_gap()
ϕ_π = np.vectorize(ϕ_π)
# create an instance
ex1 = equalizing_diff(R=R, T=T, γ_h=γ_h, γ_c=γ_c, w_h0=w_h0, D=D)
gap1 = ex1.compute_gap()
print(gap1)
1.8041412724969135
# free college
ex2 = equalizing_diff(R, T, γ_h, γ_c, w_h0, D=0)
gap2 = ex2.compute_gap()
print(gap2)
1.2204649517903732
Let us construct some graphs that show us how the initial college-high-school wage ratio 𝜙 would change if one of its
determinants were to change.
Let’s start with the gross interest rate 𝑅.
Evidently, the initial wage ratio 𝜙 must rise to compensate a prospective high school student for waiting to start receiving
income – remember that while she is earning nothing in years 𝑡 = 0, 1, 2, 3, the high school worker is earning a salary.
Not let’s study what happens to the initial wage ratio 𝜙 if the rate of growth of college wages rises, holding constant other
determinants of 𝜙.
Notice how the intitial wage gap falls when the rate of growth 𝛾𝑐 of college wages rises.
It falls to “equalize” the present values of the two types of career, one as a high school worker, the other as a college
worker.
Can you guess what happens to the initial wage ratio 𝜙 when next we vary the rate of growth of high school wages, holding
all other determinants of 𝜙 constant?
The following graph shows what happens.
# a model of enterpreneur
ex3 = equalizing_diff(R, T, γ_h, γ_c, w_h0, π=0.2)
gap3 = ex3.compute_gap()
print(gap3)
6.102324758951866
Now let’s study how the initial wage premium for successful entrepreneurs depend on the success probability.
So far, we have used only linear algebra and it has been a good enough tool for us to figure out how our model works.
However, someone who knows calculus might ask “Instead of plotting those graphs, why didn’t you just take partial
derivatives?”
We’ll briefly do just that, yes, the questioner is correct and that partial derivatives are indeed a good tool for discovering
the “comparative statics” properities of our model.
A reader who doesn’t know calculus could read no further and feel confident that applying linear algebra has taught us the
main properties of the model.
But for a reader interested in how we can get Python to do all the hard work involved in computing partial derivatives,
we’ll say a few things about that now.
We’ll use the Python module ‘sympy’ to compute partial derivatives of 𝜙 with respect to the parameters that determine it.
Let’s import key functions from sympy.
Define symbols
Define function 𝐴ℎ
𝑇 +1
1 − ( 𝛾𝑅ℎ )
((𝛾ℎ , 𝑅, 𝑇 ) ↦ )
1 − 𝛾𝑅ℎ
Define function 𝐴𝑐
𝑇 −3
4
𝛾ℎ𝑐 ⋅ (1 − ( 𝛾𝑅ℎ𝑐 ) )
⎛
⎜(𝛾ℎ𝑐 , 𝑅, 𝑇 ) ↦ ⎞
⎟
𝛾ℎ𝑐
𝑅4 ⋅ (1 − 𝑅 )
⎝ ⎠
Now, define 𝜙
𝑇 +1
𝐷𝑅4 ⋅ (1 − 𝛾𝑅ℎ𝑐 ) 𝑅4 ⋅ (1 − ( 𝛾𝑅ℎ ) ) (1 − 𝛾𝑅ℎ𝑐 )
⎛ ℎ
⎜(𝐷, 𝛾ℎ , 𝛾ℎ𝑐 , 𝑅, 𝑇 , 𝑤0 ) ↦ + ⎞
⎟
4 ℎ 𝛾ℎ𝑐 𝑇 −3 4 𝛾ℎ 𝛾ℎ𝑐 𝑇 −3
⎝ 𝛾ℎ𝑐 𝑤0 ⋅ (1 − ( 𝑅 ) ) 𝛾 ℎ𝑐 ⋅ (1 − 𝑅 ) (1 − ( 𝑅 ) ) ⎠
R_value = 1.05
T_value = 40
γ_h_value, γ_c_value = 1.01, 1.01
w_h0_value = 1
D_value = 10
𝜕𝜙
Now let’s compute 𝜕𝐷 and then evaluate it at the default values
𝛾ℎ𝑐
𝑅4 ⋅ (1 − 𝑅 )
4 𝑤ℎ ⋅ (1 − 𝛾ℎ𝑐 𝑇 −3
𝛾ℎ𝑐 0 ( 𝑅 ) )
0.058367632070654
Thus, as with our graph above, we find that raising 𝑅 increases the initial college wage premium 𝜙.
𝜕𝜙
Compute 𝜕𝑇 and evaluate it a default parameters
𝑇 −3 𝑇 +1
𝐷𝑅4 ( 𝛾𝑅ℎ𝑐 )
𝑇 −3
⋅ (1 − 𝛾ℎ𝑐 𝛾ℎ𝑐
𝑅4 ( 𝛾𝑅ℎ )
𝑇 +1 𝛾ℎ𝑐 𝛾ℎ 𝑅4 ( 𝛾𝑅ℎ𝑐 ) ⋅ (1 − ( 𝛾𝑅ℎ ) ) (1 − 𝛾ℎ𝑐 𝛾ℎ𝑐
𝑅 ) log ( 𝑅 )
𝑅 ) log ( 𝑅 ) ⋅ (1 − 𝑅 ) log ( 𝑅 )
− +
𝑇 −3 2 𝛾ℎ 𝑇 −3 𝑇 −3 2
4 𝑤ℎ (1 −
𝛾ℎ𝑐 ( 𝛾𝑅ℎ𝑐 ) )
4 ⋅ (1 −
𝛾ℎ𝑐 𝑅 ) (1 − ( 𝛾𝑅ℎ𝑐 ) ) 4 ⋅ (1 −
𝛾ℎ𝑐 𝛾ℎ
− ( 𝛾𝑅ℎ𝑐 )
0 𝑅 ) (1 )
−0.00973478032996598
𝑇 +1
𝑅4 ( 𝛾𝑅ℎ )
𝑇 +1
⋅ (1 − 𝛾ℎ𝑐 𝑅3 ⋅ (1 − ( 𝛾𝑅ℎ ) ) (1 − 𝛾ℎ𝑐
𝑅 )
𝑅 ) (𝑇 + 1)
− +
𝛾ℎ 𝛾ℎ𝑐 𝑇 −3 𝛾ℎ 2 𝑇 −3
4 ⋅ (1 −
𝛾ℎ 𝛾ℎ𝑐 𝑅 ) (1 − ( 𝑅 ) ) 4 (1 −
𝛾ℎ𝑐 𝑅) ⋅ (1 − ( 𝛾𝑅ℎ𝑐 ) )
17.8590485545256
We find that raising 𝛾ℎ increases the initial college wage premium 𝜙, as we did with our graphical analysis earlier
𝜕𝜙
Compute 𝜕𝛾𝑐 and evaluate it numerically at default parameter values
𝑇 −3 𝑇 +1
𝐷𝑅4 ( 𝛾𝑅ℎ𝑐 )
𝑇 −3
⋅ (1 − 𝛾ℎ𝑐
4𝐷𝑅4 ⋅ (1 − 𝛾ℎ𝑐
𝐷𝑅3 𝑅4 ( 𝛾𝑅ℎ𝑐 ) ⋅ (1 − ( 𝛾𝑅ℎ ) )(
𝑅 ) (𝑇 − 3) 𝑅 )
− − +
𝑇 −3 2 𝛾ℎ𝑐 𝑇 −3 𝑇 −3
5
𝛾ℎ𝑐 𝑤0ℎ (1 − ( 𝛾𝑅ℎ𝑐 ) )
5
𝛾ℎ𝑐 𝑤0ℎ ⋅ (1 − ( 𝑅 ) ) 4 𝑤ℎ ⋅ (1 −
𝛾ℎ𝑐 ( 𝛾𝑅ℎ𝑐 ) ) 5
𝛾ℎ𝑐 ⋅ (1 − 𝛾ℎ
− ( 𝛾𝑅ℎ
𝑅 ) (1
0
−31.6486401973376
We find that raising 𝛾𝑐 decreases the initial college wage premium 𝜙, as we did with our graphical analysis earlier
𝜕𝜙
Let’s compute 𝜕𝑅 and evaluate it numerically at default parameter values
𝑇 −3 𝑇 +1
𝐷𝑅3 ( 𝛾𝑅ℎ𝑐 ) ⋅ (1 − 𝛾ℎ𝑐
𝑅 ) (𝑇 − 3) 4𝐷𝑅3 ⋅ (1 − 𝛾ℎ𝑐
𝑅 ) 𝐷𝑅2 𝑅3 ( 𝛾𝑅ℎ ) ⋅ (1 − 𝛾ℎ𝑐
𝑅 ) (𝑇 +
− + + +
𝑇 −3 2 𝛾ℎ𝑐 𝑇 −3 𝑇 −3 𝑇 −3
4 𝑤ℎ (1 − ( 𝛾ℎ𝑐 )
𝛾ℎ𝑐 )
4 𝑤ℎ ⋅ (1 −
𝛾ℎ𝑐 0 ( 𝑅 ) ) 3
𝛾ℎ𝑐 𝑤0ℎ ⋅ (1 − ( 𝛾𝑅ℎ𝑐 ) ) 4 ⋅ (1 −
𝛾ℎ𝑐 𝛾ℎ
𝑅 ) (1 − ( 𝛾𝑅ℎ𝑐 )
0 𝑅
13.2642738659429
We find that raising the gross interest rate 𝑅 increases the initial college wage premium 𝜙, as we did with our graphical
analysis earlier
ELEVEN
This lecture offers some scraps of historical evidence about fluctuations in levels of aggregate price indexes.
The rate of growth of the price level is called inflation in the popular press and in discussions among central bankers and
treasury officials.
The price level is measured in units of domestic currency per units of a representative bundle of consumption goods.
Thus, in the US, the price level at 𝑡 is measured in dollars in month 𝑡 or year 𝑡 per unit of the consumption bundle.
Until the early 20th century, throughout much of the west, although price levels fluctuated from year to year, they didn’t
have much of a trend.
Thus, they tended to end a century at close to a level at which they started it.
Things were different in the 20th century, as we shall see in this lecture.
This lecture will set the stage for some subsequent lectures about a particular theory that economists use to think about
determinants of the price level.
151
A First Course in Quantitative Economics with Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime
# import data
df_fig5 = pd.read_excel('datasets/longprices.xls', sheet_name='all', header=2, index_
↪col=0).iloc[1:]
df_fig5.index = df_fig5.index.astype(int)
df_fig5.head(5)
UK US France Castile
1600 72.455009 NaN NaN 48.559223
1601 84.609771 NaN NaN 41.760932
1602 74.349258 NaN NaN 40.158477
1603 70.718615 NaN NaN 40.595510
1604 63.773036 NaN NaN 44.480248
# create plot
cols = ['UK', 'US', 'France', 'Castile']
ax.spines[['right', 'top']].set_visible(False)
ax.legend()
ax.set_ylabel('Index 1913 = 100')
ax.set_xlim(xmin=1600)
plt.tight_layout()
fig.text(.5, .0001, "Price Levels", ha='center')
plt.show()
We say “most years” because there were temporary lapses from the gold or silver standard.
By staring at the graph carefully, you might be able to guess when these temporary lapses occurred, because they were
also times during which price levels rose markedly from what had been average values during more typical years.
• 1791-1797 in France (the French Revolution)
• 1776-1793 in the US (the US War for Independence from Great Britain)
• 1861-1865 in the US (the US Civil War)
During each of these episodes, the gold/silver standard was temporarily abandoned as a government printed paper money
to help it finance war expenditures.
Despite these temporary lapses, a striking thing about the figure is that price levels hovered around roughly constant
long-term levels for over three centuries.
Two other features of the figure attracted the attention of leading economists such as Irving Fisher of Yale University and
John Maynard Keynes of Cambridge University in the early century.
• There was considerable year-to-year instability of the price levels despite their long begin anchored to the same
average level in the long term
• While using valuable gold and silver as coins was a time-tested way to anchor the price level by limiting the supply
of money, it cost real resources.
– that is, society paid a high “opportunity cost” for using gold and silver as coins; gold and silver could instead
be used as valuable jewelry and also as an industrial input.
Keynes and Fisher proposed what they suggested would be a socially more efficient way to achieve a price level that would
be at least as firmly anchored, and would also exhibit less year-to-year short-term fluctuations.
In particular, they argued that a well-managed central bank could achieve price level stability by
• issuing a limited supply of paper currency
• guaranteeing that it would not print money to finance government expenditures
Thus, the waste from using gold and silver as coins prompted John Maynard Keynes to call a commodity standard a
“barbarous relic.”
A paper fiat money system disposes of all reserves behind a currency.
But notice that in doing so, it also eliminates an automatic supply mechanism constraining the price level.
A low-inflation paper fiat money system replaces that automatic mechanism with an enlightened government that commits
itself to limiting the quantity of a pure token, no-cost currency.
Now let’s see what happened to the price level in our four countries when after 1914 one after another of them left the
gold/silver standard.
We’ll show a version of the complete graph that originally appeared on page 35 of [SV02].
The graph shows logarithms of price levels our four “hard currency” countries from 1600 to 2000.
Allthough we didn’t have to use logarithms in our earlier graphs that had stopped in 1914 – we use logarithms now because
we want also to fit observations after 1914 in the same graph as the earlier observations.
All four of the countries eventually permanently left the gold standard by modifying their monetary and fiscal policies in
several ways, starting the outbreak of the Great War in 1914.
# create plot
cols = ['UK', 'US', 'France', 'Castile']
ax.spines[['right', 'top']].set_visible(False)
ax.set_yscale('log')
ax.set_ylabel('Index 1913 = 100')
ax.set_xlim(xmin=1600)
ax.set_ylim([10, 1e6])
plt.tight_layout()
fig.text(.5, .0001, "Logs of Price Levels", ha='center')
plt.show()
The graph shows that achieving a price level system with a well-managed paper money system proved to be more chal-
lenging than Irving Fisher and Keynes perhaps imagined.
Actually, earlier economists and statesmen knew about the possibility of fiat money systems long before Keynes and Fisher
advocated them in the early 20th century.
It was because earlier proponents of a commodity money system did not trust governments properly to manage a fiat
money system that they were willing to pay the resource costs associated with setting up and maintaining a commodity
money system.
In light of the high inflation episodes that many countries experienced in the twentieth century after they abandoned
commodity monies, it is difficult to criticize them for their preference to stay on the pre-1914 gold/silver standard.
The breadth and length of the inflationary experiences of the twentieth century, the century of paper money, are historically
unprecedented.
In the wake of World War I, which ended in November 1918, monetary and fiscal authorities struggled to achieve price
level stability without being on a gold or silver standard.
We present four graphs from “The Ends of Four Big Inflations” from chapter 3 of [Sar13].
The graphs depict logarithms of price levels during the early post World War I years for four countries:
• Figure 3.1, Retail prices Austria, 1921-1924 (page 42)
• Figure 3.2, Wholesale prices Hungary, 1921-1924 (page 43)
• Figure 3.3, Wholesale prices, Poland, 1921-1924 (page 44)
• Figure 3.4, Wholesale prices, Germany, 1919-1924 (page 45)
We have added logarithms of the exchange rates vis a vis the US dollar to each of the four graphs from chapter 3 of
[Sar13].
Data underlying our graphs appear in tables in an appendix to chapter 3 of [Sar13]. We have transcribed all of these data
into a spreadsheet chapter_3.xls that we read into Pandas.
# import data
xls = pd.ExcelFile('datasets/chapter_3.xlsx')
df_list = []
for i in range(4):
11.2.1 Austria
df_Aus.head(5)
Exchange Rate
1919-01-01 17.09
1919-02-01 20.72
1919-03-01 25.85
1919-04-01 26.03
1919-05-01 24.75
# create plot
fig, ax = plt.subplots(figsize=[10,7], dpi=200)
_ = create_pe_plot(p_seq, e_seq, df_Aus.index, lab, ax)
Staring at the above graphs conveys the following impressions to the authors of this lecture at quantecon.
• an episode of “hyperinflation” with rapidly rising log price level and very high monthly inflation rates
• a sudden stop of the hyperinflation as indicated by the abrupt flattening of the log price level and a marked permanent
drop in the three-month average of inflation
• a US dollar exchange rate that shadows the price level.
We’ll see similar patterns in the next three episodes that we’ll study now.
11.2.2 Hungary
df_Hung.head(5)
Gold coin and bullion Silver coin Foreign currency and exchange \
1921-01-01 NaN NaN NaN
1921-02-01 NaN NaN NaN
1921-03-01 NaN NaN NaN
1921-04-01 NaN NaN NaN
1921-05-01 NaN NaN NaN
# create plot
fig, ax = plt.subplots(figsize=[10,7], dpi=200)
(continues on next page)
11.2.3 Poland
Note: To construct the price level series from the data in the spreadsheet, we instructed Pandas to follow the same
procedures implemented in chapter 3 of [Sar13]. We spliced together three series - Wholesale price index, Wholesale
Price Index: On paper currency basis, and Wholesale Price Index: On zloty basis. We adjusted the sequence based on the
price level ratio at the last period of the available previous series and glued them to construct a single series. We dropped
the exchange rate after June 1924, when the zloty was adopted. We did this because we don’t have the price measured in
zloty. We used the old currency in June to compute the exchange rate adjustment.
df_Pol.head(5)
# non-nan part
ch_index_1 = p_seq1[~p_seq1.isna()].index[-1]
ch_index_2 = p_seq2[~p_seq2.isna()].index[-2]
adj_ratio12 = p_seq1[ch_index_1]/p_seq2[ch_index_1]
adj_ratio23 = p_seq2[ch_index_2]/p_seq3[ch_index_2]
# exchange rate
e_seq = 1/df_Pol['Cents per Polish mark (zloty after May 1924)']
e_seq[e_seq.index > '05-01-1924'] = np.nan
# create plot
fig, ax = plt.subplots(figsize=[10,7], dpi=200)
ax1 = create_pe_plot(p_seq, e_seq, df_Pol.index, lab, ax)
11.2.4 Germany
The sources of our data for Germany are the following tables from chapter 3 of [Sar13]:
• Table 3.18, wholesale price level exp 𝑝
• Table 3.19, exchange rate
df_Germ.head(5)
Price index (on basis of marks before July 1924, reichsmarks after) \
1919-01-01 262.0
1919-02-01 270.0
1919-03-01 274.0
1919-04-01 286.0
1919-05-01 297.0
[5 rows x 22 columns]
p_seq = df_Germ['Price index (on basis of marks before July 1924, reichsmarks after)
↪'].copy()
# create plot
fig, ax = plt.subplots(figsize=[9,5], dpi=200)
ax1 = create_pe_plot(p_seq, e_seq, df_Germ.index, lab, ax)
p_seq = df_Germ['Price index (on basis of marks before July 1924, reichsmarks after)
↪'].copy()
lab = ['Price Index (Marks or converted to Marks)', '1/Cents per Mark (or Reichsmark␣
↪converted to Mark)']
# create plot
fig, ax = plt.subplots(figsize=[10,7], dpi=200)
ax1 = create_pe_plot(p_seq, e_seq, df_Germ.index, lab, ax)
A striking thing about our four graphs is how quickly the (log) price levels in Austria, Hungary, Poland, and Germany
leveled off after having been rising so quickly.
These “sudden stops” are also revealed by the permanent drops in three-month moving averages of inflation for the four
countries.
In addition, the US dollar exchange rates for each of the four countries shadowed their price levels.
• This pattern is an instance of a force modeled in the purchasing power parity theory of exchange rates.
Each of these big inflations seemed to have “stopped on a dime”.
Chapter 3 of [SV02] attempts to offer an explanation for this remarkable pattern.
In a nutshell, here is his story.
After World War I, the United States was on the gold standard. The US government stood ready to convert a dollar into
a specified amount of gold on demand. To understate things, immediately after the war, Hungary, Austria, Poland, and
Germany were not on the gold standard.
In practice, their currencies were largely “fiat” or “unbacked”, meaning that they were not backed by credible government
promises to convert them into gold or silver coins on demand. The governments of these countries resorted to the printing
of new unbacked money to finance government deficits. (The notes were “backed” mainly by treasury bills that, in those
times, could not be expected to be paid off by levying taxes, but only by printing more notes or treasury bills.) This was
done on such a scale that it led to a depreciation of the currencies of spectacular proportions. In the end, the German
mark stabilized at 1 trillion (1012 ) paper marks to the prewar gold mark, the Polish mark at 1.8 million paper marks to
the gold zloty, the Austrian crown at 14,400 paper crowns to the prewar Austro-Hungarian crown, and the Hungarian
krone at 14,500 paper crowns to the prewar Austro-Hungarian crown.
Chapter 3 of [SV02] focuses on the deliberate changes in policy that Hungary, Austria, Poland, and Germany made to end
their hyperinflations. The hyperinflations were each ended by restoring or virtually restoring convertibility to the dollar
or equivalently to gold.
The story told in [SV02] is grounded in a “fiscal theory of the price level” described in this lecture and further discussed
in this lecture.
Those lectures discuss theories about what holders of those rapidly depreciating currencies were thinking about them and
how that shaped responses of inflation to government policies.
TWELVE
12.1 Introduction
import numpy as np
from collections import namedtuple
import matplotlib.pyplot as plt
We’ll use linear algebra first to explain and then do some experiments with a “fiscal theory of the price level”.
A fiscal theory of the price level was described by Thomas Sargent and Neil Wallace in chapter 5 of [Sar13], which
reprints a 1981 article title “Unpleasant Monetarist Arithmetic”.
Sometimes people call it a ``monetary theory of the price level’’ because fiscal effects on the price level occur through the
effects of government fiscal policy decisions on the path of the money supply.
• a goverment’s fiscal policies determine whether it expenditures exceed its tax collections
• if its expenditures exceeds it tax collections, it can cover the difference by printing money
• that leads to effects on the price level as price level path adjusts to equate the supply of money to the demand for
money
The theory has been extended, criticized, and applied by John Cochrane in [Coc23].
In another lecture price level histories, we described some European hyperinflations that occurred in the wake of World
War I.
Elemental forces at work in the fiscal theory of the price level help to understand those episodes.
According to this theory, when the government persistently spends more than it collects in taxes and prints money to
finance the shortfall (the “shortfall” is called the “government deficit”), it puts upward pressure on the price level and
generates persistent inflation.
The “fiscal theory of the price level” asserts that
• to start a persistent inflation the government simply persistently runs a money-financed government deficit
• to stop a persistent inflation the government simply stops persistently running a money-financed government deficit
Our model is a “rational expectations” (or “perfect foresight”) version of a model that Philip Cagan [Cag56] used to study
the monetary dynamics of hyperinflations.
While Cagan didn’t use that “rational expectations” version of the model, Thomas Sargent [Sar82] did when he studied
the Ends of Four Big Inflations in Europe after World War I.
169
A First Course in Quantitative Economics with Python
• this lecture fiscal theory of the price level with adaptive expectations describes a version of the model that does
not impose “rational expectations” but instead uses what Cagan and his teacher Milton Friedman called “adaptive
expectations”
– a reader of both lectures will notice that the algebra is easier and more streamlined in the present rational
expectations version of the model
– this can be traced to the following source: the adaptive expectations version of the model has more endogenous
variables and more free parameters
Some of our quantitative experiments with our rational expectations version of the model are designed to illustrate how
the fiscal theory explains the abrupt end of those big inflations.
In those experiments, we’ll encounter an instance of a ‘‘velocity dividend’’ that has sometimes accompanied successful
inflation stabilization programs.
To facilitate using linear matrix algebra as our main mathematical tool, we’ll use a finite horizon version of the model.
As in the present values and consumption smoothing lectures, the only linear algebra that we’ll be using are matrix multi-
plication and matrix inversion.
This equation asserts that the demand for real balances is inversely related to the public’s expected rate of inflation.
People somehow acquire perfect foresight by their having solved a forecasting problem.
This lets us set
𝜋𝑡∗ = 𝜋𝑡 , (12.2)
while equating demand for money to supply lets us set 𝑚𝑑𝑡 = 𝑚𝑡 for all 𝑡 ≥ 0.
The preceding equations then imply
𝑚𝑡 − 𝑝𝑡 = −𝛼(𝑝𝑡+1 − 𝑝𝑡 ) , 𝛼 > 0 (12.3)
To fill in details about what it means for private agents to have perfect foresight, we subtract equation (12.3) at time 𝑡
from the same equation at 𝑡 + 1 to get
𝜇𝑡 − 𝜋𝑡 = −𝛼𝜋𝑡+1 + 𝛼𝜋𝑡 ,
which we rewrite as a forward-looking first-order linear difference equation in 𝜋𝑠 with 𝜇𝑠 as a “forcing variable”:
𝛼 1
𝜋𝑡 = 𝜋𝑡+1 + 𝜇, 𝑡 = 0, 1, … , 𝑇
1+𝛼 1+𝛼 𝑡
𝛼
where 0 < 1+𝛼 < 1.
𝛼
Setting 𝛿 = 1+𝛼 let’s us represent the preceding equation as
𝜋𝑡 = 𝛿𝜋𝑡+1 + (1 − 𝛿)𝜇𝑡 , 𝑡 = 0, 1, … , 𝑇
Write this system of 𝑇 + 1 equations as the single matrix equation
1 −𝛿 0 0 ⋯ 0 0 𝜋0 𝜇0 0
⎡0 1 −𝛿 0 ⋯ 0 0 ⎤ ⎡ 𝜋1 ⎤ ⎡ 𝜇 ⎤ ⎡ 0 ⎤
1
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 0 1 −𝛿 ⋯ 0 0 ⎥ ⎢ 𝜋2 ⎥ 𝜇
= (1 − 𝛿) ⎢ 2 ⎥ + ⎢
0 ⎥
(12.4)
⎢⋮ ⋮ ⋮ ⋮ ⋮ 0 0 ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 ⋯ 1 −𝛿 ⎥ ⎢𝜋𝑇 −1 ⎥ ⎢𝜇𝑇 −1 ⎥ ⎢ 0 ⎥
⎣0 0 0 0 ⋯ 0 1 ⎦ ⎣ 𝜋𝑇 ⎦ ⎣ 𝜇𝑇 ⎦ ⎣𝛿𝜋𝑇∗ +1 ⎦
By multiplying both sides of equation (12.4) by the inverse of the matrix on the left side, we can calculate
𝜋0
⎡ 𝜋 ⎤
⎢ 1 ⎥
𝜋
𝜋≡⎢ 2 ⎥
⎢ ⋮ ⎥
⎢𝜋𝑇 −1 ⎥
⎣ 𝜋𝑇 ⎦
It turns out that
𝑇
𝜋𝑡 = (1 − 𝛿) ∑ 𝛿 𝑠−𝑡 𝜇𝑠 + 𝛿 𝑇 +1−𝑡 𝜋𝑇∗ +1 (12.5)
𝑠=𝑡
Equation (12.7) shows that the log of the money supply at 𝑡 equals the log of the initial money supply 𝑚0 plus accumulation
of rates of money growth between times 0 and 𝑇 .
To determine the continuation inflation rate 𝜋𝑇∗ +1 we shall proceed by applying the following infinite-horizon version of
equation (12.5) at time 𝑡 = 𝑇 + 1:
∞
𝜋𝑡 = (1 − 𝛿) ∑ 𝛿 𝑠−𝑡 𝜇𝑠 , (12.8)
𝑠=𝑡
𝜇𝑡+1 = 𝛾 ∗ 𝜇𝑡 , 𝑡 ≥ 𝑇.
Plugging the preceding equation into equation (12.8) at 𝑡 = 𝑇 + 1 and rearranging we can deduce that
1−𝛿 ∗
𝜋𝑇∗ +1 = 𝛾 𝜇𝑇 (12.9)
1 − 𝛿𝛾 ∗
# parameters
T = 80
T1 = 60
α = 5
m0 = 1
μ0 = 0.5
μ_star = 0
Now we can solve the model to compute 𝜋𝑡 , 𝑚𝑡 and 𝑝𝑡 for 𝑡 = 1, … , 𝑇 + 1 using the matrix equation above
def solve(model):
model_params = model.m0, model.T, model.π_end, model.μ_seq, model.α, model.δ
m0, T, π_end, μ_seq, α, δ = model_params
A1 = np.eye(T+1, T+1) - δ * np.eye(T+1, T+1, k=1)
A2 = np.eye(T+1, T+1) - np.eye(T+1, T+1, k=-1)
π_seq = np.linalg.inv(A1) @ b1
m_seq = np.linalg.inv(A2) @ b2
(continues on next page)
In the experiments below, we’ll use formula (12.9) as our terminal condition for expected inflation.
In devising these experiments, we’ll make assumptions about {𝜇𝑡 } that are consistent with formula (12.9).
We describe several such experiments.
In all of them,
𝜇𝑡 = 𝜇 ∗ , 𝑡 ≥ 𝑇1
In this experiment, we’ll study how, when 𝛼 > 0, a foreseen inflation stabilization has effects on inflation that proceed it.
We’ll study a situation in which the rate of growth of the money supply is 𝜇0 from 𝑡 = 0 to 𝑡 = 𝑇1 and then permanently
falls to 𝜇∗ at 𝑡 = 𝑇1 .
Thus, let 𝑇1 ∈ (0, 𝑇 ).
So where 𝜇0 > 𝜇∗ , we assume that
𝜇0 , 𝑡 = 0, … , 𝑇1 − 1
𝜇𝑡+1 = {
𝜇∗ , 𝑡 ≥ 𝑇 1
We’ll start by executing a version of our “experiment 1” in which the government implements a foreseen sudden permanent
reduction in the rate of money creation at time 𝑇1 .
The following code performs the experiment and plots outcomes.
ax[0].plot(T_seq[:-1], μ_seq)
ax[0].set_ylabel(r'$\mu$')
ax[1].plot(T_seq, π_seq)
ax[1].set_ylabel(r'$\pi$')
ax[3].plot(T_seq, m_seq)
ax[3].set_ylabel(r'$m$')
ax[4].plot(T_seq, p_seq)
ax[4].set_ylabel(r'$p$')
for i in range(5):
ax[i].set_xlabel(r'$t$')
plt.tight_layout()
plt.show()
The plot of the money growth rate 𝜇𝑡 in the top level panel portrays a sudden reduction from .5 to 0 at time 𝑇1 = 60.
This brings about a gradual reduction of the inflation rate 𝜋𝑡 that precedes the money supply growth rate reduction at time
𝑇1 .
Notice how the inflation rate declines smoothly (i.e., continuously) to 0 at 𝑇1 – unlike the money growth rate, it does not
suddenly “jump” downward at 𝑇1 .
This is because the reduction in 𝜇 at 𝑇1 has been foreseen from the start.
While the log money supply portrayed in the bottom panel has a kink at 𝑇1 , the log price level does not – it is “smooth”
– once again a consequence of the fact that the reduction in 𝜇 has been foreseen.
To set the stage for our next experiment, we want to study the determinants of the price level a little more.
We can use equations (12.1) and (12.2) to discover that the log of the price level satisfies
𝑝𝑡 = 𝑚𝑡 + 𝛼𝜋𝑡 (12.10)
In our next experiment, we’ll study a “surprise” permanent change in the money growth that beforehand was completely
unanticipated.
At time 𝑇1 when the “surprise” money growth rate change occurs, to satisfy equation (12.10), the log of real balances
jumps *upward as 𝜋𝑡 jumps downward.
But in order for 𝑚𝑡 − 𝑝𝑡 to jump, which variable jumps, 𝑚𝑇1 or 𝑝𝑇1 ?
What jumps at 𝑇1 ?
Is it 𝑝𝑇1 or 𝑚𝑇1 ?
If we insist that the money supply 𝑚𝑇1 is locked at its value 𝑚1𝑇1 inherited from the past, then formula (12.10) implies
that the price level jumps downward at time 𝑇1 , to coincide with the downward jump in 𝜋𝑇1
An alternative assumption about the money supply level is that as part of the “inflation stabilization”, the government
resets 𝑚𝑇1 according to
By letting money jump according to equation (12.12) the monetary authority prevents the price level from falling at the
moment that the unanticipated stabilization arrives.
In various research papers about stabilizations of high inflations, the jump in the money supply described by equation
(12.12) has been called “the velocity dividend” that a government reaps from implementing a regime change that sustains
a permanently lower inflation rate.
We have noted that with a constant expected forward sequence 𝜇𝑠 = 𝜇̄ for 𝑠 ≥ 𝑡, 𝜋𝑡 = 𝜇.̄
A consequence is that at 𝑇1 , either 𝑚 or 𝑝 must “jump” at 𝑇1 .
We’ll study both cases.
𝑚𝑇1 = 𝑚𝑇1 −1 + 𝜇0
𝜋𝑇1 = 𝜇∗
𝑝𝑇1 = 𝑚𝑇1 + 𝛼𝜋𝑇1
Simply glue the sequences 𝑡 ≤ 𝑇1 and 𝑡 > 𝑇1 .
𝑚𝑇1 jumps.
We then compute for the remaining 𝑇 − 𝑇1 periods with 𝜇𝑠 = 𝜇∗ , ∀𝑠 ≥ 𝑇1 and the initial condition 𝑚𝑇1 from above.
We are now technically equipped to discuss our next experiment.
This experiment deviates a little bit from a pure version of our “perfect foresight” assumption by assuming that a sudden
permanent reduction in 𝜇𝑡 like that analyzed in experiment 1 is completely unanticipated.
Such a completely unanticipated shock is popularly known as an “MIT shock”.
The mental experiment involves switching at time 𝑇1 from an initial “continuation path” for {𝜇𝑡 , 𝜋𝑡 } to another path that
involves a permanently lower inflation rate.
Initial Path: 𝜇𝑡 = 𝜇0 for all 𝑡 ≥ 0. So this path is for {𝜇𝑡 }∞
𝑡=0 ; the associated path for 𝜋𝑡 has 𝜋𝑡 = 𝜇0 .
# path 1
μ_seq_2_path1 = μ0 * np.ones(T+1)
# continuation path
μ_seq_2_cont = μ_star * np.ones(T-T1)
mc2 = create_cagan_model(m0=m_seq_2_path1[T1+1],
α=α, T=T-1-T1, μ_seq=μ_seq_2_cont)
π_seq_2_cont, m_seq_2_cont1, p_seq_2_cont1 = solve(mc2)
mc = create_cagan_model(m0=m_T1, α=α,
T=T-1-T1, μ_seq=μ_seq_2_cont)
π_seq_2_cont2, m_seq_2_cont2, p_seq_2_cont2 = solve(mc)
m_seq_2_regime2 = np.concatenate([m_seq_2_path1[:T1+1],
m_seq_2_cont2])
p_seq_2_regime2 = np.concatenate([p_seq_2_path1[:T1+1],
p_seq_2_cont2])
T_seq = range(T+2)
ax[0].plot(T_seq[:-1], μ_seq_2)
ax[0].set_ylabel(r'$\mu$')
ax[1].plot(T_seq, π_seq_2)
ax[1].set_ylabel(r'$\pi$')
ax[3].plot(T_seq, m_seq_2_regime1,
label='Smooth $m_{T_1}$')
ax[3].plot(T_seq, m_seq_2_regime2,
label='Jumpy $m_{T_1}$')
ax[3].set_ylabel(r'$m$')
for i in range(5):
ax[i].set_xlabel(r'$t$')
plt.tight_layout()
plt.show()
We invite you to compare these graphs with corresponding ones for the foreseen stabilization analyzed in experiment 1
above.
Note how the inflation graph in the top middle panel is now identical to the money growth graph in the top left panel, and
how now the log of real balances portrayed in the top right panel jumps upward at time 𝑇1 .
The bottom panels plot 𝑚 and 𝑝 under two possible ways that 𝑚𝑇1 might adjust as required by the upward jump in 𝑚 − 𝑝
at 𝑇1 .
• the orange line lets 𝑚𝑇1 jump upward in order to make sure that the log price level 𝑝𝑇1 does not fall.
• the blue line lets 𝑝𝑇1 fall while stopping the money supply from jumping.
Here is a way to interpret what the government is doing when the orange line policy is in place.
The government prints money to finance expenditure with the “velocity dividend” that it reaps from the increased demand
for real balances brought about by the permanent decrease in the rate of growth of the money supply.
The next code generates a multi-panel graph that includes outcomes of both experiments 1 and 2.
That allows us to assess how important it is to understand whether the sudden permanent drop in 𝜇𝑡 at 𝑡 = 𝑇1 is fully
unanticipated, as in experiment 1, or completely unanticipated, as in experiment 2.
ax[0].plot(T_seq[:-1], μ_seq_2)
ax[0].set_ylabel(r'$\mu$')
ax[1].plot(T_seq, π_seq_2,
label='Unforeseen')
ax[1].plot(T_seq, π_seq_1,
label='Foreseen', color='tab:green')
ax[1].set_ylabel(r'$\pi$')
ax[2].plot(T_seq,
m_seq_2_regime1 - p_seq_2_regime1,
label='Unforeseen')
ax[2].plot(T_seq, m_seq_1 - p_seq_1,
label='Foreseen', color='tab:green')
ax[2].set_ylabel(r'$m - p$')
ax[3].plot(T_seq, m_seq_2_regime1,
label=r'Unforeseen (Smooth $m_{T_1}$)')
ax[3].plot(T_seq, m_seq_2_regime2,
label=r'Unforeseen ($m_{T_1}$ jumps)')
ax[3].plot(T_seq, m_seq_1,
label='Foreseen shock')
ax[3].set_ylabel(r'$m$')
ax[4].plot(T_seq, p_seq_2_regime1,
label=r'Unforeseen (Smooth $m_{T_1}$)')
ax[4].plot(T_seq, p_seq_2_regime2,
label=r'Unforeseen ($m_{T_1}$ jumps)')
ax[4].plot(T_seq, p_seq_1,
label='Foreseen')
ax[4].set_ylabel(r'$p$')
for i in range(5):
ax[i].set_xlabel(r'$t$')
(continues on next page)
plt.tight_layout()
plt.show()
It is instructive to compare the preceding graphs with graphs of log price levels and inflation rates for data from four big
inflations described in this lecture.
In particular, in the above graphs, notice how a gradual fall in inflation precedes the “sudden stop” when it has been
anticipated long beforehand, but how inflation instead falls abruptly when the permanent drop in money supply growth is
unanticipated.
It seems to the author team at quantecon that the drops in inflation near the ends of the four hyperinflations described in
this lecture more closely resemble outcomes from the experiment 2 “unforeseen stabilization”.
(It is fair to say that the preceding informal pattern recognition exercise should be supplemented with a more formal
structural statistical analysis.)
Experiment 3
𝜇𝑡 = 𝜙𝑡 𝜇0 + (1 − 𝜙𝑡 )𝜇∗ .
Next we perform an experiment in which there is a perfectly foreseen gradual decrease in the rate of growth of the money
supply.
The following code does the calculations and plots the results.
# parameters
ϕ = 0.9
μ_seq = np.array([ϕ**t * μ0 + (1-ϕ**t)*μ_star for t in range(T)])
μ_seq = np.append(μ_seq, μ_star)
12.4 Sequel
This lecture fiscal theory of the price level with adaptive expectations describes an “adaptive expectations” version of
Cagan’s model.
The dynamics become more complicated and so does the algebra.
Nowadays, the “rational expectations” version of the model is more popular among central bankers and economists ad-
vising them.
THIRTEEN
13.1 Introduction
import numpy as np
from collections import namedtuple
import matplotlib.pyplot as plt
This lecture is a sequel or prequel to this lecture fiscal theory of the price level.
We’ll use linear algebra to do some experiments with an alternative “fiscal theory of the price level”.
Like the model in this lecture fiscal theory of the price level, the model asserts that when a government persistently spends
more than it collects in taxes and prints money to finance the shortfall, it puts upward pressure on the price level and
generates persistent inflation.
Instead of the “perfect foresight” or “rational expectations” version of the model in this lecture fiscal theory of the price
level, our model in the present lecture is an “adaptive expectations” version of a model that Philip Cagan [Cag56] used to
study the monetary dynamics of hyperinflations.
It combines these components:
• a demand function for real money balances that asserts that the logarithm of the quantity of real balances demanded
depends inversely on the public’s expected rate of inflation
• an adaptive expectations model that describes how the public’s anticipated rate of inflation responds to past values
of actual inflation
• an equilibrium condition that equates the demand for money to the supply
• an exogenous sequence of rates of growth of the money supply
Our model stays quite close to Cagan’s original specification.
As in the present values and consumption smoothing lectures, the only linear algebra operations that we’ll be using are
matrix multiplication and matrix inversion.
To facilitate using linear matrix algebra as our principal mathematical tool, we’ll use a finite horizon version of the model.
187
A First Course in Quantitative Economics with Python
Let
• 𝑚𝑡 be the log of the supply of nominal money balances;
• 𝜇𝑡 = 𝑚𝑡+1 − 𝑚𝑡 be the net rate of growth of nominal balances;
• 𝑝𝑡 be the log of the price level;
• 𝜋𝑡 = 𝑝𝑡+1 − 𝑝𝑡 be the net rate of inflation between 𝑡 and 𝑡 + 1;
• 𝜋𝑡∗ be the public’s expected rate of inflation between 𝑡 and 𝑡 + 1;
• 𝑇 the horizon – i.e., the last period for which the model will determine 𝑝𝑡
• 𝜋0∗ public’s initial expected rate of inflation between time 0 and time 1.
𝑑
The demand for real balances exp ( 𝑚
𝑝 ) is governed by the following version of the Cagan demand function
𝑡
𝑡
This equation asserts that the demand for real balances is inversely related to the public’s expected rate of inflation.
Equating the logarithm 𝑚𝑑𝑡 of the demand for money to the logarithm 𝑚𝑡 of the supply of money in equation (13.1) and
solving for the logarithm 𝑝𝑡 of the price level gives
𝑝𝑡 = 𝑚𝑡 + 𝛼𝜋𝑡∗ (13.2)
Taking the difference between equation (13.2) at time 𝑡 + 1 and at time 𝑡 gives
∗
𝜋𝑡 = 𝜇𝑡 + 𝛼𝜋𝑡+1 − 𝛼𝜋𝑡∗ (13.3)
We assume that the expected rate of inflation 𝜋𝑡∗ is governed by the Friedman-Cagan adaptive expectations scheme
∗
𝜋𝑡+1 = 𝜆𝜋𝑡∗ + (1 − 𝜆)𝜋𝑡 (13.4)
As exogenous inputs into the model, we take initial conditions 𝑚0 , 𝜋0∗ and a money growth sequence 𝜇 = {𝜇𝑡 }𝑇𝑡=0 .
As endogenous outputs of our model we want to find sequences 𝜋 = {𝜋𝑡 }𝑇𝑡=0 , 𝑝 = {𝑝𝑡 }𝑇𝑡=0 as functions of the endogenous
inputs.
We’ll do some mental experiments by studying how the model outputs vary as we vary the model inputs.
We begin by writing the equation (13.4) adaptive expectations model for 𝜋𝑡∗ for 𝑡 = 0, … , 𝑇 as
1 0 0 ⋯ 0 0 𝜋0∗ 0 0 0 ⋯ 0 𝜋0 𝜋0∗
⎡−𝜆 1 0 ⋯ 0 0⎤ ⎡ 𝜋∗ ⎤ ⎡1 0 0 ⋯ 0⎤ ⎡ 𝜋1 ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢ 1∗ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 −𝜆 1 ⋯ 0 0⎥ ⎢ 𝜋2 ⎥ = (1 − 𝜆) ⎢0 1 0 ⋯ 0⎥ ⎢ 𝜋2 ⎥ + ⎢ 0 ⎥
⎢ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮⎥⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋯ ⋮⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 0 0 0 ⋯ −𝜆 1⎦ ⎣𝜋𝑇∗ +1 ⎦ ⎣0 0 0 ⋯ 1⎦ ⎣𝜋𝑇 ⎦ ⎣ 0 ⎦
188 Chapter 13. A Fiscal Theory of Price Level with Adaptive Expectations
A First Course in Quantitative Economics with Python
where the (𝑇 + 2) × (𝑇 + 2)matrix 𝐴, the (𝑇 + 2) × (𝑇 + 1) matrix 𝐵, and the vectors 𝜋∗ , 𝜋0 , 𝜋0∗ are defined implicitly
by aligning these two equations.
Next we write the key equation (13.3) in matrix notation as
𝜋0 𝜇0 −𝛼 𝛼 0 ⋯ 0 0 𝜋0∗
⎡ 𝜋 ⎤ ⎡ 𝜇 ⎤ ⎡ 0 −𝛼 𝛼 ⋯ 0 0 ⎤ ⎡ 𝜋∗ ⎤
⎢ 1⎥ ⎢ 1⎥ ⎢ ⎥ ⎢ 1∗ ⎥
⎢ 𝜋1 ⎥ = ⎢ 𝜇 2 ⎥ + ⎢ 0 0 −𝛼 ⋯ 0 0 ⎥ ⎢ 𝜋2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋯ 𝛼 0⎥ ⎢ ⋮ ⎥
⎣𝜋𝑇 ⎦ ⎣𝜇𝑇 ⎦ ⎣ 0 0 0 ⋯ −𝛼 𝛼⎦ ⎣𝜋𝑇∗ +1 ⎦
Represent the previous equation system in terms of vectors and matrices as
𝜋 = 𝜇 + 𝐶𝜋∗ (13.6)
where the (𝑇 + 1) × (𝑇 + 2) matrix 𝐶 is defined implicitly to align this equation with the preceding equation system.
We now have all of the ingredients we need to solve for 𝜋 as a function of 𝜇, 𝜋0 , 𝜋0∗ .
Combine equations (13.5)and (13.6) to get
𝐴𝜋∗ = (1 − 𝜆)𝐵𝜋 + 𝜋0∗
= (1 − 𝜆)𝐵 [𝜇 + 𝐶𝜋∗ ] + 𝜋0∗
which implies that
Multiplying both sides of the above equation by the inverse of the matrix on the left side gives
−1
𝜋∗ = [𝐴 − (1 − 𝜆)𝐵𝐶] [(1 − 𝜆)𝐵𝜇 + 𝜋0∗ ] (13.7)
Having solved equation (13.7) for 𝜋∗ , we can use equation (13.6) to solve for 𝜋:
𝜋 = 𝜇 + 𝐶𝜋∗
We have thus solved for two of the key endogenous time series determined by our model, namely, the sequence 𝜋∗ of
expected inflation rates and the sequence 𝜋 of actual inflation rates.
Knowing these, we can then quickly calculate the associated sequence 𝑝 of the logarithm of the price level from equation
(13.2).
Let’s fill in the details for this step.
Since we now know 𝜇 it is easy to compute 𝑚.
Thus, notice that we can represent the equations
𝑚𝑡+1 = 𝑚𝑡 + 𝜇𝑡 , 𝑡 = 0, 1, … , 𝑇
Multiplying both sides of equation (13.8) with the inverse of the matrix on the left will give
𝑡−1
𝑚𝑡 = 𝑚0 + ∑ 𝜇𝑠 , 𝑡 = 1, … , 𝑇 + 1 (13.9)
𝑠=0
Equation (13.9) shows that the log of the money supply at 𝑡 equals the log 𝑚0 of the initial money supply plus accumulation
of rates of money growth between times 0 and 𝑡.
We can then compute 𝑝𝑡 for each 𝑡 from equation (13.2).
We can write a compact formula for 𝑝 as
𝑝 = 𝑚 + 𝛼𝜋 ̂ ∗
where
𝜋0∗
⎡ 𝜋∗ ⎤
⎢ 1⎥
𝜋̂∗ = ⎢ 𝜋2∗ ⎥ ,
⎢ ⋮ ⎥
⎣𝜋𝑇∗ ⎦
which is just 𝜋∗ with the last element dropped.
𝜋̂∗ ≠ 𝜋,
so that in general
𝜋𝑡∗ ≠ 𝜋𝑡 , 𝑡 = 0, 1, … , 𝑇 (13.10)
This outcome is typical in models in which adaptive expectations hypothesis like equation (13.4) appear as a component.
In this lecture fiscal theory of the price level, we studied a version of the model that replaces hypothesis (13.4) with a
“perfect foresight” or “rational expectations” hypothesis.
Cagan_Adaptive = namedtuple("Cagan_Adaptive",
["α", "m0", "Eπ0", "T", "λ"])
# parameters
T = 80
T1 = 60
α = 5
λ = 0.9 # 0.7
m0 = 1
μ0 = 0.5
μ_star = 0
190 Chapter 13. A Fiscal Theory of Price Level with Adaptive Expectations
A First Course in Quantitative Economics with Python
We solve the model and plot variables of interests using the following functions.
T_seq = range(model.T+2)
for i in range(5):
ax[i].set_xlabel(r'$t$')
ax[i].set_ylabel(y_labs[i])
ax[1].legend()
plt.tight_layout()
plt.show()
𝜆 − 𝛼(1 − 𝜆)
∣ ∣<1 (13.11)
1 − 𝛼(1 − 𝜆)
By assuring that the coefficient on 𝜋𝑡 is less than one in absolute value, condition (13.11) assures stability of the dynamics
of {𝜋𝑡 } described by the last line of our string of deductions.
The reader is free to study outcomes in examples that violate condition (13.11).
0.8
print(λ - α*(1-λ))
0.40000000000000013
13.6.1 Experiment 1
We’ll study a situation in which the rate of growth of the money supply is 𝜇0 from 𝑡 = 0 to 𝑡 = 𝑇1 and then permanently
falls to 𝜇∗ at 𝑡 = 𝑇1 .
Thus, let 𝑇1 ∈ (0, 𝑇 ).
So where 𝜇0 > 𝜇∗ , we assume that
𝜇0 , 𝑡 = 0, … , 𝑇1 − 1
𝜇𝑡+1 = {
𝜇∗ , 𝑡 ≥ 𝑇 1
Notice that we studied exactly this experiment in a rational expectations version of the model in this lecture fiscal theory
of the price level.
So by comparing outcomes across the two lectures, we can learn about consequences of assuming adaptive expectations,
as we do here, instead of rational expectations as we assumed in that other lecture.
192 Chapter 13. A Fiscal Theory of Price Level with Adaptive Expectations
A First Course in Quantitative Economics with Python
194 Chapter 13. A Fiscal Theory of Price Level with Adaptive Expectations
A First Course in Quantitative Economics with Python
We invite the reader to compare outcomes with those under rational expectations studied in this lecture fiscal theory of
the price level.
Please note how the actual inflation rate 𝜋𝑡 “overshoots” its ultimate steady-state value at the time of the sudden reduction
in the rate of growth of the money supply at time 𝑇1 .
We invite you to explain to yourself the source of this overshooting and why it does not occur in the rational expectations
version of the model.
13.6.2 Experiment 2
Now we’ll do a different experiment, namely, a gradual stabilization in which the rate of growth of the money supply
smoothly decline from a high value to a persistently low value.
While price level inflation eventually falls, it falls more slowly than the driving force that ultimately causes it to fall, namely,
the falling rate of growth of the money supply.
The sluggish fall in inflation is explained by how anticipated inflation 𝜋𝑡∗ persistently exceeds actual inflation 𝜋𝑡 during the
transition from a high inflation to a low inflation situation.
# parameters
ϕ = 0.9
μ_seq_2 = np.array([ϕ**t * μ0 + (1-ϕ**t)*μ_star for t in range(T)])
μ_seq_2 = np.append(μ_seq_2, μ_star)
196 Chapter 13. A Fiscal Theory of Price Level with Adaptive Expectations
CHAPTER
FOURTEEN
Contents
14.1 Overview
The lecture describes important ideas in economics that use the mathematics of geometric series.
Among these are
• the Keynesian multiplier
• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets
(As we shall see below, the term multiplier comes down to meaning sum of a convergent geometric series)
These and other applications prove the truth of the wise crack that
“in economics, a little knowledge of geometric series goes a long way “
Below we’ll use the following imports:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
import sympy as sym
from sympy import init_printing, latex
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
197
A First Course in Quantitative Economics with Python
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇
1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐 to be in the set (−1, 1).
We now move on to describe some famous economic applications of geometric series.
In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind each deposit receipt that
they issue
• In recent times
– cash consists of pieces of paper issued by the government and called dollars or pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to ask the bank for immediate
payment in cash
• When the UK and France and the US were on either a gold or silver standard (before 1914, for example)
– cash was a gold or silver coin
– a deposit receipt was a bank note that the bank promised to convert into gold or silver on demand; (sometimes
it was also a checking or savings account balance)
Economists and financiers often define the supply of money as an economy-wide sum of cash plus deposits.
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfies 0 < 𝑟 < 1), banks create money by
issuing deposits backed by fractional reserves plus loans that they make to their customers.
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in a fractional reserve system.
The geometric series formula (14.1) is at the heart of the classic model of the money creation process – one that leads us
to the celebrated money multiplier.
𝐿𝑖 + 𝑅 𝑖 = 𝐷 𝑖 (14.2)
The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it has outstanding plus its reserves
of cash 𝑅𝑖 .
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these are IOU’s from the bank to
its depositors in the form of either checking accounts or savings accounts (or before 1914, bank notes issued by a bank
stating promises to redeem note for gold or silver on demand).
Each bank 𝑖 sets its reserves to satisfy the equation
𝑅𝑖 = 𝑟𝐷𝑖 (14.3)
𝐷𝑖+1 = 𝐿𝑖 (14.4)
Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being immediately deposited in
𝑖+1
• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1
Finally, we add an initial condition about an exogenous level of bank 0’s deposits
𝐷0 is given exogenously
We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank in the system, bank number
𝑖 = 0.
Now we do a little algebra.
Combining equations (14.2) and (14.3) tells us that
𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (14.5)
This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash reserves.
Combining equation (14.5) with equation (14.4) tells us that
Equation (14.6) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series
1, (1 − 𝑟), (1 − 𝑟)2 , ⋯
The money multiplier is a number that tells the multiplicative factor by which an exogenous injection of cash into bank
0 leads to an increase in the total deposits in the banking system.
1
Equation (14.7) asserts that the money multiplier is 𝑟
𝐷0
• An initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total deposits of 𝑟 .
∞
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system according to 𝐷0 = ∑𝑖=0 𝑅𝑖 .
The famous economist John Maynard Keynes and his followers created a simple model intended to determine national
income 𝑦 in circumstances in which
• there are substantial unemployed resources, in particular excess supply of labor and capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g., prices and interest rates are
frozen)
• national income is entirely determined by aggregate demand
An elementary Keynesian model of national income determination consists of three equations that describe aggregate
demand for 𝑦 and its components.
The first equation is a national income identity asserting that consumption 𝑐 plus investment 𝑖 equals national income 𝑦:
𝑐+𝑖=𝑦
The second equation is a Keynesian consumption function asserting that people consume a fraction 𝑏 ∈ (0, 1) of their
income:
𝑐 = 𝑏𝑦
∞
The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dynamic process that we describe
next.
We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and changing our specification
of the consumption function to take time into account
• we add a one-period lag in how income affects consumption
We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡.
We modify our consumption function to assume the form
𝑐𝑡 = 𝑏𝑦𝑡−1
so that 𝑏 is the marginal propensity to consume (now) out of last period’s income.
We begin with an initial condition stating that
𝑦−1 = 0
𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0
𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖
and
𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖
and
𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖
𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖
or
1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏
Evidently, as 𝑡 → +∞,
1
𝑦𝑡 → 𝑖
1−𝑏
Remark 1: The above formula is often applied to assert that an exogenous increase in investment of Δ𝑖 at time 0 ignites
a dynamic process of increases in national income by successive amounts
at times 0, 1, 2, ….
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures.
If we generalize the model so that the national income identity becomes
𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡
1
then a version of the preceding argument shows that the government expenditures multiplier is also 1−𝑏 , so that a
permanent increase in government expenditures ultimately leads to an increase in national income equal to the multiplier
times the increase in government expenditures.
We can apply our formula for geometric series to study how interest rates affect values of streams of dollar payments that
extend over time.
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time.
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate
• if the nominal interest rate is 5 percent, then 𝑟 = .05
A one-period gross nominal interest rate 𝑅 is defined as
𝑅 = 1 + 𝑟 ∈ (1, 2)
1, 𝑅, 𝑅2 , ⋯ (14.8)
and
Sequence (14.8) tells us how dollar values of an investment accumulate through time.
Sequence (14.9) tells us how to discount future dollars to get their values in terms of today’s dollars.
14.5.1 Accumulation
Geometric sequence (14.8) tells us how one dollar invested and re-invested in a project with gross one period nominal
rate of return accumulates
• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 + 1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus the principal 𝑅 dollars, so
we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of period 2
• and so on
Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence
𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯
14.5.2 Discounting
Geometric sequence (14.9) tells us how much future dollars are worth in terms of today’s dollars.
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡.
It follows that
• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1
• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗
So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g., today).
𝑇 (𝑇 + 1) 2 (𝑇 − 1)𝑇 (𝑇 + 1) 3
(1 + 𝑔)𝑇 +1 = 1 + (𝑇 + 1)𝑔 + 𝑔 + 𝑔 + ⋯ ≈ 1 + (𝑇 + 1)𝑔
2! 3!
Thus, we get the following approximation:
We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is relatively small compared to
1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation.
We will plot the true finite stream present-value and the two approximations, under different values of 𝑇 , and 𝑔 and 𝑟 in
Python.
First we plot the true finite stream present-value after computing it below
# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))
Now that we have defined our functions, we can plot some outcomes.
First we study the quality of our approximations
T_max = 50
T = np.arange(0, T_max+1)
g = 0.02
r = 0.03
x_0 = 1
fig, ax = plt.subplots()
for f in funcs:
plot_function(ax, T, f, our_args)
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
Fig. 14.2: Infinite and finite lease present value 𝑇 periods ahead
The graph above shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 approaches the value of a perpetual
lease.
Now we consider two different views of what happens as 𝑟 and 𝑔 covary
# First view
# Changing r and g
fig, ax = plt.subplots()
ax.set_ylabel('Present Value, $p_0$')
ax.set_xlabel('$T$ periods ahead')
T_max = 10
T=np.arange(0, T_max+1)
ax.legend()
plt.show()
This graph gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length 𝑇 = +∞ is to have finite value.
For fans of 3-d graphs the same point comes through in the following graph.
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!
# Second view
fig = plt.figure(figsize = [16, 5])
T = 3
ax = plt.subplot(projection='3d')
r = np.arange(0.01, 0.99, 0.005)
g = np.arange(0.011, 0.991, 0.005)
rr, gg = np.meshgrid(r, g)
z = finite_lease_pv_true(T, gg, rr, x_0)
We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔.
We will use a library called SymPy.
SymPy enables us to do symbolic math calculations including computing derivatives of algebraic equations.
We will illustrate how it works by creating a symbolic expression that represents our present value formula for an infinite
lease.
After that, we’ll use SymPy to compute derivatives
𝑥0
𝑔+1
− 𝑟+1 + 1
print('dp0 / dg is:')
dp_dg = sym.diff(p0, g)
dp_dg
dp0 / dg is:
𝑥0
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)
print('dp0 / dr is:')
dp_dr = sym.diff(p0, r)
dp_dr
dp0 / dr is:
𝑥0 (𝑔 + 1)
− 2
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)
𝜕𝑝0 𝜕𝑝0
We can see that for 𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑟 will always be negative.
𝜕𝑝0 𝜕𝑝0
Similarly, 𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑔 will always be positive.
We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 , given that consumption is a
constant fraction of national income, and investment is fixed.
# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100
fig, ax = plt.subplots()
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
In this model, income grows over time, until it gradually converges to the infinite geometric series sum of income.
We now examine what will happen if we vary the so-called marginal propensity to consume, i.e., the fraction of income
that is consumed
Increasing the marginal propensity to consume 𝑏 increases the path of output over time.
Now we will compare the effects on output of increases in investment and government spending.
x = np.arange(0, T+1)
values = [0.3, 0.4]
for i in values:
y = calculate_y(i, b, g_0, T, y_init)
ax1.plot(x, y, label=f"i={i}")
for g in values:
y = calculate_y(i_0, b, g, T, y_init)
ax2.plot(x, y, label=f"g={g}")
Notice here, whether government spending increases from 0.3 to 0.4 or investment increases from 0.3 to 0.4, the shifts
in the graphs are identical.
215
CHAPTER
FIFTEEN
Contents
15.1 Outline
In this lecture we give a quick introduction to data and probability distributions using Python
In this section we recall the definitions of some well-known distributions and show how to manipulate them with SciPy.
217
A First Course in Quantitative Economics with Python
We say that a random variable 𝑋 has distribution 𝑝 if 𝑋 takes value 𝑥𝑖 with probability 𝑝(𝑥𝑖 ).
That is,
Uniform distribution
One simple example is the uniform distribution, where 𝑝(𝑥𝑖 ) = 1/𝑛 for all 𝑛.
We can import the uniform distribution on 𝑆 = {1, … , 𝑛} from SciPy like so:
n = 10
u = scipy.stats.randint(1, n+1)
u.mean(), u.var()
(5.5, 8.25)
The formula for the mean is (𝑛 + 1)/2, and the formula for the variance is (𝑛2 − 1)/12.
Now let’s evaluate the PMF
u.pmf(1)
0.1
u.pmf(2)
0.1
fig, ax = plt.subplots()
S = np.arange(1, n+1)
ax.plot(S, u.pmf(S), linestyle='', marker='o', alpha=0.8, ms=4)
ax.vlines(S, 0, u.pmf(S), lw=0.2)
ax.set_xticks(S)
plt.show()
fig, ax = plt.subplots()
S = np.arange(1, n+1)
ax.step(S, u.cdf(S))
ax.vlines(S, 0, u.cdf(S), lw=0.2)
ax.set_xticks(S)
plt.show()
Exercise 15.2.1
Calculate the mean and variance for this parameterization (i.e., 𝑛 = 10) directly from the PMF, using the expressions
given above.
Check that your answers agree with u.mean() and u.var().
Binomial distribution
Another useful (and more interesting) distribution is the binomial distribution on 𝑆 = {0, … , 𝑛}, which has PMF
𝑛
𝑝(𝑖) = ( )𝜃𝑖 (1 − 𝜃)𝑛−𝑖
𝑖
n = 10
θ = 0.5
u = scipy.stats.binom(n, θ)
u.mean(), u.var()
(5.0, 2.5)
The formula for the mean is 𝑛𝜃 and the formula for the variance is 𝑛𝜃(1 − 𝜃).
Here’s the PDF
u.pmf(1)
0.009765625000000002
fig, ax = plt.subplots()
S = np.arange(1, n+1)
ax.plot(S, u.pmf(S), linestyle='', marker='o', alpha=0.8, ms=4)
ax.vlines(S, 0, u.pmf(S), lw=0.2)
ax.set_xticks(S)
plt.show()
fig, ax = plt.subplots()
S = np.arange(1, n+1)
ax.step(S, u.cdf(S))
ax.vlines(S, 0, u.cdf(S), lw=0.2)
ax.set_xticks(S)
plt.show()
Exercise 15.2.2
Using u.pmf, check that our definition of the CDF given above calculates the same function as u.cdf.
fig, ax = plt.subplots()
S = np.arange(1, n+1)
u_sum = np.cumsum(u.pmf(S))
ax.step(S, u_sum)
ax.vlines(S, 0, u_sum, lw=0.2)
ax.set_xticks(S)
plt.show()
We can see that the output graph is the same as the one above.
Poisson distribution
𝜆𝑖 −𝜆
𝑝(𝑖) = 𝑒
𝑖!
The interpretation of 𝑝(𝑖) is: the number of events in a fixed time interval, where the events occur at a constant rate 𝜆
and independently of each other.
The mean and variance are
λ = 2
u = scipy.stats.poisson(λ)
u.mean(), u.var()
(2.0, 2.0)
λ = 2
u = scipy.stats.poisson(λ)
u.pmf(1)
0.2706705664732254
fig, ax = plt.subplots()
S = np.arange(1, n+1)
ax.plot(S, u.pmf(S), linestyle='', marker='o', alpha=0.8, ms=4)
ax.vlines(S, 0, u.pmf(S), lw=0.2)
ax.set_xticks(S)
plt.show()
Continuous distributions are represented by a density function, which is a function 𝑝 over ℝ (the set of all numbers) such
that 𝑝(𝑥) ≥ 0 for all 𝑥 and
∞
∫ 𝑝(𝑥)𝑑𝑥 = 1
−∞
for all 𝑎 ≤ 𝑏.
The definition of the mean and variance of a random variable 𝑋 with distribution 𝑝 are the same as the discrete case,
after replacing the sum with an integral.
Normal distribution
Perhaps the most famous distribution is the normal distribution, which has density
1 (𝑥 − 𝜇)2
𝑝(𝑥) = √ exp (− )
2𝜋𝜎 2𝜎2
μ, σ = 0.0, 1.0
u = scipy.stats.norm(μ, σ)
u.mean(), u.var()
(0.0, 1.0)
μ_vals = [-1, 0, 1]
σ_vals = [0.4, 1, 1.6]
fig, ax = plt.subplots()
x_grid = np.linspace(-4, 4, 200)
plt.legend()
plt.show()
fig, ax = plt.subplots()
for μ, σ in zip(μ_vals, σ_vals):
u = scipy.stats.norm(μ, σ)
ax.plot(x_grid, u.cdf(x_grid),
alpha=0.5, lw=2,
label=f'$\mu={μ}, \sigma={σ}$')
ax.set_ylim(0, 1)
plt.legend()
plt.show()
Lognormal distribution
μ, σ = 0.0, 1.0
u = scipy.stats.lognorm(s=σ, scale=np.exp(μ))
u.mean(), u.var()
(1.6487212707001282, 4.670774270471604)
μ_vals = [-1, 0, 1]
σ_vals = [0.25, 0.5, 1]
x_grid = np.linspace(0, 3, 200)
fig, ax = plt.subplots()
(continues on next page)
plt.legend()
plt.show()
fig, ax = plt.subplots()
μ = 1
for σ in σ_vals:
u = scipy.stats.norm(μ, σ)
ax.plot(x_grid, u.cdf(x_grid),
alpha=0.5, lw=2,
label=f'$\mu={μ}, \sigma={σ}$')
ax.set_ylim(0, 1)
ax.set_xlim(0, 3)
plt.legend()
plt.show()
Exponential distribution
λ = 1.0
u = scipy.stats.expon(scale=1/λ)
u.mean(), u.var()
(1.0, 1.0)
fig, ax = plt.subplots()
λ_vals = [0.5, 1, 2]
x_grid = np.linspace(0, 6, 200)
for λ in λ_vals:
u = scipy.stats.expon(scale=1/λ)
ax.plot(x_grid, u.pdf(x_grid),
(continues on next page)
fig, ax = plt.subplots()
for λ in λ_vals:
u = scipy.stats.expon(scale=1/λ)
ax.plot(x_grid, u.cdf(x_grid),
alpha=0.5, lw=2,
label=f'$\lambda={λ}$')
ax.set_ylim(0, 1)
plt.legend()
plt.show()
Beta distribution
α, β = 3.0, 1.0
u = scipy.stats.beta(α, β)
u.mean(), u.var()
(0.75, 0.0375)
fig, ax = plt.subplots()
for α, β in zip(α_vals, β_vals):
(continues on next page)
fig, ax = plt.subplots()
for α, β in zip(α_vals, β_vals):
u = scipy.stats.beta(α, β)
ax.plot(x_grid, u.cdf(x_grid),
alpha=0.5, lw=2,
label=fr'$\alpha={α}, \beta={β}$')
ax.set_ylim(0, 1)
plt.legend()
plt.show()
Gamma distribution
𝛽 𝛼 𝛼−1
𝑝(𝑥) = 𝑥 exp(−𝛽𝑥)
Γ(𝛼)
α, β = 3.0, 2.0
u = scipy.stats.gamma(α, scale=1/β)
u.mean(), u.var()
(1.5, 0.75)
fig, ax = plt.subplots()
for α, β in zip(α_vals, β_vals):
(continues on next page)
fig, ax = plt.subplots()
for α, β in zip(α_vals, β_vals):
u = scipy.stats.gamma(α, scale=1/β)
ax.plot(x_grid, u.cdf(x_grid),
alpha=0.5, lw=2,
label=fr'$\alpha={α}, \beta={β}$')
ax.set_ylim(0, 1)
plt.legend()
plt.show()
name income
0 Hiroshi 1200
1 Ako 1210
2 Emi 1400
3 Daiki 990
4 Chiyo 1530
5 Taka 1210
6 Katsuhiko 1240
7 Daisuke 1124
(continues on next page)
In this situation, we might refer to the set of their incomes as the “income distribution.”
The terminology is confusing because this set is not a probability distribution — it’s just a collection of numbers.
However, as we will see, there are connections between observed distributions (i.e., sets of numbers like the income
distribution above) and probability distributions.
Below we explore some observed distributions.
1 𝑛
𝑥̄ = ∑𝑥
𝑛 𝑖=1 𝑖
1 𝑛
∑(𝑥 − 𝑥)̄ 2
𝑛 𝑖=1 𝑖
For the income distribution given above, we can calculate these numbers via
x = np.asarray(df['income'])
x.mean(), x.var()
(1257.4, 20412.839999999997)
Exercise 15.3.1
Check that the formulas given above produce the same numbers.
15.3.2 Visualization
Let’s look at different ways that we can visualize one or more observed distributions.
We will cover
• histograms
• kernel density estimates and
• violin plots
Histograms
x = df['income']
fig, ax = plt.subplots()
ax.hist(x, bins=5, density=True, histtype='bar')
plt.show()
[*********************100%%**********************] 1 of 1 completed
Date
2000-02-01 6.679568
2000-03-01 -2.722323
2000-04-01 -17.630592
2000-05-01 -12.457531
(continues on next page)
The first observation is the monthly return (percent change) over January 2000, which was
data[0]
6.6795679502808625
Let’s turn the return observations into an array and histogram it.
x_amazon = np.asarray(data)
fig, ax = plt.subplots()
ax.hist(x_amazon, bins=20)
plt.show()
Kernel density estimate (KDE) is a non-parametric way to estimate and visualize the PDF of a distribution.
KDE will generate a smooth curve that approximates the PDF.
fig, ax = plt.subplots()
sns.kdeplot(x_amazon, ax=ax)
plt.show()
fig, ax = plt.subplots()
sns.kdeplot(x_amazon, ax=ax, bw_adjust=0.1, alpha=0.5, label="bw=0.1")
sns.kdeplot(x_amazon, ax=ax, bw_adjust=0.5, alpha=0.5, label="bw=0.5")
sns.kdeplot(x_amazon, ax=ax, bw_adjust=1, alpha=0.5, label="bw=1")
plt.legend()
plt.show()
Violin plots
fig, ax = plt.subplots()
ax.violinplot(x_amazon)
plt.show()
Violin plots are particularly useful when we want to compare different distributions.
For example, let’s compare the monthly returns on Amazon shares with the monthly return on Apple shares.
[*********************100%%**********************] 1 of 1 completed
fig, ax = plt.subplots()
ax.violinplot([x_amazon, x_apple])
plt.show()
Let’s discuss the connection between observed distributions and probability distributions.
Sometimes it’s helpful to imagine that an observed distribution is generated by a particular probability distribution.
For example, we might look at the returns from Amazon above and imagine that they were generated by a normal distri-
bution.
Even though this is not true, it might be a helpful way to think about the data.
Here we match a normal distribution to the Amazon monthly returns by setting the sample mean to the mean of the
normal distribution and the sample variance equal to the variance.
Then we plot the density and the histogram.
μ = x_amazon.mean()
σ_squared = x_amazon.var()
σ = np.sqrt(σ_squared)
u = scipy.stats.norm(μ, σ)
The match between the histogram and the density is not very bad but also not very good.
One reason is that the normal distribution is not really a good fit for this observed data — we will discuss this point again
when we talk about heavy tailed distributions.
Of course, if the data really is generated by the normal distribution, then the fit will be better.
Let’s see this in action
• first we generate random draws from the normal distribution
• then we histogram them and compare with the density.
μ, σ = 0, 1
u = scipy.stats.norm(μ, σ)
N = 2000 # Number of observations
x_draws = u.rvs(N)
x_grid = np.linspace(-4, 4, 200)
fig, ax = plt.subplots()
ax.plot(x_grid, u.pdf(x_grid))
ax.hist(x_draws, density=True, bins=40)
plt.show()
Note that if you keep increasing 𝑁 , which is the number of observations, the fit will get better and better.
This convergence is a version of the “law of large numbers”, which we will discuss later.
SIXTEEN
16.1 Overview
This lecture illustrates two of the most important results in probability and statistics:
1. the law of large numbers (LLN) and
2. the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics and quantitative economic
modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are based on do not hold.
This lecture will focus on the univariate case (the multivariate case is treated in a more advanced lecture).
We’ll need the following imports:
We begin with the law of large numbers, which tells us when sample averages will converge to their population means.
245
A First Course in Quantitative Economics with Python
p = 0.8
X = st.bernoulli.rvs(p)
print(X)
In this setting, the LLN tells us if we flip the coin many times, the fraction of heads that we see will be close to the mean
𝑝.
Let’s check this:
n = 1_000_000
X_draws = st.bernoulli.rvs(p, size=n)
print(X_draws.mean()) # count the number of 1's and divide by n
0.800343
p = 0.3
X_draws = st.bernoulli.rvs(p, size=n)
print(X_draws.mean())
0.299651
Let’s connect this to the discussion above, where we said the sample average converges to the “population mean”.
Think of 𝑋1 , … , 𝑋𝑛 as independent flips of the coin.
The population mean is the mean in an infinite sample, which equals the expectation 𝔼𝑋.
The sample mean of the draws 𝑋1 , … , 𝑋𝑛 is
1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1
In this case, it is the fraction of draws that equal one (the number of heads divided by 𝑛).
Thus, the LLN tells us that for the Bernoulli trials above
𝑋̄ 𝑛 → 𝔼𝑋 = 𝑝 (𝑛 → ∞) (16.1)
(For the discrete case, we need to replace densities with probability mass functions and integrals with sums.)
Let 𝜇 denote the common mean of this sample.
Thus, for each 𝑖,
∞
𝜇 ∶= 𝔼𝑋𝑖 = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞
1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1
Theorem 16.2.1
If 𝑋1 , … , 𝑋𝑛 are IID and 𝔼|𝑋| is finite, then
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (16.2)
Here
• IID means independent and identically distributed and
∞
• 𝔼|𝑋| = ∫−∞ |𝑥|𝑓(𝑥)𝑑𝑥
16.2.4 Illustration
sample_means = np.empty(m)
for j in range(m):
sample_means[j] = draw_means(X_distribution, n)
# Generate a histogram
fig, ax = plt.subplots()
ax.hist(sample_means, bins=30, alpha=0.5, density=True)
μ = X_distribution.mean() # Get the population mean
σ = X_distribution.std() # and the standard deviation
ax.axvline(x=μ, ls="--", c="k", label=fr"$\mu = {μ}$")
(continues on next page)
ax.set_xlim(μ - σ, μ + σ)
ax.set_xlabel(r'$\bar X_n$', size=12)
ax.set_ylabel('density', size=12)
ax.legend()
plt.show()
def means_violin_plot(distribution,
ns = [1_000, 10_000, 100_000],
m = 10_000):
data = []
for n in ns:
sample_means = [draw_means(distribution, n) for i in range(m)]
data.append(sample_means)
(continues on next page)
fig, ax = plt.subplots()
ax.violinplot(data)
μ = distribution.mean()
ax.axhline(y=μ, ls="--", c="k", label=fr"$\mu = {μ}$")
plt.subplots_adjust(bottom=0.15, wspace=0.05)
ax.set_ylabel('density', size=12)
ax.legend()
plt.show()
means_violin_plot(st.norm(loc=5, scale=2))
As 𝑛 gets large, more probability mass clusters around the population mean 𝜇.
Now let’s try with a Beta distribution.
means_violin_plot(st.beta(6, 6))
As indicated by the theorem, the LLN can break when 𝔼|𝑋| is not finite.
We can demonstrate this using the Cauchy distribution.
The Cauchy distribution has the following property:
If 𝑋1 , … , 𝑋𝑛 are IID and Cauchy, then so is 𝑋̄ 𝑛 .
This means that the distribution of 𝑋̄ 𝑛 does not eventually concentrate on a single number.
Hence the LLN does not hold.
The LLN fails to hold here because the assumption 𝔼|𝑋| = ∞ is violated by the Cauchy distribution.
The LLN can also fail to hold when the IID assumption is violated.
For example, suppose that
In this case,
1 𝑛
𝑋̄ 𝑛 = ∑ 𝑋𝑖 = 𝑋0 ∼ 𝑁 (0, 1)
𝑛 𝑖=1
Note: Although in this case the violation of IID breaks the LLN, there are situations where IID fails but the LLN still
holds.
We will show an example in the exercise.
Next, we turn to the central limit theorem (CLT), which tells us about the distribution of the deviation between sample
averages and population means.
The central limit theorem is one of the most remarkable results in all of mathematics.
In the IID setting, it tells us the following:
Theorem 16.4.1
If 𝑋1 , … , 𝑋𝑛 is IID with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞), then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛 → ∞ (16.3)
𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e., zero mean) normal with standard deviation 𝜎.
The striking implication of the CLT is that for any distribution with finite second moment, the simple operation of adding
independent copies always leads to a Gaussian curve.
16.4.2 Simulation 1
Since the CLT seems almost magical, running simulations that verify its implications is one good way to build under-
standing.
To this end, we now perform the following simulation
1. Choose an arbitrary distribution 𝐹 for the underlying observations 𝑋𝑖 .
√
2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇).
3. Use these draws to compute some measure of their distribution — such as a histogram.
4. Compare the latter to 𝑁 (0, 𝜎2 ).
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the conditions of the CLT, the distribution
must have a finite second moment.)
# Set parameters
n = 250 # Choice of n
k = 1_000_000 # Number of draws of Y_n
distribution = st.expon(2) # Exponential distribution, λ = 1/2
μ, σ = distribution.mean(), distribution.std()
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * σ, 3 * σ
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.4, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, st.norm.pdf(xgrid, scale=σ),
'k-', lw=2, label='$N(0, \sigma^2)$')
ax.set_xlabel(r"$Y_n$", size=12)
ax.set_ylabel(r"$density$", size=12)
ax.legend()
plt.show()
(Notice the absence of for loops — every operation is vectorized, meaning that the major calculations are all shifted to
fast C code.)
The fit to the normal density is already tight and can be further improved by increasing n.
16.5 Exercises
Exercise 16.5.1
Repeat the simulation above1 with the Beta distribution.
You can choose any 𝛼 > 0 and 𝛽 > 0.
# Set parameters
n = 250 # Choice of n
k = 1_000_000 # Number of draws of Y_n
distribution = st.beta(2,2) # We chose Beta(2, 2) as an example
μ, σ = distribution.mean(), distribution.std()
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * σ, 3 * σ
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.4, density=True)
(continues on next page)
plt.show()
Exercise 16.5.2
At the start of this lecture we discussed Bernoulli random variables.
NumPy doesn’t provide a bernoulli function that we can sample from.
However, we can generate a draw of Bernoulli 𝑋 using NumPy via
U = np.random.rand()
X = 1 if U < p else 0
print(X)
Explain why this provides a random variable 𝑋 with the right distribution.
ℙ{0 ≤ 𝑈 < 𝑝} = 𝑝 − 0 = 𝑝
Exercise 16.5.3
We mentioned above that LLN can still hold sometimes when IID is violated.
Let’s investigate this claim further.
Consider the AR(1) process
σ = 10
α = 0.8
β = 0.2
n = 100_000
ax.set_xlabel(r"$n$", size=12)
ax.set_ylabel(r"$\bar X_n$", size=12)
yabs_max = max(ax.get_ylim(), key=abs)
ax.axhline(y=α/(1-β), ls="--", lw=3,
label=r"$\mu = \frac{\alpha}{1-\beta}$",
color = 'black')
plt.legend()
plt.show()
We see the convergence of 𝑥̄ around 𝜇 even when the independence assumption is violated.
SEVENTEEN
17.1 Overview
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import randn
In this section we describe how Monte Carlo can be used to compute expectations.
259
A First Course in Quantitative Economics with Python
𝜎2
𝔼𝑆 = exp (𝜇 + )
2
and
𝑆 = (𝑋1 + 𝑋2 + 𝑋3 )𝑝
where
• 𝑝 is a positive number, which is known to us (i.e., has been estimated),
• 𝑋𝑖 ∼ 𝐿𝑁 (𝜇𝑖 , 𝜎𝑖 ) for 𝑖 = 1, 2, 3,
• the values 𝜇𝑖 , 𝜎𝑖 are also known, and
• the random variables 𝑋1 , 𝑋2 and 𝑋3 are independent.
n = 1_000_000
p = 0.5
μ_1, μ_2, μ_3 = 0.2, 0.8, 0.4
σ_1, σ_2, σ_3 = 0.1, 0.05, 0.2
Here’s a routine using native Python loops to calculate the desired mean
1 𝑛
∑ 𝑆 ≈ 𝔼𝑆
𝑛 𝑖=1 𝑖
%%time
S = 0.0
for i in range(n):
X_1 = np.exp(μ_1 + σ_1 * randn())
X_2 = np.exp(μ_2 + σ_2 * randn())
X_3 = np.exp(μ_3 + σ_3 * randn())
S += (X_1 + X_2 + X_3)**p
S / n
2.2297684653600913
def compute_mean(n=1_000_000):
S = 0.0
for i in range(n):
X_1 = np.exp(μ_1 + σ_1 * randn())
X_2 = np.exp(μ_2 + σ_2 * randn())
X_3 = np.exp(μ_3 + σ_3 * randn())
S += (X_1 + X_2 + X_3)**p
return (S / n)
compute_mean()
2.2296764329201237
def compute_mean_vectorized(n=1_000_000):
X_1 = np.exp(μ_1 + σ_1 * randn(n))
X_2 = np.exp(μ_2 + σ_2 * randn(n))
X_3 = np.exp(μ_3 + σ_3 * randn(n))
S = (X_1 + X_2 + X_3)**p
return S.mean()
%%time
compute_mean_vectorized()
2.229649736607342
%%time
compute_mean_vectorized(n=10_000_000)
CPU times: user 926 ms, sys: 104 ms, total: 1.03 s
Wall time: 1.03 s
2.2297604134260776
Next we are going to price a European call option under risk neutrality.
Let’s first discuss risk neutrality and then consider European options.
When we use risk-neutral pricing, we determine the price of a given asset according to its expected payoff:
17.3.3 Discounting
𝑃 = 𝛽 𝑛 𝔼𝐺 = 𝛽 𝑛 5 × 105
𝑃 = 𝛽 𝑛 𝔼 max{𝑆𝑛 − 𝐾, 0}
Now all we need to do is specify the distribution of 𝑆𝑛 , so the expectation can be calculated.
Suppose we know that 𝑆𝑛 ∼ 𝐿𝑁 (𝜇, 𝜎) and 𝜇 and 𝜎 are known.
If 𝑆𝑛1 , … , 𝑆𝑛𝑀 are independent draws from this lognormal distribution then, by the law of large numbers,
1 𝑀
𝔼 max{𝑆𝑛 − 𝐾, 0} ≈ ∑ max{𝑆𝑛𝑚 − 𝐾, 0}
𝑀 𝑚=1
We suppose that
μ = 1.0
σ = 0.1
K = 1
n = 10
β = 0.95
M = 10_000_000
S = np.exp(μ + σ * np.random.randn(M))
return_draws = np.maximum(S - K, 0)
P = β**n * np.mean(return_draws)
print(f"The Monte Carlo option price is approximately {P:3f}")
In this exercise we investigate a more realistic model for the share price 𝑆𝑛 .
This comes from specifying the underlying dynamics of the share price.
First we specify the dynamics.
Then we’ll compute the price of the option using Monte Carlo.
𝑆𝑡+1
ln = 𝜇 + 𝜎𝜉𝑡+1
𝑆𝑡
where
• 𝑆0 is normally distributed and
• {𝜉𝑡 } is IID and standard normal.
Under the stated assumptions, 𝑆𝑛 is lognormally distributed.
To see why, observe that, with 𝑠𝑡 ∶= ln 𝑆𝑡 , the price dynamics become
Since 𝑠0 is normal and 𝜉1 is normal and IID, we see that 𝑠1 is normally distributed.
Continuing in this way shows that 𝑠𝑛 is normally distributed.
Hence 𝑆𝑛 = exp(𝑠𝑛 ) is lognormal.
The simple dynamic model we studied above is convenient, since we can work out the distribution of 𝑆𝑛 .
However, its predictions are counterfactual because, in the real world, volatility (measured by 𝜎) is not stationary.
Instead it rather changes over time, sometimes high (like during the GFC) and sometimes low.
In terms of our model above, this means that 𝜎 should not be constant.
𝑆𝑡+1
ln = 𝜇 + 𝜎𝑡 𝜉𝑡+1
𝑆𝑡
where
μ = 0.0001
ρ = 0.1
ν = 0.001
S0 = 10
h0 = 0
(Here S0 is 𝑆0 and h0 is ℎ0 .)
For the option we use the following defaults.
K = 100
n = 10
β = 0.95
17.4.5 Visualizations
return np.exp(s)
fig.tight_layout()
plt.show()
Now that our model is more complicated, we cannot easily determine the distribution of 𝑆𝑛 .
So to compute the price 𝑃 of the option, we use Monte Carlo.
We average over realizations 𝑆𝑛1 , … , 𝑆𝑛𝑀 of 𝑆𝑛 and appealing to the law of large numbers:
1 𝑀
𝔼 max{𝑆𝑛 − 𝐾, 0} ≈ ∑ max{𝑆𝑛𝑚 − 𝐾, 0}
𝑀 𝑚=1
def compute_call_price(β=β,
μ=μ,
S0=S0,
h0=h0,
K=K,
n=n,
ρ=ρ,
ν=ν,
M=10_000):
current_sum = 0.0
# For each sample path
for m in range(M):
s = np.log(S0)
h = h0
# Simulate forward in time
for t in range(n):
s = s + μ + np.exp(h) * randn()
h = ρ * h + ν * randn()
# And add the value max{S_n - K, 0} to current_sum
current_sum += np.maximum(np.exp(s) - K, 0)
%%time
compute_call_price()
869.8728500140572
17.5 Exercises
Exercise 17.5.1
We would like to increase 𝑀 in the code above to make the calculation more accurate.
But this is problematic because Python loops are slow.
Your task is to write a faster version of this code using NumPy.
def compute_call_price(β=β,
μ=μ,
S0=S0,
h0=h0,
K=K,
n=n,
ρ=ρ,
ν=ν,
M=10_000):
s = np.full(M, np.log(S0))
h = np.full(M, h0)
for t in range(n):
Z = np.random.randn(2, M)
s = s + μ + np.exp(h) * Z[0, :]
h = ρ * h + ν * Z[1, :]
expectation = np.mean(np.maximum(np.exp(s) - K, 0))
%%time
compute_call_price()
1148.8594136849188
Notice that this version is faster than the one using a Python loop.
Now let’s try with larger 𝑀 to get a more accurate calculation.
%%time
compute_call_price(M=10_000_000)
897.2171827573278
Exercise 17.5.2
Consider that a European call option may be written on an underlying with spot price of $100 and a knockout barrier of
$120.
This option behaves in every way like a vanilla European call, except if the spot price ever moves above $120, the option
“knocks out” and the contract is null and void.
Note that the option does not reactivate if the spot price falls below $120 again.
Use the dynamics defined in (17.1) to price the European call option.
μ = 0.0001
ρ = 0.1
ν = 0.001
S0 = 10
h0 = 0
K = 100
n = 10
β = 0.95
bp = 120
def compute_call_price_with_barrier(β=β,
μ=μ,
S0=S0,
h0=h0,
K=K,
n=n,
ρ=ρ,
ν=ν,
bp=bp,
M=50_000):
current_sum = 0.0
# For each sample path
for m in range(M):
s = np.log(S0)
h = h0
payoff = 0
option_is_null = False
# Simulate forward in time
for t in range(n):
s = s + μ + np.exp(h) * randn()
h = ρ * h + ν * randn()
if np.exp(s) > bp:
payoff = 0
option_is_null = True
break
if not option_is_null:
payoff = np.maximum(np.exp(s) - K, 0)
# And add the payoff to current_sum
current_sum += payoff
%time compute_call_price_with_barrier()
0.044558291514763024
Let’s look at the vectorized version which is faster than using Python loops.
def compute_call_price_with_barrier_vector(β=β,
μ=μ,
S0=S0,
h0=h0,
K=K,
n=n,
ρ=ρ,
ν=ν,
bp=bp,
M=50_000):
s = np.full(M, np.log(S0))
h = np.full(M, h0)
option_is_null = np.full(M, False)
for t in range(n):
Z = np.random.randn(2, M)
s = s + μ + np.exp(h) * Z[0, :]
h = ρ * h + ν * Z[1, :]
# Mark all the options null where S_n > barrier price
option_is_null = np.where(np.exp(s) > bp, True, option_is_null)
%time compute_call_price_with_barrier_vector()
0.03892917960975804
EIGHTEEN
HEAVY-TAILED DISTRIBUTIONS
Contents
• Heavy-Tailed Distributions
– Overview
– Visual comparisons
– Heavy tails in economic cross-sections
– Failure of the LLN
– Why do heavy tails matter?
– Classifying tail properties
– Further reading
– Exercises
In addition to what’s in Anaconda, this lecture will need the following libraries:
273
A First Course in Quantitative Economics with Python
18.1 Overview
Most commonly used probability distributions in classical statistics and the natural sciences have “light tails.”
To explain this concept, let’s look first at examples.
The classic example is the normal distribution, which has density
1 (𝑥 − 𝜇)2
𝑓(𝑥) = √ exp (− ) (−∞ < 𝑥 < ∞)
2𝜋𝜎 2𝜎2
The two parameters 𝜇 and 𝜎 are the mean and standard deviation respectively.
As 𝑥 deviates from 𝜇, the value of 𝑓(𝑥) goes to zero extremely quickly.
We can see this when we plot the density and show a histogram of observations, as with the following code (which assumes
𝜇 = 0 and 𝜎 = 1).
fig, ax = plt.subplots()
X = norm.rvs(size=1_000_000)
ax.hist(X, bins=40, alpha=0.4, label='histogram', density=True)
x_grid = np.linspace(-4, 4, 400)
ax.plot(x_grid, norm.pdf(x_grid), label='density')
ax.legend()
plt.show()
Notice how
• the density’s tails converge quickly to zero in both directions and
• even with 1,000,000 draws, we get no very large or very small observations.
We can see the last point more clearly by executing
X.min(), X.max()
(-4.878995200091626, 5.777536110294029)
n = 2000
fig, ax = plt.subplots()
data = norm.rvs(size=n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_ylim(-15, 15)
ax.set_xlabel('$i$')
ax.set_ylabel('$X_i$', rotation=0)
plt.show()
In probability theory and in the real world, many distributions are light-tailed.
For example, human height is light-tailed.
Yes, it’s true that we see some very tall people.
• For example, basketballer Sun Mingming is 2.32 meters tall
But have you ever heard of someone who is 20 meters tall? Or 200? Or 2000?
Have you ever wondered why not?
After all, there are 8 billion people in the world!
In essence, the reason we don’t see such draws is that the distribution of human height has very light tails.
In fact the distribution of human height obeys a bell-shaped curve similar to the normal distribution.
fig, ax = plt.subplots()
plt.show()
[*********************100%%**********************] 1 of 1 completed
This data looks different to the draws from the normal distribution we saw above.
Several of observations are quite extreme.
We get a similar picture if we look at other assets, such as Bitcoin
fig, ax = plt.subplots()
plt.show()
[*********************100%%**********************] 1 of 1 completed
The histogram also looks different to the histogram of the normal distribution:
fig, ax = plt.subplots()
ax.hist(r, bins=60, alpha=0.4, label='bitcoin returns', density=True)
ax.set_xlabel('returns', fontsize=12)
plt.show()
If we look at higher frequency returns data (e.g., tick-by-tick), we often see even more extreme observations.
See, for example, [Man63] or [Rac03].
Heavy tails are common in economic data but does that mean they are important?
The answer to this question is affirmative!
When distributions are heavy-tailed, we need to think carefully about issues like
• diversification and risk
• forecasting
• taxation (across a heavy-tailed income distribution), etc.
We return to these points below.
Later we will provide a mathematical definition of the difference between light and heavy tails.
But for now let’s do some visual comparisons to help us build intuition on the difference between these two types of
distributions.
18.2.1 Simulations
n = 120
np.random.seed(11)
for ax in axes:
ax.set_ylim((-120, 120))
s_vals = 2, 12
ax = axes[2]
distribution = cauchy()
data = distribution.rvs(n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from the Cauchy distribution", fontsize=11)
plt.subplots_adjust(hspace=0.25)
plt.show()
In the top subfigure, the standard deviation of the normal distribution is 2, and the draws are clustered around the mean.
In the middle subfigure, the standard deviation is increased to 12 and, as expected, the amount of dispersion rises.
The bottom subfigure, with the Cauchy draws, shows a different pattern: tight clustering around the mean for the great
majority of observations, combined with a few sudden large deviations from the mean.
This is typical of a heavy-tailed distribution.
n = 120
np.random.seed(11)
fig, ax = plt.subplots()
ax.set_ylim((0, 50))
data = np.random.exponential(size=n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
plt.show()
If 𝑋 has the Pareto distribution, then there are positive constants 𝑥̄ and 𝛼 such that
𝛼
(𝑥/𝑥)
̄ if 𝑥 ≥ 𝑥̄
ℙ{𝑋 > 𝑥} = { (18.1)
1 if 𝑥 < 𝑥̄
The parameter 𝛼 is called the tail index and 𝑥̄ is called the minimum.
The Pareto distribution is a heavy-tailed distribution.
One way that the Pareto distribution arises is as the exponential of an exponential random variable.
In particular, if 𝑋 is exponentially distributed with rate parameter 𝛼, then
𝑌 = 𝑥̄ exp(𝑋)
n = 120
np.random.seed(11)
fig, ax = plt.subplots()
ax.set_ylim((0, 80))
exponential_data = np.random.exponential(size=n)
pareto_data = np.exp(exponential_data)
ax.plot(list(range(n)), pareto_data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, pareto_data, lw=0.2)
plt.show()
For nonnegative random variables, one way to visualize the difference between light and heavy tails is to look at the
counter CDF (CCDF).
For a random variable 𝑋 with CDF 𝐹 , the CCDF is the function
𝐺𝐸 (𝑥) = exp(−𝛼𝑥)
𝐺𝑃 (𝑥) = 𝑥−𝛼
Exercise 18.2.1
Show how the CCDF of the standard Pareto distribution can be derived from the CCDF of the exponential distribution.
Here’s a log-log plot of the same functions, which makes visual comparison easier.
fig, ax = plt.subplots()
alpha = 1.0
ax.loglog(x, np.exp(- alpha * x), label='exponential', alpha=0.8)
ax.loglog(x, x**(- alpha), label='Pareto', alpha=0.8)
ax.legend()
plt.show()
In the log-log plot, the Pareto CCDF is linear, while the exponential one is concave.
This idea is often used to separate light- and heavy-tailed distributions in visualisations — we return to this point below.
̂ 1 𝑛
𝐺(𝑥) = ∑ 𝟙{𝑥𝑖 > 𝑥}
𝑛 𝑖=1
̂
Thus, 𝐺(𝑥) shows the fraction of the sample that exceeds 𝑥.
Here’s a figure containing some empirical CCDFs from simulated data.
data_1 = np.random.exponential(size=sample_size)
data_2 = np.exp(z)
data_3 = np.exp(np.random.exponential(size=sample_size))
ax.legend()
fig.subplots_adjust(hspace=0.4)
plt.show()
As with the CCDF, the empirical CCDF from the Pareto distributions is approximately linear in a log-log plot.
We will use this idea below when we look at real data.
One specific class of heavy-tailed distributions has been found repeatedly in economic and social phenomena: the class
of so-called power laws.
A random variable 𝑋 is said to have a power law if, for some 𝛼 > 0,
It is also common to say that a random variable 𝑋 with this property has a Pareto tail with tail index 𝛼.
Notice that every Pareto distribution with tail index 𝛼 has a Pareto tail with tail index 𝛼.
We can think of power laws as a generalization of Pareto distributions.
They are distributions that resemble Pareto distributions in their upper right tail.
Another way to think of power laws is a set of distributions with a specific kind of (very) heavy tail.
Here is a plot of the firm size distribution for the largest 500 firms in 2020 taken from Forbes Global 2000.
Here are plots of the city size distribution for the US and Brazil in 2023 from world population review.
The size is measured by population.
18.3.3 Wealth
Here is a plot of the upper tail (top 500) of the wealth distribution.
The data is from the Forbes Billionaires list in 2020.
18.3.4 GDP
The plot is concave rather than linear, so the distribution has light tails.
One reason is that this is data on an aggregate variable, which involves some averaging in its definition.
Averaging tends to eliminate extreme outcomes.
One impact of heavy tails is that sample averages can be poor estimators of the underlying mean of the distribution.
To understand this point better, recall our earlier discussion of the Law of Large Numbers, which considered IID
𝑋1 , … , 𝑋𝑛 with common distribution 𝐹
𝑛
If 𝔼|𝑋𝑖 | is finite, then the sample mean 𝑋̄ 𝑛 ∶= 1
𝑛 ∑𝑖=1 𝑋𝑖 satisfies
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (18.3)
np.random.seed(1234)
N = 1_000
distribution = cauchy()
fig, ax = plt.subplots()
data = distribution.rvs(N)
# Plot
ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar{X}_n$')
plt.show()
18.5.1 Diversification
One of the most important ideas in investing is using diversification to reduce risk.
This is a very old idea — consider, for example, the expression “don’t put all your eggs in one basket”.
To illustrate, consider an investor with one dollar of wealth and a choice over 𝑛 assets with payoffs 𝑋1 , … , 𝑋𝑛 .
Suppose that returns on distinct assets are independent and each return has mean 𝜇 and variance 𝜎2 .
If the investor puts all wealth in one asset, say, then the expected payoff of the portfolio is 𝜇 and the variance is 𝜎2 .
If instead the investor puts share 1/𝑛 of her wealth in each asset, then the portfolio payoff is
𝑛
𝑋𝑖 1 𝑛
𝑌𝑛 = ∑ = ∑ 𝑋𝑖 .
𝑖=1
𝑛 𝑛 𝑖=1
The heaviness of the tail in the wealth distribution matters for taxation and redistribution policies.
The same is true for the income distribution.
For example, the heaviness of the tail of the income distribution helps determine how much revenue a given tax policy
will raise.
Up until now we have discussed light and heavy tails without any mathematical definitions.
Let’s now rectify this.
We will focus our attention on the right hand tails of nonnegative random variables and their distributions.
The definitions for left hand tails are very similar and we omit them to simplify the exposition.
𝜆
𝑚(𝑡) = when 𝑡 < 𝜆
𝜆−𝑡
In particular, 𝑚(𝑡) is finite whenever 𝑡 < 𝜆, so 𝑋 is light-tailed.
One can show that if 𝑋 is light-tailed, then all of its moments are finite.
Conversely, if some moment is infinite, then 𝑋 is heavy-tailed.
The latter condition is not necessary, however.
For example, the lognormal distribution is heavy-tailed but every moment is finite.
For more on heavy tails in the wealth distribution, see e.g., [Vil96] and [BB18].
For more on heavy tails in the firm size distribution, see e.g., [Axt01], [Gab16].
For more on heavy tails in the city size distribution, see e.g., [RRGM11], [Gab16].
There are other important implications of heavy tails, aside from those discussed above.
For example, heavy tails in income and wealth affect productivity growth, business cycles, and political economy.
For further reading, see, for example, [AR02], [GSS03], [BEGS18] or [AKM+18].
18.8 Exercises
Exercise 18.8.1
Prove: If 𝑋 has a Pareto tail with tail index 𝛼, then 𝔼[𝑋 𝑟 ] = ∞ for all 𝑟 ≥ 𝛼.
But then
∞ 𝑥̄ ∞
𝔼𝑋 𝑟 = 𝑟 ∫ 𝑥𝑟−1 ℙ{𝑋 > 𝑥}𝑑𝑥 ≥ 𝑟 ∫ 𝑥𝑟−1 ℙ{𝑋 > 𝑥}𝑑𝑥 + 𝑟 ∫ 𝑥𝑟−1 𝑏𝑥−𝛼 𝑑𝑥.
0 0 𝑥̄
∞
We know that ∫𝑥̄ 𝑥𝑟−𝛼−1 𝑑𝑥 = ∞ whenever 𝑟 − 𝛼 − 1 ≥ −1.
Since 𝑟 ≥ 𝛼, we have 𝔼𝑋 𝑟 = ∞.
Exercise 18.8.2
Repeat exercise 1, but replace the three distributions (two normal, one Cauchy) with three Pareto distributions using
different choices of 𝛼.
For 𝛼, try 1.15, 1.5 and 1.75.
Use np.random.seed(11) to set the seed.
np.random.seed(11)
n = 120
alphas = [1.15, 1.50, 1.75]
plt.subplots_adjust(hspace=0.4)
(continues on next page)
plt.show()
Exercise 18.8.3
There is an ongoing argument about whether the firm size distribution should be modeled as a Pareto distribution or a
lognormal distribution (see, e.g., [FDGA+04], [KLS18] or [ST19]).
This sounds esoteric but has real implications for a variety of economic phenomena.
To illustrate this fact in a simple way, let us consider an economy with 100,000 firms, an interest rate of r = 0.05 and
num_firms = 100_000
num_years = 10
tax_rate = 0.15
r = 0.05
β = 1 / (1 + r) # discount factor
x_bar = 1.0
α = 1.05
def pareto_rvs(n):
"Uses a standard method to generate Pareto draws."
u = np.random.uniform(size=n)
y = x_bar / (u**(1/α))
return y
μ = np.log(2) / α
σ_sq = 2 * (np.log(α/(α - 1)) - np.log(2)/α)
σ = np.sqrt(σ_sq)
Here’s a function to compute a single estimate of tax revenue for a particular choice of distribution dist.
def tax_rev(dist):
tax_raised = 0
for t in range(num_years):
if dist == 'pareto':
π = pareto_rvs(num_firms)
else:
π = np.exp(μ + σ * np.random.randn(num_firms))
tax_raised += β**t * np.sum(π * tax_rate)
return tax_raised
num_reps = 100
np.random.seed(1234)
tax_rev_lognorm = np.empty(num_reps)
tax_rev_pareto = np.empty(num_reps)
for i in range(num_reps):
tax_rev_pareto[i] = tax_rev('pareto')
tax_rev_lognorm[i] = tax_rev('lognorm')
fig, ax = plt.subplots()
ax.violinplot(data)
plt.show()
tax_rev_pareto.mean(), tax_rev_pareto.std()
(1458729.0546623734, 406089.3613661567)
tax_rev_lognorm.mean(), tax_rev_lognorm.std()
(2556174.8615230713, 25586.444565139616)
Looking at the output of the code, our main conclusion is that the Pareto assumption leads to a lower mean and greater
dispersion.
Exercise 18.8.4
The characteristic function of the Cauchy distribution is
Prove that the sample mean 𝑋̄ 𝑛 of 𝑛 independent draws 𝑋1 , … , 𝑋𝑛 from the Cauchy distribution has the same charac-
teristic function as 𝑋1 .
(This means that the sample mean never converges.)
̄ 𝑡 𝑛
𝔼𝑒𝑖𝑡𝑋𝑛 = 𝔼 exp {𝑖 ∑𝑋 }
𝑛 𝑗=1 𝑗
𝑛
𝑡
= 𝔼 ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ 𝔼 exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛
NINETEEN
RACIAL SEGREGATION
Contents
• Racial Segregation
– Outline
– The model
– Results
– Exercises
19.1 Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [Sch69].
His model studies the dynamics of racially mixed neighborhoods.
Like much of Schelling’s work, the model shows how local interactions can lead to surprising aggregate outcomes.
It studies a setting where agents (think of households) have relatively mild preference for neighbors of the same race.
For example, these agents might be comfortable with a mixed race neighborhood but uncomfortable when they feel
“surrounded” by people from a different race.
Schelling illustrated the follow surprising result: in such a setting, mixed race neighborhoods are likely to be unstable,
tending to collapse over time.
In fact the model predicts strongly divided neighborhoods, with high levels of segregation.
In other words, extreme segregation outcomes arise even though people’s preferences are not particularly extreme.
These extreme outcomes happen because of interactions between agents in the model (e.g., households in a city) that drive
self-reinforcing dynamics in the model.
These ideas will become clearer as the lecture unfolds.
In recognition of his work on segregation and other research, Schelling was awarded the 2005 Nobel Prize in Economic
Sciences (joint with Robert Aumann).
Let’s start with some imports:
301
A First Course in Quantitative Economics with Python
%matplotlib inline
import matplotlib.pyplot as plt
from random import uniform, seed
from math import sqrt
import numpy as np
19.2.1 Set-Up
We will cover a variation of Schelling’s model that is different from the original but also easy to program and, at the same
time, captures his main idea.
Suppose we have two types of people: orange people and green people.
Assume there are 𝑛 of each type.
These agents all live on a single unit square.
Thus, the location (e.g, address) of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1.
• The set of all points (𝑥, 𝑦) satisfying 0 < 𝑥, 𝑦 < 1 is called the unit square
• Below we denote the unit square by 𝑆
19.2.2 Preferences
We will say that an agent is happy if 5 or more of her 10 nearest neighbors are of the same type.
An agent who is not happy is called unhappy.
For example,
• if an agent is orange and 5 of her 10 nearest neighbors are orange, then she is happy.
• if an agent is green and 8 of her 10 nearest neighbors are orange, then she is unhappy.
‘Nearest’ is in terms of Euclidean distance.
An important point to note is that agents are not averse to living in mixed areas.
They are perfectly happy if half of their neighbors are of the other color.
19.2.3 Behavior
Now, cycling through the set of all agents, each agent is now given the chance to stay or move.
Each agent stays if they are happy and moves if they are unhappy.
The algorithm for moving is as follows
We cycle continuously through the agents, each time allowing an unhappy agent to move.
We continue to cycle until no one wishes to move.
19.3 Results
* Data:
* Methods:
class Agent:
def draw_location(self):
self.location = uniform(0, 1), uniform(0, 1)
def happy(self,
agents, # List of other agents
(continues on next page)
distances = []
Here’s some code that takes a list of agents and produces a plot showing their locations on the unit square.
Orange agents are represented by orange dots and green ones are represented by green dots.
And here’s some pseudocode for the main loop, where we cycle through the agents until no one wishes to move.
The pseudocode is
def run_simulation(num_of_type_0=600,
num_of_type_1=600,
max_iter=100_000, # Maximum number of iterations
set_seed=1234):
# Initialize a counter
count = 1
run_simulation()
Entering loop 1
Entering loop 2
Entering loop 3
Entering loop 4
Entering loop 5
Entering loop 6
Entering loop 7
19.4 Exercises
Exercise 19.4.1
The object oriented style that we used for coding above is neat but harder to optimize than procedural code (i.e., code
based around functions rather than objects and methods).
Try writing a new version of the model that stores
• the locations of all agents as a 2D NumPy array of floats.
• the types of all agents as a flat NumPy array of integers.
Write functions that act on this data to update the model using the logic similar to that described above.
However, implement the following two changes:
1. Agents are offered a move at random (i.e., selected randomly and given the opportunity to move).
2. After an agent has moved, flip their type with probability 0.01
The second change introduces extra randomness into the model.
(We can imagine that, every so often, an agent moves to a different city and, with small probability, is replaced by an
agent of the other type.)
def initialize_state():
locations = uniform(size=(n, 2))
types = randint(0, high=2, size=n) # label zero or one
return locations, types
"""
if flip_prob > 0:
# flip agent i's type with probability epsilon
U = uniform()
if U < flip_prob:
current_type = types[i]
types[i] = 0 if current_type == 1 else 1
current_iter += 1
When we run this we again find that mixed neighborhoods break down and segregation emerges.
Here’s a sample run.
Nonlinear Dynamics
313
CHAPTER
TWENTY
In this lecture we review a famous model due to Robert Solow (1925–2014) and Trevor Swan (1918–1989).
The model is used to study growth over the long run.
Although the model is simple, it contains some interesting lessons.
We will use the following imports
315
A First Course in Quantitative Economics with Python
𝐹 (𝐾𝑡 , 𝐿)
𝑘𝑡+1 = 𝑠 + (1 − 𝛿)𝑘𝑡 = 𝑠𝐹 (𝑘𝑡 , 1) + (1 − 𝛿)𝑘𝑡
𝐿
With 𝑓(𝑘) ∶= 𝐹 (𝑘, 1), the final expression for capital dynamics is
Our aim is to learn about the evolution of 𝑘𝑡 over time, given an exogenous initial capital stock 𝑘0 .
To understand the dynamics of the sequence (𝑘𝑡 )𝑡≥0 we use a 45 degree diagram.
To do so, we first need to specify the functional form for 𝑓 and assign values to the parameters.
We choose the Cobb–Douglas specification 𝑓(𝑘) = 𝐴𝑘𝛼 and set 𝐴 = 2.0, 𝛼 = 0.3, 𝑠 = 0.3 and 𝛿 = 0.4.
The function 𝑔 from (20.1) is then plotted, along with the 45 degree line.
Let’s define the constants.
def plot45(kstar=None):
xgrid = np.linspace(xmin, xmax, 12000)
fig, ax = plt.subplots()
ax.set_xlim(xmin, xmax)
if kstar:
fps = (kstar,)
ax.set_xticks((0, 1, 2, 3))
ax.set_yticks((0, 1, 2, 3))
ax.set_xlabel('$k_t$', fontsize=12)
ax.set_ylabel('$k_{t+1}$', fontsize=12)
plt.show()
plot45()
Suppose, at some 𝑘𝑡 , the value 𝑔(𝑘𝑡 ) lies strictly above the 45 degree line.
Then we have 𝑘𝑡+1 = 𝑔(𝑘𝑡 ) > 𝑘𝑡 and capital per worker rises.
If 𝑔(𝑘𝑡 ) < 𝑘𝑡 then capital per worker falls.
If 𝑔(𝑘𝑡 ) = 𝑘𝑡 , then we are at a steady state and 𝑘𝑡 remains constant.
(A steady state of the model is a fixed point of the mapping 𝑔.)
From the shape of the function 𝑔 in the figure, we see that there is a unique steady state in (0, ∞).
From our graphical analysis, it appears that (𝑘𝑡 ) converges to 𝑘∗ , regardless of initial capital 𝑘0 .
This is a form of global stability.
The next figure shows three time paths for capital, from three distinct initial conditions, under the parameterization listed
above.
At this parameterization, 𝑘∗ ≈ 1.78.
Let’s define the constants and three distinct initial conditions
ts_length = 20
xmin, xmax = 0, ts_length
ymin, ymax = 0, 3.5
k_star = (s * A / delta)**(1/(1-alpha))
fig, ax = plt.subplots(figsize=[11, 5])
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ts = np.zeros(ts_length)
ax.set_xlabel(r'$t$', fontsize=14)
ax.set_ylabel(r'$k_t$', fontsize=14)
plt.show()
simulate_ts(x0, ts_length)
As expected, the time paths in the figure both converge to this value.
In this section we investigate a continuous time version of the Solow–Swan growth model.
We will see how the smoothing provided by continuous time can simplify analysis.
Recall that the discrete time dynamics for capital are given by 𝑘𝑡+1 = 𝑠𝑓(𝑘𝑡 ) + (1 − 𝛿)𝑘𝑡 .
A simple rearrangement gives the rate of change per unit of time:
Taking the time step to zero gives the continuous time limit
𝑑
𝑘𝑡′ = 𝑠𝑓(𝑘𝑡 ) − 𝛿𝑘𝑡 with 𝑘𝑡′ ∶= 𝑘 (20.3)
𝑑𝑡 𝑡
Our aim is to learn about the evolution of 𝑘𝑡 over time, given initial stock 𝑘0 .
A steady state for (20.3) is a value 𝑘∗ at which capital is unchanging, meaning 𝑘𝑡′ = 0 or, equivalently, 𝑠𝑓(𝑘∗ ) = 𝛿𝑘∗ .
We assume 𝑓(𝑘) = 𝐴𝑘𝛼 , so 𝑘∗ solves 𝑠𝐴𝑘𝛼 = 𝛿𝑘.
The solution is the same as the discrete time case—see (20.2).
The dynamics are represented in the next figure, maintaining the parameterization we used above.
Writing 𝑘𝑡′ = 𝑔(𝑘𝑡 ) with 𝑔(𝑘) = 𝑠𝐴𝑘𝛼 − 𝛿𝑘, values of 𝑘 with 𝑔(𝑘) > 0 imply that 𝑘𝑡′ > 0, so capital is increasing.
When 𝑔(𝑘) < 0, the opposite occurs. Once again, high marginal returns to savings at low levels of capital combined with
low rates of return at high levels of capital combine to yield global stability.
To see this in a figure, let’s define the constants
def plot_gcon(kstar=None):
if kstar:
fps = (kstar,)
ax.set_xlabel("$k$",fontsize=10)
ax.set_ylabel("$k'$", fontsize=10)
ax.set_xticks((0, 1, 2, 3))
ax.set_yticks((-0.3, 0, 0.3))
plt.show()
This shows global stability heuristically for a fixed parameterization, but how would we show the same thing formally for
a continuum of plausible parameters?
In the discrete time case, a neat expression for 𝑘𝑡 is hard to obtain.
In continuous time the process is easier: we can obtain a relatively simple expression for 𝑘𝑡 that specifies the entire path.
The first step is to set 𝑥𝑡 ∶= 𝑘𝑡1−𝛼 , so that 𝑥′𝑡 = (1 − 𝛼)𝑘𝑡−𝛼 𝑘𝑡′ .
Substituting into 𝑘𝑡′ = 𝑠𝐴𝑘𝑡𝛼 − 𝛿𝑘𝑡 leads to the linear differential equation
𝑠𝐴 −𝛿(1−𝛼)𝑡 𝑠𝐴
𝑥𝑡 = (𝑘01−𝛼 − )e +
𝛿 𝛿
(You can confirm that this function 𝑥𝑡 satisfies (20.4) by differentiating it with respect to 𝑡.)
Converting back to 𝑘𝑡 yields
1/(1−𝛼)
𝑠𝐴 −𝛿(1−𝛼)𝑡 𝑠𝐴 (20.5)
𝑘𝑡 = [(𝑘01−𝛼 − )e + ]
𝛿 𝛿
20.4 Exercises
Exercise 20.4.1
Plot per capita consumption 𝑐 at the steady state, as a function of the savings rate 𝑠, where 0 ≤ 𝑠 ≤ 1.
Use the Cobb–Douglas specification 𝑓(𝑘) = 𝐴𝑘𝛼 .
Set 𝐴 = 2.0, 𝛼 = 0.3, and 𝛿 = 0.5
Also, find the approximate value of 𝑠 that maximizes the 𝑐∗ (𝑠) and show it in the plot.
A = 2.0
alpha = 0.3
delta = 0.5
Let’s find the value of 𝑠 that maximizes 𝑐∗ using scipy.optimize.minimize_scalar. We will use −𝑐∗ (𝑠) since mini-
mize_scalar finds the minimum value.
def calc_c_star(s):
k = ((s * A) / delta)**(1/(1 - alpha))
return - (1 - s) * A * k ** alpha
fps = (c_star_max,)
ax.annotate(r'$s^*$',
xy=(s_star_max, c_star_max),
xycoords='data',
xytext=(20, -50),
textcoords='offset points',
fontsize=12,
arrowprops=dict(arrowstyle="->"))
ax.plot(s_grid, c_star, label=r'$c*(s)$')
ax.plot(x_s_max, y_s_max, alpha=0.5, ls='dotted')
ax.set_xlabel(r'$s$')
ax.set_ylabel(r'$c^*(s)$')
ax.legend()
plt.show()
𝑑 ∗
One can also try to solve this mathematically by differentiating 𝑐∗ (𝑠) and solve for 𝑑𝑠 𝑐 (𝑠) = 0 using sympy.
s_star = 0.300000000000000
Incidentally, the rate of savings which maximizes steady state level of per capita consumption is called the Golden Rule
savings rate.
Exercise 20.4.2
Stochastic Productivity
To bring the Solow–Swan model closer to data, we need to think about handling random fluctuations in aggregate quan-
tities.
Among other things, this will eliminate the unrealistic prediction that per-capita output 𝑦𝑡 = 𝐴𝑘𝑡𝛼 converges to a constant
𝑦∗ ∶= 𝐴(𝑘∗ )𝛼 .
We shift to discrete time for the following discussion.
One approach is to replace constant productivity with some stochastic sequence (𝐴𝑡 )𝑡≥1 .
Dynamics are now
def lgnorm():
return np.exp(mu + sig * np.random.randn())
ax.legend(loc='best', fontsize=10)
ax.set_xlabel(r'$t$', fontsize=12)
ax.set_ylabel(r'$k_t$', fontsize=12)
plt.show()
ts_plot(x0, 50)
TWENTYONE
Contents
21.1 Overview
In this lecture we give a quick introduction to discrete time dynamics in one dimension.
• In one-dimensional models, the state of the system is described by a single variable.
• The variable is a number (that is, a point in ℝ).
While most quantitative models have two or more state variables, the one-dimensional setting is a good place to learn the
foundations of dynamics and understand key concepts.
Let’s start with some standard imports:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
This section sets out the objects of interest and the kinds of properties we study.
327
A First Course in Quantitative Economics with Python
(𝑓 ∘ 𝑔)(𝑥) = 𝑓(𝑔(𝑥))
For example, if
• 𝐴 = 𝐵 = 𝐶 = ℝ, the set of real numbers,
√ √
• 𝑔(𝑥) = 𝑥2 and 𝑓(𝑥) = 𝑥, then (𝑓 ∘ 𝑔)(𝑥) = 𝑥2 = |𝑥|.
If 𝑓 is a function from 𝐴 to itself, then 𝑓 2 is the composition of 𝑓 with itself.
√
For example, if 𝐴 = (0, ∞), the set of positive numbers, and 𝑓(𝑥) = 𝑥, then
√
𝑓 2 (𝑥) = √ 𝑥 = 𝑥1/4
A (discrete time) dynamic system is a set 𝑆 and a function 𝑔 that sends set 𝑆 back into to itself.
Examples of dynamic systems include
√
• 𝑆 = (0, 1) and 𝑔(𝑥) = 𝑥
• 𝑆 = (0, 1) and 𝑔(𝑥) = 𝑥2
• 𝑆 = ℤ (the integers) and 𝑔(𝑥) = 2𝑥
On the other hand, if 𝑆 = (−1, 1) and 𝑔(𝑥) = 𝑥 + 1, then 𝑆 and 𝑔 do not form a dynamic system, since 𝑔(1) = 2.
• 𝑔 does not always send points in 𝑆 back into 𝑆.
We care about dynamic systems because we can use them to study dynamics!
Given a dynamic system consisting of set 𝑆 and function 𝑔, we can create a sequence {𝑥𝑡 } of points in 𝑆 by setting
Recalling that 𝑔𝑛 is the 𝑛 compositions of 𝑔 with itself, we can write the trajectory more simply as
𝑥𝑡 = 𝑔𝑡 (𝑥0 ) for 𝑡 ≥ 0.
In all of what follows, we are going to assume that 𝑆 is a subset of ℝ, the real numbers.
Equation (21.1) is sometimes called a first order difference equation
• first order means dependence on only one lag (i.e., earlier states such as 𝑥𝑡−1 do not enter into (21.1)).
One simple example of a dynamic system is when 𝑆 = ℝ and 𝑔(𝑥) = 𝑎𝑥 + 𝑏, where 𝑎, 𝑏 are fixed constants.
This leads to the linear difference equation
The trajectory of 𝑥0 is
Continuing in this way, and using our knowledge of geometric series, we find that, for any 𝑡 ≥ 0,
1 − 𝑎𝑡
𝑥𝑡 = 𝑎𝑡 𝑥0 + 𝑏 (21.4)
1−𝑎
We have an exact expression for 𝑥𝑡 for all 𝑡 and hence a full understanding of the dynamics.
Notice in particular that |𝑎| < 1, then, by (21.4), we have
𝑏
𝑥𝑡 → as 𝑡 → ∞ (21.5)
1−𝑎
regardless of 𝑥0
This is an example of what is called global stability, a topic we return to below.
In the linear example above, we obtained an exact analytical expression for 𝑥𝑡 in terms of arbitrary 𝑡 and 𝑥0 .
This made analysis of dynamics very easy.
When models are nonlinear, however, the situation can be quite different.
For example, recall how we previously studied the law of motion for the Solow growth model, a simplified version of
which is
Here 𝑘 is capital stock and 𝑠, 𝑧, 𝛼, 𝛿 are positive parameters with 0 < 𝛼, 𝛿 < 1.
If you try to iterate like we did in (21.3), you will find that the algebra gets messy quickly.
Analyzing the dynamics of this model requires a different method (see below).
21.3 Stability
A steady state 𝑥∗ of the dynamic system is called globally stable if, for all 𝑥0 ∈ 𝑆,
𝑥𝑡 = 𝑔𝑡 (𝑥0 ) → 𝑥∗ as 𝑡 → ∞
For example, in the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏 with 𝑎 ≠ 1, the steady state 𝑥∗
• is globally stable if |𝑎| < 1 and
• fails to be globally stable otherwise.
This follows directly from (21.4).
A steady state 𝑥∗ of the dynamic system is called locally stable if there exists an 𝜖 > 0 such that
Let’s create a 45 degree diagram for the Solow model with a fixed set of parameters
def g(k):
return A * s * k**alpha + (1 - delta) * k
𝑠𝑧 1/(1−𝛼)
𝑘∗ = ( )
𝛿
21.4.1 Trajectories
By the preceding discussion, in regions where 𝑔 lies above the 45 degree line, we know that the trajectory is increasing.
The next figure traces out a trajectory in such a region so we can see this more clearly.
The initial condition is 𝑘0 = 0.25.
k0 = 0.25
We can plot the time series of capital corresponding to the figure above as follows:
When capital stock is higher than the unique positive steady state, we see that it declines:
k0 = 2.95
The Solow model is nonlinear but still generates very regular dynamics.
One model that generates irregular dynamics is the quadratic map
xmin, xmax = 0, 1
g = lambda x: 4 * x * (1 - x)
x0 = 0.3
plot45(g, xmin, xmax, x0, num_arrows=0)
21.5 Exercises
Exercise 21.5.1
Consider again the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏 with 𝑎 ≠ 1.
The unique steady state is 𝑏/(1 − 𝑎).
The steady state is globally stable if |𝑎| < 1.
Try to illustrate this graphically by looking at a range of initial conditions.
What differences do you notice in the cases 𝑎 ∈ (−1, 0) and 𝑎 ∈ (0, 1)?
Use 𝑎 = 0.5 and then 𝑎 = −0.5 and study the trajectories
Set 𝑏 = 1 throughout.
a, b = 0.5, 1
xmin, xmax = -1, 3
g = lambda x: a * x + b
x0 = -0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
Here is the corresponding time series, which converges towards the steady state.
a, b = -0.5, 1
xmin, xmax = -1, 3
g = lambda x: a * x + b
x0 = -0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
Here is the corresponding time series, which converges towards the steady state.
Once again, we have convergence to the steady state but the nature of convergence differs.
In particular, the time series jumps from above the steady state to below it and back again.
In the current context, the series is said to exhibit damped oscillations.
TWENTYTWO
The cobweb model is a model of prices and quantities in a given market, and how they evolve over time.
22.1 Overview
The cobweb model dates back to the 1930s and, while simple, it remains significant because it shows the fundamental
importance of expectations.
To give some idea of how the model operates, and why expectations matter, imagine the following scenario.
There is a market for soy beans, say, where prices and traded quantities depend on the choices of buyers and sellers.
The buyers are represented by a demand curve — they buy more at low prices and less at high prices.
The sellers have a supply curve — they wish to sell more at high prices and less at low prices.
However, the sellers (who are farmers) need time to grow their crops.
Suppose now that the price is currently high.
Seeing this high price, and perhaps expecting that the high price will remain for some time, the farmers plant many fields
with soy beans.
Next period the resulting high supply floods the market, causing the price to drop.
Seeing this low price, the farmers now shift out of soy beans, restricting supply and causing the price to climb again.
You can imagine how these dynamics could cause cycles in prices and quantities that persist over time.
The cobweb model puts these ideas into equations so we can try to quantify them, and to study conditions under which
cycles persist (or disappear).
In this lecture, we investigate and simulate the basic model under different assumptions regarding the way that produces
form expectations.
Our discussion and simulations draw on high quality lectures by Cars Hommes.
We will use the following imports.
import numpy as np
import matplotlib.pyplot as plt
343
A First Course in Quantitative Economics with Python
22.2 History
hog_prices = [55, 57, 80, 70, 60, 65, 72, 65, 51, 49, 45, 80, 85,
78, 80, 68, 52, 65, 83, 78, 60, 62, 80, 87, 81, 70,
69, 65, 62, 85, 87, 65, 63, 75, 80, 62]
years = np.arange(1924, 1960)
fig, ax = plt.subplots()
ax.plot(years, hog_prices, '-o', ms=4, label='hog price')
ax.set_xlabel('year')
ax.set_ylabel('dollars')
ax.legend()
ax.grid()
plt.show()
Let’s return to our discussion of a hypothetical soy bean market, where price is determined by supply and demand.
We suppose that demand for soy beans is given by
𝐷(𝑝𝑡 ) = 𝑎 − 𝑏𝑝𝑡
where 𝑎, 𝑏 are nonnegative constants and 𝑝𝑡 is the spot (i.e, current market) price at time 𝑡.
(𝐷(𝑝𝑡 ) is the quantity demanded in some fixed unit, such as thousands of tons.)
Because the crop of soy beans for time 𝑡 is planted at 𝑡 − 1, supply of soy beans at time 𝑡 depends on expected prices at
𝑒
time 𝑡, which we denote 𝑝𝑡−1 .
We suppose that supply is nonlinear in expected prices, and takes the form
𝑒 𝑒
𝑆(𝑝𝑡−1 ) = tanh(𝜆(𝑝𝑡−1 − 𝑐)) + 𝑑
class Market:
def __init__(self,
a=8, # demand parameter
b=1, # demand parameter
c=6, # supply parameter
d=1, # supply parameter
λ=2.0): # supply parameter
self.a, self.b, self.c, self.d = a, b, c, d
self.λ = λ
plt.show()
1 𝑒
𝑝𝑡 = − [𝑆(𝑝𝑡−1 ) − 𝑎]
𝑏
Finally, to complete the model, we need to describe how price expectations are formed.
We will assume that expected prices at time 𝑡 depend on past prices.
In particular, we suppose that
𝑒
𝑝𝑡−1 = 𝑓(𝑝𝑡−1 , 𝑝𝑡−2 ) (22.1)
To go further in our analysis we need to specify the function 𝑓; that is, how expectations are formed.
Let’s start with naive expectations, which refers to the case where producers expect the next period spot price to be
whatever the price is in the current period.
In other words,
𝑒
𝑝𝑡−1 = 𝑝𝑡−1
𝑝𝑡 = 𝑔(𝑝𝑡−1 )
Let’s try to understand how prices will evolve using a 45 degree diagram, which is a tool for studying one-dimensional
dynamics.
The function plot45 defined below helps us draw the 45 degree diagram.
Now we can set up a market and plot the 45 degree diagram.
m = Market()
plot45(m, 0, 9, 2, num_arrows=3)
The plot shows the function 𝑔 defined in (22.3) and the 45 degree line.
Think of 𝑝𝑡 as a value on the horizontal axis.
Since 𝑝𝑡+1 = 𝑔(𝑝𝑡 ), we use the graph of 𝑔 to see 𝑝𝑡+1 on the vertical axis.
Clearly,
• If 𝑔 lies above the 45 degree line at 𝑝𝑡 , then we have 𝑝𝑡+1 > 𝑝𝑡 .
• If 𝑔 lies below the 45 degree line at 𝑝𝑡 , then we have 𝑝𝑡+1 < 𝑝𝑡 .
• If 𝑔 hits the 45 degree line at 𝑝𝑡 , then we have 𝑝𝑡+1 = 𝑝𝑡 , so 𝑝𝑡 is a steady state.
Consider the sequence of prices starting at 𝑝0 , as shown in the figure.
We find 𝑝1 on the vertical axis and then shift it to the horizontal axis using the 45 degree line (where values on the two
axes are equal).
Then from 𝑝1 we obtain 𝑝2 and continue.
We can see the start of a cycle.
To confirm this, let’s plot a time series.
"""
(continues on next page)
ts_plot_price(m, 4, ts_length=15)
Naive expectations are quite simple and also important in driving the cycle that we found.
What if expectations are formed in a different way?
Next we consider adaptive expectations.
This refers to the case where producers form expectations for the next period price as a weighted average of their last
guess and the current spot price.
That is,
𝑒 𝑒
𝑝𝑡−1 = 𝛼𝑝𝑡−1 + (1 − 𝛼)𝑝𝑡−2 (0 ≤ 𝛼 ≤ 1) (22.4)
The function below plots price dynamics under adaptive expectations for different values of 𝛼.
axs[i_plot].plot(np.arange(ts_length), p_values)
axs[i_plot].set_title(r'$\alpha={}$'.format(a))
axs[i_plot].set_xlabel('t')
axs[i_plot].set_ylabel('price')
plt.show()
ts_price_plot_adaptive(m, 5, ts_length=30)
22.6 Exercises
Exercise 22.6.1
Using the default Market class and naive expectations, plot a time series simulation of supply (rather than the price).
Show, in particular, that supply also cycles.
fig, ax = plt.subplots()
ax.plot(np.arange(ts_length),
s_values,
'bo-',
alpha=0.6,
lw=2,
label=r'supply')
ax.legend(loc='best', fontsize=10)
ax.set_xticks(np.arange(ts_length))
ax.set_xlabel("time")
ax.set_ylabel("quantity")
plt.show()
m = Market()
ts_plot_supply(m, 5, 15)
Exercise 22.6.2
Backward looking average expectations
Backward looking average expectations refers to the case where producers form expectations for the next period price as
a linear combination of their last guess and the second last guess.
That is,
𝑒
𝑝𝑡−1 = 𝛼𝑝𝑡−1 + (1 − 𝛼)𝑝𝑡−2 (22.6)
Simulate and plot the price dynamics for 𝛼 ∈ {0.1, 0.3, 0.5, 0.8} where 𝑝0 = 1 and 𝑝1 = 2.5.
m = Market()
ts_plot_price_blae(m,
p0=5,
p1=6,
alphas=[0.1, 0.3, 0.5, 0.8],
ts_length=20)
TWENTYTHREE
In this lecture we study the famous overlapping generations (OLG) model, which is used by policy makers and researchers
to examine
• fiscal policy
• monetary policy
• long run growth
and many other topics.
The first rigorous version of the OLG model was developed by Paul Samuelson [Sam58].
Our aim is to gain a good understanding of a simple version of the OLG model.
23.1 Overview
The dynamics of the OLG model are quite similar to those of the Solow-Swan growth model.
At the same time, the OLG model adds an important new feature: the choice of how much to save is endogenous.
To see why this is important, suppose, for example, that we are interested in predicting the effect of a new tax on long-run
growth.
We could add a tax to the Solow-Swan model and look at the change in the steady state.
But this ignores the fact that households will change their savings and consumption behavior when they face the new tax
rate.
Such changes can substantially alter the predictions of the model.
Hence, if we care about accurate predictions, we should model the decision problems of the agents.
In particular, households in the model should decide how much to save and how much to consume, given the environment
that they face (technology, taxes, prices, etc.)
The OLG model takes up this challenge.
We will present a simple version of the OLG model that clarifies the decision problem of households and studies the
implications for long run growth.
Let’s start with some imports.
import numpy as np
from scipy import optimize
from collections import namedtuple
(continues on next page)
357
A First Course in Quantitative Economics with Python
23.2 Environment
Suppose that utility for individuals born at time 𝑡 takes the form
Here
• 𝑢 ∶ ℝ+ → ℝ is called the “flow” utility function
• 𝛽 ∈ (0, 1) is the discount factor
• 𝑐𝑡 is time 𝑡 consumption of the individual born at time 𝑡
• 𝑐𝑡+1 is time 𝑡 + 1 consumption of the same individual
We assume that 𝑢 is strictly increasing.
Savings behavior is determined by the optimization problem
subject to
Here
• 𝑠𝑡 is savings by an individual born at time 𝑡
• 𝑤𝑡 is the wage rate at time 𝑡
• 𝑅𝑡+1 is the interest rate on savings invested at time 𝑡, paid at time 𝑡 + 1
Since 𝑢 is strictly increasing, both of these constraints will hold as equalities at the maximum.
Using this fact and substituting 𝑠𝑡 from the first constraint into the second we get 𝑐𝑡+1 = 𝑅𝑡+1 (𝑤𝑡 − 𝑐𝑡 ).
The first-order condition for a maximum can be obtained by plugging 𝑐𝑡+1 into the objective function, taking the derivative
with respect to 𝑐𝑡 , and setting it to zero.
This leads to the Euler equation of the OLG model, which is
From the first constraint we get 𝑐𝑡 = 𝑤𝑡 − 𝑠𝑡 , so the Euler equation can also be expressed as
Suppose that, for each 𝑤𝑡 and 𝑅𝑡+1 , there is exactly one 𝑠𝑡 that solves (23.3.4).
Then savings can be written as a fixed function of 𝑤𝑡 and 𝑅𝑡+1 .
We write this as
The precise form of the function 𝑠 will depend on the choice of flow utility function 𝑢.
Together, 𝑤𝑡 and 𝑅𝑡+1 represent the prices in the economy (price of labor and rental rate of capital).
Thus, (23.3.5) states the quantity of savings given prices.
In the special case 𝑢(𝑐) = log 𝑐, the Euler equation simplifies to 𝑠𝑡 = 𝛽(𝑤𝑡 − 𝑠𝑡 ).
Solving for saving, we get
𝛽
𝑠𝑡 = 𝑠(𝑤𝑡 , 𝑅𝑡+1 ) = 𝑤 (23.6)
1+𝛽 𝑡
In this special case, savings does not depend on the interest rate.
Since the population size is normalized to 1, 𝑠𝑡 is also total savings in the economy at time 𝑡.
In our closed economy, there is no foreign investment, so net savings equals total investment, which can be understood as
supply of capital to firms.
In the next section we investigate demand for capital.
Equating supply and demand will allow us to determine equilibrium in the OLG economy.
First we describe the firm problem and then we write down an equation describing demand for capital given prices.
For each integer 𝑡 ≥ 0, output 𝑦𝑡 in period 𝑡 is given by the Cobb-Douglas production function
Here 𝑘𝑡 is capital, ℓ𝑡 is labor, and 𝛼 is a parameter (sometimes called the “output elasticity of capital”).
The profit maximization problem of the firm is
The first-order conditions are obtained by taking the derivative of the objective function with respect to capital and labor
respectively and setting them to zero:
23.4.2 Demand
𝑤𝑡 = (1 − 𝛼)𝑘𝑡𝛼 (23.9)
and
𝑅𝑡 = 𝛼𝑘𝑡𝛼−1 (23.10)
The next figure plots the supply of capital, as in (23.3.6), as well as the demand for capital, as in (23.4.5), as functions of
the interest rate 𝑅𝑡+1 .
(For the special case of log utility, supply does not depend on the interest rate, so we have a constant function.)
R_vals = np.linspace(0.3, 1)
α, β = 0.5, 0.9
w = 2.0
fig, ax = plt.subplots()
ax.set_xlabel("$R_{t+1}$")
ax.set_ylabel("$k_{t+1}$")
ax.legend()
plt.show()
23.5 Equilibrium
In equilibrium, savings at time 𝑡 equals investment at time 𝑡, which equals capital supply at time 𝑡 + 1.
Equilibrium is computed by equating these quantities, setting
1/(1−𝛼)
𝛼
𝑑
𝑠(𝑤𝑡 , 𝑅𝑡+1 ) = 𝑘 (𝑅𝑡+1 ) = ( ) (23.12)
𝑅𝑡+1
In principle, we can now solve for the equilibrium price 𝑅𝑡+1 given 𝑤𝑡 .
(In practice, we first need to specify the function 𝑢 and hence 𝑠.)
When we solve this equation, which concerns time 𝑡 + 1 outcomes, time 𝑡 quantities are already determined, so we can
treat 𝑤𝑡 as a constant.
From equilibrium 𝑅𝑡+1 and (23.4.5), we can obtain the equilibrium quantity 𝑘𝑡+1 .
In the case of log utility, we can use (23.5.1) and (23.3.6) to obtain
1/(1−𝛼)
𝛽 𝛼
𝑤𝑡 = ( ) (23.13)
1+𝛽 𝑅𝑡+1
Solving for the equilibrium interest rate gives
𝛼−1
𝛽
𝑅𝑡+1 = 𝛼 ( 𝑤) (23.14)
1+𝛽 𝑡
In Python we can compute this via
In the case of log utility, since capital supply does not depend on the interest rate, the equilibrium quantity is fixed by
supply.
That is,
𝛽
𝑘𝑡+1 = 𝑠(𝑤𝑡 , 𝑅𝑡+1 ) = 𝑤 (23.15)
1+𝛽 𝑡
Let’s redo our plot above but now inserting the equilibrium quantity and price.
R_vals = np.linspace(0.3, 1)
α, β = 0.5, 0.9
w = 2.0
fig, ax = plt.subplots()
R_e = equilibrium_R_log_utility(α, β, w)
k_e = (β / (1 + β)) * w
ax.annotate(r'equilibrium',
xy=(R_e, k_e),
xycoords='data',
xytext=(0, 60),
textcoords='offset points',
fontsize=12,
arrowprops=dict(arrowstyle="->"))
ax.set_xlabel("$R_{t+1}$")
ax.set_ylabel("$k_{t+1}$")
ax.legend()
plt.show()
23.6 Dynamics
𝛽
𝑘𝑡+1 = (1 − 𝛼)(𝑘𝑡 )𝛼 (23.16)
1+𝛽
If we iterate on this equation, we get a sequence for capital stock.
Let’s plot the 45 degree diagram of these dynamics, which we write as
𝛽
𝑘𝑡+1 = 𝑔(𝑘𝑡 ) where 𝑔(𝑘) ∶= (1 − 𝛼)(𝑘)𝛼
1+𝛽
α, β = 0.5, 0.9
kmin, kmax = 0, 0.1
x = 1000
k_grid = np.linspace(kmin, kmax, x)
k_grid_next = np.empty_like(k_grid)
for i in range(x):
k_grid_next[i] = k_update(k_grid[i], α, β)
plt.show()
The diagram shows that the model has a unique positive steady state, which we denote by 𝑘∗ .
We can solve for 𝑘∗ by setting 𝑘∗ = 𝑔(𝑘∗ ), or
𝛽(1 − 𝛼)(𝑘∗ )𝛼
𝑘∗ = (23.17)
(1 + 𝛽)
We can get the steady state interest rate from (23.4.4), which yields
𝛼 1+𝛽
𝑅∗ = 𝛼(𝑘∗ )𝛼−1 =
1−𝛼 𝛽
In Python we have
The 45 degree diagram above shows that time series of capital with positive initial conditions converge to this steady state.
Let’s plot some time series that visualize this.
ts_length = 25
k_series = np.empty(ts_length)
k_series[0] = 0.02
for t in range(ts_length - 1):
k_series[t+1] = k_update(k_series[t], α, β)
fig, ax = plt.subplots()
ax.plot(k_series, label="capital series")
ax.plot(range(ts_length), np.full(ts_length, k_star), 'k--', label="$k^*$")
ax.set_ylim(0, 0.1)
ax.set_ylabel("capital")
ax.set_xlabel("$t$")
ax.legend()
plt.show()
If you experiment with different positive initial conditions, you will see that the series always converges to 𝑘∗ .
Below we also plot the gross interest rate over time.
R_series = α * k_series**(α - 1)
fig, ax = plt.subplots()
ax.plot(R_series, label="gross interest rate")
ax.plot(range(ts_length), np.full(ts_length, R_star), 'k--', label="$R^*$")
ax.set_ylim(0, 4)
ax.set_xlabel("$t$")
ax.legend()
plt.show()
The interest rate reflects the marginal product of capital, which is high when capital stock is low.
Let’s also redefine the capital demand function to work with this namedtuple.
23.7.1 Supply
Notice how, unlike the log case, savings now depends on the interest rate.
R_vals = np.linspace(0.3, 1)
model = create_olg_model()
w = 2.0
fig, ax = plt.subplots()
ax.set_xlabel("$R_{t+1}$")
ax.set_ylabel("$k_{t+1}$")
ax.legend()
plt.show()
23.7.2 Equilibrium
Equating aggregate demand for capital (see (23.4.5)) with our new aggregate supply function yields equilibrium capital.
Thus, we set
1/(𝛼−1)
(𝛾−1)/𝛾 −1 𝑅𝑡+1 (23.21)
𝑤𝑡 [1 + 𝛽 −1/𝛾 𝑅𝑡+1 ] =( )
𝛼
This expression is quite complex and we cannot solve for 𝑅𝑡+1 analytically.
Combining (23.4.4) and (23.7.3) yields
−1
𝑘𝑡+1 = [1 + 𝛽 −1/𝛾 (𝛼𝑘𝑡+1
𝛼−1 (𝛾−1)/𝛾
) ] (1 − 𝛼)(𝑘𝑡 )𝛼 (23.22)
Again, with this equation and 𝑘𝑡 as given, we cannot solve for 𝑘𝑡+1 by pencil and paper.
In the exercise below, you will be asked to solve these equations numerically.
23.8 Exercises
Exercise 23.8.1
Solve for the dynamics of equilibrium capital stock in the CRRA case numerically using (23.7.4).
Visualize the dynamics using a 45 degree diagram.
for i in range(x):
k_grid_next[i] = k_update(k_grid[i], model)
plt.show()
Exercise 23.8.2
The 45 degree diagram from the last exercise shows that there is a unique positive steady state.
The positive steady state can be obtained by setting 𝑘𝑡+1 = 𝑘𝑡 = 𝑘∗ in (23.7.4), which yields
(1 − 𝛼)(𝑘∗ )𝛼
𝑘∗ =
1+ 𝛽 −1/𝛾 (𝛼(𝑘∗ )𝛼−1 )(𝛾−1)/𝛾
Unlike the log preference case, the CRRA utility steady state 𝑘∗ cannot be obtained analytically.
Instead, we solve for 𝑘∗ using Newton’s method.
Here it is in Python
k_star = 0.25788950250843484
Exercise 23.8.3
Generate three time paths for capital, from three distinct initial conditions, under the parameterization listed above.
Use initial conditions for 𝑘0 of 0.001, 1.2, 2.6 and time series length 10.
ts_length = 10
k0 = np.array([0.001, 1.2, 2.6])
fig, ax = plt.subplots()
ts = np.zeros(ts_length)
(continues on next page)
ax.set_xlabel(r'$t$', fontsize=14)
ax.set_ylabel(r'$k_t$', fontsize=14)
plt.show()
TWENTYFOUR
COMMODITY PRICES
24.1 Outline
For more than half of all countries around the globe, commodities account for the majority of total exports.
Examples of commodities include copper, diamonds, iron ore, lithium, cotton and coffee beans.
In this lecture we give an introduction to the theory of commodity prices.
The lecture is quite advanced relative to other lectures in this series.
We need to compute an equilibrium, and that equilibrium is described by a price function.
We will solve an equation where the price function is the unknown.
This is harder than solving an equation for an unknown number, or vector.
The lecture will discuss one way to solve a “functional equation” for an unknown function
For this lecture we need the yfinance library.
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.optimize import minimize_scalar, brentq
from scipy.stats import beta
24.2 Data
The figure below shows the price of cotton in USD since the start of 2016.
373
A First Course in Quantitative Economics with Python
Note: These days, goods such as basic computer chips and integrated circuits are often treated as commodities in financial
markets, being highly standardized, and, for these kinds of commodities, the word “harvest” is not appropriate.
Nonetheless, we maintain it for simplicity.
24.5 Equilibrium
In this section we define the equilibrium and discuss how to compute it.
Speculators are assumed to be risk neutral, which means that they buy the commodity whenever expected profits are
positive.
As a consequence, if expected profits are positive, then the market is not in equilibrium.
Hence, to be in equilibrium, prices must satisfy the “no-arbitrage” condition
• demand = 𝐷(𝑝𝑡 ) + 𝐼𝑡
Thus, the market equilibrium condition is
We choose 𝑝 so that these prices and quantities satisfy the equilibrium conditions above.
More precisely, we seek a 𝑝 such that (24.5.1) and (24.5.2) hold for the corresponding system (24.5.4).
To this end, suppose that there exists a function 𝑝∗ on 𝑆 satisfying
∞
𝑝∗ (𝑥) = max {𝛼 ∫ 𝑝∗ (𝛼𝐼(𝑥) + 𝑧)𝜙(𝑧)𝑑𝑧, 𝑃 (𝑥)} (𝑥 ∈ 𝑆) (24.5)
0
where
It turns out that such a 𝑝∗ will suffice, in the sense that (24.5.1) and (24.5.2) hold for the corresponding system (24.5.4).
To see this, observe first that
∞
𝔼𝑡 𝑝𝑡+1 = 𝔼𝑡 𝑝∗ (𝑋𝑡+1 ) = 𝔼𝑡 𝑝∗ (𝛼𝐼(𝑋𝑡 ) + 𝑍𝑡+1 ) = ∫ 𝑝∗ (𝛼𝐼(𝑋𝑡 ) + 𝑧)𝜙(𝑧)𝑑𝑧
0
We now know that an equilibrium can be obtained by finding a function 𝑝∗ that satisfies (24.5.5).
It can be shown that, under mild conditions there is exactly one function on 𝑆 satisfying (24.5.5).
Moreover, we can compute this function using successive approximation.
This means that we start with a guess of the function and then update it using (24.5.5).
This generates a sequence of functions 𝑝1 , 𝑝2 , …
We continue until this process converges, in the sense that 𝑝𝑘 and 𝑝𝑘+1 are very close together.
Then we take the final 𝑝𝑘 that we computed as our approximation of 𝑝∗ .
To implement our update step, it is helpful if we put (24.5.5) and (24.5.6) together.
This leads us to the update rule
∞
𝑝𝑘+1 (𝑥) = max {𝛼 ∫ 𝑝𝑘 (𝛼(𝑥 − 𝐷(𝑝𝑘+1 (𝑥))) + 𝑧)𝜙(𝑧)𝑑𝑧, 𝑃 (𝑥)} (24.7)
0
24.6 Code
beta_dist = beta(5, 5)
Z = a + beta_dist.rvs(mc_draw_size) * c # Shock observations
D = P = lambda x: 1.0 / x
tol = 1e-4
def T(p_array):
new_p = np.empty_like(p_array)
(continues on next page)
# Update
for i, x in enumerate(grid):
return new_p
fig, ax = plt.subplots()
price = P(grid)
ax.plot(grid, price, alpha=0.5, lw=1, label="inverse demand curve")
error = tol + 1
while error > tol:
new_price = T(price)
error = max(np.abs(new_price - price))
price = new_price
plt.show()
The figure above shows the inverse demand curve 𝑃 , which is also 𝑝0 , as well as our approximation of 𝑝∗ .
Once we have an approximation of 𝑝∗ , we can simulate a time series of prices.
def carry_over(x):
return α * (x - D(p_star(x)))
fig, ax = plt.subplots()
ax.plot(generate_cp_ts(), label="price")
ax.set_xlabel("time")
ax.legend()
plt.show()
Stochastic Dynamics
381
CHAPTER
TWENTYFIVE
Contents
In addition to what’s in Anaconda, this lecture will need the following libraries:
25.1 Overview
Markov chains are a standard way to model time series with some dependence between observations.
For example,
• inflation next year depends on inflation this year
• unemployment next month depends on unemployment this month
Markov chains are one of the workhorse models of economics and finance.
The theory of Markov chains is beautiful and provides many insights into probability and dynamics.
In this introductory lecture, we will
• review some of the key ideas from the theory of Markov chains and
• show how Markov chains appear in some economic applications.
Let’s start with some standard imports:
383
A First Course in Quantitative Economics with Python
In this section we provide the basic definitions and some elementary examples.
Recall that a probability mass function over 𝑛 possible outcomes is a nonnegative 𝑛-vector 𝑝 that sums to one.
For example, 𝑝 = (0.2, 0.2, 0.6) is a probability mass function over 3 outcomes.
A stochastic matrix (or Markov matrix) is an 𝑛 × 𝑛 square matrix 𝑃 such that each row of 𝑃 is a probability mass
function over 𝑛 outcomes.
In other words,
1. each element of 𝑃 is nonnegative, and
2. each row of 𝑃 sums to one
If 𝑃 is a stochastic matrix, then so is the 𝑘-th power 𝑃 𝑘 for all 𝑘 ∈ ℕ.
Checking this in the first exercises below.
Example 1
ℙ{𝑋𝑡+1 = 0 | 𝑋𝑡 = 1} = 0.145
0.971 0.029 0
𝑃 =⎡
⎢0.145 0.778 0.077⎤
⎥
⎣ 0 0.508 0.492⎦
𝑃 (𝑖, 𝑗) = ℙ{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖}
Example 2
Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state 1).
Suppose that, over a one-month period,
1. the unemployed worker finds a job with probability 𝛼 ∈ (0, 1).
2. the employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1).
Given the above information, we can write out the transition probabilities in matrix form as
1−𝛼 𝛼
𝑃 =[ ] (25.1)
𝛽 1−𝛽
For example,
Example 3
Imam and Temple [IT23] categorize political institutions into three types: democracy (D), autocracy (A), and an inter-
mediate state called anocracy (N).
Each institution can have two potential development regimes: collapse (C) and growth (G). This results in six possible
states: DG, DC, NG, NC, AG and AC.
Imam and Temple [IT23] estimate the following transition probabilities:
Looking at the data, we see that democracies tend to have longer-lasting growth regimes compared to autocracies (as
indicated by the lower probability of transitioning from growth to growth in autocracies).
We can also find a higher probability from collapse to growth in democratic regimes
So far we’ve given examples of Markov chains but now let’s define them more carefully.
To begin, let 𝑆 be a finite set {𝑥1 , … , 𝑥𝑛 } with 𝑛 elements.
The set 𝑆 is called the state space and 𝑥1 , … , 𝑥𝑛 are the state values.
A distribution 𝜓 on 𝑆 is a probability mass function of length 𝑛, where 𝜓(𝑖) is the amount of probability allocated to
state 𝑥𝑖 .
A Markov chain {𝑋𝑡 } on 𝑆 is a sequence of random variables taking values in 𝑆 that have the Markov property.
This means that, for any date 𝑡 and any state 𝑦 ∈ 𝑆,
In other words, knowing the current state is enough to know probabilities for the future states.
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 } as follows:
• draw 𝑋0 from a distribution 𝜓0 on 𝑆
• for each 𝑡 = 0, 1, …, draw 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅)
By construction, the resulting process satisfies (25.3).
25.3 Simulation
One natural way to answer questions about Markov chains is to simulate them.
Let’s start by doing this ourselves and then look at libraries that can help us.
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1.
(We start at 0 because Python arrays are indexed from 0.)
array([1, 0, 1, 0, 1])
We’ll write our code as a function that accepts the following three arguments
• A stochastic matrix P.
• An initial distribution ψ_0.
• A positive integer ts_length representing the length of the time series the function should return.
# set up
P = np.asarray(P)
X = np.empty(ts_length, dtype=int)
# simulate
X[0] = X_0
for t in range(ts_length - 1):
X[t+1] = qe.random.draw(P_dist[X[t], :])
return X
P = [[0.4, 0.6],
[0.2, 0.8]]
array([0, 1, 1, 1, 1, 1, 1, 1, 0, 1])
It can be shown that for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25.
(We will explain why later.)
Moreover, this is true regardless of the initial distribution from which 𝑋0 is drawn.
The following code illustrates this
0.250019
You can try changing the initial distribution to confirm that the output is always close to 0.25 (for the P matrix above).
mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1_000_000)
np.mean(X == 0)
0.249481
CPU times: user 18.3 ms, sys: 4.12 ms, total: 22.4 ms
Wall time: 22 ms
mc.simulate(ts_length=4, init='unemployed')
If we want to see indices rather than state values as outputs as we can use
mc.simulate_indices(ts_length=4)
array([1, 0, 1, 1])
We learned that
1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃
2. the distribution of 𝑋𝑡 is known to be 𝜓𝑡
What then is the distribution of 𝑋𝑡+1 , or, more generally, of 𝑋𝑡+𝑚 ?
To answer this, we let 𝜓𝑡 be the distribution of 𝑋𝑡 for 𝑡 = 0, 1, 2, ….
Our first aim is to find 𝜓𝑡+1 given 𝜓𝑡 and 𝑃 .
To begin, pick any 𝑦 ∈ 𝑆.
To get the probability of being at 𝑦 tomorrow (at 𝑡+1), we account for all ways this can happen and sum their probabilities.
This leads to
ℙ{𝑋𝑡+1 = 𝑦} = ∑ ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} ⋅ ℙ{𝑋𝑡 = 𝑥}
𝑥∈𝑆
𝜓𝑡+1 = 𝜓𝑡 𝑃 (25.4)
𝑋0 ∼ 𝜓0 ⟹ 𝑋𝑚 ∼ 𝜓 0 𝑃 𝑚 (25.5)
The general rule is that post-multiplying a distribution by 𝑃 𝑚 shifts it forward 𝑚 units of time.
Hence the following is also valid.
𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (25.6)
We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦).
It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the (𝑥, 𝑦)-th element of the 𝑚-th
power of 𝑃 .
To see why, consider again (25.6), but now with a 𝜓𝑡 that puts all probability on state 𝑥.
Then 𝜓𝑡 is a vector with 1 in position 𝑥 and zero elsewhere.
Inserting this into (25.6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the 𝑥-th row of 𝑃 𝑚 .
In particular
Recall the stochastic matrix 𝑃 for recession and growth considered above.
Suppose that the current state is unknown — perhaps statistics are available only at the end of the current month.
We guess that the probability that the economy is in state 𝑥 is 𝜓𝑡 (𝑥) at time t.
The probability of being in recession (either mild or severe) in 6 months time is given by
As seen in (25.4), we can shift a distribution forward one unit of time via postmultiplication by 𝑃 .
Some distributions are invariant under this updating process — for example,
P = np.array([[0.4, 0.6],
[0.2, 0.8]])
ψ = (0.25, 0.75)
ψ @ P
array([0.25, 0.75])
Theorem 25.5.1
Every stochastic matrix 𝑃 has at least one stationary distribution.
Note that there can be many stationary distributions corresponding to a given stochastic matrix 𝑃 .
• For example, if 𝑃 is the identity matrix, then all distributions on 𝑆 are stationary.
To get uniqueness, we need the Markov chain to “mix around,” so that the state doesn’t get stuck in some part of the state
space.
This gives some intuition for the following theorem.
Theorem 25.5.2
If 𝑃 is everywhere positive, then 𝑃 has exactly one stationary distribution.
We will come back to this when we introduce irreducibility in the next lecture on Markov chains.
25.5.1 Example
Recall our model of the employment/unemployment dynamics of a particular worker discussed above.
If 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then the transition matrix is everywhere positive.
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment (state 0).
Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields
𝛽
𝑝=
𝛼+𝛽
This is, in some sense, a steady state probability of unemployment.
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0.
P = [[0.4, 0.6],
[0.2, 0.8]]
mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions
array([[0.25, 0.75]])
Theorem 25.5.3
Theorem: If there exists an integer 𝑚 such that all entries of 𝑃 𝑚 are strictly positive, with unique stationary distribution
𝜓∗ , and
𝜓0 𝑃 𝑡 → 𝜓 ∗ as 𝑡 → ∞
Hamilton’s chain satisfies the conditions of the theorem because 𝑃 2 is everywhere positive:
Let’s pick an initial distribution 𝜓0 and trace out the sequence of distributions 𝜓0 𝑃 𝑡 for 𝑡 = 0, 1, 2, …
First, we write a function to iterate the sequence of distributions for ts_length period
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)
plt.show()
Here
• 𝑃 is the stochastic matrix for recession and growth considered above.
• The highest red dot is an arbitrarily chosen initial marginal probability distribution 𝜓0 , represented as a vector in
ℝ3 .
• The other red dots are the marginal distributions 𝜓0 𝑃 𝑡 for 𝑡 = 1, 2, ….
• The black dot is 𝜓∗ .
You might like to try experimenting with different initial conditions.
An alternative illustration
We can show this in a slightly different way by focusing on the probability that 𝜓𝑡 puts on each state.
First, we write a function to draw initial distributions 𝜓0 of size num_distributions
def generate_initial_values(num_distributions):
n = len(P)
ψ_0s = np.empty((num_distributions, n))
for i in range(num_distributions):
draws = np.random.randint(1, 10_000_000, size=n)
return ψ_0s
We then write a function to plot the dynamics of (𝜓0 𝑃 𝑡 )(𝑖) as 𝑡 gets large, for each state 𝑖 with different initial distributions
ψ_0s = generate_initial_values(num_distributions)
# Add labels
for i in range(n):
axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color = 'black',
label = fr'$\psi^*({i})$')
axes[i].set_xlabel('t')
axes[i].set_ylabel(fr'$\psi_t({i})$')
axes[i].legend()
plt.show()
0 1
𝑃 =[ ]
1 0
P = np.array([[0, 1],
[1, 0]])
ts_length = 20
num_distributions = 30
Indeed, this 𝑃 fails our asymptotic stationarity condition, since, as you can verify, 𝑃 𝑡 is not everywhere positive for any
𝑡.
𝔼[ℎ(𝑋𝑡 )] (25.7)
𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (25.8)
where
• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃 .
• ℎ is a given function, which, in terms of matrix algebra, we’ll think of as the column vector
ℎ(𝑥1 )
ℎ=⎡
⎢ ⋮ ⎥.
⎤
⎣ℎ(𝑥𝑛 )⎦
Computing the unconditional expectation (25.7) is easy.
We just sum over the marginal distribution of 𝑋𝑡 to get
𝔼[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ
For the conditional expectation (25.8), we need to sum over the conditional distribution of 𝑋𝑡+𝑘 given 𝑋𝑡 = 𝑥.
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so
Sometimes we want to compute the mathematical expectation of a geometric sum, such as ∑𝑡 𝛽 𝑡 ℎ(𝑋𝑡 ).
In view of the preceding discussion, this is
∞
𝔼 [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = 𝑥 + 𝛽(𝑃 ℎ)(𝑥) + 𝛽 2 (𝑃 2 ℎ)(𝑥) + ⋯
𝑗=0
𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯ = (𝐼 − 𝛽𝑃 )−1
Exercise 25.6.1
Imam and Temple [IT23] used a three-state transition matrix to describe the transition of three states of a regime: growth,
stagnation, and collapse
where rows, from top to down, correspond to growth, stagnation, and collapse.
In this exercise,
1. visualize the transition matrix and show this process is asymptotically stationary
2. calculate the stationary distribution using simulations
3. visualize the dynamics of (𝜓0 𝑃 𝑡 )(𝑖) where 𝑡 ∈ 0, ..., 25 and compare the convergent path with the previous
transition matrix
Compare your solution to the paper.
Note that rows of the transition matrix converge to the stationary distribution.
ψ_star_p = P_power[0]
ψ_star_p
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ψ_star
Solution 3:
We find the distribution 𝜓 converges to the stationary distribution more quickly compared to the hamilton’s chain.
ts_length = 10
num_distributions = 25
plot_distribution(P, ts_length, num_distributions)
P_eigenvals = np.linalg.eigvals(P)
P_eigenvals
hamilton_eigenvals = np.linalg.eigvals(P_hamilton)
hamilton_eigenvals
More specifically, it is governed by the spectral gap, the difference between the largest and the second largest eigenvalue.
True
Exercise 25.6.2
We discussed the six-state transition matrix estimated by Imam & Temple [IT23] before.
In this exercise,
1. show this process is asymptotically stationary without simulation
2. simulate and visualize the dynamics starting with a uniform distribution across states (each state will have a prob-
ability of 1/6)
3. change the initial distribution to P(DG) = 1, while all other states have a probability of 0
np.linalg.matrix_power(P,3)
Solution 2:
We find the distribution 𝜓 converges to the stationary distribution quickly regardless of the initial distributions
ts_length = 30
num_distributions = 20
nodes = ['DG', 'DC', 'NG', 'NC', 'AG', 'AC']
plt.show()
Exercise 25.6.3
Prove the following: If 𝑃 is a stochastic matrix, then so is the 𝑘-th power 𝑃 𝑘 for all 𝑘 ∈ ℕ.
TWENTYSIX
Contents
In addition to what’s in Anaconda, this lecture will need the following libraries:
26.1 Overview
405
A First Course in Quantitative Economics with Python
26.2 Irreducibility
We can translate this into a stochastic matrix, putting zeros where there’s no edge between nodes
0.9 0.1 0
𝑃 ∶= ⎡
⎢0.4 0.4 0.2⎤
⎥
⎣0.1 0.1 0.8⎦
It’s clear from the graph that this stochastic matrix is irreducible: we can eventually reach any state from any other state.
We can also test this using QuantEcon.py’s MarkovChain class
True
Here’s a more pessimistic scenario in which poor people remain poor forever
This stochastic matrix is not irreducible since, for example, rich is not accessible from poor.
Let’s confirm this
False
It might be clear to you already that irreducibility is going to be important in terms of long-run outcomes.
For example, poverty is a life sentence in the second graph but not the first.
We’ll come back to this a bit later.
We discussed the uniqueness of the stationary in the previous lecture requires the transition matrix to be everywhere
positive.
In fact irreducibility is enough for the uniqueness of the stationary distribution to hold if the distribution exists.
We can revise the theorem into the following fundamental theorem:
Theorem 26.2.1
If 𝑃 is irreducible, then 𝑃 has exactly one stationary distribution.
26.3 Ergodicity
Theorem 26.3.1
If 𝑃 is irreducible and 𝜓∗ is the unique stationary distribution, then, for all 𝑥 ∈ 𝑆,
1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (26.1)
𝑚 𝑡=1
Here
• {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃 and initial. distribution 𝜓0
• 𝟙{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise.
The result in (26.1) is sometimes called ergodicity.
The theorem tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as time goes to infinity.
This gives us another way to interpret the stationary distribution (provided irreducibility holds).
Importantly, the result is valid for any choice of 𝜓0 .
The theorem is related to the Law of Large Numbers.
It tells us that, in some settings, the law of large numbers sometimes holds even when the sequence of random variables
is not IID.
26.3.1 Example 1
𝛽
𝑝=
𝛼+𝛽
In the cross-sectional interpretation, this is the fraction of people unemployed.
In view of our latest (ergodicity) result, it is also the fraction of time that a single worker can expect to spend unemployed.
Thus, in the long run, cross-sectional averages for a population and time-series averages for a given person coincide.
This is one aspect of the concept of ergodicity.
26.3.2 Example 2
1 𝑡
𝑝𝑡̂ (𝑥) ∶= ∑ 1{𝑋𝑡 = 𝑥}
𝑡 𝑡=1
Here we compare 𝑝𝑡̂ (𝑥) with the stationary distribution 𝜓∗ (𝑥) for different starting points 𝑥0 .
for i in range(n):
axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color='black',
label = fr'$\psi^*({i})$')
axes[i].set_xlabel('t')
axes[i].set_ylabel(fr'$\hat p_t({i})$')
Note the convergence to the stationary distribution regardless of the starting point 𝑥0 .
26.3.3 Example 3
Let’s look at one more example with six states discussed before.
The graph for the chain shows all states are reachable, indicating that this chain is irreducible.
Here we visualize the difference between 𝑝𝑡̂ (𝑥) and the stationary distribution 𝜓∗ (𝑥) for each state 𝑥
ts_length = 10_000
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
fig, ax = plt.subplots(figsize=(9, 6))
X = mc.simulate(ts_length)
# Center the plot at 0
ax.set_ylim(-0.25, 0.25)
ax.axhline(0, linestyle='dashed', lw=2, color='black', alpha=0.4)
for x0 in range(6):
# Calculate the fraction of time for each state
p_hat = (X == x0).cumsum() / (1 + np.arange(ts_length, dtype=float))
ax.plot(p_hat - ψ_star[x0], label=f'$x = {x0+1} $')
ax.set_xlabel('t')
ax.set_ylabel(r'$\hat p_t(x) - \psi^* (x)$')
ax.legend()
plt.show()
Similar to previous examples, the sample path averages for each state converge to the stationary distribution.
26.3.4 Example 4
0 1
𝑃 ∶= [ ]
1 0
In fact it has a periodic cycle — the state cycles between the two states in a regular way.
This is called periodicity.
It is still irreducible so ergodicity holds.
P = np.array([[0, 1],
[1, 0]])
ts_length = 10_000
mc = qe.MarkovChain(P)
n = len(P)
fig, axes = plt.subplots(nrows=1, ncols=n)
ψ_star = mc.stationary_distributions[0]
for i in range(n):
axes[i].set_ylim(0.45, 0.55)
axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color='black',
(continues on next page)
axes[i].legend()
plt.show()
This example helps to emphasize that asymptotic stationarity is about the distribution, while ergodicity is about the sample
path.
The proportion of time spent in a state can converge to the stationary distribution with periodic chains.
However, the distribution at each state does not.
Sometimes we want to compute the mathematical expectation of a geometric sum, such as ∑𝑡 𝛽 𝑡 ℎ(𝑋𝑡 ).
In view of the preceding discussion, this is
∞
𝔼 [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = 𝑥 + 𝛽(𝑃 ℎ)(𝑥) + 𝛽 2 (𝑃 2 ℎ)(𝑥) + ⋯
𝑗=0
𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯ = (𝐼 − 𝛽𝑃 )−1
26.4 Exercises
Exercise 26.4.1
Benhabib el al. [BBL19] estimated that the transition matrix for social mobility as the following
P = [
[0.222, 0.222, 0.215, 0.187, 0.081, 0.038, 0.029, 0.006],
[0.221, 0.22, 0.215, 0.188, 0.082, 0.039, 0.029, 0.006],
[0.207, 0.209, 0.21, 0.194, 0.09, 0.046, 0.036, 0.008],
[0.198, 0.201, 0.207, 0.198, 0.095, 0.052, 0.04, 0.009],
[0.175, 0.178, 0.197, 0.207, 0.11, 0.067, 0.054, 0.012],
[0.182, 0.184, 0.2, 0.205, 0.106, 0.062, 0.05, 0.011],
[0.123, 0.125, 0.166, 0.216, 0.141, 0.114, 0.094, 0.021],
[0.084, 0.084, 0.142, 0.228, 0.17, 0.143, 0.121, 0.028]
]
P = np.array(P)
codes_B = ('1','2','3','4','5','6','7','8')
In this exercise,
1. show this process is asymptotically stationary and calculate the stationary distribution using simulations.
2. use simulations to demonstrate ergodicity of this process.
P = np.array(P)
codes_B = ('1','2','3','4','5','6','7','8')
np.linalg.matrix_power(P, 10)
We find again that rows of the transition matrix converge to the stationary distribution
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ψ_star
Solution 2:
ts_length = 1000
mc = qe.MarkovChain(P)
fig, ax = plt.subplots(figsize=(9, 6))
X = mc.simulate(ts_length)
ax.set_ylim(-0.25, 0.25)
ax.axhline(0, linestyle='dashed', lw=2, color='black', alpha=0.4)
for x0 in range(8):
# Calculate the fraction of time for each worker
p_hat = (X == x0).cumsum() / (1 + np.arange(ts_length, dtype=float))
ax.plot(p_hat - ψ_star[x0], label=f'$x = {x0+1} $')
ax.set_xlabel('t')
ax.set_ylabel(r'$\hat p_t(x) - \psi^* (x)$')
ax.legend()
plt.show()
Note that the fraction of time spent at each state quickly converges to the probability assigned to that state by the stationary
distribution.
Exercise 26.4.2
According to the discussion above, if a worker’s employment dynamics obey the stochastic matrix
1−𝛼 𝛼
𝑃 ∶= [ ]
𝛽 1−𝛽
with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long run, the fraction of time spent unemployed will be
𝛽
𝑝 ∶=
𝛼+𝛽
In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 → ∞, where
1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1
This exercise asks you to illustrate convergence by computing 𝑋̄ 𝑚 for large 𝑚 and checking that it is close to 𝑝.
You will see that this statement is true regardless of the choice of initial condition or the values of 𝛼, 𝛽, provided both lie
in (0, 1).
The result should be similar to the plot we plotted here
α = β = 0.1
ts_length = 10000
p = β / (α + β)
ax.legend(loc='upper right')
plt.show()
Exercise 26.4.3
In quantecon library, irreducibility is tested by checking whether the chain forms a strongly connected component.
However, another way to verify irreducibility is by checking whether 𝐴 satisfies the following statement:
𝑛−1
Assume A is an 𝑛 × 𝑛 𝐴 is irreducible if and only if ∑𝑘=0 𝐴𝑘 is a positive matrix.
(see more: [Zha12] and here)
Based on this claim, write a function to test irreducibility.
def is_irreducible(P):
n = len(P)
result = np.zeros((n, n))
for i in range(n):
result += np.linalg.matrix_power(P, i)
return np.all(result > 0)
P1 = np.array([[0, 1],
[1, 0]])
P2 = np.array([[1.0, 0.0, 0.0],
[0.1, 0.8, 0.1],
[0.0, 0.2, 0.8]])
P3 = np.array([[0.971, 0.029, 0.000],
[0.145, 0.778, 0.077],
[0.000, 0.508, 0.492]])
[[0 1]
[1 0]]: irreducible
[[1. 0. 0. ]
[0.1 0.8 0.1]
[0. 0.2 0.8]]: reducible
[[0.971 0.029 0. ]
[0.145 0.778 0.077]
[0. 0.508 0.492]]: irreducible
TWENTYSEVEN
Contents
27.1 Overview
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
419
A First Course in Quantitative Economics with Python
where we assume that 𝑦0 and 𝑦−1 are given numbers that we take as initial conditions.
In Samuelson’s model, 𝑦𝑡 stood for national income or perhaps a different measure of aggregate activity called gross
domestic product (GDP) at time 𝑡.
Equation (27.1) is called a second-order linear difference equation.
But actually, it is a collection of 𝑇 simultaneous linear equations in the 𝑇 variables 𝑦1 , 𝑦2 , … , 𝑦𝑇 .
Note: To be able to solve a second-order linear difference equation, we require two boundary conditions that can take
the form either of two initial conditions or two terminal conditions or possibly one of each.
𝐴𝑦 = 𝑏
where
𝑦1
⎡𝑦 ⎤
𝑦 = ⎢ 2⎥
⎢ ⋮ ⎥
⎣𝑦𝑇 ⎦
Evidently 𝑦 can be computed from
𝑦 = 𝐴−1 𝑏
T = 80
# parameters
0 = 10.0
1 = 1.53
2 = -.9
for i in range(T):
if i-1 >= 0:
A[i, i-1] = - 1
if i-2 >= 0:
A[i, i-2] = - 2
b = np.full(T, 0)
b[0] = 0 + 1 * y0 + 2 * y_1
b[1] = 0 + 2 * y0
Let’s look at the matrix 𝐴 and the vector 𝑏 for our example.
A, b
(array([[ 1. , 0. , 0. , ..., 0. , 0. , 0. ],
[-1.53, 1. , 0. , ..., 0. , 0. , 0. ],
[ 0.9 , -1.53, 1. , ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 1. , 0. , 0. ],
[ 0. , 0. , 0. , ..., -1.53, 1. , 0. ],
[ 0. , 0. , 0. , ..., 0.9 , -1.53, 1. ]]),
array([ 21.52, -11.6 , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ]))
A_inv = np.linalg.inv(A)
y = A_inv @ b
y_second_method = np.linalg.solve(A, b)
Here make sure the two methods give the same result, at least up to floating point precision:
np.allclose(y, y_second_method)
True
Note: In general, np.linalg.solve is more numerically stable than using np.linalg.inv directly. However,
stability is not an issue for this small example. Moreover, we will repeatedly use A_inv in what follows, so there is added
value in computing it directly.
plt.plot(np.arange(T)+1, y)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
The steady state value 𝑦∗ of 𝑦𝑡 is obtained by setting 𝑦𝑡 = 𝑦𝑡−1 = 𝑦𝑡−2 = 𝑦∗ in (27.1), which yields
𝛼0
𝑦∗ =
1 − 𝛼 1 − 𝛼2
y_star = 0 / (1 - 1 - 2)
y_1_steady = y_star # y_{-1}
y0_steady = y_star
b_steady = np.full(T, 0)
b_steady[0] = 0 + 1 * y0_steady + 2 * y_1_steady
b_steady[1] = 0 + 2 * y0_steady
plt.plot(np.arange(T)+1, y_steady)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
To generate some excitement, we’ll follow in the spirit of the great economists Eugen Slutsky and Ragnar Frisch and
replace our original second-order difference equation with the following second-order stochastic linear difference
equation:
where 𝑢𝑡 ∼ 𝑁 (0, 𝜎𝑢2 ) and is IID, meaning independent and identically distributed.
We’ll stack these 𝑇 equations into a system cast in terms of matrix algebra.
Let’s define the random vector
𝑢1
⎡ 𝑢 ⎤
𝑢=⎢ 2 ⎥
⎢ ⋮ ⎥
⎣ 𝑢𝑇 ⎦
Where 𝐴, 𝑏, 𝑦 are defined as above, now assume that 𝑦 is governed by the system
𝐴𝑦 = 𝑏 + 𝑢 (27.3)
𝑦 = 𝐴−1 (𝑏 + 𝑢) (27.4)
u = 2.
u = np.random.normal(0, u, size=T)
y = A_inv @ (b + u)
plt.plot(np.arange(T)+1, y)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
The above time series looks a lot like (detrended) GDP series for a number of advanced countries in recent decades.
We can simulate 𝑁 paths.
N = 100
for i in range(N):
col = cm.viridis(np.random.rand()) # Choose a random color from viridis
u = np.random.normal(0, u, size=T)
y = A_inv @ (b + u)
plt.plot(np.arange(T)+1, y, lw=0.5, color=col)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
Also consider the case when 𝑦0 and 𝑦−1 are at steady state.
N = 100
for i in range(N):
col = cm.viridis(np.random.rand()) # Choose a random color from viridis
u = np.random.normal(0, u, size=T)
(continues on next page)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
We can apply standard formulas for multivariate normal distributions to compute the mean vector and covariance matrix
for our time series model
𝑦 = 𝐴−1 (𝑏 + 𝑢).
You can read about multivariate normal distributions in this lecture Multivariate Normal Distribution.
Let’s write our model as
̃ + 𝑢)
𝑦 = 𝐴(𝑏
where 𝐴 ̃ = 𝐴−1 .
Because linear combinations of normal random variables are normal, we know that
𝑦 ∼ 𝒩(𝜇𝑦 , Σ𝑦 )
where
𝜇𝑦 = 𝐴𝑏̃
and
̃ 2𝐼
Σ𝑦 = 𝐴(𝜎 𝑇̃
𝑢 𝑇 ×𝑇 )𝐴
Let’s write a Python class that computes the mean vector 𝜇𝑦 and covariance matrix Σ𝑦 .
class population_moments:
"""
Compute population moments mu_y, Sigma_y.
---------
Parameters:
alpha0, alpha1, alpha2, T, y_1, y0
"""
def __init__(self, alpha0, alpha1, alpha2, T, y_1, y0, sigma_u):
# compute A
A = np.identity(T)
for i in range(T):
if i-1 >= 0:
A[i, i-1] = -alpha1
if i-2 >= 0:
A[i, i-2] = -alpha2
# compute b
b = np.full(T, alpha0)
b[0] = alpha0 + alpha1 * y0 + alpha2 * y_1
b[1] = alpha0 + alpha2 * y0
# compute A inverse
A_inv = np.linalg.inv(A)
return ys
def get_moments(self):
"""
Compute the population moments of y.
"""
A_inv, sigma_u, b = self.A_inv, self.sigma_u, self.b
# compute mu_y
self.mu_y = A_inv @ b
self.Sigma_y = sigma_u**2 * (A_inv @ A_inv.T)
my_process = population_moments(
alpha0=10.0, alpha1=1.53, alpha2=-.9, T=80, y_1=28., y0=24., sigma_u=1)
# plot mean
N = 100
for i in range(N):
col = cm.viridis(np.random.rand()) # Choose a random color from viridis
ys = my_process.sample_y(N)
plt.plot(ys[i,:], lw=0.5, color=col)
plt.plot(mu_y, color='red')
plt.xlabel('t')
plt.ylabel('y')
plt.show()
# plot variance
plt.plot(Sigma_y.diagonal())
plt.show()
Notice that the covariance between 𝑦𝑡 and 𝑦𝑡−1 – the elements on the superdiagonal – are not identical.
This is is an indication that the time series represented by our 𝑦 vector is not stationary.
To make it stationary, we’d have to alter our system so that our initial conditions (𝑦1 , 𝑦0 ) are not fixed numbers but
instead a jointly normally distributed random vector with a particular mean and covariance matrix.
We describe how to do that in another lecture in this lecture Linear State Space Models.
But just to set the stage for that analysis, let’s print out the bottom right corner of Σ𝑦 .
Please notice how the sub diagonal and super diagonal elements seem to have converged.
This is an indication that our process is asymptotically stationary.
You can read about stationarity of more general linear time series models in this lecture Linear State Space Models.
There is a lot to be learned about the process by staring at the off diagonal elements of Σ𝑦 corresponding to different time
periods 𝑡, but we resist the temptation to do so here.
[[ 1. 0. 0. 0. 0. 0. 0. 0. ]
[ 1.53 1. 0. -0. -0. 0. -0. -0. ]
[ 1.441 1.53 1. 0. 0. 0. 0. 0. ]
[ 0.828 1.441 1.53 1. 0. 0. 0. 0. ]
[-0.031 0.828 1.441 1.53 1. 0. -0. -0. ]
[-0.792 -0.031 0.828 1.441 1.53 1. 0. 0. ]
[-1.184 -0.792 -0.031 0.828 1.441 1.53 1. 0. ]
[-1.099 -1.184 -0.792 -0.031 0.828 1.441 1.53 1. ]]
Notice how every row ends with the previous row’s pre-diagonal entries.
Since 𝐴−1 is lower triangular, each row represents 𝑦𝑡 for a particular 𝑡 as the sum of
• a time-dependent function 𝐴−1 𝑏 of the initial conditions incorporated in 𝑏, and
• a weighted sum of current and past values of the IID shocks {𝑢𝑡 }
Thus, let 𝐴 ̃ = 𝐴−1 .
Evidently, for 𝑡 ≥ 0,
𝑡+1 𝑡
̃
𝑦𝑡+1 = ∑ 𝐴𝑡+1,𝑖 ̃
𝑏𝑖 + ∑ 𝐴𝑡+1,𝑖 𝑢𝑖 + 𝑢𝑡+1
𝑖=1 𝑖=1
Just as system (27.4) constitutes a moving average representation for 𝑦, system (27.3) constitutes an autoregressive
representation for 𝑦.
Samuelson’s model is backwards looking in the sense that we give it initial conditions and let it run.
Let’s now turn to model that is forward looking.
We apply similar linear algebra machinery to study a perfect foresight model widely used as a benchmark in macroeco-
nomics and finance.
As an example, we suppose that 𝑝𝑡 is the price of a stock and that 𝑦𝑡 is its dividend.
We assume that 𝑦𝑡 is determined by second-order difference equation that we analyzed just above, so that
𝑦 = 𝐴−1 (𝑏 + 𝑢)
= .96
# construct B
B = np.zeros((T, T))
for i in range(T):
B[i, i:] = ** np.arange(0, T-i)
u = 0.
u = np.random.normal(0, u, size=T)
y = A_inv @ (b + u)
y_steady = A_inv @ (b_steady + u)
p = B @ y
plt.show()
Can you explain why the trend of the price is downward over time?
Also consider the case when 𝑦0 and 𝑦−1 are at the steady state.
p_steady = B @ y_steady
plt.show()
Optimization
433
A First Course in Quantitative Economics with Python
In this lecture, we will need the following library. Install ortools using pip.
435
A First Course in Quantitative Economics with Python
436
CHAPTER
TWENTYEIGHT
LINEAR PROGRAMMING
28.1 Overview
Linear programming problems either maximize or minimize a linear objective function subject to a set of linear equality
and/or inequality constraints.
Linear programs come in pairs:
• an original primal problem, and
• an associated dual problem.
If a primal problem involves maximization, the dual problem involves minimization.
If a primal problem involves minimization, the dual problem involves maximization.
We provide a standard form of a linear program and methods to transform other forms of linear programming problems
into a standard form.
We tell how to solve a linear programming problem using SciPy and Google OR-Tools.
We describe the important concept of complementary slackness and how it relates to the dual problem.
Let’s start with some standard imports.
import numpy as np
from ortools.linear_solver import pywraplp
from scipy.optimize import linprog
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
%matplotlib inline
437
A First Course in Quantitative Economics with Python
Product 1 Product 2
Material 2 5
Labor 4 2
Revenue 3 4
The following graph illustrates the firm’s constraints and iso-revenue lines.
The blue region is the feasible set within which all constraints are satisfied.
Parallel orange lines are iso-revenue lines.
The firm’s objective is to find the parallel orange lines to the upper boundary of the feasible set.
The intersection of the feasible set and the highest orange line delineates the optimal set.
In this example, the optimal set is the point (2.5, 5).
Let’s try to solve the same problem using the package ortools.linear_solver
The following cell instantiates a solver and creates two variables specifying the range of values that they can have.
Let’s us create two variables 𝑥1 and 𝑥2 such that they can only have nonnegative values.
# Create the two variables and let them take on any non-negative value.
x1 = solver.NumVar(0, solver.infinity(), 'x1')
x2 = solver.NumVar(0, solver.infinity(), 'x2')
Let’s specify the objective function. We use solver.Maximize method in the case when we want to maximize the
objective function and in the case of minimization we can use solver.Minimize.
Once we solve the problem, we can check whether the solver was successful in solving the problem using it’s status. If it’s
successful, then the status will be equal to pywraplp.Solver.OPTIMAL.
if status == pywraplp.Solver.OPTIMAL:
print('Objective value =', solver.Objective().Value())
x1_sol = round(x1.solution_value(), 2)
x2_sol = round(x2.solution_value(), 2)
print(f'(x1, x2): ({x1_sol}, {x2_sol})')
else:
print('The problem does not have an optimal solution.')
Annuity 𝑥1 𝑥1 𝑥1
Bank account 𝑥2 𝑥3 𝑥4
Corporate bond 0 𝑥5 0
The mutual fund’s decision making proceeds according to the following timing protocol:
1. At the beginning of the first year, the mutual fund decides how much to invest in the annuity and how much to
deposit in the bank. This decision is subject to the constraint:
𝑥1 + 𝑥2 = 100, 000
2. At the beginning of the second year, the mutual fund has a bank balance of 1.06𝑥2 . It must keep 𝑥1 in the annuity.
It can choose to put 𝑥5 into the corporate bond, and put 𝑥3 in the bank. These decisions are restricted by
𝑥1 + 𝑥5 = 1.06𝑥2 − 𝑥3
3. At the beginning of the third year, the mutual fund has a bank account balance equal to 1.06𝑥3 . It must again
invest 𝑥1 in the annuity, leaving it with a bank account balance equal to 𝑥4 . This situation is summarized by the
restriction:
𝑥1 = 1.06𝑥3 − 𝑥4
The mutual fund’s objective function, i.e., its wealth at the end of the third year is:
Let’s try to solve the above problem using the package ortools.linear_solver.
The following cell instantiates a solver and creates two variables specifying the range of values that they can have.
Let’s us create five variables 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , and 𝑥5 such that they can only have the values defined in the above constraints.
Let’s solve the problem and check the status using pywraplp.Solver.OPTIMAL.
if status == pywraplp.Solver.OPTIMAL:
print('Objective value =', solver.Objective().Value())
x1_sol = round(x1.solution_value(), 3)
x2_sol = round(x2.solution_value(), 3)
x3_sol = round(x1.solution_value(), 3)
x4_sol = round(x2.solution_value(), 3)
x5_sol = round(x1.solution_value(), 3)
print(f'(x1, x2, x3, x4, x5): ({x1_sol}, {x2_sol}, {x3_sol}, {x4_sol}, {x5_sol})')
else:
print('The problem does not have an optimal solution.')
For purposes of
• unifying linear programs that are initially stated in superficially different forms, and
• having a form that is convenient to put into black-box software packages,
it is useful to devote some effort to describe a standard form.
Our standard form is:
min 𝑐1 𝑥1 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛
𝑥
subject to 𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2
⋮
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + ⋯ + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
𝑥 1 , 𝑥 2 , … , 𝑥𝑛 ≥ 0
Let
𝑎11 𝑎12 … 𝑎1𝑛 𝑏1 𝑐1 𝑥1
⎡𝑎 𝑎22 … 𝑎2𝑛 ⎤ ⎡𝑏 ⎤ ⎡𝑐 ⎤ ⎡𝑥 ⎤
𝐴 = ⎢ 21 ⎥, 𝑏 = ⎢ 2⎥, 𝑐 = ⎢ 2⎥ , 𝑥 = ⎢ 2⎥ .
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮⎥ ⎢ ⋮ ⎥
⎣𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛 ⎦ ⎣𝑏𝑚 ⎦ ⎣𝑐𝑛 ⎦ ⎣𝑥𝑛 ⎦
min 𝑐′ 𝑥
𝑥
subject to 𝐴𝑥 = 𝑏 (28.1)
𝑥≥0
Here, 𝐴𝑥 = 𝑏 means that the 𝑖-th entry of 𝐴𝑥 equals the 𝑖-th entry of 𝑏 for every 𝑖.
Similarly, 𝑥 ≥ 0 means that 𝑥𝑗 is greater than equal to 0 for every 𝑗.
It is useful to know how to transform a problem that initially is not stated in the standard form into one that is.
By deploying the following steps, any linear programming problem can be transformed into an equivalent standard form
linear programming problem.
1. Objective Function: If a problem is originally a constrained maximization problem, we can construct a new
objective function that is the additive inverse of the original objective function. The transformed problem is then a
minimization problem.
2. Decision Variables: Given a variable 𝑥𝑗 satisfying 𝑥𝑗 ≤ 0, we can introduce a new variable 𝑥′𝑗 = −𝑥𝑗 and
substitute it into original problem. Given a free variable 𝑥𝑖 with no restriction on its sign, we can introduce two
new variables 𝑥+ − + − + −
𝑗 and 𝑥𝑗 satisfying 𝑥𝑗 , 𝑥𝑗 ≥ 0 and replace 𝑥𝑗 by 𝑥𝑗 − 𝑥𝑗 .
𝑛
3. Inequality constraints: Given an inequality constraint ∑𝑗=1 𝑎𝑖𝑗 𝑥𝑗 ≤ 0, we can introduce a new variable 𝑠𝑖 ,
𝑛
called a slack variable that satisfies 𝑠𝑖 ≥ 0 and replace the original constraint by ∑𝑗=1 𝑎𝑖𝑗 𝑥𝑗 + 𝑠𝑖 = 0.
Let’s apply the above steps to the two examples described above.
The package scipy.optimize provides a function linprog to solve linear programming problems with a form below:
min 𝑐′ 𝑥
𝑥
subject to 𝐴𝑢𝑏 𝑥 ≤ 𝑏𝑢𝑏
𝐴𝑒𝑞 𝑥 = 𝑏𝑒𝑞
𝑙≤𝑥≤𝑢
Note: By default 𝑙 = 0 and 𝑢 = None unless explicitly specified with the argument ‘bounds’.
# Construct parameters
c_ex1 = np.array([3, 4])
# Inequality constraints
A_ex1 = np.array([[2, 5],
[4, 2]])
b_ex1 = np.array([30,20])
Once we solve the problem, we can check whether the solver was successful in solving the problem using the boolean
attribute success. If it’s successful, then the success attribute is set to True.
if res_ex1.success:
# We use negative sign to get the optimal value (maximized value)
print('Optimal Value:', -res_ex1.fun)
print(f'(x1, x2): {res_ex1.x[0], res_ex1.x[1]}')
else:
print('The problem does not have an optimal solution.')
The optimal plan tells the factory to produce 2.5 units of Product 1 and 5 units of Product 2; that generates a maximizing
value of revenue of 27.5.
We are using the linprog function as a black box.
Inside it, Python first transforms the problem into standard form.
To do that, for each inequality constraint it generates one slack variable.
Here the vector of slack variables is a two-dimensional NumPy array that equals 𝑏𝑢𝑏 − 𝐴𝑢𝑏 𝑥.
See the official documentation for more details.
Note: This problem is to maximize the objective, so that we need to put a minus sign in front of parameter vector c.
# Construct parameters
rate = 1.06
# Inequality constraints
A_ex2 = np.array([[1, 1, 0, 0, 0],
[1, -rate, 1, 0, 1],
[1, 0, -rate, 1, 0]])
b_ex2 = np.array([100000, 0, 0])
Let’s solve the problem and check the status using success attribute.
if res_ex2.success:
# We use negative sign to get the optimal value (maximized value)
print('Optimal Value:', -res_ex2.fun)
x1_sol = round(res_ex2.x[0], 3)
x2_sol = round(res_ex2.x[1], 3)
x3_sol = round(res_ex2.x[2], 3)
x4_sol = round(res_ex2.x[3], 3)
x5_sol = round(res_ex2.x[4], 3)
print(f'(x1, x2, x3, x4, x5): {x1_sol, x2_sol, x3_sol, x4_sol, x5_sol}')
else:
print('The problem does not have an optimal solution.')
Note: You might notice the difference in the values of optimal solution using OR-Tools and SciPy but the optimal value
is the same. It is because there can be many optimal solutions for the same problem.
28.5 Exercises
Exercise 28.5.1
Implement a new extended solution for the Problem 1 where in the factory owner decides that number of units of Product
1 should not be less than the number of units of Product 2.
# Create the two variables and let them take on any non-negative value.
x1 = solver.NumVar(0, solver.infinity(), 'x1')
x2 = solver.NumVar(0, solver.infinity(), 'x2')
if status == pywraplp.Solver.OPTIMAL:
print('Objective value =', solver.Objective().Value())
x1_sol = round(x1.solution_value(), 2)
x2_sol = round(x2.solution_value(), 2)
print(f'(x1, x2): ({x1_sol}, {x2_sol})')
else:
print('The problem does not have an optimal solution.')
Exercise 28.5.2
A carpenter manufactures 2 products - 𝐴 and 𝐵.
Product 𝐴 generates a profit of 23 and product 𝐵 generates a profit of 10.
It takes 2 hours for the carpenter to produce 𝐴 and 0.8 hours to produce 𝐵.
Moreover, he can’t spend more than 25 hours per week and the total number of units of 𝐴 and 𝐵 should not be greater
than 20.
Find the number of units of 𝐴 and product 𝐵 that he should manufacture in order to maximise his profit.
subject to 𝑥 + 𝑦 ≤ 20
2𝑥 + 0.8𝑦 ≤ 25
Let’s us create two variables 𝑥1 and 𝑥2 such that they can only have nonnegative values.
# Create the two variables and let them take on any non-negative value.
x = solver.NumVar(0, solver.infinity(), 'x')
y = solver.NumVar(0, solver.infinity(), 'y')
if status == pywraplp.Solver.OPTIMAL:
print('Maximum Profit =', solver.Objective().Value())
x_sol = round(x.solution_value(), 3)
y_sol = round(y.solution_value(), 3)
print(f'(x, y): ({x_sol}, {y_sol})')
else:
print('The problem does not have an optimal solution.')
TWENTYNINE
SHORTEST PATHS
Contents
• Shortest Paths
– Overview
– Outline of the problem
– Finding least-cost paths
– Solving for minimum cost-to-go
– Exercises
29.1 Overview
The shortest path problem is a classic problem in mathematics and computer science with applications in
• Economics (sequential decision making, analysis of social networks, etc.)
• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• etc., etc.
Variations of the methods we discuss in this lecture are used millions of times every day, in applications such as
• Google Maps
• routing packets on the internet
For us, the shortest path problem also provides a nice introduction to the logic of dynamic programming.
Dynamic programming is an extremely powerful optimization technique that we apply in many lectures on this site.
The only scientific library we’ll need in what follows is NumPy:
import numpy as np
449
A First Course in Quantitative Economics with Python
The shortest path problem is one of finding how to traverse a graph from one specified node to another at minimum cost.
Consider the following graph
where
• 𝐹𝑣 is the set of nodes that can be reached from 𝑣 in one step.
• 𝑐(𝑣, 𝑤) is the cost of traveling from 𝑣 to 𝑤.
Hence, if we know the function 𝐽 , then finding the best path is almost trivial.
But how can we find the cost-to-go function 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies
This is known as the Bellman equation, after the mathematician Richard Bellman.
The Bellman equation can be thought of as a restriction that 𝐽 must satisfy.
What we want to do now is use this restriction to compute 𝐽 .
Let’s look at an algorithm for computing 𝐽 and then think about how to implement it.
The standard algorithm for finding 𝐽 is to start an initial guess and then iterate.
This is a standard approach to solving nonlinear equations, often called the method of successive approximations.
Our initial guess will be
Now
1. Set 𝑛 = 0
2. Set 𝐽𝑛+1 (𝑣) = min𝑤∈𝐹𝑣 {𝑐(𝑣, 𝑤) + 𝐽𝑛 (𝑤)} for all 𝑣
3. If 𝐽𝑛+1 and 𝐽𝑛 are not equal then increment 𝑛, go to 2
This sequence converges to 𝐽 .
Although we omit the proof, we’ll prove similar claims in our other lectures on dynamic programming.
29.4.2 Implementation
Having an algorithm is a good start, but we also need to think about how to implement it on a computer.
First, for the cost function 𝑐, we’ll implement it as a matrix 𝑄, where a typical element is
𝑐(𝑣, 𝑤) if 𝑤 ∈ 𝐹𝑣
𝑄(𝑣, 𝑤) = {
+∞ otherwise
Notice that the cost of staying still (on the principle diagonal) is set to
• np.inf for non-destination nodes — moving on is required.
• 0 for the destination node — here is where we stop.
For the sequence of approximations {𝐽𝑛 } of the cost-to-go functions, we can use NumPy arrays.
Let’s try with this example and see how we go:
max_iter = 500
i = 0
29.5 Exercises
Exercise 29.5.1
The text below describes a weighted directed graph.
The line node0, node1 0.04, node8 11.11, node14 72.21 means that from node0 we can go to
• node1 at cost 0.04
• node8 at cost 11.11
• node14 at cost 72.21
No other nodes can be reached directly from node0.
Other lines have a similar interpretation.
Your task is to use the algorithm given above to find the optimal path and its cost.
Note: You will be dealing with floating point numbers now, rather than integers, so consider replacing np.equal()
with np.allclose().
%%file graph.txt
node0, node1 0.04, node8 11.11, node14 72.21
node1, node46 1247.25, node6 20.59, node13 64.94
node2, node66 54.18, node31 166.80, node45 1561.45
node3, node20 133.65, node6 2.06, node11 42.43
node4, node75 3706.67, node5 0.73, node7 1.02
node5, node45 1382.97, node7 3.33, node11 34.54
node6, node31 63.17, node9 0.72, node10 13.10
node7, node50 478.14, node9 3.15, node10 5.85
node8, node69 577.91, node11 7.45, node12 3.18
node9, node70 2454.28, node13 4.42, node20 16.53
node10, node89 5352.79, node12 1.87, node16 25.16
node11, node94 4961.32, node18 37.55, node20 65.08
node12, node84 3914.62, node24 34.32, node28 170.04
node13, node60 2135.95, node38 236.33, node40 475.33
node14, node67 1878.96, node16 2.70, node24 38.65
node15, node91 3597.11, node17 1.01, node18 2.57
node16, node36 392.92, node19 3.49, node38 278.71
node17, node76 783.29, node22 24.78, node23 26.45
node18, node91 3363.17, node23 16.23, node28 55.84
node19, node26 20.09, node20 0.24, node28 70.54
node20, node98 3523.33, node24 9.81, node33 145.80
node21, node56 626.04, node28 36.65, node31 27.06
node22, node72 1447.22, node39 136.32, node40 124.22
node23, node52 336.73, node26 2.66, node33 22.37
node24, node66 875.19, node26 1.80, node28 14.25
node25, node70 1343.63, node32 36.58, node35 45.55
node26, node47 135.78, node27 0.01, node42 122.00
node27, node65 480.55, node35 48.10, node43 246.24
node28, node82 2538.18, node34 21.79, node36 15.52
node29, node64 635.52, node32 4.22, node33 12.61
node30, node98 2616.03, node33 5.61, node35 13.95
node31, node98 3350.98, node36 20.44, node44 125.88
node32, node97 2613.92, node34 3.33, node35 1.46
(continues on next page)
Overwriting graph.txt
num_nodes = 100
destination_node = 99
def map_graph_to_distance_matrix(in_file):
def compute_cost_to_go(Q):
num_nodes = Q.shape[0]
J = np.zeros(num_nodes) # Initial guess
max_iter = 500
i = 0
return(J)
We used np.allclose() rather than testing exact equality because we are dealing with floating point numbers now.
Finally, here’s a function that uses the cost-to-go function to obtain the optimal path (and its cost).
print(destination_node)
print('Cost: ', sum_costs)
Okay, now we have the necessary functions, let’s call them to do the job we were assigned.
Q = map_graph_to_distance_matrix('graph.txt')
J = compute_cost_to_go(Q)
print_best_path(J, Q)
0
8
11
18
23
33
41
53
56
57
60
(continues on next page)
The total cost of the path should agree with 𝐽 [0] so let’s check this.
J[0]
160.55
461
CHAPTER
THIRTY
Contents
In addition to what’s in Anaconda, this lecture will need the following libraries:
In this lecture we will begin with the foundational concepts in spectral theory.
Then we will explore the Perron-Frobenius Theorem and connect it to applications in Markov chains and networks.
We will use the following imports:
463
A First Course in Quantitative Economics with Python
0.5 0.1
𝐴=[ ]
0.2 0.2
0 1 1 0
𝐵=[ ], 𝐵2 = [ ]
1 0 0 1
1 0
𝐶=[ ]
0 1
𝐴𝑣 = 𝜆𝑣.
A = np.array([[3, 2],
[1, 4]])
# Keep 5 decimals
np.set_printoptions(precision=5)
We can also use scipy.linalg.eig with argument left=True to find left eigenvectors directly
The eigenvalues are the same while the eigenvectors themselves are different.
(Also note that we are taking the nonnegative value of the eigenvector of dominant eigenvalue, this is because eig
automatically normalizes the eigenvectors.)
We can then take transpose to obtain 𝐴⊤ 𝑤 = 𝜆𝑤 and obtain 𝑤⊤ 𝐴 = 𝜆𝑤⊤ .
This is a more common expression and where the name left eigenvectors originates.
For a square nonnegative matrix 𝐴, the behavior of 𝐴𝑘 as 𝑘 → ∞ is controlled by the eigenvalue with the largest absolute
value, often called the dominant eigenvalue.
For any such matrix 𝐴, the Perron-Frobenius Theorem characterizes certain properties of the dominant eigenvalue and
its corresponding eigenvector.
If a matrix 𝐴 ≥ 0 then,
1. the dominant eigenvalue of 𝐴, 𝑟(𝐴), is real-valued and nonnegative.
2. for any other eigenvalue (possibly complex) 𝜆 of 𝐴, |𝜆| ≤ 𝑟(𝐴).
3. we can find a nonnegative and nonzero eigenvector 𝑣 such that 𝐴𝑣 = 𝑟(𝐴)𝑣.
Moreover if 𝐴 is also irreducible then,
4. the eigenvector 𝑣 associated with the eigenvalue 𝑟(𝐴) is strictly positive.
5. there exists no other positive eigenvector 𝑣 (except scalar multiples of 𝑣) associated with 𝑟(𝐴).
(More of the Perron-Frobenius theorem about primitive matrices will be introduced below.)
(This is a relatively simple version of the theorem — for more details see here).
We will see applications of the theorem below.
Let’s build our intuition for the theorem using a simple example we have seen before.
Now let’s consider examples for each case.
A = np.array([[0, 1, 0],
[.5, 0, .5],
[0, 1, 0]])
eig(A)
Now we can see the claims of the Perron-Frobenius Theorem holds for the irreducible matrix 𝐴:
1. The dominant eigenvalue is real-valued and non-negative.
2. All other eigenvalues have absolute values less than or equal to the dominant eigenvalue.
3. A non-negative and nonzero eigenvector is associated with the dominant eigenvalue.
4. As the matrix is irreducible, the eigenvector associated with the dominant eigenvalue is strictly positive.
5. There exists no other positive eigenvector associated with the dominant eigenvalue.
We know that in real world situations it’s hard for a matrix to be everywhere positive (although they have nice properties).
The primitive matrices, however, can still give us helpful properties with looser definitions.
Let 𝐴 be a square nonnegative matrix and let 𝐴𝑘 be the 𝑘𝑡ℎ power of 𝐴.
A matrix is called primitive if there exists a 𝑘 ∈ ℕ such that 𝐴𝑘 is everywhere positive.
Recall the examples given in irreducible matrices:
0.5 0.1
𝐴=[ ]
0.2 0.2
B = np.array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
np.linalg.matrix_power(B, 2)
array([[2, 1, 1],
[1, 2, 1],
[1, 1, 2]])
eig(B)
Now let’s give some examples to see if the claims of the Perron-Frobenius Theorem hold for the primitive matrix 𝐵:
1. The dominant eigenvalue is real-valued and non-negative.
2. All other eigenvalues have absolute values strictly less than the dominant eigenvalue.
3. A non-negative and nonzero eigenvector is associated with the dominant eigenvalue.
4. The eigenvector associated with the dominant eigenvalue is strictly positive.
5. There exists no other positive eigenvector associated with the dominant eigenvalue.
6. The inequality |𝜆| < 𝑟(𝐵) holds for all eigenvalues 𝜆 of 𝐵 distinct from the dominant eigenvalue.
Furthermore, we can verify the convergence property (7) of the theorem on the following examples:
def compute_perron_projection(M):
eigval, v = eig(M)
eigval, w = eig(M.T)
r = np.max(eigval)
def check_convergence(M):
P, r = compute_perron_projection(M)
print("Perron projection:")
print(P)
for n in n_list:
# Compute (A/r)^n
M_n = np.linalg.matrix_power(M/r, n)
# Compute the difference between A^n / r^n and the Perron projection
diff = np.abs(M_n - P)
A1 = np.array([[1, 2],
(continues on next page)
A2 = np.array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
Matrix:
[[1 2]
[1 4]]
Perron projection:
[[0.1362 0.48507]
[0.24254 0.8638 ]]
n = 1, error = 0.0989045731
n = 10, error = 0.0000000001
n = 100, error = 0.0000000000
n = 1000, error = 0.0000000000
n = 10000, error = 0.0000000000
------------------------------------
Matrix:
[[0 1 1]
[1 0 1]
[1 1 0]]
Perron projection:
[[0.33333 0.33333 0.33333]
[0.33333 0.33333 0.33333]
[0.33333 0.33333 0.33333]]
n = 1, error = 0.7071067812
n = 10, error = 0.0013810679
n = 100, error = 0.0000000000
n = 1000, error = 0.0000000000
n = 10000, error = 0.0000000000
------------------------------------
Matrix:
[[0.971 0.029 0.1 1. ]
[0.145 0.778 0.077 0.59 ]
[0.1 0.508 0.492 1.12 ]
[0.2 0.8 0.71 0.95 ]]
Perron projection:
(continues on next page)
------------------------------------
B = np.array([[0, 1, 1],
[1, 0, 0],
[1, 0, 0]])
check_convergence(B)
Matrix:
[[0 1 1]
[1 0 0]
[1 0 0]]
100th power of matrix B:
[[1125899906842624 0 0]
[ 0 562949953421312 562949953421312]
[ 0 562949953421312 562949953421312]]
Perron projection:
[[0.5 0.35355 0.35355]
[0.35355 0.25 0.25 ]
[0.35355 0.25 0.25 ]]
n = 1, error = 1.0000000000
n = 10, error = 1.0000000000
n = 100, error = 1.0000000000
n = 1000, error = 1.0000000000
n = 10000, error = 1.0000000000
The result shows that the matrix is not primitive as it is not everywhere positive.
These examples show how the Perron-Frobenius Theorem relates to the eigenvalues and eigenvectors of positive matrices
and the convergence of the power of matrices.
In fact we have already seen the theorem in action before in the Markov chain lecture.
We are now prepared to bridge the languages spoken in the two lectures.
A primitive matrix is both irreducible and aperiodic.
So Perron-Frobenius Theorem explains why both Imam and Temple matrix and Hamilton matrix converge to a stationary
distribution, which is the Perron projection of the two matrices
print(compute_perron_projection(P)[0])
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ψ_star
print(compute_perron_projection(P_hamilton)[0])
mc = qe.MarkovChain(P_hamilton)
ψ_star = mc.stationary_distributions[0]
ψ_star
We can also verify other properties hinted by Perron-Frobenius in these stochastic matrices.
Another example is the relationship between convergence gap and convergence rate.
In the exercise, we stated that the convergence rate is determined by the spectral gap, the difference between the largest
and the second largest eigenvalue.
This can be proven using what we have learned here.
Please note that we use 𝟙 for a vector of ones in this lecture.
With Markov model 𝑀 with state space 𝑆 and transition matrix 𝑃 , we can write 𝑃 𝑡 as
𝑛−1
𝑃 𝑡 = ∑ 𝜆𝑡𝑖 𝑣𝑖 𝑤𝑖⊤ + 𝟙𝜓∗ ,
𝑖=1
Recall that eigenvalues are ordered from smallest to largest from 𝑖 = 1...𝑛.
As we have seen, the largest eigenvalue for a primitive stochastic matrix is one.
This can be proven using Gershgorin Circle Theorem, but it is out of the scope of this lecture.
So by the statement (6) of Perron-Frobenius Theorem, 𝜆𝑖 < 1 for all 𝑖 < 𝑛, and 𝜆𝑛 = 1 when 𝑃 is primitive.
Hence, after taking the Euclidean norm deviation, we obtain
Thus, the rate of convergence is governed by the modulus of the second largest eigenvalue.
30.2 Exercises
0.8444086477164563
Since we have 𝑟(𝐴) < 1 we can thus find the solution using the Neumann Series Lemma.
I = np.identity(3)
B = I - A
d = np.array([4, 5, 12])
d.shape = (3,1)
B_inv = np.linalg.inv(B)
x_star = B_inv @ d
print(x_star)
[[38.30189]
[44.33962]
[46.47799]]
THIRTYONE
INPUT-OUTPUT MODELS
31.1 Overview
This lecture requires the following imports and installs before we proceed.
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import quantecon_book_networks
import quantecon_book_networks.input_output as qbn_io
import quantecon_book_networks.plotting as qbn_plt
import quantecon_book_networks.data as qbn_data
import matplotlib as mpl
from matplotlib.patches import Polygon
quantecon_book_networks.config("matplotlib")
mpl.rcParams.update(mpl.rcParamsDefault)
The following figure illustrates a network of linkages among 15 sectors obtained from the US Bureau of Economic Anal-
ysis’s 2021 Input-Output Accounts Data.
An arrow from 𝑖 to 𝑗 means that some of sector 𝑖’s output serves as an input to production of sector 𝑗.
Economies are characterised by many such links.
A basic framework for their analysis is Leontief’s input-output model.
After introducing the input-output model, we describe some of its connections to linear programming lecture.
475
A First Course in Quantitative Economics with Python
Let
• 𝑥0 be the amount of a single exogenous input to production, say labor
• 𝑥𝑗 , 𝑗 = 1, … 𝑛 be the gross output of final good 𝑗
• 𝑑𝑗 , 𝑗 = 1, … 𝑛 be the net output of final good 𝑗 that is available for final consumption
• 𝑧𝑖𝑗 be the quantity of good 𝑖 allocated to be an input to producing good 𝑗 for 𝑖 = 1, … 𝑛, 𝑗 = 1, … 𝑛
• 𝑧0𝑗 be the quantity of labor allocated to producing good 𝑗.
• 𝑎𝑖𝑗 be the number of units of good 𝑖 required to produce one unit of good 𝑗, 𝑖 = 0, … , 𝑛, 𝑗 = 1, … 𝑛.
• 𝑤 > 0 be an exogenous wage of labor, denominated in dollars per unit of labor
• 𝑝 be an 𝑛 × 1 vector of prices of produced goods 𝑖 = 1, … , 𝑛.
The technology for producing good 𝑗 ∈ {1, … , 𝑛} is described by the Leontief function
𝑧𝑖𝑗
𝑥𝑗 = min ( )
𝑖∈{0,…,𝑛} 𝑎𝑖𝑗
(𝐼 − 𝐴)𝑥 ≥ 𝑑
(31.1)
𝑎⊤
0 𝑥 ≤ 𝑥0
𝑥 = (𝐼 − 𝐴)−1 𝑑 ≡ 𝐿𝑑 (31.2)
0.1 40 50
𝐴=[ ] and 𝑑 = [ ] (31.3)
0.01 0 2
A = np.array([[0.1, 40],
[0.01, 0]])
d = np.array([50, 2]).reshape((2, 1))
I = np.identity(2)
B = I - A
B
True
array([[2.0e+00, 8.0e+01],
[2.0e-02, 1.8e+00]])
array([[260. ],
[ 4.6]])
𝑎⊤
0 𝑥 = 𝑥0
or
𝐴⊤
0 𝑑 = 𝑥0 (31.4)
where
𝐴⊤ ⊤
0 = 𝑎0 (𝐼 − 𝐴)
−1
For 𝑖 ∈ {1, … , 𝑛}, the 𝑖th component of 𝐴0 is the amount of labor that is required to produce one unit of final output of
good 𝑖.
Equation (31.4) sweeps out a production possibility frontier of final consumption bundles 𝑑 that can be produced with
exogenous labor input 𝑥0 .
Consider the example in (31.3).
Suppose we are now given
𝑎⊤
0 = [4 100]
a0 = np.array([4, 100])
A0 = a0 @ L
A0
10𝑑1 + 500𝑑2 = 𝑥0
31.4 Prices
[DSS58] argue that relative prices of the 𝑛 produced goods must satisfy
More generally,
𝑝 = 𝐴⊤ 𝑝 + 𝑎0 𝑤
which states that the price of each final good equals the total cost of production, which consists of costs of intermediate
inputs 𝐴⊤ 𝑝 plus costs of labor 𝑎0 𝑤.
This equation can be written as
(𝐼 − 𝐴⊤ )𝑝 = 𝑎0 𝑤 (31.5)
which implies
𝑝 = (𝐼 − 𝐴⊤ )−1 𝑎0 𝑤
Notice how (31.5) with (31.1) forms a conjugate pair through the appearance of operators that are transposes of one
another.
This connection surfaces again in a classic linear program and its dual.
A primal problem is
min 𝑤𝑎⊤
0𝑥
𝑥
subject to
(𝐼 − 𝐴)𝑥 ≥ 𝑑
max 𝑝⊤ 𝑑
𝑝
subject to
(𝐼 − 𝐴)⊤ 𝑝 ≤ 𝑎0 𝑤
The primal problem chooses a feasible production plan to minimize costs for delivering a pre-assigned vector of final
goods consumption 𝑑.
The dual problem chooses prices to maximize the value of a pre-assigned vector of final goods 𝑑 subject to prices covering
costs of production.
By the strong duality theorem, optimal value of the primal and dual problems coincide:
𝑤𝑎⊤ ∗ ∗
0𝑥 = 𝑝 𝑑
where ∗ ’s denote optimal choices for the primal and dual problems.
The dual problem can be graphically represented as follows.
We have discussed that gross output 𝑥 is given by (31.2), where 𝐿 is called the Leontief Inverse.
Recall the Neumann Series Lemma which states that 𝐿 exists if the spectral radius 𝑟(𝐴) < 1.
In fact
∞
𝐿 = ∑ 𝐴𝑖
𝑖=0
Consider the impact of a demand shock Δ𝑑 which shifts demand from 𝑑0 to 𝑑1 = 𝑑0 + Δ𝑑.
Gross output shifts from 𝑥0 = 𝐿𝑑0 to 𝑥1 = 𝐿𝑑1 .
If 𝑟(𝐴) < 1 then a solution exists and
This illustrates that an element 𝑙𝑖𝑗 of 𝐿 shows the total impact on sector 𝑖 of a unit change in demand of good 𝑗.
We can further study input output networks through applications of graph theory.
An input output network can be represented by a weighted directed graph induced by the adjacency matrix 𝐴.
The set of nodes 𝑉 = [𝑛] is the list of sectors and the set of edges is given by
In Fig. 31.1 weights are indicated by the widths of the arrows, which are proportional to the corresponding input-output
coefficients.
We can now use centrality measures to rank sectors and discuss their importance relative to the other sectors.
We plot a bar graph of hub-based eigenvector centrality for the sectors represented in Fig. 31.1.
Another way to rank sectors in input output networks is via output multipliers.
The output multiplier of sector 𝑗 denoted by 𝜇𝑗 is usually defined as the total sector-wide impact of a unit change of
demand in sector 𝑗.
Earlier when disussing demand shocks we concluded that for 𝐿 = (𝑙𝑖𝑗 ) the element 𝑙𝑖𝑗 represents the impact on sector 𝑖
of a unit change in demand in sector 𝑗.
Thus,
𝑛
𝜇𝑗 = ∑ 𝑙𝑖𝑗
𝑗=1
𝜇⊤ = 𝟙⊤ (𝐼 − 𝐴)−1
31.8 Exercises
Exercise 31.8.1
[DSS58] Chapter 9 discusses an example with the following parameter settings:
0.1 1.46
𝐴=[ ] and 𝑎0 = [.04 .33]
0.16 0.17
250
𝑥=[ ] and 𝑥0 = 50
120
50
𝑑=[ ]
60
Describe how they infer the input-output coefficients in 𝐴 and 𝑎0 from the following hypothetical underlying “data” on
agricultural and manufacturing industries:
25 175
𝑧=[ ] and 𝑧0 = [10 40]
40 20
Exercise 31.8.2
Derive the production possibility frontier for the economy characterized in the previous exercise.
A = np.array([[0.1, 1.46],
[0.16, 0.17]])
a_0 = np.array([0.04, 0.33])
I = np.identity(2)
B = I - A
L = np.linalg.inv(B)
A_0 = a_0 @ L
A_0
array([0.16751071, 0.69224776])
0.17𝑑1 + 0.69𝑑2 = 50
THIRTYTWO
32.1 Outline
In addition to what’s in Anaconda, this lecture will need the following libraries:
import numpy as np
import matplotlib.pyplot as plt
This model is sometimes called the lake model because there are two pools of workers:
1. those who are currently employed.
2. those who are currently unemployed but are seeking employment.
The “flows” between the two lakes are as follows:
1. workers exit the labor market at rate 𝑑.
2. new workers enter the labor market at rate 𝑏.
3. employed workers separate from their jobs at rate 𝛼.
4. unemployed workers find jobs at rate 𝜆.
The below graph illustrates the lake model.
32.3 Dynamics
Let 𝑒𝑡 and 𝑢𝑡 be the number of employed and unemployed workers at time 𝑡 respectively.
The total population of workers is 𝑛𝑡 = 𝑒𝑡 + 𝑢𝑡 .
The number of unemployed and employed workers thus evolves according to:
487
A First Course in Quantitative Economics with Python
We can arrange (32.1) as a linear system of equations in matrix form 𝑥𝑡+1 = 𝐴𝑥𝑡 where
Let us first plot the time series of unemployment 𝑢𝑡 , employment 𝑒𝑡 , and labor force 𝑛𝑡 .
class LakeModel:
"""
Solves the lake model and computes dynamics of the unemployment stocks and
rates.
Parameters:
------------
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force
"""
def __init__(self, λ=0.1, α=0.013, b=0.0124, d=0.00822):
(continues on next page)
self.ū = (1 + g - (1 - d) * (1 - α)) / (1 + g - (1 - d) * (1 - α) + (1 - d) *␣
λ)
↪
self.ē = 1 - self.ū
Parameters
----------
x0 : array
Contains initial values (u0,e0)
T : int
Number of periods to simulate
Returns
----------
x : iterator
Contains sequence of employment and unemployment rates
"""
x0 = np.atleast_1d(x0) # Recast as array just in case
x_ts= np.zeros((2, T))
x_ts[:, 0] = x0
for t in range(1, T):
x_ts[:, t] = self.A @ x_ts[:, t-1]
return x_ts
lm = LakeModel()
e_0 = 0.92 # Initial employment
u_0 = 1 - e_0 # Initial unemployment, given initial n_0 = 1
lm = LakeModel()
T = 100 # Simulation length
axes[2].plot(x_path.sum(0), lw=2)
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
Since by intuition if we consider unemployment pool and employment pool as a closed system, the growth should be
similar to the labor force.
We next ask whether the long run growth rates of 𝑒𝑡 and 𝑢𝑡 also dominated by 1 + 𝑏 − 𝑑 as labor force.
The answer will be clearer if we appeal to Perron-Frobenius theorem.
The importance of the Perron-Frobenius theorem stems from the fact that firstly in the real world most matrices we
encounter are nonnegative matrices.
Secondly, many important models are simply linear iterative models that begin with an initial condition 𝑥0 and then evolve
recursively by the rule 𝑥𝑡+1 = 𝐴𝑥𝑡 or in short 𝑥𝑡 = 𝐴𝑡 𝑥0 .
This theorem helps characterise the dominant eigenvalue 𝑟(𝐴) which determines the behavior of this iterative process.
Dominant eigenvector
We now illustrate the power of the Perron-Frobenius theorem by showing how it helps us to analyze the lake model.
Since 𝐴 is a nonnegative and irreducible matrix, the Perron-Frobenius theorem implies that:
• the spectral radius 𝑟(𝐴) is an eigenvalue of 𝐴, where
𝑟(𝐴) ∶= max{|𝜆| ∶ 𝜆 is an eigenvalue of 𝐴}
• any other eigenvalue 𝜆 in absolute value is strictly smaller than 𝑟(𝐴): |𝜆| < 𝑟(𝐴),
• there exist unique and everywhere positive right eigenvector 𝜙 (column vector) and left eigenvector 𝜓 (row vector):
𝐴𝜙 = 𝑟(𝐴)𝜙, 𝜓𝐴 = 𝑟(𝐴)𝜓
• if further 𝐴 is positive, then with < 𝜓, 𝜙 >= 𝜓𝜙 = 1 we have
𝑟(𝐴)−𝑡 𝐴𝑡 → 𝜙𝜓
The last statement implies that the magnitude of 𝐴𝑡 is identical to the magnitude of 𝑟(𝐴)𝑡 in the long run, where 𝑟(𝐴)
can be considered as the dominant eigenvalue in this lecture.
Therefore, the magnitude 𝑥𝑡 = 𝐴𝑡 𝑥0 is also dominated by 𝑟(𝐴)𝑡 in the long run.
Recall that the spectral radius is bounded by column sums: for 𝐴 ≥ 0, we have
min colsum𝑗 (𝐴) ≤ 𝑟(𝐴) ≤ max colsum𝑗 (𝐴) (32.2)
𝑗 𝑗
Note that colsum𝑗 (𝐴) = 1 + 𝑏 − 𝑑 for 𝑗 = 1, 2 and by (32.2) we can thus conclude that the dominant eigenvalue is
𝑟(𝐴) = 1 + 𝑏 − 𝑑.
Denote 𝑔 = 𝑏 − 𝑑 as the overall growth rate of the total labor force, so that 𝑟(𝐴) = 1 + 𝑔.
𝑢̄
The Perron-Frobenius implies that there is a unique positive eigenvector 𝑥̄ = [ ] such that 𝐴𝑥̄ = 𝑟(𝐴)𝑥̄ and [1 1] 𝑥̄ =
𝑒̄
1:
𝑏 + 𝛼(1 − 𝑑)
𝑢̄ =
𝑏 + (𝛼 + 𝜆)(1 − 𝑑)
(32.3)
𝜆(1 − 𝑑)
𝑒̄ =
𝑏 + (𝛼 + 𝜆)(1 − 𝑑)
Since 𝑥̄ is the eigenvector corresponding to the dominant eigenvalue 𝑟(𝐴), we call 𝑥̄ the dominant eigenvector.
This dominant eigenvector plays an important role in determining long-run outcomes as illustrated below.
Parameters
----------
lm : class
Lake Model
x0 : array
Contains some different initial values.
T : int
Number of periods to simulate
"""
if x0 is None:
x0 = np.array([[5.0, 0.1]])
ū, ē = lm.ū, lm.ē
x0 = np.atleast_2d(x0)
if ax is None:
fig, ax = plt.subplots(figsize=(10, 8))
# Plot line D
s = 10
ax.plot([0, s * ū], [0, s * ē], "k--", lw=1, label='set $D$')
ax.set_xlim(-2, 6)
ax.set_ylim(-2, 6)
ax.set_xlabel("unemployed workforce")
ax.set_ylabel("employed workforce")
ax.set_xticks((0, 6))
ax.set_yticks((0, 6))
u0, e0 = x
ax.plot([u0], [e0], "ko", ms=2, alpha=0.6)
ax.annotate(f'$x_0 = ({u0},{e0})$',
xy=(u0, e0),
xycoords="data",
xytext=(0, 20),
(continues on next page)
if ax is None:
plt.show()
Since 𝑥̄ is an eigenvector corresponding to the eigenvalue 𝑟(𝐴), all the vectors in the set 𝐷 ∶= {𝑥 ∈ ℝ2 ∶ 𝑥 =
𝛼𝑥̄ for some 𝛼 > 0} are also eigenvectors corresponding to 𝑟(𝐴).
This set 𝐷 is represented by a dashed line in the above figure.
The graph illustrates that for two distinct initial conditions 𝑥0 the sequences of iterates (𝐴𝑡 𝑥0 )𝑡≥0 move towards 𝐷 over
time.
This suggests that all such sequences share strong similarities in the long run, determined by the dominant eigenvector 𝑥.̄
In the example illustrated above we considered parameters such that overall growth rate of the labor force 𝑔 > 0.
Suppose now we are faced with a situation where the 𝑔 < 0, i.e., negative growth in the labor force.
This means that 𝑏 − 𝑑 < 0, i.e., workers exit the market faster than they enter.
What would the behavior of the iterative sequence 𝑥𝑡+1 = 𝐴𝑥𝑡 be now?
This is visualised below.
Thus, while the sequence of iterates still moves towards the dominant eigenvector 𝑥,̄ in this case they converge to the
origin.
This is a result of the fact that 𝑟(𝐴) < 1, which ensures that the iterative sequence (𝐴𝑡 𝑥0 )𝑡≥0 will converge to some
point, in this case to (0, 0).
This leads us to the next result.
32.3.3 Properties
Since the column sums of 𝐴 are 𝑟(𝐴) = 1, the left eigenvector is 𝟙⊤ = [1, 1].
Perron-Frobenius theory implies that
𝑢̄ 𝑢̄
𝑟(𝐴)−𝑡 𝐴𝑡 ≈ 𝑥𝟙
̄ ⊤=[ ].
𝑒̄ 𝑒̄
𝑢̄ 𝑢̄ 𝑢0
𝑥𝑡 = 𝐴𝑡 𝑥0 ≈ 𝑟(𝐴)𝑡 [ ][ ]
𝑒 ̄ 𝑒 ̄ 𝑒0
𝑢̄
= (1 + 𝑔)𝑡 (𝑢0 + 𝑒0 ) [ ]
𝑒̄
= (1 + 𝑔)𝑡 𝑛0 𝑥̄
= 𝑛𝑡 𝑥.̄
as 𝑡 is large enough.
We see that the growth of 𝑢𝑡 and 𝑒𝑡 also dominated by 𝑟(𝐴) = 1 + 𝑔 in the long run: 𝑥𝑡 grows along 𝐷 as 𝑟(𝐴) > 1 and
converges to (0, 0) as 𝑟(𝐴) < 1.
Moreover, the long-run unemployment and employment are steady fractions of 𝑛𝑡 .
The latter implies that 𝑢̄ and 𝑒 ̄ are long-run unemployment rate and employment rate, respectively.
In detail, we have the unemployment rates and employment rates: 𝑥𝑡 /𝑛𝑡 = 𝐴𝑡 𝑛0 /𝑛𝑡 → 𝑥̄ as 𝑡 → ∞.
To illustrate the dynamics of the rates, let 𝐴 ̂ ∶= 𝐴/(1 + 𝑔) be the transition matrix of 𝑟𝑡 ∶= 𝑥𝑡 /𝑛𝑡 .
The dynamics of the rates follow
𝑥𝑡+1 𝑥𝑡+1 𝐴𝑥𝑡 𝑥
𝑟𝑡+1 = = = = 𝐴 ̂ 𝑡 = 𝐴𝑟̂ 𝑡 .
𝑛𝑡+1 (1 + 𝑔)𝑛𝑡 (1 + 𝑔)𝑛𝑡 𝑛𝑡
Observe that the column sums of 𝐴 ̂ are all one so that 𝑟(𝐴)̂ = 1.
One can check that 𝑥̄ is also the right eigenvector of 𝐴 ̂ corresponding to 𝑟(𝐴)̂ that 𝑥̄ = 𝐴𝑥.
̂ ̄
Moreover, 𝐴𝑡̂ 𝑟0 → 𝑥̄ as 𝑡 → ∞ for any 𝑟0 = 𝑥0 /𝑛0 , since the above discussion implies
𝑢̄ 𝑢̄ 𝑢̄
𝑟𝑡 = 𝐴𝑡̂ 𝑟0 = (1 + 𝑔)−𝑡 𝐴𝑡 𝑟0 = 𝑟(𝐴)−𝑡 𝐴𝑡 𝑟0 → [ ]𝑟 = [ ].
𝑒̄ 𝑒̄ 0 𝑒̄
lm = LakeModel()
e_0 = 0.92 # Initial employment
u_0 = 1 - e_0 # Initial unemployment, given initial n_0 = 1
lm = LakeModel()
T = 100 # Simulation length
x_path = lm.simulate_path(x_0, T)
plt.tight_layout()
plt.show()
To provide more intuition for convergence, we further explain the convergence below without the Perron-Frobenius the-
orem.
Suppose that 𝐴 ̂ = 𝑃 𝐷𝑃 −1 is diagonalizable, where 𝑃 = [𝑣1 , 𝑣2 ] consists of eigenvectors 𝑣1 and 𝑣2 of 𝐴 ̂ corresponding
to eigenvalues 𝛾1 and 𝛾2 respectively, and 𝐷 = diag(𝛾1 , 𝛾2 ).
Let 𝛾1 = 𝑟(𝐴)̂ = 1 and |𝛾2 | < 𝛾1 , so that the spectral radius is a dominant eigenvalue.
The dynamics of the rates follow 𝑟𝑡+1 = 𝐴𝑟̂ 𝑡 , where 𝑟0 is a probability vector: ∑𝑗 𝑟0,𝑗 = 1.
Consider 𝑧𝑡 = 𝑃 −1 𝑟𝑡 .
Then, we have 𝑧𝑡+1 = 𝑃 −1 𝑟𝑡+1 = 𝑃 −1 𝐴𝑟̂ 𝑡 = 𝑃 −1 𝐴𝑃
̂ 𝑧 = 𝐷𝑧 .
𝑡 𝑡
𝛾1𝑡 0 𝑐1
𝑟𝑡 = 𝑃 𝑧𝑡 = [𝑣1 𝑣2 ] [ ] [ ] = 𝑐1 𝛾1𝑡 𝑣1 + 𝑐2 𝛾2𝑡 𝑣2 .
0 𝛾2𝑡 𝑐2
Since |𝛾2 | < |𝛾1 | = 1, the second term in the right hand side converges to zero.
Therefore, the convergence follows 𝑟𝑡 → 𝑐1 𝑣1 .
Since the column sums of 𝐴 ̂ are one and 𝑟0 is a probability vector, 𝑟𝑡 must be a probability vector.
In this case, 𝑐1 𝑣1 must be a normalized eigenvector, so 𝑐1 𝑣1 = 𝑥̄ and then 𝑟𝑡 → 𝑥.̄
32.4 Exercise
ax.legend(loc='best')
plt.show()
THIRTYTHREE
NETWORKS
33.1 Outline
In recent years there has been rapid growth in a field called network science.
Network science studies relationships between groups of objects.
One important example is the world wide web , where web pages are connected by hyperlinks.
Another is the human brain: studies of brain function emphasize the network of connections between nerve cells (neurons).
Artificial neural networks are based on this idea, using data to build intricate connections between simple processing units.
Epidemiologists studying transmission of diseases like COVID-19 analyze interactions between groups of human hosts.
In operations research, network analysis is used to study fundamental problems as on minimum cost flow, the traveling
salesman, shortest paths, and assignment.
This lecture gives an introduction to economic and financial networks.
Some parts of this lecture are drawn from the text https://networks.quantecon.org/ but the level of this lecture is more
introductory.
We will need the following imports.
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import quantecon as qe
import matplotlib.cm as cm
import quantecon_book_networks
import quantecon_book_networks.input_output as qbn_io
import quantecon_book_networks.data as qbn_data
499
A First Course in Quantitative Economics with Python
The following figure shows international trade in large commercial aircraft in 2019 based on International Trade Data
SITC Revision 2.
The circles in the figure are called nodes or vertices – in this case they represent countries.
The arrows in the figure are called edges or links.
Node size is proportional to total exports and edge width is proportional to exports to the target country.
(The data is for trade in commercial aircraft weighing at least 15,000kg and was sourced from CID Dataverse.)
The figure shows that the US, France and Germany are major export hubs.
In the discussion below, we learn to quantify such ideas.
Recall that, in our lecture on Markov chains we studied a dynamic model of business cycles where the states are
• “ng” = “normal growth”
• “mr” = “mild recession”
• “sr” = “severe recession”
Let’s examine the following figure
This is an example of a network, where the set of nodes 𝑉 equals the states:
The edges between the nodes show the one month transition probabilities.
For these graphs, the arrows (edges) can be thought of as representing positive transition probabilities over a given unit
of time.
In general, if an edge (𝑢, 𝑣) exists, then the node 𝑢 is called a direct predecessor of 𝑣 and 𝑣 is called a direct successor
of 𝑢.
Also, for 𝑣 ∈ 𝑉 ,
• the in-degree is 𝑖𝑑 (𝑣) = the number of direct predecessors of 𝑣 and
• the out-degree is 𝑜𝑑 (𝑣) = the number of direct successors of 𝑣.
The Python package Networkx provides a convenient data structure for representing directed graphs and implements
many common routines for analyzing them.
As an example, let us recreate Fig. 33.3 using Networkx.
To do so, we first create an empty DiGraph object:
G_p = nx.DiGraph()
for e in edge_list:
u, v = e
G_p.add_edge(u, v)
G_p.add_edges_from(edge_list)
Adding the edges automatically adds the nodes, so G_p is now a correct representation of our graph.
We can verify this by plotting the graph via Networkx with the following code:
fig, ax = plt.subplots()
nx.draw_spring(G_p, ax=ax, node_size=500, with_labels=True,
font_weight='bold', arrows=True, alpha=0.8,
connectionstyle='arc3,rad=0.25', arrowsize=20)
plt.show()
The figure obtained above matches the original directed graph in Fig. 33.3.
DiGraph objects have methods that calculate in-degree and out-degree of nodes.
For example,
G_p.in_degree('p')
33.3.3 Communication
Next, we study communication and connectedness, which have important implications for economic networks.
Node 𝑣 is called accessible from node 𝑢 if either 𝑢 = 𝑣 or there exists a sequence of edges that lead from 𝑢 to 𝑣.
• in this case, we write 𝑢 → 𝑣
(Visually, there is a sequence of arrows leading from 𝑢 to 𝑣.)
For example, suppose we have a directed graph representing a production network, where
• elements of 𝑉 are industrial sectors and
• existence of an edge (𝑖, 𝑗) means that 𝑖 supplies products or services to 𝑗.
Then 𝑚 → ℓ means that sector 𝑚 is an upstream supplier of sector ℓ.
Two nodes 𝑢 and 𝑣 are said to communicate if both 𝑢 → 𝑣 and 𝑣 → 𝑢.
A graph is called strongly connected if all nodes communicate.
For example, Fig. 33.2 is strongly connected however in Fig. 33.3 rich is not accessible from poor, thus it is not strongly
connected.
We can verify this by first constructing the graphs using Networkx and then using nx.is_strongly_connected.
fig, ax = plt.subplots()
G1 = nx.DiGraph()
G1.add_edges_from([('p', 'p'),('p','m'),('p','r'),
('m', 'p'), ('m', 'm'), ('m', 'r'),
('r', 'p'), ('r', 'm'), ('r', 'r')])
True
fig, ax = plt.subplots()
G2 = nx.DiGraph()
G2.add_edges_from([('p', 'p'),
('m', 'p'), ('m', 'm'), ('m', 'r'),
('r', 'p'), ('r', 'm'), ('r', 'r')])
False
We now introduce weighted graphs, where weights (numbers) are attached to each edge.
To motivate the idea, consider the following figure which shows flows of funds (i.e., loans) between private banks, grouped
by country of origin.
An arrow from Japan to the US indicates aggregate claims held by Japanese banks on all US-registered banks, as collected
by the Bank of International Settlements (BIS).
The size of each node in the figure is increasing in the total foreign claims of all other nodes on this node.
The widths of the arrows are proportional to the foreign claims they represent.
Notice that, in this network, an edge (𝑢, 𝑣) exists for almost every choice of 𝑢 and 𝑣 (i.e., almost every country in the
network).
(In fact, there are even more small arrows, which we have dropped for clarity.)
Hence the existence of an edge from one node to another is not particularly informative.
To understand the network, we need to record not just the existence or absence of a credit flow, but also the size of the
flow.
The correct data structure for recording this information is a “weighted directed graph”.
33.4.2 Definitions
A weighted directed graph is a directed graph to which we have added a weight function 𝑤 that assigns a positive
number to each edge.
The figure above shows one weighted directed graph, where the weights are the size of fund flows.
The following figure shows a weighted directed graph, with arrows representing edges of the induced directed graph.
Another way that we can represent weights, which turns out to be very convenient for numerical work, is via a matrix.
The adjacency matrix of a weighted directed graph with nodes {𝑣1 , … , 𝑣𝑛 }, edges 𝐸 and weight function 𝑤 is the matrix
𝑤(𝑣𝑖 , 𝑣𝑗 ) if (𝑣𝑖 , 𝑣𝑗 ) ∈ 𝐸
𝐴 = (𝑎𝑖𝑗 )1≤𝑖,𝑗≤𝑛 with 𝑎𝑖𝑗 = {
0 otherwise.
Once the nodes in 𝑉 are enumerated, the weight function and adjacency matrix provide essentially the same information.
For example, with {poor, middle, rich} mapped to {1, 2, 3} respectively, the adjacency matrix corresponding to the
weighted directed graph in Fig. 33.5 is
0.9 0.1 0
⎛
⎜0.4 0.4 0.2⎞
⎟.
⎝0.1 0.1 0.8⎠
In QuantEcon’s DiGraph implementation, weights are recorded via the keyword weighted:
One of the key points to remember about adjacency matrices is that taking the transpose reverses all the arrows in the
associated directed graph.
For example, the following directed graph can be interpreted as a stylized version of a financial network, with nodes as
banks and edges showing the flow of funds.
G4 = nx.DiGraph()
G4.add_edges_from([('1','2'),
('2','1'),('2','3'),
('3','4'),
('4','2'),('4','5'),
('5','1'),('5','3'),('5','4')])
pos = nx.circular_layout(G4)
edge_labels={('1','2'): '100',
('2','1'): '50', ('2','3'): '200',
('3','4'): '100',
('4','2'): '500', ('4','5'): '50',
('5','1'): '150',('5','3'): '250', ('5','4'): '300'}
plt.show()
0 100 0 0 0
⎛
⎜ 50 0 200 0 0⎞⎟
⎜ ⎟
𝐴=⎜
⎜ 0 0 0 100 0 ⎟
⎟ .
⎜
⎜ 0 500 ⎟
0 0 50⎟
⎝150 0 250 300 0 ⎠
The transpose is
0 50 0 0 150
⎛
⎜ 100 0 0 500 0 ⎞⎟
⎜ ⎟
𝐴⊤ = ⎜
⎜ 0 200 0 0 250⎟⎟ .
⎜
⎜ 0 ⎟
0 100 0 300⎟
⎝ 0 0 0 50 0 ⎠
The corresponding network is visualized in the following figure which shows the network of liabilities after the loans have
been granted.
Both of these networks (original and transpose) are useful for analyzing financial markets.
G5 = nx.DiGraph()
G5.add_edges_from([('1','2'),('1','5'),
('2','1'),('2','4'),
('3','2'),('3','5'),
('4','3'),('4','5'),
('5','4')])
plt.show()
In general, every nonnegative 𝑛 × 𝑛 matrix 𝐴 = (𝑎𝑖𝑗 ) can be viewed as the adjacency matrix of a weighted directed
graph.
To build the graph we set 𝑉 = 1, … , 𝑛 and take the edge set 𝐸 to be all (𝑖, 𝑗) such that 𝑎𝑖𝑗 > 0.
For the weight function we set 𝑤(𝑖, 𝑗) = 𝑎𝑖𝑗 for all edges (𝑖, 𝑗).
We call this graph the weighted directed graph induced by 𝐴.
33.6 Properties
Theorem 33.6.1
The above result is obvious when 𝑘 = 1 and a proof of the general case can be found in [SS22].
Now recall from the eigenvalues lecture that a nonnegative matrix 𝐴 is called irreducible if for each (𝑖, 𝑗) there is an
integer 𝑘 ≥ 0 such that 𝑎𝑘𝑖𝑗 > 0.
From the preceding theorem, it is not too difficult (see [SS22] for details) to get the next result.
Theorem 33.6.2
For a weighted directed graph the following statements are equivalent:
1. The directed graph is strongly connected.
2. The adjacency matrix of the graph is irreducible.
G6 = nx.DiGraph()
G6.add_edges_from([('1','2'),('1','3'),
('2','1'),
('3','1'),('3','2')])
True
True
When studying networks of all varieties, a recurring topic is the relative “centrality” or “importance” of different nodes.
Examples include
• ranking of web pages by search engines
• determining the most important bank in a financial network (which one a central bank should rescue if there is a
financial crisis)
• determining the most important industrial sector in an economy.
In what follows, a centrality measure associates to each weighted directed graph a vector 𝑚 where the 𝑚𝑖 is interpreted
as the centrality (or rank) of node 𝑣𝑖 .
Two elementary measures of “importance” of a node in a given directed graph are its in-degree and out-degree.
Both of these provide a centrality measure.
In-degree centrality is a vector containing the in-degree of each node in the graph.
Consider the following simple example.
G7 = nx.DiGraph()
G7.add_nodes_from(['1','2','3','4','5','6','7'])
G7.add_edges_from([('1','2'),('1','6'),
('2','1'),('2','4'),
('3','2'),
('4','2'),
('5','3'),('5','4'),
('6','1'),
('7','4'),('7','6')])
pos = nx.planar_layout(G7)
plt.show()
for i, d in enumerate(iG7):
print(i+1, d)
1 2
2 3
3 1
4 3
5 0
6 2
7 0
D = qbn_io.build_unweighted_matrix(Z)
indegree = D.sum(axis=0)
fig, ax = plt.subplots()
df = centrality_plot_data(countries, indegree)
ax.set_ylim((0,20))
plt.show()
Unfortunately, while in-degree and out-degree centrality are simple to calculate, they are not always informative.
In Fig. 33.4, an edge exists between almost every node, so the in- or out-degree based centrality ranking fails to effectively
separate the countries.
This can be seen in the above graph as well.
Another example is the task of a web search engine, which ranks pages by relevance whenever a user enters a search.
Suppose web page A has twice as many inbound links as page B.
In-degree centrality tells us that page A deserves a higher rank.
But in fact, page A might be less important than page B.
To see why, suppose that the links to A are from pages that receive almost no traffic, while the links to B are from pages
that receive very heavy traffic.
In this case, page B probably receives more visitors, which in turn suggests that page B contains more valuable (or enter-
taining) content.
Note the recursive nature of the definition: the centrality obtained by node 𝑖 is proportional to a sum of the centrality of
all nodes, weighted by the rates of flow from 𝑖 into these nodes.
A node 𝑖 is highly ranked if
1. there are many edges leaving 𝑖,
2. these edges have large weights, and
3. the edges point to other highly ranked nodes.
Later, when we study demand shocks in production networks, there will be a more concrete interpretation of eigenvector
centrality.
We will see that, in production networks, sectors with high eigenvector centrality are important suppliers.
In particular, they are activated by a wide array of demand shocks once orders flow backwards through the network.
To compute eigenvector centrality we can use the following function.
"""
A_temp = A.T if authority else A
n = len(A_temp)
r = np.max(np.abs(np.linalg.eigvals(A_temp)))
(continues on next page)
Let’s compute eigenvector centrality for the graph generated in Fig. 33.6.
e = eigenvector_centrality(A)
n = len(e)
for i in range(n):
print(i+1,e[i])
1 0.18580570704268035
2 0.18580570704268035
3 0.11483424225608216
4 0.11483424225608216
5 0.14194292957319637
6 0.11483424225608216
7 0.14194292957319637
While nodes 2 and 4 had the highest in-degree centrality, we can see that nodes 1 and 2 have the highest eigenvector
centrality.
Let’s revisit the international credit network in Fig. 33.4.
eig_central = eigenvector_centrality(Z)
fig, ax = plt.subplots()
df = centrality_plot_data(countries, eig_central)
plt.show()
Countries that are rated highly according to this rank tend to be important players in terms of supply of credit.
Japan takes the highest rank according to this measure, although countries with large financial sectors such as Great Britain
and France are not far behind.
The advantage of eigenvector centrality is that it measures a node’s importance while considering the importance of its
neighbours.
A variant of eigenvector centrality is at the core of Google’s PageRank algorithm, which is used to rank web pages.
The main principle is that links from important nodes (as measured by degree centrality) are worth more than links from
unimportant nodes.
One problem with eigenvector centrality is that 𝑟(𝐴) might be zero, in which case 1/𝑟(𝐴) is not defined.
For this and other reasons, some researchers prefer another measure of centrality for networks called Katz centrality.
Fixing 𝛽 in (0, 1/𝑟(𝐴)), the Katz centrality of a weighted directed graph with adjacency matrix 𝐴 is defined as the
vector 𝜅 that solves
𝜅𝑖 = 𝛽 ∑ 𝑎𝑖𝑗 𝜅𝑗 + 1 for all 𝑖 ∈ {0, … , 𝑛 − 1}. (33.3)
1≤𝑗1
𝜅 = 1 + 𝛽𝐴𝜅 (33.4)
𝜅 = (𝐼 − 𝛽𝐴)−1 1
Search engine designers recognize that web pages can be important in two different ways.
Some pages have high hub centrality, meaning that they link to valuable sources of information (e.g., news aggregation
sites).
Other pages have high authority centrality, meaning that they contain valuable information, as indicated by the number
and significance of incoming links (e.g., websites of respected news organizations).
Similar ideas can and have been applied to economic networks (often using different terminology).
The eigenvector centrality and Katz centrality measures we discussed above measure hub centrality.
(Nodes have high centrality if they point to other nodes with high centrality.)
If we care more about authority centrality, we can use the same definitions except that we take the transpose of the
adjacency matrix.
This works because taking the transpose reverses the direction of the arrows.
(Now nodes will have high centrality if they receive links from other nodes with high centrality.)
For example, the authority-based eigenvector centrality of a weighted directed graph with adjacency matrix 𝐴 is the
vector 𝑒 solving
1
𝑒= 𝐴⊤ 𝑒. (33.5)
𝑟(𝐴)
The only difference from the original definition is that 𝐴 is replaced by its transpose.
(Transposes do not affect the spectral radius of a matrix so we wrote 𝑟(𝐴) instead of 𝑟(𝐴⊤ ).)
Element-by-element, this is given by
1
𝑒𝑗 = ∑ 𝑎 𝑒 (33.6)
𝑟(𝐴) 1≤𝑖≤𝑛 𝑖𝑗 𝑖
We see 𝑒𝑗 will be high if many nodes with high authority rankings link to 𝑗.
The following figurenshows the authority-based eigenvector centrality ranking for the international credit network shown
in Fig. 33.4.
fig, ax = plt.subplots()
df = centrality_plot_data(countries, ecentral_authority)
plt.show()
Highly ranked countries are those that attract large inflows of credit, or credit inflows from other major players.
In this case the US clearly dominates the rankings as a target of interbank credit.
33.9 Exercises
Exercise 33.9.1
Here is a mathematical exercise for those who like proofs.
Let (𝑉 , 𝐸) be a directed graph and write 𝑢 ∼ 𝑣 if 𝑢 and 𝑣 communicate.
Show that ∼ is an equivalence relation on 𝑉 .
⇒ 𝑢 → 𝑣 and 𝑣 → 𝑢.
By definition, this implies 𝑣 ∼ 𝑢.
Transitivity:
Suppose, 𝑢 ∼ 𝑣 and 𝑣 ∼ 𝑤
This implies, 𝑢 → 𝑣 and 𝑣 → 𝑢 and also 𝑣 → 𝑤 and 𝑤 → 𝑣.
Thus, we can conclude 𝑢 → 𝑣 → 𝑤 and 𝑤 → 𝑣 → 𝑢.
Which means 𝑢 ∼ 𝑤.
Exercise 33.9.2
Consider a directed graph 𝐺 with the set of nodes
𝑉 = {0, 1, 2, 3, 4, 5, 6, 7}
𝐸 = {(0, 1), (0, 3), (1, 0), (2, 4), (3, 2), (3, 4), (3, 7), (4, 3), (5, 4), (5, 6), (6, 3), (6, 5), (7, 0)}
G = nx.DiGraph()
plt.show()
for i, d in enumerate(oG):
print(i, d)
0 2
1 1
2 1
3 3
4 1
5 2
6 2
7 1
for i in range(n):
print(i+1, e[i])
1 0.1458980838002507
2 0.0901698980074874
3 0.05572805602479352
4 0.14589810100962305
5 0.0901699482402499
6 0.1803397955498566
7 0.20162621936025152
8 0.0901698980074874
Exercise 33.9.3
Consider a graph 𝐺 with 𝑛 nodes and 𝑛 × 𝑛 adjacency matrix 𝐴.
𝑛−1
Let 𝑆 = ∑𝑘=0 𝐴𝑘
We can say for any two nodes 𝑖 and 𝑗, 𝑗 is accessible from 𝑖 if and only if 𝑆𝑖𝑗 > 0.
Devise a function is_accessible that checks if any two nodes of a given graph are accessible.
Consider the graph in Exercise 33.9.2 and use this function to check if
1. 1 is accessible from 2
2. 6 is accessible from 3
def is_accessible(G,i,j):
A = nx.to_numpy_array(G)
n = len(A)
result = np.zeros((n, n))
for i in range(n):
result += np.linalg.matrix_power(A, i)
if result[i,j]>0:
return True
else:
return False
G = nx.DiGraph()
is_accessible(G, 2, 1)
True
is_accessible(G, 3, 6)
False
527
CHAPTER
THIRTYFOUR
34.1 Overview
In a previous lecture we studied supply, demand and welfare in a market with a single consumption good.
In this lecture, we study a setting with 𝑛 goods and 𝑛 corresponding prices.
Key infrastructure concepts that we’ll encounter in this lecture are
• inverse demand curves
• marginal utilities of wealth
• inverse supply curves
• consumer surplus
• producer surplus
• social welfare as a sum of consumer and producer surpluses
• competitive equilibrium
We will provide a version of the first fundamental welfare theorem, which was formulated by
• Leon Walras
• Francis Ysidro Edgeworth
• Vilfredo Pareto
Important extensions to the key ideas were obtained by
• Abba Lerner
• Harold Hotelling
• Paul Samuelson
• Kenneth Arrow
• Gerard Debreu
We shall describe two classic welfare theorems:
• first welfare theorem: for a given distribution of wealth among consumers, a competitive equilibrium allocation
of goods solves a social planning problem.
• second welfare theorem: An allocation of goods to consumers that solves a social planning problem can be
supported by a competitive equilibrium with an appropriate initial distribution of wealth.
As usual, we start by importing some Python modules.
529
A First Course in Quantitative Economics with Python
𝜕𝑎⊤ 𝑥 𝜕𝑥⊤ 𝑎
= =𝑎
𝜕𝑥 𝜕𝑥
𝜕𝐴𝑥
=𝐴
𝜕𝑥
𝜕𝑥⊤ 𝐴𝑥
= (𝐴 + 𝐴⊤ )𝑥
𝜕𝑥
𝜕𝑐
= (Π⊤ Π)−1
𝜕𝑝
A consumer faces 𝑝 as a price taker and chooses 𝑐 to maximize the utility function
1
− (Π𝑐 − 𝑏)⊤ (Π𝑐 − 𝑏) (34.1)
2
𝑝⊤ (𝑐 − 𝑒) = 0 (34.2)
We shall specify examples in which Π and 𝑏 are such that it typically happens that
Π𝑐 ≪ 𝑏 (34.3)
This means that the consumer has much less of each good than he wants.
The deviation in (34.3) will ultimately assure us that competitive equilibrium prices are positive.
Substituting (34.4) into budget constraint (34.2) and solving for 𝜇 gives
𝑝⊤ (Π⊤ Π)−1 Π⊤ 𝑏 − 𝑝⊤ 𝑒
𝜇(𝑝, 𝑒) = . (34.5)
𝑝⊤ (Π⊤ Π)−1 𝑝
Equation (34.5) tells how marginal utility of wealth depends on the endowment vector 𝑒 and the price vector 𝑝.
In the present case where we have imposed budget constraint in the form (34.2), we are free to normalize the price vector
by setting the marginal utility of wealth 𝜇 = 1 (or any other value for that matter).
This amounts to choosing a common unit (or numeraire) in which prices of all goods are expressed.
(Doubling all prices will affect neither quantities nor relative prices.)
We’ll set 𝜇 = 1.
Exercise 34.4.1
Verify that setting 𝜇 = 1 in (34.4) implies that formula (34.5) is satisfied.
Exercise 34.4.2
Verify that setting 𝜇 = 2 in (34.4) also implies that formula (34.5) is satisfied.
class ExchangeEconomy:
def __init__(self,
Π,
b,
e,
thres=1.5):
"""
Set up the environment for an exchange economy
Args:
Π (np.array): shared matrix of substitution
b (list): the consumer's bliss point
e (list): the consumer's endowment
thres (float): a threshold to check p >> Π e condition
"""
# check non-satiation
if np.min(b / np.max(Π @ e)) <= thres:
raise Exception('set bliss points further away')
def competitive_equilibrium(self):
"""
Compute the competitive equilibrium prices and allocation
"""
Π, b, e = self.Π, self.b, self.e
return p, c
Sometimes we’ll use budget constraint (34.2) in situations in which a consumer’s endowment vector 𝑒 is his only source
of income.
Other times we’ll instead assume that the consumer has another source of income (positive or negative) and write his
budget constraint as
𝑝⊤ (𝑐 − 𝑒) = 𝑤 (34.6)
where 𝑤 is measured in “dollars” (or some other numeraire) and component 𝑝𝑖 of the price vector is measured in dollars
per unit of good 𝑖.
Whether the consumer’s budget constraint is (34.2) or (34.6) and whether we take 𝑤 as a free parameter or instead as an
endogenous variable will affect the consumer’s marginal utility of wealth.
Consequently, how we set 𝜇 determines whether we are constructing
• a Marshallian demand curve, as when we use (34.2) and solve for 𝜇 using equation (34.5) above, or
• a Hicksian demand curve, as when we treat 𝜇 as a fixed parameter and solve for 𝑤 from (34.6).
Marshallian and Hicksian demand curves contemplate different mental experiments:
For a Marshallian demand curve, hypothetical changes in a price vector have both substitution and income effects
• income effects are consequences of changes in 𝑝⊤ 𝑒 associated with the change in the price vector
For a Hicksian demand curve, hypothetical price vector changes have only substitution effects
• changes in the price vector leave the 𝑝⊤ 𝑒 + 𝑤 unaltered because we freeze 𝜇 and solve for 𝑤
Sometimes a Hicksian demand curve is called a compensated demand curve in order to emphasize that, to disarm the
income (or wealth) effect associated with a price change, the consumer’s wealth 𝑤 is adjusted.
We’ll discuss these distinct demand curves more below.
Special cases of our 𝑛-good pure exchange model can be created to represent
• dynamics — by putting different dates on different commodities
• risk — by interpreting delivery of goods as being contingent on states of the world whose realizations are described
by a known probability distribution
Let’s illustrate how.
34.6.1 Dynamics
1 √0 ]
Π=[
0 𝛽
𝑒
𝑒 = [ 1]
𝑒2
and
𝑏
𝑏 = [√ 1 ]
𝛽𝑏2
𝑝1 𝑐1 + 𝑝2 𝑐2 = 𝑝1 𝑒1 + 𝑝2 𝑒2
beta = 0.95
Π = np.array([[1, 0],
[0, np.sqrt(beta)]])
dynamics = ExchangeEconomy(Π, b, e)
p, c = dynamics.competitive_equilibrium()
We study risk in the context of a static environment, meaning that there is only one period.
By risk we mean that an outcome is not known in advance, but that it is governed by a known probability distribution.
As an example, our consumer confronts risk means in particular that
• there are two states of nature, 1 and 2.
• the consumer knows that the probability that state 1 occurs is 𝜆.
• the consumer knows that the probability that state 2 occurs is (1 − 𝜆).
Before the outcome is realized, the consumer’s expected utility is
1
− [𝜆(𝑐1 − 𝑏1 )2 + (1 − 𝜆)(𝑐2 − 𝑏2 )2 ]
2
where
• 𝑐1 is consumption in state 1
• 𝑐2 is consumption in state 2
To capture these preferences we set
√
𝜆 √ 0 ]
Π=[
0 1−𝜆
𝑒
𝑒 = [ 1]
𝑒2
√
𝜆𝑏1
𝑏 = [√ ]
1 − 𝜆𝑏2
A consumer’s endowment vector is
𝑐
𝑐 = [ 1]
𝑐2
A price vector is
𝑝
𝑝 = [ 1]
𝑝2
The state-contingent goods being traded are often called Arrow securities.
Before the random state of the world 𝑖 is realized, the consumer sells his/her state-contingent endowment bundle and
purchases a state-contingent consumption bundle.
Trading such state-contingent goods is one way economists often model insurance.
We use the tricks described above to interpret 𝑐1 , 𝑐2 as “Arrow securities” that are state-contingent claims to consumption
goods.
Here is an instance of the risk economy:
prob = 0.2
Π = np.array([[np.sqrt(prob), 0],
[0, np.sqrt(1 - prob)]])
e = np.array([1, 1])
risk = ExchangeEconomy(Π, b, e)
p, c = risk.competitive_equilibrium()
Exercise 34.6.1
Consider the instance above.
Please numerically study how each of the following cases affects the equilibrium prices and allocations:
• the consumer gets poorer,
• they like the first good more, or
• the probability that state 1 occurs is higher.
Hints. For each case choose some parameter 𝑒, 𝑏, or 𝜆 different from the instance.
p, c = risk.competitive_equilibrium()
If the consumer likes the first (or second) good more, then we can set a larger bliss value for good 1.
prob = 0.8
Π = np.array([[np.sqrt(prob), 0],
[0, np.sqrt(1 - prob)]])
e = np.array([1, 1])
risk = ExchangeEconomy(Π, b, e)
p, c = risk.competitive_equilibrium()
Up to now we have described a pure exchange economy in which endowments of goods are exogenous, meaning that they
are taken as given from outside the model.
A competitive firm that can produce goods takes a price vector 𝑝 as given and chooses a quantity 𝑞 to maximize total
revenue minus total costs.
The firm’s total revenue equals 𝑝⊤ 𝑞 and its total cost equals 𝐶(𝑞) where 𝐶(𝑞) is a total cost function
1
𝐶(𝑞) = ℎ⊤ 𝑞 + 𝑞 ⊤ 𝐽 𝑞
2
and 𝐽 is a positive definite matrix.
𝑝⊤ 𝑞 − 𝐶(𝑞) (34.7)
𝜕𝐶(𝑞)
= ℎ + 𝐻𝑞
𝜕𝑞
where
1
𝐻= (𝐽 + 𝐽 ⊤ )
2
The firm maximizes total profits by setting marginal revenue to marginal costs.
𝜕𝑝⊤ 𝑞
An 𝑛 × 1 vector of marginal revenues for the price-taking firm is 𝜕𝑞 = 𝑝.
So price equals marginal revenue for our price-taking competitive firm.
This leads to the following inverse supply curve for the competitive firm:
𝑝 = ℎ + 𝐻𝑞
To compute a competitive equilibrium for a production economy where demand curve is pinned down by the marginal
utility of wealth 𝜇, we first compute an allocation by solving a planning problem.
Then we compute the equilibrium price vector using the inverse demand or supply curve.
𝜇 = 1 warmup
As a special case, let’s pin down a demand curve by setting the marginal utility of wealth 𝜇 = 1.
Equating supply price to demand price and letting 𝑞 = 𝑐 we get
𝑝 = ℎ + 𝐻𝑐 = Π⊤ 𝑏 − Π⊤ Π𝑐,
This equation is the counterpart of equilibrium quantity (7.3) for the scalar 𝑛 = 1 model with which we began.
General 𝜇 ≠ 1 case
Now let’s extend the preceding analysis to a more general case by allowing 𝜇 ≠ 1.
Then the inverse demand curve is
Equating this to the inverse supply curve, letting 𝑞 = 𝑐 and solving for 𝑐 gives
34.7.3 Implementation
class ProductionEconomy:
def __init__(self,
Π,
b,
h,
J,
μ):
"""
Set up the environment for a production economy
Args:
Π (np.ndarray): matrix of substitution
b (np.array): bliss points
h (np.array): h in cost func
J (np.ndarray): J in cost func
μ (float): welfare weight of the corresponding planning problem
"""
self.n = len(b)
self.Π, self.b, self.h, self.J, self.μ = Π, b, h, J, μ
def competitive_equilibrium(self):
"""
Compute a competitive equilibrium of the production economy
"""
Π, b, h, μ, J = self.Π, self.b, self.h, self.μ, self.J
H = .5 * (J + J.T)
# allocation
c = inv(Π.T @ Π + μ * H) @ (Π.T @ b - μ * h)
# price
p = 1 / μ * (Π.T @ b - Π.T @ Π @ c)
# check non-satiation
if any(Π @ c - b >= 0):
raise Exception('invalid result: set bliss points further away')
return c, p
(continues on next page)
def compute_surplus(self):
"""
Compute consumer and producer surplus for single good case
"""
if self.n != 1:
raise Exception('not single good')
h, J, Π, b, μ = self.h.item(), self.J.item(), self.Π.item(), self.b.item(),␣
↪self.μ
H = J
# competitive equilibrium
c, p = self.competitive_equilibrium()
# calculate surplus
c_surplus = d0 * c - .5 * d1 * c**2 - p * c
p_surplus = p * c - s0 * c - .5 * s1 * c**2
Then define a function that plots demand and supply curves and labels surpluses and equilibrium.
PE = ProductionEconomy(Π, b, h, J, μ)
c, p = PE.competitive_equilibrium()
# plot
plot_competitive_equilibrium(PE)
PE.μ = 2
c, p = PE.competitive_equilibrium()
# plot
plot_competitive_equilibrium(PE)
Now we change the bliss point so that the consumer derives more utility from consumption.
PE.μ = 1
PE.b = PE.b * 1.5
c, p = PE.competitive_equilibrium()
# plot
plot_competitive_equilibrium(PE)
Π = np.array([[1, 0],
[0, 1]])
b = np.array([10, 10])
h = np.array([0.5, 0.5])
J = np.array([[1, 0.5],
[0.5, 1]])
μ = 1
PE = ProductionEconomy(Π, b, h, J, μ)
c, p = PE.competitive_equilibrium()
c, p = PE.competitive_equilibrium()
c, p = PE.competitive_equilibrium()
A competitive firm is a price-taker who regards the price and therefore its marginal revenue as being beyond its control.
A monopolist knows that it has no competition and can influence the price and its marginal revenue by setting quantity.
A monopolist takes a demand curve and not the price as beyond its control.
Thus, instead of being a price-taker, a monopolist sets prices to maximize profits subject to the inverse demand curve
(34.9).
So the monopolist’s total profits as a function of its output 𝑞 is
1
[𝜇−1 Π⊤ (𝑏 − Π𝑞)]⊤ 𝑞 − ℎ⊤ 𝑞 − 𝑞 ⊤ 𝐽 𝑞 (34.11)
2
After finding first-order necessary conditions for maximizing monopoly profits with respect to 𝑞 and solving them for 𝑞,
we find that the monopolist sets
We’ll soon see that a monopolist sets a lower output 𝑞 than does either a
Exercise 34.7.1
Please verify the monopolist’s supply curve (34.12).
34.7.5 A monopolist
class Monopoly(ProductionEconomy):
def __init__(self,
Π,
b,
h,
J,
μ):
"""
Inherit all properties and methods from class ProductionEconomy
"""
super().__init__(Π, b, h, J, μ)
def equilibrium_with_monopoly(self):
"""
Compute the equilibrium price and allocation when there is a monopolist␣
↪supplier
"""
(continues on next page)
# allocation
q = inv(μ * H + 2 * Π.T @ Π) @ (Π.T @ b - μ * h)
# price
p = 1 / μ * (Π.T @ b - Π.T @ Π @ q)
return q, p
Define a function that plots the demand, marginal cost and marginal revenue curves with surpluses and equilibrium la-
belled.
Let’s compare competitive equilibrium and monopoly outcomes in a multiple goods economy.
Π = np.array([[1, 0],
[0, 1.2]])
b = np.array([10, 10])
h = np.array([0.5, 0.5])
J = np.array([[1, 0.5],
[0.5, 1]])
μ = 1
M = Monopoly(Π, b, h, J, μ)
c, p = M.competitive_equilibrium()
q, pm = M.equilibrium_with_monopoly()
A single-good example
M = Monopoly(Π, b, h, J, μ)
c, p = M.competitive_equilibrium()
q, pm = M.equilibrium_with_monopoly()
# plot
plot_monopoly(M)
Our welfare maximization problem – also sometimes called a social planning problem – is to choose 𝑐 to maximize
1
− 𝜇−1 (Π𝑐 − 𝑏)⊤ (Π𝑐 − 𝑏)
2
minus the area under the inverse supply curve, namely,
1
ℎ𝑐 + 𝑐⊤ 𝐽 𝑐
2
So the welfare criterion is
1 1
− 𝜇−1 (Π𝑐 − 𝑏)⊤ (Π𝑐 − 𝑏) − ℎ𝑐 − 𝑐⊤ 𝐽 𝑐
2 2
In this formulation, 𝜇 is a parameter that describes how the planner weighs interests of outside suppliers and our repre-
sentative consumer.
The first-order condition with respect to 𝑐 is
−𝜇−1 Π⊤ Π𝑐 + 𝜇−1 Π⊤ 𝑏 − ℎ − 𝐻𝑐 = 0
THIRTYFIVE
Contents
35.1 Overview
In the previous lecture, we studied competitive equilibria in an economy with many goods.
While the results of the study were informative, we used a strong simplifying assumption: all of the agents in the economy
are identical.
In the real world, households, firms and other economic agents differ from one another along many dimensions.
In this lecture, we introduce heterogeneity across consumers by allowing their preferences and endowments to differ.
We will examine competitive equilibrium in this setting.
We will also show how a “representative consumer” can be constructed.
Here are some imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.linalg import inv
549
A First Course in Quantitative Economics with Python
𝑐1 + 𝑐 2 = 𝑒 1 + 𝑒 2
𝑝⊤ (Π−1 𝑏𝑖 − 𝑒𝑖 )
𝜇𝑖 (𝑝, 𝑒) = (35.2)
𝑝⊤ (Π⊤ Π)−1 𝑝
for 𝜇𝑖 , 𝑖 = 1, 2.
Exercise 35.2.1
Show that, up to normalization by a positive scalar, the same competitive equilibrium price vector that you computed
in the preceding two-consumer economy would prevail in a single-consumer economy in which a single representative
consumer has utility function
𝑏 = 𝑏 1 + 𝑏2
and
𝑒 = 𝑒 1 + 𝑒2 .
Let’s further explore a pure exchange economy with 𝑛 goods and 𝑚 people.
class ExchangeEconomy:
def __init__(self,
Π,
bs,
(continues on next page)
Args:
Π (np.array): shared matrix of substitution
bs (list): all consumers' bliss points
es (list): all consumers' endowments
Ws (list): all consumers' wealth
thres (float): a threshold set to test b >> Pi e violated
"""
n, m = Π.shape[0], len(bs)
# check non-satiation
for b, e in zip(bs, es):
if np.min(b / np.max(Π @ e)) <= thres:
raise Exception('set bliss points further away')
if Ws == None:
Ws = np.zeros(m)
else:
if sum(Ws) != 0:
raise Exception('invalid wealth distribution')
def competitive_equilibrium(self):
"""
Compute the competitive equilibrium prices and allocation
"""
Π, bs, es, Ws = self.Π, self.bs, self.es, self.Ws
n, m = self.n, self.m
slope_dc = inv(Π.T @ Π)
Π_inv = inv(Π)
# aggregate
b = sum(bs)
e = sum(es)
for i in range(m):
μ_i = (-Ws[i] + p.T @ (Π_inv @ bs[i] - es[i])) / A
c_i = Π_inv @ bs[i] - μ_i * slope_dc @ p
μ_s.append(μ_i)
c_s.append(c_i)
35.4 Implementation
Here we study how competitive equilibrium 𝑝, 𝑐1 , 𝑐2 respond to different 𝑏𝑖 and 𝑒𝑖 , 𝑖 ∈ {1, 2}.
Π = np.array([[1, 0],
[0, 1]])
What happens if the first consumer likes the first good more and the second consumer likes the second good more?
Ws = [0.5, -0.5]
EE_new = ExchangeEconomy(Π, bs, es, Ws)
p, c_s, μ_s = EE_new.competitive_equilibrium()
Now let’s use the tricks described above to study a dynamic economy, one with two periods.
beta = 0.95
Π = np.array([[1, 0],
[0, np.sqrt(beta)]])
es = [np.array([1, 1])]
We use the tricks described above to interpret 𝑐1 , 𝑐2 as “Arrow securities” that are state-contingent claims to consumption
goods.
prob = 0.7
Π = np.array([[np.sqrt(prob), 0],
[0, np.sqrt(1 - prob)]])
es = [np.array([1, 0]),
np.array([0, 1])]
In the class of multiple consumer economies that we are studying here, it turns out that there exists a single representative
consumer whose preferences and endowments can be deduced from lists of preferences and endowments for separate
individual consumers.
Consider a multiple consumer economy with initial distribution of wealth 𝑊𝑖 satisfying ∑𝑖 𝑊𝑖 = 0
We allow an initial redistribution of wealth.
We have the following objects
• The demand curve:
𝑐𝑖 = Π−1 𝑏𝑖 − (Π⊤ Π)−1 𝜇𝑖 𝑝
• The marginal utility of wealth:
−𝑊𝑖 + 𝑝⊤ (Π−1 𝑏𝑖 − 𝑒𝑖 )
𝜇𝑖 =
𝑝⊤ (Π⊤ Π)−1 𝑝
• Market clearing:
∑ 𝑐 𝑖 = ∑ 𝑒𝑖
Denote aggregate consumption ∑𝑖 𝑐𝑖 = 𝑐 and ∑𝑖 𝜇𝑖 = 𝜇.
Market clearing requires
where
0 + 𝑝⊤ (Π−1 𝑏 − 𝑒)
𝜇 = ∑ 𝜇𝑖 = .
𝑖
𝑝⊤ (Π⊤ Π)−1 𝑝
𝑝⊤ (Π−1 𝑏 − 𝑒)
𝜇̃ =
𝑝⊤ (Π⊤ Π)−1 𝑝
In an equilibrium 𝑐 = 𝑒, so
𝑝 = 𝜇−1
̃ (Π⊤ 𝑏 − Π⊤ Π𝑒)
Thus, we have verified that, up to the choice of a numeraire in which to express absolute prices, the price vector in our
representative consumer economy is the same as that in an underlying economy with multiple consumers.
Estimation
557
CHAPTER
THIRTYSIX
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
The simple regression model estimates the relationship between two variables 𝑥𝑖 and 𝑦𝑖
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1, 2, ..., 𝑁
where 𝜖𝑖 represents the error between the line of best fit and the sample values for 𝑦𝑖 given 𝑥𝑖 .
Our goal is to choose values for 𝛼 and 𝛽 to build a line of “best” fit for some data that is available for variables 𝑥𝑖 and 𝑦𝑖 .
Let us consider a simple dataset of 10 observations for variables 𝑥𝑖 and 𝑦𝑖 :
𝑦𝑖 𝑥𝑖
1 2000 32
2 1000 21
3 1500 24
4 2500 35
5 500 10
6 900 11
7 1100 22
8 1500 21
9 1800 27
10 250 2
Let us think about 𝑦𝑖 as sales for an ice-cream cart, while 𝑥𝑖 is a variable that records the day’s temperature in Celsius.
X Y
0 32 2000
1 21 1000
2 24 1500
3 35 2500
(continues on next page)
559
A First Course in Quantitative Economics with Python
We can use a scatter plot of the data to see the relationship between 𝑦𝑖 (ice-cream sales in dollars ($’s)) and 𝑥𝑖 (degrees
Celsius).
ax = df.plot(
x='X',
y='Y',
kind='scatter',
ylabel='Ice-Cream Sales ($\'s)',
xlabel='Degrees Celcius'
)
as you can see the data suggests that more ice-cream is typically sold on hotter days.
To build a linear model of the data we need to choose values for 𝛼 and 𝛽 that represents a line of “best” fit such that
𝑦𝑖̂ = 𝛼̂ + 𝛽𝑥̂ 𝑖
α = 5
β = 10
df['Y_hat'] = α + β * df['X']
fig, ax = plt.subplots()
df.plot(x='X',y='Y', kind='scatter', ax=ax)
df.plot(x='X',y='Y_hat', kind='line', ax=ax)
We can see that this model does a poor job of estimating the relationship.
We can continue to guess and iterate towards a line of “best” fit by adjusting the parameters
β = 100
df['Y_hat'] = α + β * df['X']
fig, ax = plt.subplots()
df.plot(x='X',y='Y', kind='scatter', ax=ax)
df.plot(x='X',y='Y_hat', kind='line', ax=ax)
561
A First Course in Quantitative Economics with Python
β = 65
df['Y_hat'] = α + β * df['X']
fig, ax = plt.subplots()
df.plot(x='X',y='Y', kind='scatter', ax=ax)
df.plot(x='X',y='Y_hat', kind='line', ax=ax, color='g')
However we need to think about formalising this guessing process by thinking of this problem as an optimization problem.
Let’s consider the error 𝜖𝑖 and define the difference between the observed values 𝑦𝑖 and the estimated values 𝑦𝑖̂ which we
will call the residuals
𝑒𝑖̂ = 𝑦𝑖 − 𝑦𝑖̂
= 𝑦𝑖 − 𝛼̂ − 𝛽𝑥̂ 𝑖
df
X Y Y_hat error
0 32 2000 2085 85
1 21 1000 1370 370
2 24 1500 1565 65
3 35 2500 2280 -220
4 10 500 655 155
5 11 900 720 -180
6 22 1100 1435 335
7 21 1500 1370 -130
8 27 1800 1760 -40
9 2 250 135 -115
fig, ax = plt.subplots()
df.plot(x='X',y='Y', kind='scatter', ax=ax)
df.plot(x='X',y='Y_hat', kind='line', ax=ax, color='g')
plt.vlines(df['X'], df['Y_hat'], df['Y'], color='r');
563
A First Course in Quantitative Economics with Python
The Ordinary Least Squares (OLS) method, as the name suggests, chooses 𝛼 and 𝛽 in such a way that minimises the
Sum of the Squared Residuals (SSR).
𝑁 𝑁
min ∑ 𝑒2𝑖̂ = min ∑ (𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 )2
𝛼,𝛽 𝛼,𝛽
𝑖=1 𝑖=1
Let us first look at how the total error changes with respect to 𝛽 (holding the intercept 𝛼 constant)
We know from the next section the optimal values for 𝛼 and 𝛽 are:
β_optimal = 64.38
α_optimal = -14.72
errors = {}
for β in np.arange(20,100,0.5):
errors[β] = abs((α_optimal + β * df['X']) - df['Y']).sum()
ax = pd.Series(errors).plot(xlabel='β', ylabel='error')
plt.axvline(β_optimal, color='r');
errors = {}
for α in np.arange(-500,500,5):
errors[α] = abs((α + β_optimal * df['X']) - df['Y']).sum()
ax = pd.Series(errors).plot(xlabel='α', ylabel='error')
plt.axvline(α_optimal, color='r');
Now let us use calculus to solve the optimization problem and compute the optimal values for 𝛼 and 𝛽 to find the ordinary
least squares solution.
First taking the partial derivative with respect to 𝛼
𝜕𝐶 𝑁
[∑ (𝑦 − 𝛼 − 𝛽𝑥𝑖 )2 ]
𝜕𝛼 𝑖=1 𝑖
we can remove the constant −2 from the summation by dividing both sides by −2
𝑁
0 = ∑ (𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 )
𝑖=1
Now let’s take the partial derivative of the cost function 𝐶 with respect to 𝛽
𝜕𝐶 𝑁
[∑ (𝑦 − 𝛼 − 𝛽𝑥𝑖 )2 ]
𝜕𝛽 𝑖=1 𝑖
and setting it equal to 0
𝑁
0 = ∑ −2𝑥𝑖 (𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 )
𝑖=1
we can again take the constant outside of the summation and divide both sides by −2
𝑁
0 = ∑ 𝑥𝑖 (𝑦𝑖 − 𝛼 − 𝛽𝑥𝑖 )
𝑖=1
which becomes
𝑁
0 = ∑ (𝑥𝑖 𝑦𝑖 − 𝛼𝑥𝑖 − 𝛽𝑥2𝑖 )
𝑖=1
Now computing across the 10 observations and then summing the numerator and denominator
64.37665782493369
Calculating 𝛼
α = y_bar - β * x_bar
print(α)
-14.72148541114052
df['Y_hat'] = α + β * df['X']
df['error'] = df['Y_hat'] - df['Y']
fig, ax = plt.subplots()
df.plot(x='X',y='Y', kind='scatter', ax=ax)
df.plot(x='X',y='Y_hat', kind='line', ax=ax, color='g')
plt.vlines(df['X'], df['Y_hat'], df['Y'], color='r');
Exercise 36.2.1
Now that you know the equations that solve the simple linear regression model using OLS you can now run your own
regressions to build a model between 𝑦 and 𝑥.
Let’s consider two economic variables GDP per capita and Life Expectancy.
1. What do you think their relationship would be?
2. Gather some data from our world in data
3. Use pandas to import the csv formated data and plot a few different countries of interest
4. Use (36.1) and (36.2) to compute optimal values for 𝛼 and 𝛽
5. Plot the line of best fit found using OLS
6. Interpret the coefficients and write a summary sentence of the relationship between GDP per capita and Life Ex-
pectancy
fl = "_static/lecture_specific/simple_linear_regression/life-expectancy-vs-gdp-per-
↪capita.csv" # TODO: Replace with GitHub link
df = pd.read_csv(fl, nrows=10)
df
Continent
0 Asia
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
You can see that the data downloaded from Our World in Data has provided a global set of countries with the GDP per
capita and Life Expectancy Data.
It is often a good idea to at first import a few lines of data from a csv to understand its structure so that you can then
choose the columns that you want to read into your DataFrame.
You can observe that there are a bunch of columns we won’t need to import such as Continent
So let’s built a list of the columns we want to import
cols = ['Code', 'Year', 'Life expectancy at birth (historical)', 'GDP per capita']
df = pd.read_csv(fl, usecols=cols)
df
Sometimes it can be useful to rename your columns to make it easier to work with in the DataFrame
We can see there are NaN values which represents missing data so let us go ahead and drop those
df.dropna(inplace=True)
df
We have now dropped the number of rows in our DataFrame from 62156 to 12445 removing a lot of empty data rela-
tionships.
Now we have a dataset containing life expectancy and GDP per capita for a range of years.
It is always a good idea to spend a bit of time understanding what data you actually have.
For example, you may want to explore this data to see if there is consistent reporting for all countries across years
Let’s first look at the Life Expectancy Data
le_years
year 1543 1548 1553 1558 1563 1568 1573 1578 1583 1588 ... 2009 \
cntry ...
AFG NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 60.4
AGO NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 55.8
ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 77.8
ARE NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 78.0
ARG NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 75.9
... ... ... ... ... ... ... ... ... ... ... ... ...
VNM NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 73.5
(continues on next page)
year 2010 2011 2012 2013 2014 2015 2016 2017 2018
cntry
AFG 60.9 61.4 61.9 62.4 62.5 62.7 63.1 63.0 63.1
AGO 56.7 57.6 58.6 59.3 60.0 60.7 61.1 61.7 62.1
ALB 77.9 78.1 78.1 78.1 78.4 78.6 78.9 79.0 79.2
ARE 78.3 78.5 78.7 78.9 79.0 79.2 79.3 79.5 79.6
ARG 75.7 76.1 76.5 76.5 76.8 76.8 76.3 76.8 77.0
... ... ... ... ... ... ... ... ... ...
VNM 73.5 73.7 73.7 73.8 73.9 73.9 73.9 74.0 74.0
YEM 67.3 67.4 67.3 67.5 67.4 65.9 66.1 66.0 64.6
ZAF 58.9 60.7 61.8 62.5 63.4 63.9 64.7 65.4 65.7
ZMB 56.8 57.8 58.9 59.9 60.7 61.2 61.8 62.1 62.3
ZWE 50.7 53.3 55.6 57.5 58.8 59.6 60.3 60.7 61.4
As you can see there are a lot of countries where data is not available for the Year 1543!
Which country does report this data?
le_years[~le_years[1543].isna()]
year 1543 1548 1553 1558 1563 1568 1573 1578 1583 1588 \
cntry
GBR 33.94 38.82 39.59 22.38 36.66 39.67 41.06 41.56 42.7 37.05
year ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
cntry ...
GBR ... 80.2 80.4 80.8 80.9 80.9 81.2 80.9 81.1 81.2 81.1
You can see that Great Britain (GBR) is the only one available
You can also take a closer look at the time series to find that it is also non-continuous, even for GBR.
le_years.loc['GBR'].plot()
<Axes: xlabel='year'>
In fact we can use pandas to quickly check how many countries are captured in each year
So it is clear that if you are doing cross-sectional comparisons then more recent data will include a wider set of countries
Now let us consider the most recent year in the dataset 2018
df = df[df.year == 2018].reset_index(drop=True).copy()
By specifying logx you can plot the GDP per Capita data on a log scale
As you can see from this transformation – a linear model fits the shape of the data more closely.
df['log_gdppc'] = df['gdppc'].apply(np.log10)
df
Q4: Use (36.1) and (36.2) to compute optimal values for 𝛼 and 𝛽
data
log_gdppc life_expectancy
0 3.286581 63.1
1 4.045486 79.2
2 4.153145 76.1
3 3.890502 62.1
4 4.268493 77.0
.. ... ...
161 3.833411 74.0
162 4.182198 72.6
163 3.358865 64.6
164 3.548271 62.3
165 3.207205 61.4
12.643730292819699
α = y_bar - β * x_bar
print(α)
21.702096701389074
data['life_expectancy_hat'] = α + β * df['log_gdppc']
data['error'] = data['life_expectancy_hat'] - data['life_expectancy']
fig, ax = plt.subplots()
data.plot(x='log_gdppc',y='life_expectancy', kind='scatter', ax=ax)
data.plot(x='log_gdppc',y='life_expectancy_hat', kind='line', ax=ax, color='g')
plt.vlines(data['log_gdppc'], data['life_expectancy_hat'], data['life_expectancy'],␣
↪color='r')
<matplotlib.collections.LineCollection at 0x7f4b67f3a9e0>
Exercise 36.2.2
Minimising the sum of squares is not the only way to generate the line of best fit.
For example, we could also consider minimising the sum of the absolute values, that would give less weight to outliers.
Solve for 𝛼 and 𝛽 using the least absolute values
THIRTYSEVEN
37.1 Introduction
Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax will raise.
The proposed tax is
𝑎𝑤 if 𝑤 ≤ 𝑤̄
ℎ(𝑤) = {
𝑎𝑤̄ + 𝑏(𝑤 − 𝑤)̄ if 𝑤 > 𝑤̄
where 𝑤 is wealth.
For example, if 𝑎 = 0.05, 𝑏 = 0.1, and 𝑤̄ = 2.5, this means
• a 5% tax on wealth up to 2.5 and
• a 10% tax on wealth in excess of 2.5.
The unit is 100,000, so 𝑤 = 2.5 means 250,000 dollars.
Let’s go ahead and define ℎ:
For a population of size 𝑁 , where individual 𝑖 has wealth 𝑤𝑖 , total revenue raised by the tax will be
𝑁
𝑇 = ∑ ℎ(𝑤𝑖 )
𝑖=1
579
A First Course in Quantitative Economics with Python
Collecting and maintaining accurate wealth data for all individuals or households in a country is just too hard.
So let’s suppose instead that we obtain a sample 𝑤1 , 𝑤2 , ⋯ , 𝑤𝑛 telling us the wealth of 𝑛 randomly selected individuals.
For our exercise we are going to use a sample of 𝑛 = 10, 000 observations from wealth data in the US in 2016.
n = 10_000
fig, ax = plt.subplots()
ax.set_xlim(-1,20)
ax.hist(sample, density=True, bins=5_000, histtype='stepfilled', alpha=0.8)
plt.show()
The histogram shows that many people have very low wealth and a few people have very high wealth.
We will take the full population size to be
N = 100_000_000
How can we estimate total revenue from the full population using only the sample data?
Our plan is to assume that wealth of each individual is a draw from a distribution with density 𝑓.
If we obtain an estimate of 𝑓 we can then approximate 𝑇 as follows:
𝑁 ∞
1 𝑁
𝑇 = ∑ ℎ(𝑤𝑖 ) = 𝑁 ∑ ℎ(𝑤𝑖 ) ≈ 𝑁 ∫ ℎ(𝑤)𝑓(𝑤)𝑑𝑤 (37.1)
𝑖=1
𝑁 𝑖=1 0
(The sample mean should be close to the mean by the law of large numbers.)
ln_sample = np.log(sample)
fig, ax = plt.subplots()
ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
plt.show()
Now our job is to obtain the maximum likelihood estimates of 𝜇 and 𝜎, which we denote by 𝜇̂ and 𝜎.̂
These estimates can be found by maximizing the likelihood function given the data.
The pdf of a lognormally distributed random variable 𝑋 is given by:
2
1 1 −1 ln 𝑥 − 𝜇
𝑓(𝑥, 𝜇, 𝜎) = √ exp ( ( ))
𝑥 𝜎 2𝜋 2 𝜎
To find where this function is maximised we find its partial derivatives wrt 𝜇 and 𝜎2 and equate them to 0.
Let’s first find the maximum likelihood estimate (MLE) of 𝜇
𝑛
𝛿ℓ 1
= − 2 × 2 ∑(ln 𝑤𝑖 − 𝜇) = 0
𝛿𝜇 2𝜎 𝑖=1
𝑛
⟹ ∑ ln 𝑤𝑖 − 𝑛𝜇 = 0
𝑖=1
𝑛
∑ ln 𝑤𝑖
⟹ 𝜇̂ = 𝑖=1
𝑛
Now let’s find the MLE of 𝜎
𝛿ℓ 𝑛 1 𝑛
= − + ∑(ln 𝑤𝑖 − 𝜇)2 = 0
𝛿𝜎2 2𝜎2 2𝜎4 𝑖=1
𝑛 1 𝑛
⟹ = ∑(ln 𝑤𝑖 − 𝜇)2
2𝜎2 2𝜎4 𝑖=1
𝑛 1/2
∑ (ln 𝑤𝑖 − 𝜇)̂ 2
⟹ 𝜎̂ = ( 𝑖=1 )
𝑛
Now that we have derived the expressions for 𝜇̂ and 𝜎,̂ let’s compute them for our wealth sample.
μ_hat = np.mean(ln_sample)
μ_hat
0.0634375526654064
2.1507346258433424
Let’s plot the log-normal pdf using the estimated parameters against our sample data.
fig, ax = plt.subplots()
ax.set_xlim(-1,20)
Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
We now use (37.1) to calculate total revenue.
We will compute the integral using numerical integration via SciPy’s quad function
def total_revenue(dist):
integral, _ = quad(lambda x: h(x) * dist.pdf(x), 0, 100_000)
T = N * integral
return T
tr_lognorm = total_revenue(dist_lognorm)
tr_lognorm
101105326.82814863
(Our unit was 100,000 dollars, so this means that actual revenue is 100,000 times as large.)
We mentioned above that using maximum likelihood estimation requires us to make a prior assumption of the underlying
distribution.
Previously we assumed that the distribution is lognormal.
Suppose instead we assume that 𝑤𝑖 are drawn from the Pareto Distribution with parameters 𝑏 and 𝑥𝑚 .
In this case, the maximum likelihood estimates are known to be
𝑛
𝑏̂ = 𝑛 and 𝑥𝑚
̂ = min 𝑤𝑖
∑𝑖=1 ln(𝑤𝑖 /𝑥𝑚̂ ) 𝑖
xm_hat = min(sample)
xm_hat
0.0001
den = np.log(sample/xm_hat)
b_hat = 1/np.mean(den)
b_hat
0.10783091940803055
12933168365.762571
tr_pareto / tr_lognorm
127.91777418162562
fig, ax = plt.subplots()
ax.set_xlim(-1, 20)
ax.set_ylim(0,1.75)
plt.show()
We observe that in this case the fit for the Pareto distribution is not very good, so we can probably reject it.
fig, ax = plt.subplots()
ax.set_xlim(0,50)
ax.hist(sample_tail, density=True, bins=500, histtype='stepfilled', alpha=0.8)
plt.show()
ln_sample_tail = np.log(sample_tail)
μ_hat_tail = np.mean(ln_sample_tail)
num_tail = (ln_sample_tail - μ_hat_tail)**2
σ_hat_tail = (np.mean(num_tail))**(1/2)
dist_lognorm_tail = lognorm(σ_hat_tail, scale = exp(μ_hat_tail))
fig, ax = plt.subplots()
ax.set_xlim(0,50)
ax.hist(sample_tail, density=True, bins=500, histtype='stepfilled', alpha=0.5)
ax.plot(x, dist_lognorm_tail.pdf(x), 'k-', lw=0.5, label='lognormal pdf')
ax.legend()
plt.show()
While the lognormal distribution was a good fit for the entire dataset, it is not a good fit for the right hand tail.
xm_hat_tail = min(sample_tail)
den_tail = np.log(sample_tail/xm_hat_tail)
b_hat_tail = 1/np.mean(den_tail)
dist_pareto_tail = pareto(b = b_hat_tail, scale = xm_hat_tail)
fig, ax = plt.subplots()
ax.set_xlim(0, 50)
ax.set_ylim(0,0.65)
ax.hist(sample_tail, density=True, bins= 500, histtype='stepfilled', alpha=0.5)
ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
plt.show()
The Pareto distribution is a better fit for the right hand tail of our dataset.
37.5 Exercises
Exercise 37.5.1
Suppose we assume wealth is exponentially distributed with parameter 𝜆 > 0.
The maximum likelihood estimate of 𝜆 is given by
𝑛
𝜆̂ = 𝑛
∑𝑖=1 𝑤𝑖
λ_hat = 1/np.mean(sample)
λ_hat
0.15234120963403971
55246978.53427645
Exercise 37.5.2
Plot the exponential distribution against the sample and check if it is a good fit or not.
fig, ax = plt.subplots()
ax.set_xlim(-1, 20)
plt.show()
Other
591
CHAPTER
THIRTYEIGHT
TROUBLESHOOTING
Contents
• Troubleshooting
– Fixing your local environment
– Reporting an issue
This page is for readers experiencing errors when running the code from the lectures.
The basic assumption of the lectures is that code in a lecture should execute whenever
1. it is executed in a Jupyter notebook and
2. the notebook is running on a machine with the latest version of Anaconda Python.
You have installed Anaconda, haven’t you, following the instructions in this lecture?
Assuming that you have, the most common source of problems for our readers is that their Anaconda distribution is not
up to date.
Here’s a useful article on how to update Anaconda.
Another option is to simply remove Anaconda and reinstall.
You also need to keep the external code libraries, such as QuantEcon.py up to date.
For this task you can either
• use conda install -y quantecon on the command line, or
• execute !conda install -y quantecon within a Jupyter notebook.
If your local environment is still not working you can do two things.
First, you can use a remote machine instead, by clicking on the Launch Notebook icon available for each lecture
593
A First Course in Quantitative Economics with Python
Second, you can report an issue, so we can try to fix your local set up.
We like getting feedback on the lectures so please don’t hesitate to get in touch.
One way to give feedback is to raise an issue through our issue tracker.
Please be as specific as possible. Tell us where the problem is and as much detail about your local set up as you can
provide.
Another feedback option is to use our discourse forum.
Finally, you can provide direct feedback to [email protected]
THIRTYNINE
REFERENCES
595
A First Course in Quantitative Economics with Python
FORTY
EXECUTION STATISTICS
597
A First Course in Quantitative Economics with Python
These lectures are built on linux instances through github actions and amazon web services (aws)
to enable access to a gpu. These lectures are built on a p3.2xlarge that has access to 8 vcpu's, a V100 NVIDIA
Tesla GPU, and 61 Gb of memory.
[AR02] Daron Acemoglu and James A. Robinson. The political economy of the Kuznets curve. Review of Development
Economics, 6(2):183–203, 2002.
[AKM+18] SeHyoun Ahn, Greg Kaplan, Benjamin Moll, Thomas Winberry, and Christian Wolf. When inequality mat-
ters for macro and macro matters for inequality. NBER Macroeconomics Annual, 32(1):1–75, 2018.
[Axt01] Robert L Axtell. Zipf distribution of us firm sizes. science, 293(5536):1818–1820, 2001.
[BB18] Jess Benhabib and Alberto Bisin. Skewed wealth distributions: theory and empirics. Journal of Economic
Literature, 56(4):1261–91, 2018.
[BBL19] Jess Benhabib, Alberto Bisin, and Mi Luo. Wealth Distribution and Social Mobility in the US: A Quantitative
Approach. American Economic Review, 109(5):1623–1647, May 2019.
[Ber97] J. N. Bertsimas, D. & Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.
[BEGS18] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J Sargent. Inequality, business cycles, and
monetary-fiscal policy. Technical Report, National Bureau of Economic Research, 2018.
[BEJ18] Stephen P Borgatti, Martin G Everett, and Jeffrey C Johnson. Analyzing social networks. Sage, 2018.
[Cag56] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor, Studies in the Quantity
Theory of Money, pages 25–117. University of Chicago Press, Chicago, 1956.
[CB96] Marcus J Chambers and Roy E Bailey. A theory of commodity price fluctuations. Journal of Political Econ-
omy, 104(5):924–957, 1996.
[Coc23] John H Cochrane. The Fiscal Theory of the Price Level. Princeton University Press, Princeton, New Jersey,
2023.
[Cos21] Michele Coscia. The atlas for the aspiring network scientist. arXiv preprint arXiv:2101.00863, 2021.
[DL92] Angus Deaton and Guy Laroque. On the behavior of commodity prices. The Review of Economic Studies,
59:1–23, 1992.
[DL96] Angus Deaton and Guy Laroque. Competitive storage and commodity price dynamics. Journal of Political
Economy, 104(5):896–923, 1996.
[DSS58] Robert Dorfman, Paul A. Samuelson, and Robert M. Solow. Linear Programming and Economic Analysis:
Revised Edition. McGraw Hill, New York, 1958.
[EK+10] David Easley, Jon Kleinberg, and others. Networks, crowds, and markets. Volume 8. Cambridge university
press Cambridge, 2010.
[Fri56] M. Friedman. A Theory of the Consumption Function. Princeton University Press, 1956.
599
A First Course in Quantitative Economics with Python
[FDGA+04] Yoshi Fujiwara, Corrado Di Guilmi, Hideaki Aoyama, Mauro Gallegati, and Wataru Souma. Do pareto–
zipf and gibrat laws hold true? an analysis with european firms. Physica A: Statistical Mechanics and its
Applications, 335(1-2):197–216, 2004.
[Gab16] Xavier Gabaix. Power laws in economics: an introduction. Journal of Economic Perspectives, 30(1):185–206,
2016.
[GSS03] Edward Glaeser, Jose Scheinkman, and Andrei Shleifer. The injustice of inequality. Journal of Monetary
Economics, 50(1):199–222, 2003.
[Goy23] Sanjeev Goyal. Networks: An economics approach. MIT Press, 2023.
[Hal78] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis: Theory and Evi-
dence. Journal of Political Economy, 86(6):971–987, 1978.
[Ham05] James D Hamilton. What's real about the business cycle? Federal Reserve Bank of St. Louis Review, pages
435–452, 2005.
[Har60] Arthur A. Harlow. The hog cycle and the cobweb theorem. American Journal of Agricultural Economics,
42(4):842–853, 1960. doi:https://doi.org/10.2307/1235116.
[Hu18] Y. Hu, Y. & Guo. Operations research. Tsinghua University Press, 5th edition, 2018.
[Haggstrom02] Olle Häggström. Finite Markov chains and algorithmic applications. Volume 52. Cambridge University
Press, 2002.
[IT23] Patrick Imam and Jonathan RW Temple. Political institutions and output collapses. IMF Working Paper, 2023.
[Jac10] Matthew O Jackson. Social and economic networks. Princeton university press, 2010.
[KLS18] Illenin Kondo, Logan T Lewis, and Andrea Stella. On the us firm and establishment size distributions. Tech-
nical Report, SSRN, 2018.
[Man63] Benoit Mandelbrot. The variation of certain speculative prices. The Journal of Business, 36(4):394–419, 1963.
[MFD20] Filippo Menczer, Santo Fortunato, and Clayton A Davis. A first course in network science. Cambridge Uni-
versity Press, 2020.
[New18] Mark Newman. Networks. Oxford university press, 2018.
[Rac03] Svetlozar Todorov Rachev. Handbook of heavy tailed distributions in finance: Handbooks in finance. Vol-
ume 1. Elsevier, 2003.
[RRGM11] Hernán D Rozenfeld, Diego Rybski, Xavier Gabaix, and Hernán A Makse. The area and population of cities:
new insights from a different perspective on cities. American Economic Review, 101(5):2205–25, 2011.
[Sam58] Paul A Samuelson. An exact consumption-loan model of interest with or without the social contrivance of
money. Journal of political economy, 66(6):467–482, 1958.
[Sam71] Paul A Samuelson. Stochastic speculative price. Proceedings of the National Academy of Sciences, 68(2):335–
337, 1971.
[Sam39] Paul A. Samuelson. Interactions between the multiplier analysis and the principle of acceleration. Review of
Economic Studies, 21(2):75–78, 1939.
[Sar82] Thomas J Sargent. The ends of four big inflations. In Robert E Hall, editor, Inflation: Causes and effects, pages
41–98. University of Chicago Press, 1982.
[Sar13] Thomas J Sargent. Rational Expectations and Inflation. Princeton University Press, Princeton, New Jersey,
2013.
[SS22] Thomas J Sargent and John Stachurski. Economic networks: theory and computation. arXiv preprint
arXiv:2203.11972, 2022.
600 Bibliography
A First Course in Quantitative Economics with Python
[SS23] Thomas J Sargent and John Stachurski. Economic networks: theory and computation. arXiv preprint
arXiv:2203.11972, 2023.
[SV02] Thomas J Sargent and François R Velde. The Big Problem of Small Change. Princeton University Press,
Princeton, New Jersey, 2002.
[SS83] Jose A Scheinkman and Jack Schechtman. A simple competitive model with production and storage. The
Review of Economic Studies, 50(3):427–441, 1983.
[Sch69] Thomas C Schelling. Models of Segregation. American Economic Review, 59(2):488–493, 1969.
[ST19] Christian Schluter and Mark Trede. Size distributions reconsidered. Econometric Reviews, 38(6):695–710,
2019.
[Smi10] Adam Smith. The Wealth of Nations: An inquiry into the nature and causes of the Wealth of Nations. Harriman
House Limited, 2010.
[Too14] Adam Tooze. The deluge: the great war, america and the remaking of the global order, 1916–1931. 2014.
[Vil96] Pareto Vilfredo. Cours d'économie politique. Rouge, Lausanne, 1896.
[Wau64] Frederick V. Waugh. Cobweb models. Journal of Farm Economics, 46(4):732–750, 1964.
[WW82] Brian D Wright and Jeffrey C Williams. The economic role of commodity storage. The Economic Journal,
92(367):596–614, 1982.
[Zha12] Dongmei Zhao. Power Distribution and Performance Analysis for Wireless Communication Networks.
SpringerBriefs in Computer Science. Springer US, Boston, MA, 2012. ISBN 978-1-4614-3283-8 978-
1-4614-3284-5. URL: https://link.springer.com/10.1007/978-1-4614-3284-5 (visited on 2023-02-03),
doi:10.1007/978-1-4614-3284-5.
Bibliography 601
A First Course in Quantitative Economics with Python
602 Bibliography
PROOF INDEX
con-perron-frobenius
con-perron-frobenius (eigen_II), ??
graph_theory_property1
graph_theory_property1 (networks), 512
graph_theory_property2
graph_theory_property2 (networks), 513
mc_conv_thm
mc_conv_thm (markov_chains_II), 407
mc_po_conv_thm
mc_po_conv_thm (markov_chains_I), 393
move_algo
move_algo (schelling), 303
neumann_series_lemma
neumann_series_lemma (eigen_I), 92
perron-frobenius
perron-frobenius (eigen_II), ??
statement_clt
statement_clt (lln_clt), 252
stationary
stationary (markov_chains_II), 408
theorem-0
theorem-0 (lln_clt), 247
theorem-2
theorem-2 (markov_chains_I), 394
unique_stat
unique_stat (markov_chains_I), 393
603
A First Course in Quantitative Economics with Python
C S
Central Limit Theorem, 252 Schelling Segregation Model, 301
D T
Distributions and Probabilities, 217 The Perron-Frobenius Theorem, 463
Dynamic Programming
Shortest Paths, 449 V
Dynamics in One Dimension, 327 Vectors, 60
Inner Product, 63
E Norm, 63
Eigenvalues and Eigenvectors, 77 Operations, 61
L
Law of Large Numbers, 245
Illustration, 248
Linear Algebra
Eigenvalues, 88
SciPy, 70
Vectors, 60
Linear Equations and Matrix Algebra, 59
M
Markov Chains
Forecasting Future Values, 399
Future Probabilities, 392
Simulation, 388
Markov Chains: Basic Concepts and Sta-
tionarity, 383
Markov Chains: Irreducibility and Er-
godicity, 405
Matrix
Numpy, 66
Operations, 64
Solving Systems of Equations, 68
Models
Schelling's Segregation Model, 301
N
Neumann's Lemma, 91
P
python, 196
605