50% found this document useful (2 votes)
136 views149 pages

Asset Book

Lecture notes on asset pricing for macroeconomics

Uploaded by

GaOn Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
136 views149 pages

Asset Book

Lecture notes on asset pricing for macroeconomics

Uploaded by

GaOn Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

Lectures on asset pricing for

macroeconomics

Jim Dolmas

February 21, 2013


© 2012 by Jim Dolmas. Feel free to re-distribute these notes, but do let me know
if you find typos or other errors (you will). Comments are always welcome;
email me at [email protected].
Typeset in LATEX, using Kile 2.1 by KDE.

1
Contents

0 Introduction 5
0.1 Why study asset pricing? . . . . . . . . . . . . . . . . . . . . . . . 5
0.2 Some facts to aim for . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.3 Putting models on the computer . . . . . . . . . . . . . . . . . . 7
0.4 Organization of the lectures . . . . . . . . . . . . . . . . . . . . . 7

1 The mean-variance model: Portfolio choice and the CAPM 9


1.1 Basic ideas: arbitrage, risk aversion and covariance . . . . . . . 10
1.2 Mean-variance portfolio choice . . . . . . . . . . . . . . . . . . . 11
1.2.1 No riskless asset . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Mean-variance portfolio choice when there is a riskless
asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 The CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 The CAPM in return form . . . . . . . . . . . . . . . . . . 21

2 Basic theory: Stochastic discount factors, state prices, etc. 25


2.1 Basic model structure . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.1 Assets and payoffs . . . . . . . . . . . . . . . . . . . . . . 26
2.1.2 Investors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.3 Prices & arbitrage . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 The Fundamental Theorem of Asset Pricing . . . . . . . . . . . . 30
2.3 The Representation Theorem . . . . . . . . . . . . . . . . . . . . 33
2.3.1 The representations in return form . . . . . . . . . . . . . 37
2.4 Investor utilities and pricing representations . . . . . . . . . . . 38

3 Lucas, Mehra-Prescott, and the Equity Premium Puzzle 42


3.1 Lucas’s 1978 model . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1.1 Some historical context . . . . . . . . . . . . . . . . . . . . 44
3.1.2 The ‘tree’ economy . . . . . . . . . . . . . . . . . . . . . . 45
3.1.3 The representative agent . . . . . . . . . . . . . . . . . . . 45
3.1.4 Recursive formulation . . . . . . . . . . . . . . . . . . . . 46
3.1.5 Characterizing equilibrium . . . . . . . . . . . . . . . . . 47
3.1.6 SDF Representation . . . . . . . . . . . . . . . . . . . . . 49
3.1.7 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 50

2
CONTENTS CONTENTS

3.2 Mehra-Prescott (1985) & the Equity Premium Puzzle . . . . . . . 55


3.2.1 Differences relative to Lucas’s model . . . . . . . . . . . 56
3.2.2 Mehra and Prescott’s calibration . . . . . . . . . . . . . . 57
3.2.3 The target and the results . . . . . . . . . . . . . . . . . . 57
3.3 Second moment aspects of the puzzle . . . . . . . . . . . . . . . 60
3.4 Melino and Yang’s insight . . . . . . . . . . . . . . . . . . . . . . 61

4 Responses to the Equity Premium Puzzle, I: The Representative Agent 65


4.1 Habit formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.1 Internal versus external habits . . . . . . . . . . . . . . . 66
4.1.2 Putting habits in the Mehra-Prescott model . . . . . . . . 67
4.1.3 Habits and countercyclical risk aversion . . . . . . . . . . 70
4.1.4 Campbell and Cochrane’s model . . . . . . . . . . . . . . 71
4.1.5 Some additional issues regarding habits . . . . . . . . . . 74
4.2 Epstein-Zin preferences . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.1 Basic properties of EZ preferences . . . . . . . . . . . . . 75
4.2.2 Asset pricing with EZ preferences . . . . . . . . . . . . . 77
4.2.3 Solving for the value function . . . . . . . . . . . . . . . . 80
4.2.4 Eliminating the value function altogther . . . . . . . . . . 82
4.2.5 Portfolio choice with EZ preferences . . . . . . . . . . . . 85
4.3 State-dependent preferences: More on Melino and Yang (2003) . 87
4.4 First-order risk aversion . . . . . . . . . . . . . . . . . . . . . . . 91
4.4.1 FORA: The ‘what’ . . . . . . . . . . . . . . . . . . . . . . 91
4.4.2 FORA: The ‘why’ . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.3 FORA: Some results . . . . . . . . . . . . . . . . . . . . . 96
4.5 Models with disappointment aversion . . . . . . . . . . . . . . . 96

5 Responses to the Equity Premium Puzzle, II: The Consumption Pro-


cess 97
5.1 Bansal-Yaron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.1.4 Calibration and results . . . . . . . . . . . . . . . . . . . . 109
5.1.5 Time-varying volatility . . . . . . . . . . . . . . . . . . . . 112
5.2 Consumption disasters . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.1 Modeling rare disasters . . . . . . . . . . . . . . . . . . . 114
5.2.2 Standard preferences . . . . . . . . . . . . . . . . . . . . . 117
5.2.3 Epstein-Zin preferences, i.i.d. disaster risk . . . . . . . . . 119
5.2.4 Epstein-Zin preferences, persistent probability of disaster 122

6 Bond Pricing and the Term Structure of Interest Rates 126

Bibliography 126

3
CONTENTS CONTENTS

A An introduction to using M ATLAB 133


A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 Creating matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.3 Basic matrix operations . . . . . . . . . . . . . . . . . . . . . . . . 135
A.3.1 An example—ordinary least squares . . . . . . . . . . . . 136
A.4 Array operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.5 Multi-dimensional arrays . . . . . . . . . . . . . . . . . . . . . . 138
A.6 Structure arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A.7 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . 139
A.8 Max and min . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
A.9 Special scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
A.10 Loops and such . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
A.10.1 A ‘while’ loop example . . . . . . . . . . . . . . . . . . . 142
A.10.2 A ‘for’ loop example . . . . . . . . . . . . . . . . . . . . . 144
A.11 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.11.1 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.11.2 Function files . . . . . . . . . . . . . . . . . . . . . . . . . 147
A.12 Last tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

4
Lecture 0

Introduction

These notes were written for a ‘mini-course’ on asset pricing that I gave for
second-year Ph.D. students in macroeconomics at Southern Methodist Univer-
sity in the spring of 2012. The course was half a semester—seven three-hour
lectures. I chose not to use a book for the course. A book like Duffie’s Dynamic
Asset Pricing Theory [Duf92] might have been useful for the very early material,
on the basic theory surrounding stochastic discount factors, but after that the
focus of the lectures was on published papers.
I also hadn’t planned on writing over a hundred pages of notes: after the
first couple lectures, the plan was to have the notes taper off to little more than
lecture outlines. But, given that I compose in LATEXonly a bit slower than I write
by hand—and am better able to read what I wrote in the former case—it was
difficult to stop once I got going.

0.1 Why study asset pricing?


Most grad students in macro need little motivation to study asset pricing. They
can’t help but notice the volume of work being done nowadays at the intersec-
tion of macro and finance.
Nevertheless, I think Sargent, as always, puts the motivation very well in
the following few sentences. In a discussion of a paper by Ravi Bansal on
‘long run risk’, Sargent argued that this is work macroeconomists should pay
attention to:
Why? Because a representative agent’s consumption Euler equa-
tion that links a one-period real interest rate to the consumption
growth rate is the “IS curve” that is central to the policy transmis-
sion mechanism in today’s New Keynesian models. A long list of
empirical failures called puzzles come from applying the stochastic
discount factor implied by that Euler equation. Until we succeed in
getting a consumption-based asset pricing model that works well,
the New Keynesian IS curve is built on sand. [Sar07]

5
0.2. SOME FACTS LECTURE 0. INTRODUCTION

One could also note that how we resolve asset pricing puzzles matters for
how we think about the cost of business cycles, which is the source of my own
interest in the subject.

0.2 Some stylized facts to aim for


If I were doing this again, I would begin with some facts up front—stylized
facts that our models ought to be consistent with. As it was, the facts were
introduced in the context of the various models. Some important facts would
be:

1. The equity premium: The average real return on equity (says measured
by a broad value-weighted index of stocks) has been historically large,
on the order of 7 percent. The average real return on a relatively riskless
asset, like a Treasury bill or commerical paper, has been low, on the order
of 1 percent. The difference between the two—the equity premium—has
averaged around 6 percentage points.
2. Return volatility: The standard deviation of the equity return is large,
around 15 or 16 percent; the standard deviation of the riskless rate is
much smaller, on the order of 3 percent.1

3. The market Sharpe ratio: Around 0.5 on average, but conditional Sharpe
ratios are subject to considerable variation, with swings from 0 (near
business cycle peaks) to 1 (at business cycle troughs) not uncommon
[TW11, LN07].
4. Price-dividend ratios move around a lot, and have a lot of low frequency
power. They appear to forecast returns (high P/D ratios imply low re-
turns ahead), more so at longer horizons [Coc08]. Not everyone agrees
with this, though [BRW08].
5. Short term interest rates are quite persistent.2
6. Nominal bond yields on average rise with maturity. Perhaps also for ex
ante real yields.
7. The volatility of bond yields is fairly constant across maturities.
8. Consumption growth is not very volatile—a mean of about 2 percent and
a standard deviation of about the same size (in annual data).

9. Consumption growth may be negatively autocorrelated/positively auto-


correlated/i.i.d. The persistence isn’t a robust fact across different sam-
ples. Mehra and Prescott [MP85] estimate a process with a first-order
1 Forfacts 1 and 2, see [Koc96] among many, many others.
2 An older, but I think still useful, reference for facts 5 through 7—and the problems they create
for consumption-based models—is Wouter Den Haan’s [Den95].

6
0.3. COMPUTING LECTURE 0. INTRODUCTION

autocorrelation of −0.14 in annual data spanning nearly 100 years. In


more recent samples one might find AC1 ’s on the order of +0.30. What is
robust is that the AC1 is small in absolute value. Consumption growth is
not very persistent.
10. Aggregate dividend growth (or corporate earnings growth) is much more
volatile than consumption growth. The correlation between the two is
moderate—on the order of 50–60% [LP04].

0.3 Putting models on the computer


One of my aims for the class was to mix theory and computational work.
Most of the exercizes—which are scattered throughout the text—involve writ-
ing some M ATLAB code to solve some version of whatever model the lecture
is about. With only seven weeks, though, and many models to cover, there’s
not much time to devote exclusively to computational techniques. So, the tech-
niques involved are decidely not exotic. Most follow the basic template set by
Mehra and Prescott’s approach.
An appendix at the end of these notes covers some basics of using M AT-
LAB. Guidance for solving particular problems appears at various points in the
notes, wherever necessary.

0.4 Organization of the Lectures


The environment for lectures 1 and 2 is a two-period, today/tomorrow world.
Decisions are taken today; uncertainty resolves tomorrow.
Lecture 1 covers portfolio choice in a mean-variance setting and the CAPM.
In addition to deriving the CAPM in various versions—in terms of prices and
payoffs, returns, and in beta form—the lecture also presents versions of the
“Two Fund Theorem” (in the case of no riskless asset) and the “One Fund The-
orem”, a.k.a. the Tobin Separation Theorem (when there is a riskless asset).
Apart from a desire to “begin at the beginning”, the point of lecture 1 is
to illustrate some recurrent themes, within the context of the simple mean-
variance model. Most important are the ideas of ‘risk as covariance’ and the
pricing of risk.
From a technical standpoint, this lecture relies heavily on linear algebra, in
contrast to the subsequent lectures.
Lecture 2 lays out the basic no-arbitrage approach to asset pricing in the
context of a two period/N asset/S state environment. The keys results proven
in the lecture are the Fundamental Theorem of Asset Pricing (relating no ar-
bitrage to the existence of a state price vector) and the Representation Theo-
rem (drawing an equivalence between the existence of a state price vector, a
stochastic discount factor, and risk neutral probabilities). The proofs closely
follow the approach of Dybvig and Ross [DR89] and Ross [Ros04].

7
0.4. ORGANIZATION LECTURE 0. INTRODUCTION

The main takeaway from the lecture is the basic pricing relationship

p = E (mx ) (1)

where m  0 is the stochastic discount factor. This is the lens through which
all the subsequent models are viewed.
With those results in hand, we turn to infinite-horizon consumption based
models in Lecture 3, which begins with Lucas’s 1978 paper [Luc78]. The focus
here is less on Lucas’s mathematical machinery and more on the structure of
consumption based models. I also try to put Lucas’s work into context, situat-
ing it against the backdrop of 1970s-style tests of market efficiency and the ran-
dom walk hypothesis. In this and subsequent lectures, we take it for granted
that (1) becomes
p t = Et ( m t +1 x t +1 ) (2)
in a many-period context.
Lucas’s is the first consumption-based model we take to the computer—so
the notes on Lucas include a description of the solution technique, including a
first look at Markov chain approximations to AR(1) processes.
From Lucas, the lecture turns to Mehra and Prescott [MP85] and the equity
premium puzzle. The treatment is brief, given that the computational aspects
have already been presented in the context of Lucas’s model. This section also
gives some alternative characterizations of the puzzle(s) in terms of second
moment implications and bounds on attainable Sharpe ratios.
Lectures 4 and 5 then look at responses to the equity premium puzzle.

8
Lecture 1

The mean-variance model:


Portfolio choice and the
CAPM

This lecture treats the mean-variance model of portfolio choice and the equi-
librium asset-pricing model based on it, the Capital Asset Pricing Model, or
CAPM. Think of it as some ‘pre-history’ for the more modern approaches we
will focus on for most of the course.
The mean-variance approach to portfolio choice—which emphasizes the re-
duction of risk by taking into account the covariances among asset payoffs—
took a long time to emerge. Markowitz, in a history of portfolio choice [Mar99],
notes that the theory had few real precursors. The few authors who did con-
sider the problem of investment in risky assets—like Hicks [Hic35]—seemed
to believe a version of the Law of Large Numbers meant all risk could be di-
versified away by investing in a large enough range of assets.
The insight of Markowitz [Mar52] was to recognize that asset payoffs were
typically correlated and that no simple diversification could eliminate all risk.
Rather, rational investors—if they care about the mean and variance of their
final wealth—should take account of the covariance among asset payoffs in
structuring their portfolios, so as to minimize the risk associated with any ex-
pected payoff. After Markowitz, it became clear that the risk of any asset was
not a feature of the asset in isolation, but depended on how the asset’s payoffs
impacted the variance of the investor’s portfolio. That, in turn, depends not
just on the variance of the asset’s payoff, but the covariances with other asset’s
payoffs.
The CAPM—the Capital Asset Pricing Model, developed independently by
Sharpe [Sha64], Lintner [Lin65] and a couple others—followed very directly
from Markowitz’s model of portfolio choice. Like any model of equilibrium
prices, it combines demands (Markowitz portfolios in this case) with supplies
(exogenous supplies of risky assets) to determine prices. According to the

9
1.1. BASIC IDEAS LECTURE 1. MEAN-VARIANCE MODEL

CAPM, an asset’s price depends on the covariance of its payoffs with an aggre-
gate payoff—that of the market portfolio. That role of covariance between in-
dividual asset payoffs and some aggregate is a feature that will carry over into
all the more modern models we will examine throughout the course, though
the aggregate will take different forms (like the marginal utility of consump-
tion or wealth). In fact, once we’ve developed the more modern approach in
the next lecture—which is based on stochastic discount factors, state prices or
risk neutral probabilities—we’ll reinterpret the CAPM as a stochastic discount
factor model.
Before getting into the mean-variance model, though, we’ll try to introduce
some basic ideas through a series of simple examples, beginning with present
discounted values and working our way to the role of covariance in pricing
assets.

1.1 Basic ideas: The roles of arbitrage, risk aversion,


and covariance
To begin, then, just think about present values. Imagine two periods (as we will
through most of this introductory material); call them today and tomorrow.
Agents can borrow and lend between the periods at a riskless interest rate r.
Suppose there is an asset that pays x units of account tomorrow (per unit of
the asset) with certainty. Agents can buy or sell units of the asset, and there is
no restriction on short sales (agents can hold a negative position in the asset—
someone who holds −1 units, for example, owes someone x tomorrow).
Under those conditions, the price p of the asset must obey
x
p=
1+r
else an arbitrage opportunity is available. For example, if p < x/(1 + r ), an agent
could borrow p units of account, and buy a unit of the asset. Their net position
today is zero, but tomorrow, they’ll receive x and owe (1 + r ) p < x, gaining
x − (1 + r ) p essentially for free. Since they could do this at any scale, they have
a ’money pump’, a source of unlimited wealth. That’s not an equilibrium. Thus
p ≥ x/(1 + r ) must hold. A converse argument in which agents short the asset
establishes p ≤ x/(1 + r ), so p = x/(1 + r ) must hold.
Now, suppose there is uncertainty tomorrow—to be precise, suppose there
are two possible states s tomorrow, s ∈ S = {1, 2}, which occur with prob-
abilites 1/2 and 1/2. There is now also an asset whose payoff y is risky—
in particular, y : S → R with y(1) = 2x and y(2) = 0. This asset has the
same expected payoff (x) as the first asset, but if agents care at all about risk, it
shouldn’t have the same equilibrium price. In fact, if agents dislike risk—and
there are no other sources of uncertainty—its price p̂ should obey

E(y) x
p̂ < = = p.
1+r 1+r

10
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

Let’s now add another risky asset, one whose payoffs (call them z) are in a
sense opposite to those of the last asset, z(1) = 0 and z(2) = 2x. We’ll also add
another source of uncertainty that will give us some clue as to how y and z will
be priced relative to one another: suppose that agents also receive endowments
tomorrow, the endowments (call them e) are the same for all agents, and e :
S → R with e(1) ≫ e(2) (endowments if state 1 occurs are much larger than
if state 2 occurs). In this case, even though both assets have expected payoffs
equal to x, the z asset—which pays off in the state where endowments are
low—will be regarded as more valuable by agents. Hence, we’d expect the z
asset’s price—call it p̃—to exceed the price of the y asset, p̂.
We can say one more thing. Note that a portfolio consisting of one-half unit
of the y asset and one-half unit of the z asset exactly replicates the payoffs of
the riskless x asset: (1/2)y(s) + (1/2)z(s) = x for s = 1 or s = 2. An arbitrage
argument thus implies
(1/2) p̂ + (1/2) p̃ = p
Note that an implication of the last equation (and the result that p̃ > p̂) is that
p̃ > p > p̂—the risky z asset is more valuable than the riskless x asset. Even
though it’s payoff is uncertain, the z asset’s price exceeds its expected present
value because it covaries with the agents’ endowments in a way that hedges
their endowment risk.
These are very simple examples, but they illustrate a few concepts that
will be important throughout these lectures—the roles played in the pricing
of assets by arbitrage (precisely, the absence of arbitrage opportunities), risk
aversion (more generally, curvature of the marginal utility of consumption or
wealth), and covariances (between individual asset payoffs and more aggre-
gate sources of uncertainty).

1.2 Mean-variance portfolio choice


1.2.1 No riskless asset
We begin with a framework in which there is no riskless asset. In addition to
developing some basic concepts in mean-variance portfolio analysis, our main
result will be the so-called Two-fund Theorem—all portfolios that are efficient (in
a sense we will make precise below) can be constructed as linear combinations
of precisely two portfolios (think of them as mutual funds, which explains the
name of theorem).
There are N risky assets, with payoff vector x = ( x1 , x2 , . . . x N ), which is a
random variable taking values in R N . The assets have expected payoff vector
given by
E( x ) = µ
and a variance-covariance matrix Ω, which we assume to be positive definite.1
1 An N × N matrix A is positive definite iff u> Au > 0 for all non-zero u ∈ R N . Note that any

such A is non-singular.

11
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

Because Ω is a variance-covariance matrix, it is also symmetric. To be precise,


Ω = [ωi,j ]i,j=1,2,...N with

ωi,j = E[( xi − µi )( x j − µ j )]

In matrix notation, Ω = E[( x − µ)( x − µ)> ].


Remark 1.1. Just a bit on notation. In contexts involving matrix algebra, think of
all vectors as initially column vectors (so the mean vector µ, for example, is N × 1).
The superscript > denotes transposition. For two vectors x and y in R N , we’ll use
interchangeably the matrix product notation x > y and the inner product notation x ·
y—both denote ∑i xi yi .
Lastly, x ≥ 0 means xi ≥ 0 for all i, equivalently x ∈ R+ N ; x > 0 means x ≥ 0

and x 6= 0, equivalently x ∈ R+ \ {0}; and x  0 means xi > 0 for all i, equivalently


N

x ∈ R++N .

We’ve already stated that there’s no riskless asset, but even if we hadn’t,
the assumption that Ω is positive definite would rule that out. A riskless asset
(say the ith) would have ωi,i = 0, in fact ωi,j = 0 for all j. If u is a vector with
ui > 0 and u j = 0 for j 6= i, then u 6= 0 and u> Ωu = (ui )2 ωi,i = 0. As we’ll see
below, once we define a portfolio, assuming that Ω is positive definite implies
that there is no (non-zero) portfolio that has a zero variance.
There is also a vector of asset prices p ∈ R N , with p  0, which is known
to the investor at date 0. We’ll be a bit vague for now about what units assets,
payoffs and prices are in. Think of the payoffs as being in ‘units of account’
(eventually, they’ll be in terms of consumption); assets as being in ‘shares’; and
prices in units of account per share.
A portfolio is a z ∈ R N where zi denotes the number of shares of the ith
asset held by the investor. We do not rule out short sales—zi < 0 is feasible for
any i. The price of a portfolio z is p · z and its payoff is z · x, a random variable.
The portfolio’s expected payoff is E(z · x ) = z · E( x ) = z · µ ≡ µ(z), using the
linearity of the expectations operator. The variance of the portfolio’s payoff is
σ2 (z) ≡ E[(z · x − z · µ)2 ], which has a simple description in terms of z and Ω:

σ2 (z) = E[(z> ( x − µ))2 ]


= E[z> ( x − µ)( x − µ)> z]
= z> E[( x − µ)( x − µ)> ]z
= z> Ωz

The typical investor will have some initial wealth W0 to allocate to his port-
folio,
W0 = p · z, (1.1)
and his final wealth will simply be the realized value of his portfolio

W1 = z · x.

12
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

There are no other sources of income. Thus, the mean and variance of his final
wealth will simply be the mean and variance of his portfolio’s payoff, µ(z) and
σ 2 ( z ).
In the mean-variance portfolio choice approach, the investor is assumed to
care only about the mean and variance of final wealth. Keeping variance the
same, a higher mean is preferred, while keeping mean the same, a lower vari-
ance is preferred. The investor’s mean-variance preferences may be justified in
a number of ways, such as:
• This is a primitive assumption about the investor’s utility function—
it’s simply defined as being over the mean and variance of final wealth,
v[µ(z), σ2 (z)] say.
• The investor is an expected utility maximizer, and his von Neumann-
Morgenstern utility function happens to be quadratic—e.g., v(W1 ) =
aW1 − (b/2)(W1 )2 with a, b > 0. In this case, the investor’s expected util-
ity takes the form aµ(z) − (b/2)µ(z)2 − (b/2)σ2 (z), which is decreasing
in σ2 (z) and increasing in µ(z) so long as µ(z) < a/b.
• The investor has a utility function defined over the distribution of final
wealth, but payoffs are normally distributed. Because the normal distri-
bution is completely characterized by its first two moments, under some
regularity conditions the investor’s utility function will have a represen-
tation of the form v[µ(z), σ2 (z)].
We will assume investor preferences simply have the form v[µ(z), σ2 (z)],
with v increasing in its first argument and decreasing in its second. For the
main result of this sub-section, we don’t need to be any more concrete than
that. The typical investor’s problem will be

max v[µ(z), σ2 (z)] (1.2)


z

subject to the budget constraint (1.1).


We imagine that there are many investors with different levels of initial
wealth W0 and possibly different utility functions v; technically, we should
superscript these as W0i and vi , but as long as there is no confusion, we’ll avoid
introducing extra clutter to the notation.
A moment’s reflection on the nature of the problem (1.2) will reveal that
any z that solves it must be mean-variance efficient:
Definition 1.1. A portfolio z is mean-variance efficient if any z0 such that µ(z0 ) >
µ(z) has σ2 (z0 ) > σ2 (z) and any z0 such that σ2 (z0 ) < σ2 (z) has µ(z0 ) < µ(z). In
words, a portfolio is mean-variance efficient if any portfolio with a higher mean has a
higher variance, and any portfolio with a lower variance has a lower mean.
Mean-variance efficient portfolios have the lowest variance among all port-
folios having the same mean, and the highest mean among all portfolios having
the same variance.

13
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

For a given initial wealth W0 , the set of mean-variance efficient portfolios


can be found by solving the following family of problems, as we vary m:

min σ2 (z) (1.3)


z

subject to

µ(z) = m
W0 = p · z

Using the matrix results from above, we can re-write problem (1.3) in the
following form:
min z> Ωz (1.4)
z

subject to

z·µ = m (1.5)
W0 = p · z (1.6)

Note that this problem will only be interesting if the number of assets N ex-
ceeds 2. The reason is that, apart from the case where µ and p are collinear, the
set of z that satisfy the two constraints (1.5) and (1.6) is a subset of dimension
N − 2, which in R2 is a singelton—i.e., a single portfolio z will satisfy both con-
straints, and we’re left with really no problem to solve. Henceforth, we assume
N ≥ 3.
To solve the problem, form the Lagrangian

1
L = − z> Ωz + λ(z> µ − m) + π (W0 − z> p),
2
where the Lagrange mutlipliers are λ, π ∈ R. The 1/2 in front of the objective
doesn’t alter the solution and saves us from having some 2’s floating around in
the first-order conditions. The minus sign is there because I tend to write any
optimization problem as a maximization problem—I find it economizes on the
number of optimality conditions one needs to remember.
Differentiating L with respect to the vector z gives the first-order condition

−Ωz + λµ − π p = 0

which is N equations stacked in vector form. Because Ω is positive definite, we


can invert it to get
z = λΩ−1 µ − πΩ−1 p. (1.7)
This is not yet a complete solution, because it still involves the multipliers λ
and π. But, we can still see an important feature of the solution’s structure.
The efficient portfolio is a linear combination of two vectors, Ω−1 µ and −Ω−1 p,
with weights λ and π.

14
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

Note, too, that the efficient portfolio only depends on m and W0 through the
scalars λ and π. For a given W0 —which is to say, for a given investor—as we
vary m, λ and π will vary, but the variance-minimzing portfolio will always
be a linear combination of the same two vectors, Ω−1 µ and −Ω−1 p. The same
holds true as we vary W0 , which is in effect varying investors—all investors
choose a linear combination of these same two vectors.
Since Ω−1 µ and Ω−1 p are vectors in R N , we can think of them as portfolios
(or funds). In this environment, all investors’ asset demands could be met by
two mutual funds, one offering the portfolio Ω−1 µ and the other offering the
portfolio −Ω−1 p. Given what we’ve discussed above regarding equation (1.7),
we may state:
Theorem 1.1 (The two-fund theorem). In the model of this section—no riskless
asset, N ≥ 3 assets, variance-covariance matrix Ω positive definite—there exist two
portfolios such that any mean-variance efficient portfolio can be written as a linear
combination of those two portfolios.
We can actually say a bit more by solving for λ and π. This would be
tedious algebra, but Exercize 1.1 asks you to show the following, by taking
account of the constraints (1.5) and (1.6): at a solution to problem (1.4), the
Lagrange multipliers λ and π are linear in W0 and m. That is, there are coef-
ficients aλ , aπ , bλ , and bπ —real numbers that themselves do not depend on W0
and m—such that

λ = aλ W0 + bλ m (1.8)
π = aπ W0 + bπ m (1.9)

Exercize 1.1 (Linearity of λ and π). Combine the expression (1.7) for z with
the constraints (1.5) and (1.6) to prove the claim from the previous paragraph.
You can make this exercize less tedious by doing as much of it in matrix form as
possible, using expressions like µ> Ω−1 µ, µ> Ω−1 p, and p> Ω−1 p.

Given (1.8) and (1.9), we may re-write the expression (1.7) for a mean-
variance efficient portfolio as follows:

z = λΩ−1 µ − πΩ−1 p
= ( aλ W0 + bλ m)Ω−1 µ − ( aπ W0 + bπ m)Ω−1 p
= W0 ( aλ Ω−1 µ − aπ Ω−1 p) + m(bλ Ω−1 µ − bπ Ω−1 p) (1.10)

In (1.10), aλ Ω−1 µ − aπ Ω−1 p and bλ Ω−1 µ − bπ Ω−1 p are simply vectors in R N ,


which depend only on the parameters µ, Ω and p. We may think of these two
vectors as portfolios, which for compactness, we’ll define as ẑ1 and ẑ2 , so that

z = W0 ẑ1 + mẑ2 (1.11)

15
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

Let’s think now of many investors, assuming all of them have mean-variance
preferences. Our work thus far has not identified the particular portfolio a
utility-maximizing investor would choose, since the expected payoff m is it-
self a choice in the investor’s utility-maximization problem—recall (1.2). We’ve
only identified a family of portfolios to which any solution must belong—those
that, for a given m have the lowest variance. The investor’s preferences—the
trade-off they make between additional mean payoff and additional variance—
together with their initial wealth, will determine their choice of m. Without
modeling that problem, let’s simply let mi denote investor i’s choice of m, W0i
his wealth, and zi his resulting portfolio choice. Then,
zi = W0i ẑ1 + mi ẑ2 . (1.12)
Remark 1.2. Note that even if all investors have identical preferences, their initial
wealth level may affect their choices of m, so different investors would still make differ-
ent choices. This would be the case if their preferences don’t display constant relative
risk aversion—for example if their preferences are quadratic (including both first- and
second-order terms).
Define the market portfolio as the sum of all investors’ portfolios, and denote
it by z M . Summing (1.12) across all investors gives the market portfolio as
z M = (∑ W0i )ẑ1 + (∑ mi )ẑ2
i i
= W0M ẑ1 M
+ m ẑ2 (1.13)
where W M is the aggregate initial wealth of the market, and m M is the aggre-
gate expected payoff from all assets holdings in the market. Note that because
the individual zi satisfy the linear constraints p · zi = W0i and zi · µ = mi , we
have p · z M = W0M and z M · µ = m M —that is, the market portfolio satisfies the
constraints at the market initial wealth W0M and expected payoff m M . Thus,
(1.13) describes a mean-variance efficient portfolio, given W0M and m M .
We’ve thus proved the other major result of this section:
Theorem 1.2 (Efficiency of the market portfolio). In the model of this section, if
all investors have mean-variance preferences, the market portfolio is mean-variance
efficient.

Exercize 1.2 (The mean-variance frontier). This exercize asks you to use Mat-
lab to find mean-variance efficient portfolios, then plot what’s known as the mean-
variance frontier, which is normally drawn in mean-standard deviation space.
The data are for five risky assets:
 
1.38
1.29
 
µ= 1.47

1.65
1.25

16
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

and  
26.63 13.63 10.86 6.15 8.55
13.63 34.46 15.26 7.59 11.01
Ω=
 
10.86 15.26 38.32 7.69 13.49
 6.15 7.59 7.69 24.7 6.50 
8.55 11.01 13.49 6.50 19.01
The price vector is just p = (1, 1, 1, 1, 1)> , so you might think of the payoffs as
gross returns—payoffs per unit of account spent on the asset.
Write some Matlab code to display two portfolios, or funds, from which all
efficient portfolios can be constructed.
Assume an investor’s initial wealth is W0 = 1 and let m vary from
the smallest expected payoff, min(µ), to the largest, max(µ), in 50 evenly-
spaced steps. You might use the Matlab command ‘linspace’: m =
linspace(min(mu),max(mu),50). Write lines that solve for the multipliers
λ and π at each of the the 50 points, calculate the efficient
√ portfolio at each of the
50 points, and calculate the portfolio standard deviation z> Ωz at each of the 50
points. You don’t need to print out all those results, just show me the portfolios
that correspond to the first and last m values, and all your code.
Finally, plot m versus the portfolio standard deviations, with standard devi-
ation on the horizontal axis. This is the mean-variance frontier (only its upper
portion is relevant to investors). It’s interesting to compare the mean-variance
frontier with the means and standard deviations of the underlying assets. You can
do this with Matlab’s hold command. After you’ve plotted the frontier, if mu and
sd are the vectors of expected payoffs and standard deviations, enter hold; then
scatter(sd,mu). Note the standard deviations are just the square roots of the
diagonal of Ω—sd = sqrt(diag(Omega)).

1.2.2 Mean-variance portfolio choice when there is a riskless


asset
We now add a riskless asset to the market. The riskless asset pays one sure
unit of account next period for each unit of the asset purchased today. We’ll let
q denote the price of the riskless asset. An investor who buys z0 > 0 units of
the asset today pays qz0 today and receives z0 units of account tomorrow. An
investor who shorts the riskless asset, choosing z0 < 0 units, in effect borrows
−qz0 today and repays −z0 next period. We can also describe the riskless asset
by a (gross) riskless rate of return, denoted R F , obeying

1
RF = .
q

Assumptions about the N risky assets remain the same as before: µ = E( x )


still denotes the expected payoff vector and Ω = E[( x − µ)( x − µ)> ] still de-

17
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL

notes the variance-covariance matrix. We’ll continue to let z = (z1 , z2 , . . . z N )


denote a portfolio of the N risky assets, and p their prices.
A typical investor’s budget constraint now takes the form
W0 = qz0 + p · z (1.14)
and his next-period wealth will be
W1 = z0 + z · x. (1.15)
We can use the budget constraint to eliminate z0 from the expression for
W1 , substituting z0 = (W0 − p · z)/q to get
1 1
W1 = W0 − p · z + z · x
q q
1 1
= W0 + z · ( x − p)
q q
= R F W0 + z · ( x − R F p) (1.16)
where (1.16) uses R F = 1/q. The investor’s expected final wealth is E(W1 ) =
R F W0 + z · (µ − R F p), and W1 − E(W1 ) = z · ( x − µ), so the variance of final
wealth will again take a quadratic form E[(W1 − E(W1 ))2 ] = z> Ωz. If an
investor holds no risky assets (z = 0), his wealth simply grows at the gross
riskless rate; if the investor spends all his initial wealth on his risky portfolio
( p · z = W0 ), his expected final wealth is (as before) z · µ.
We could, as before, consider the problem of finding mean-variance effi-
cient portfolios—for each m, the minimum-variance portfolio attaining E(W1 ) =
m. Instead, though, we will go straight to the typical investor’s utility maxi-
mization problem, assuming the investor has mean-variance preferences. The
problem is
max v[ R F W0 + z · (µ − R F p), z> Ωz]. (1.17)
z
Note that z in this problem is unconstrained—any difference between p · z and
W0 is made up for by long or short positions in the riskless asset.
Letting v1 > 0 and v2 < 0 denote the partial derivatives of v with respect to
mean and variance, the first-order conditions are
v1 (µ − R F p) + 2v2 Ωz = 0 (1.18)
which can be rearranged to give us:
v 1 −1
z=− Ω ( µ − R F p ). (1.19)
2v2
Note that the only term in (1.19) that is specific to the investor—that may re-
flect his wealth or tastes—is the scalar −v1 /2v2 , the marginal rate of substitu-
tion between mean and variance. The vector Ω−1 (µ − R F p) will be common
to all investors, assuming they all have mean-variance preferences. We’ve thus
derived—very simply—a ‘one fund theorem’ for our market including a risk-
less asset:

18
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL

Theorem 1.3 (One fund theorem). Add a riskless asset to the model of the last
subsection, and assume all investors have mean-variance preferences. Then, investors’
optimal portfolios of risky assets are all scalar multiples of one another.
Remark 1.3. One can also see this result in the structure of the Lagrangian for the
problem of finding a mean-variance efficient portfolio:

1
L = − z> Ωz + λ( R F W0 + z · (µ − R F p) − m)
2
The first-order conditions give z = λΩ−1 (µ − R F p), and λ will be the only term that
varies across investors.
Remark 1.4. While (1.19) tells us something important about the structure of the so-
lution to (1.17), it does not necessarily constitute a complete solution for z, since it does
not determine v1 /v2 , assuming this term is not a constant. Put differently, and maybe
more precisely, the first-order conditions (1.18) are N equations in N unknowns, but
not necessarily a system of linear equations.
Remark 1.5. Tobin [Tob58] proved a version of the One Fund Theorem in a 1958
paper on liquidity preference. The riskless asset in Tobin’s model is money, and Tobin
shows that investors’ risky asset portfolios are all scalar multiples of a single portfolio.
The result is thus sometimes known as the Tobin Separation Theorem.
What about demand for the riskless asset? An investor’s holdings of the
riskless asset are obscured by the fact that we’ve substituted them out of the
budget constraint. Once we solve the first-order conditions (1.18) for the risky
asset demands z, we can find the investor’s riskless asset demand by going
back to the original budget constraint, z0 = (1/q)(W0 − p · z). In any case,
the one fund theorem tells us that in a mean-variance environment with one
riskless asset, every investor’s portfolio can be represented as some amount of
the riskless asset plus holdings of one risky portfolio (or fund) common to all
investors.

1.3 The Capital Asset Pricing Model (CAPM)


The Capital Asset Pricing Model, or CAPM, was developed by Sharpe [Sha64]
and Lintner [Lin65], building on the mean-variance portfolio analysis of Markowitz.
It combines Markowitz’s results on asset demands with some assumptions
about asset supplies to derive equilibrium prices.
Again distinguish investors by their wealth W0i and preferences vi . Let αi =
−2v2i /v1i , which is positive under the assumption that v1i > 0 and v2i < 0. αi is
in units of mean per variance, and measures i’s willingness to trade off higher
(or lower) mean final wealth for higher (or lower) variance of final wealth. An
investor with a high value of αi , for example, would sacrifice a large reduction
in mean for a small reduction in variance.

19
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL

With that bit of notation, investor i’s risky portfolio has the form

1 −1
zi = Ω ( µ − R F p ). (1.20)
αi

As we noted in Remark 1.4, αi is not necessarily a constant, but we can still


learn something about the nature of equilibrium prices even if we cannot ex-
plicitly solve it out of the investor’s risky asset demands.
If we aggregate (1.20) across all investors, we will obtain the market portfo-
lio z M
1
z M = (∑ ) Ω −1 ( µ − R F p ).
i
αi
1 −1
= Ω (µ − R F p) (1.21)
αM

where in (1.21) we’ve defined

1 −1
α M = (∑ ) .
i
αi

The key insight to determining p is that z jM , the market’s demand for shares
of asset j must, in equilibrium, equal the supply of shares in asset j, which we
take to be exogenous. Thus, we can treat z M as exogenous, and rearrange (1.21)
to get
1 α
p= µ − M Ωz M . (1.22)
RF RF
Because it involves the potentially endogenous parameter α M , a measure of
market risk aversion, equation (1.22) does not necessarily constitute a complete
solution for equilibrium prices (α M may itself depend on p). Nevertheless, we
can gain the key insights of the CAPM from (1.22).
The equation (1.22) is really N equations stacked in vector form. Let’s con-
sider the ith row—the expression for the ith asset price:

N
µi α
pi =
RF
− M
RF ∑ ωi,j z jM . (1.23)
j =1

The first term in (1.23), µi /R F , is just the present value (using the gross risk-
less rate R F ) of the ith asset’s expected payoff, µi . The term ∑ j ωi,j z jM is just
the covariance between the ith asset’s payoff, xi , and the payoff of the market
portfolio, z M · x. For simplicity, call this cov(i, M ). Then,

< µi /R F if cov(i, M ) > 0

pi = µi /R F if cov(i, M ) = 0

> µi /R F if cov(i, M) < 0

20
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL

An asset whose payoff covaries positively with that of the market portfolio
is priced at less than the expected present value of its payoff, while an asset
whose payoff covaries negatively with that of the market portfolio is priced
at more than the expected present value of its payoff—analogous to what we
saw in the very simple examples at the start of the this lecture. The middle
case—cov(i, M) = 0—corresponds to an asset whose payoff risk is purely id-
iosyncratic, in the sense of being orthogonal to the market as whole. Under the
CAPM, the price of such an asset, no matter how large its variance, is just equal to
its expected present value—which would be the value placed on the asset by a
completely risk neutral investor. To summarize:
Result 1.1 (The CAPM, price version). Under the assumptions of this section, an
asset’s equilibrium price is less than or greater than the expected present value of the
asset’s payoff, depending on whether the asset’s payoffs covary postively or negatively
with the payoffs of the market portfolio. Idiosyncratic risk is not priced.
Remark 1.6. The results for individual asset prices also hold for any arbitrary portfolio
z of the assets. The price of the portfolio, p · z, is less than, greater than, or equal to
the present value of the portfolio’s expected payoff, (z · µ)/R F , depending on whether
the covariance of the portfolio’s payoff with the market portfolio payoff, z> Ωz M , is
positive, negative, or zero.
Because this must apply as well to the market portfolio, and because (z M )> Ωz M >
0, the price of the market portfolio is necessarily less than the present value of its ex-
pected payoff.

1.3.1 The CAPM in return form


So far, we’ve worked with asset prices and payoffs, but the CAPM (as well
as the other results) is more frequently cast in terms of asset returns. We’ve
already introduced the riskless rate of return R F and related it to the price q of
a riskless asset. We now consider returns on the risky assets.
The gross return on the ith risky asset is defined as xi /pi , which we’ll de-
note by Ri . We’ll denote by R the N × 1 vector of gross returns. An asset’s
return is basically it’s payoff per unit of account invested in the asset. Just as x
is random variable, so is R. The asset’s expected return is E( Ri ) = µi /pi ≡ Rie .
We can express this in matrix notation, for all N assets at once,by letting C be
an N × N diagonal matrix with (1/p1 , 1/p2 , . . . 1/p N ) on the diagonal. Then,

Re = E[ R] = E[Cx ] = Cµ

The variance-covariance matrix of returns—call it V—is then given by

V = E[( R − Re )( R − Re )> ]
= E[(Cx − Cµ)(Cx − Cµ)> ]
= CE[( x − µ)( x − µ)> ]C T
= CΩC >

21
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL

Note too that as a diagonal matrix C is symmetric; that C −1 is also a diago-


nal matrix, with ( p1 , p2 , . . . p N ) on the diagonal; and that Cp = (1, 1, . . . 1)> , an
N × 1 vector of ones, which we will denote by 1. Having a linear map between
payoffs and returns (for a given p) will allow us to easily toggle back and forth
between results obtained in the ‘payoffs/prices’ framework and corresponding
results in the ‘returns’ framework.
When working in terms of returns, a portfolio takes the form of a vector θ of
portfolio weights, rather than a vector z of asset demands. The portfolio weights
are fractions of the investor’s initial wealth allocated to each asset. The θi ’s and
zi ’s are related by θi = pi zi /p · z = pi zi /W0 . Using the transformation matrix
C,
1 −1
θ= C z
W0
Letting θ0 = qz0 /W0 , the fraction of initial wealth allocated to the riskless as-
set, the investor’s budget constraint (1.14) becomes a constraint requiring the
portfolio weights to sum to 1:

N
1 = θ0 + ∑ θ i = θ0 + θ · 1 (1.24)
i =1

The return on the investor’s portfolio of risky assets is θ · R, and the return
on his whole portfolio of assets is θ0 R F + θ · R. We can show that the investor’s
final wealth is given by

W1 = (θ0 R F + θ · R)W0 .

This follows from (1.15) and the equivalences z0 = (1/q)qz0 = R F θ0 W0 and

z> x = z> C −1 Cx = (C −1 z)> (Cx ) = W0 θ · R.

Expected final wealth is just initial wealth times the expected portfolio re-
turn,

E(W1 ) = (θ0 R F + θ · Re )W0


= R F W0 + θ ( Re − R F 1)W0 (1.25)

where (1.25) uses the fact that θ0 = 1 − θ · 1. The last line expresses the expected
portfolio return in terms of the riskless return R F and the portfolio-weighted
excess returns on the risky assets, Rie − R F .
The variance of final wealth is just (W0 )2 times the variance of the portfolio
of risky returns:

E[(W1 − E(W1 ))2 ] = (W0 )2 E[(θ · R − θ · Re )2 ]


= (W0 )2 θ > E[( R − Re )( R − Re )> ]θ
= (W0 )2 θ > Vθ (1.26)

22
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL

At this point, we could plug (1.25) and (1.26) into the investor’s utility func-
tion, maximize with respect to θ, and retrace all the steps that led to our equi-
librium price expression (1.22). Rather than take that long route to the return
version of the CAPM, we’ll begin with (1.20) and apply the transformation ma-
trix C. Recall that (1.20) was an expression describing an investor’s portfolio of
risky assets:
1
z i = Ω −1 ( µ − R F p )
αi
Pre-multiply both sides by C −1 to get

1 −1 −1
C −1 z i = C Ω (µ − R F p)
αi
1
= C −1 Ω −1 C −1 C ( µ − R F p )
αi
1
= C −1 Ω−1 C −1 (Cµ − R F Cp)
αi

We can now use C −1 zi = W0i θ i , Cµ = Re , Cp = 1, and V = CΩC > —which,


together with the symmetry of C, implies C −1 Ω−1 C −1 = V −1 —to arrive at
risky asset demands in terms of portfolio weights:

1 −1 e
θ i W0i = V ( R − R F 1)
αi

We now sum over all investors. Let W0M = ∑i W0i and

W0i i
θM = ∑( W0M
)θ ,
i

i.e., θ M is an initial-wealth-weighted average of all investors’ portfolios. Again


let 1/α M = ∑i (1/αi ). We then obtain the following analogue to (1.21):

1 −1 e
θ M W0M = V ( R − R F 1).
αM
Rearranging gives
Re − R F 1 = α M W0M Vθ M . (1.27)
As before, we view the market portfolio θ M (and aggregate market initial wealth
W0M ) as exogenous data, dictated by the supplies of the N risky assets. Note
that the ith row of Vθ M is the covariance of the ith asset return with the return
θ M · R on the market portfolio. In a slight abuse of our previous notation, let’s
call this magnitude cov(i, M ). We then obtain the following analogue to Result
1.1:
Result 1.2 (The CAPM, return version). Under the assumptions of this section, an
asset’s equilibrium expected excess return is either positive (Rie − R F > 0) or negative

23
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL

(Rie − R F < 0), depending on whether the asset’s return covaries postively or neg-
atively with the return on the market portfolio. Idiosyncratic risk is not priced—an
asset whose return has a zero covariance with the return on the market portfolio has a
zero expected excess return.
Early in the history of the CAPM, practitioners viewed the market risk pa-
rameter α M as the only unobservable in (1.27); later, doubts were raised about
whether the market portfolio was even observable. In any event, a version of
the CAPM that eliminated α M was seen as desirable. The result—the ‘beta’
form of the CAPM—eliminated α M at the expense of making the CAPM a
purely relative theory of equilibrium returns. This is the form in which the
CAPM is most commonly expressed.
To arrive at it, note that (1.27) has implications for the excess returns on
portfolios, including the market portfolio. The excess return on the market
portfolio is (θ M )> ( Re − R F 1). Pre-multiply both sides of (1.27) by (θ M )> to get
(θ M )> ( Re − R F 1) = α M W0M (θ M )> Vθ M . (1.28)
(θ M )> Vθ M is simply the variance of the return on the market portfolio, which
we denote by var( M ). Equation (1.28) then gives
1
α M W0M = ( θ M ) > ( R e − R F 1) (1.29)
var( M)
Now, plug (1.29) into (1.27) to get the following expression for the equilibrium
expected excess return on the ith asset:
cov(i, M ) M > e
Rie − R F = ( θ ) ( R − R F 1)
var( M )
≡ β i ( θ M ) > ( R e − R F 1) (1.30)
Result 1.3 (The CAPM, beta version). Under the assumptions of this section, the
equilibrium expected excess return on any asset i depends only on the expected excess
return on the market portfolio and the asset’s beta, β i = cov(i, M )/var( M ). In
particular, the only source of systematic variation in the asset’s expected excess return
is variation in the expected excess return on the market portfolio, and the only source
of variation in expected excess returns across assets is variation in their betas.
Remark 1.7. Analogous to our point in Remark 1.6, note that from (1.28), since
var( M ) > 0, the market portfolio earns a positive expected excess return.

Exercize 1.3 (Calculating betas). For this exercize, using the same data from
Exercize 1.2, but interpret the µ from that exercize as Re and the Ω as V. In
Exercize 1.2 you calculated 50 efficient portfolios; assume the 25th portfolio which
you obtained there is the market portfolio θ M here. Write some Matlab code to
calculate the betas for the 5 assets. Report the betas and turn in the code you
wrote.

24
Lecture 2

Some basic theory: Stochastic


discount factors, state prices,
and risk neutral probabilities

The common structure of almost all the models we’ll look at throughout the
course is summarized in the relation

p = E(mx ) (2.1)

where p is an asset’s price, x it’s payoff (a random variable), and m is a ran-


dom variable known alternatively as a stochastic discount factor or pricing ker-
nel.1 The existence of a stochastic discount factor is related to the absence of ar-
bitrage opportunities—loosely, there is an m such that the above pricing equa-
tion holds for all assets if and only if there is no arbitrage. As we’ll eventually
see, economic models—describing agents’ preferences, endowments, trading
opportunities, etc.—will typically give rise to SDFs, and the asset-pricing im-
plications of the different models can be usefully characterized by the different
m’s they imply.
SDFs are also closely linked to the concept of risk-neutral probabilities. Sup-
pose there is a risk-free asset with price q and payoff 1 in all states, let R F = 1/q
denote the gross risk-free rate of return. Risk-neutral probabilities—or a risk-
neutral measure—are probabilities such that when we take expectations using
those probabilities—call the expectation Ê( · )—we get

1
p= Ê( x ) (2.2)
RF

That is, every asset is priced according to the present value of its expected
payoff using the risk-neutral measure.
1 In finance, m is often referred to as a state-price density.

25
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY

We’re going to take an indirect (but standard) route to equations (2.1) and
(2.2), one that begins with the concept of state prices—loosely, a set of prices
(distinct from asset prices) that value a marginal unit of wealth (or consump-
tion or unit of account) in each possible state of the world. A key result—called
the Fundamental Theorem of Asset Pricing—is going to link absence of arbi-
trage with the existence of state prices. A further result—the so-called ‘Repre-
sentation Theorem’—is going prove a type of equivalence between state prices,
risk neutral probabilities, and stochastic discount factors.
The material in this lecture loosely follows the presentation in Duffie [Duf92].
Another good reference is John Cochrane’s book [Coc01]. Dybvig and Ross’s
entry on “Arbitrage” in The New Palgrave: Finance [DR89] and the first chapter
of Ross’s book [Ros04] are also good sources for some of this material, espe-
cially the two theorems.

2.1 Basic structure of the two-period model


2.1.1 Assets and payoffs
As before, there are two periods, zero and one, or ‘today’ and ‘tomorrow’. De-
cisions are taken today, and uncertainty resolves (and payoffs realized) tomor-
row.
Tomorrow’s uncertainty is discrete: there are S possible states of the world
that may be realized, which we’ll index by s ∈ {1, 2, . . . S} ≡ S . State s occurs
with probability πs > 0, and π denotes the vector of probabilities.
There are N risky assets, indexed by i = 1, 2, . . . N. Assets pay off in units
of account or consumption, depending on the context. Asset i’s payoff in state
s is denoted xi,s , and X is the N × S matrix describing payoffs for all assets in
all states. Thus, each row of X corresponds to an asset, and each column to a
state. The expected payoff of asset i is ∑s xi,s πs ≡ E( xi ), and in matrix notation

E( X ) ≡ Xπ. (2.3)

A portfolio is just a vector z ∈ R N . There are no restrictions on short sales,


unless otherwise noted. A portfolio z pays off ∑i zi xi,s in state s, so—in matrix
notation—z> X ∈ RS describes the portfolio’s payoffs in every possible state.
We may also write the portfolio payoff as X > z when it is necessary to represent
it as a column vector.
Having defined states, assets, payoffs, and portfolios, we’re in a position
to define two concepts that relate to them—asset redundancy and asset market
completeness. A redundant asset is one whose payoffs can be replicated by a
portfolio of other assets, while asset market completeness means any vector in
RS can be realized by some portfolio.
Definition 2.1 (Redundant assets). Asset i is redundant if there exists a portfolio z
such that zi = 0 and, ∀s, xi,s = ∑ j z j x j,s . This is equivalent to saying there is a linear
dependency in the rows of X.

26
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY

Definition 2.2 (Complete asset markets). Asset markets are complete if for any
y ∈ RS there is a portfolio z ∈ R N with X > z = y. Since linear combinations of
portfolios are again portfolios, we can also state market completeness as: there exist S
portfolios, zi , i = 1, 2, . . . S, such that their payoff vectors { X > zi : i = 1, . . . S} form
a basis for RS .
We’ll often assume there are no redundant assets. We may or may not as-
sume markets are complete. The following easy exercize is to help make sure
you understand the concepts of redundancy and completeness.

Exercize 2.1. Suppose there are two assets and three states. The payoff matrix is
 
1 1 1
X=
0 2 0

Show that asset markets are incomplete, the simple way, by showing a y ∈ R3
that can’t be attained by any portfolio z ∈ R2 . Now, imagine adding a third asset,
and consider two cases. First, add an asset that would be redundant, given the
first two. Make sure to demonstrate that it is redundant. Second, add a third asset
that completes the market. Demonstrate that it completes the market by showing
portfolios z1 , z2 and z3 that attain the basis vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1).

2.1.2 Investors
There will be H investors, indexed by h = 1, 2, . . . H. Investors (potentially)
differ in their preferences, initial wealth, and/or endowments. Investor h’s
preferences, in their most general form, will be described by a utility function
U h : R × RS → R. The arguments are either consumption values or units of
account, as the context dictates.
In the most general case, we’ll write U h (c0 , c1 ) as shorthand for

U h (c0 , c1,1 , . . . , c1,s , . . . c1,S ).

A familiar special form that U h might take is the time-separable, discounted


expected utility form (which is separable across both dates and states):
S
U h ( c0 , c1 ) = u h ( c0 ) + β ∑ πs uh (c1,s ). (2.4)
s =1

In some cases, we may assume investors only care about final wealth or
consumption: U h (c1 ). In any case—whether preferences are over (c0 , c1 ) or
just c1 —we assume that investors’ utilities are continuous (or at least upper
semicontinuous). We’ll also assume investors’ utilities are strictly increasing in
all their arguments—they prefer more to less.

27
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY

Investors will have initial wealth W0h , and may receive a stochastic endow-
ment eh = (e1h , e2h , . . . eSh ) in period one. A feasible portfolio for investor h will
obey the budget constraint
W0h ≥ p · z. (2.5)
If they receive an endowment in period one, then the utility from their choice
of portfolio will be
U h (W0h − p · z, eh + X > z). (2.6)
where (2.6) makes use of the fact that, when investor’s prefer more to less, (2.5)
holds as an equality.
Note that the budget set

Bh = {(c0 , c1 ) ∈ R+ × RS+ : c0 = W0h − p · z, c1 = eh + X > z, some z ∈ R N }


(2.7)
is clearly closed and convex. If it is bounded—which we’ll see has to do with
the presence or absence of arbitrage—it is also compact, and therefore an opti-
mal choice of z will exist.
When there’s no chance of confusion—or a representative agent—we’ll drop
the superscript h to avoid unnecessary clutter.

2.1.3 Prices & arbitrage


We distinguish two sorts of prices—asset prices and state prices. The former
are the usual prices investors face in the market: they specify the number of
units of account (or consumption) at date zero an investor has to pay for a
unit (or share) of each asset—which is a claim to the payoffs described above.
The latter—usually just a theoretical construct—specify the value in units of
account (or consumption) at date zero of an additional unit of account (or con-
sumption) in each of the S possible states at date one. All our models will have
asset prices, of course; state prices will exist under certain conditions, which
we’ll make clear shortly.
Investors take asset prices as given. The price of asset i is pi and p ∈ R N is
the vector of asset prices. The price of a portfolio z is then just p · z.
The state prices (when they exist) will be denoted by ψs for s = 1, 2, . . . S.
The state price vector is ψ ∈ RS . If state prices exist, they can be used to
price the payoffs of any of the assets. For example, asset i’s payoffs in each
state ( xi,1 , xi,2 , . . . , xi,S ) form a vector in RS , and their value at state prices ψ is
∑s ψs xi,s . State prices, depending on the context, are sometimes referred to as
Arrow-Debreu prices.
Since a unit of asset i is equivalent to a set of claims to i’s payoffs in each
of the S possible date-one states, when a state price vector exists, we say that it
prices assets correctly if:
S
pi = ∑ ψs xi,s . (2.8)
s =1
holds for all i = 1, 2, . . . N.

28
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY

Remark 2.1. The existence of state prices is separate from the concept of asset market
completeness. We can have a state price vector that tells us the value of a claim to one
unit of account in state s (and only state s) for each s, even when a set of S assets with
those payoffs—which would constitute a basis for RS —does not exist. Whether or not
markets are complete does matter for the uniqueness of a state price vector.
An arbitrage, or arbitrage portfolio, is—in words—a portfolio that either (a)
costs nothing today, has a positive payoff in at least some states tomorrow, and
no chance of a negative payoff in any state, or (b) costs less than nothing today
(it has a negative price), and has a nonnegative payoff in every state tomorrow.
Formally:
Definition 2.3 (Arbitrage). z is an arbitrage if either of the following two conditions
hold:
p · z ≤ 0 and X > z > 0 (2.9)
or
p · z < 0 and X > z ≥ 0 (2.10)
We say there is no arbitrage if no such z exists.
Remark 2.2. The two conditions in Definition 2.3 can be combined into one if we
stack − p> (which is 1 × N) and X > (which is S × N) into a single (S + 1) × N
matrix. Then, (2.9) and (2.10) can be stated as: an arbitrage is a portfolio z such that
 >
−p
z>0 (2.11)
X>

where the matrix product on the left is S + 1 × 1, and you’ll recall that for vectors
w > 0 means all wi ≥ 0 and at least some wi > 0. There is no arbitrage if no such z
exists.
Since if z is a portfolio, so is αz for any real number α, an arbitrage (if one
exists) can be run at any scale. Note that the presence or absence of arbitrage is
a feature of asset prices and payoffs together.
Example 2.1. Suppose asset i is redundant: there is a portfolio z ∈ R N with zi = 0
and ( xi,1 , xi,2 , . . . xi,S ) = z> X. If pi 6= p · z, there is an arbitrage opportunity. To see
this, suppose pi > p · z, and consider the portfolio z − ei , where ei is the ith unit basis
vector in R N . This portfolio corresponds to shorting one unit of asset i and buying the
replicating portfolio z. Then,

X > ( z − ei ) = X > z − X > ei


 
xi,1
 xi,2 
= X> z −  . 
 
 .. 
xi,S
=0

29
2.2. THE FUNDAMENTAL THEOREM LECTURE 2. BASIC SDF THEORY

so the payoffs in all states tomorrow are all zero. But, the cost of the portfolio is p ·
(z − ei ) = p · z − pi < 0, which therefore presents an arbitrage opportunity. For the
case of pi < p · z, just consider the portolio −z + ei .
As the next section shows, arbitrage, asset prices and the existence state
prices are intimately linked by a basic result in asset pricing. Once we have
state prices, it will be easy to construct stochastic discount factors and risk
neutral probabilities.

Exercize 2.2. Suppose that there are two assets ( N = 2) and three states (S =
3). The payoff matrix is  
1 1 1
X=
0 2 0
Characterize the set of prices vectors p = ( p1 , p2 ) that are consistent with there
being no arbitrage opportunities. Show all the steps in your reasoning. Hint:
There is one inequality that p1 and p2 need to obey for there to be no arbitrage.

2.2 The Fundamental Theorem of Asset Pricing


This result is sometimes known as the Fundamental Theorem of Finance.2
Loosely, it states an equivalence between the absence of arbitrage, the existence
of a strictly positive state price vector, and the existence of an optimal portfolio
choice for an investor who prefers more to less. Formally, in the context of our
model:
Theorem 2.1 (The Fundamental Theorem of Asset-Pricing). The following are
equivalent:
1. The asset price vector p and payoff matrix X satisfy no arbitrage—there is no
portfolio z satisfying (2.11).

2. There exists a strictly positive state price vector ψ that correctly prices assets—
that is, there is a ψ  0 satisfying

p = Xψ, (2.12)

the matrix version of (2.8).


3. There exists a finite optimal portfolio choice for a (hypothetical) investor who
prefers more to less.
2 See chapter one of Ross [Ros04], available from Princeton University Press here: http:

//press.princeton.edu/chapters/s7834.pdf. Other useful references are the beginning of


Duffie’s book [Duf92] or Dybvig and Ross’s New Palgrave entry on “Arbitrage” [DR89].

30
2.2. THE FUNDAMENTAL THEOREM LECTURE 2. BASIC SDF THEORY

P ROOF. Condition (3) of the theorem clearly implies condition (1)—if some
investor who prefers more to less has an optimal choice, there must be no arbi-
trage. If we can show that (1) implies (2), and (2) implies (3), we’ll be done.
So, let’s begin with (1) implies (2). Assume then that (1) holds: there is
no z ∈ R N satisfying (2.11). We’ll use a separating hyperplane argument to
contstruct a ψ that satisfies (2). Let’s first define what we mean by a hyperplane
in Rn :
Definition 2.4 (Hyperplanes). A hyperplane in Rn is a set of the form H ( p, α) =
{ x ∈ Rn : p · x = α}, for a given p ∈ Rn and α ∈ R. Associated with any hyperplane
are two closed half-spaces, H ( p, α)− = { x ∈ Rn : p · x ≤ α} and H ( p, α)+ =
{ x ∈ Rn : p · x ≥ α}. The open half-spaces H ( p, α)−− and H ( p, α)++ are defined
similarly, but with strict inequalities. Two sets A, B ⊂ Rn are said to be separated by
a hyperplane H ( p, α) if one of the sets is contained wholly in H ( p, α)− and the other
set is contained wholly in H ( p, α)+ . The sets are said to be strongly separated if one of
them is actually wholly contained in one of H ( p, a)’s open half-spaces. A hyperplane
H ( p, α) is said to support a set A at a point y ∈ A if y ∈ H ( p, α) and A is wholly
contained in either H ( p, α)− or H ( p, α)+ .
Keeping that definition in mind, and referring back to equation (2.11), let
 >
−p
M=
X>

a S + 1 × N matrix. For any portfolio z ∈ R N , Mz is just a vector in RS+1 , and


the set
K = {y ∈ RS+1 : y = Mz for z ∈ R N }
is a subset of RS+1 . In fact it is a special kind of subset: K is a closed linear
subspace, and is hence also a closed, convex cone.
Definition 2.5 (Linear subspaces and cones). A set A ⊂ Rn is a linear subspace
if x, y ∈ A and a, b ∈ R imply ax + by ∈ A. A linear subspace A is a closed linear
subspace if it’s also a closed set (in the usual sense in Rn ). A set B ⊂ Rn is a cone if
x ∈ B and a ∈ R, a > 0, imply ax ∈ B. A cone B is a closed, convex cone if it’s also
a closed, convex set. Clearly, any closed linear subspace of Rn is also a closed convex
cone in Rn .
The no arbitrage condition is precisely the statement that the only point
of intersection between K and the nonnegative orthant RS++1 is the origin—
K ∩ RS++1 = {0}. We are ready for the separation argument. We’ll use (without
proving) the following result, due to Samuel Karlin [Kar59]:3
Result 2.1. Let V be a closed convex cone in Rn intersecting the nonnegative orthant
only at the origin. Then, there is φ ∈ Rn , φ  0, such that φ · y ≤ 0 for all y ∈ V.
3 See Theorem B.3.5 on page 404 of [Kar59]. A lot of the book is available for free preview on

Google Books; see here: http://books.google.com/books?id=qgMfct0YnFQC

31
2.2. THE FUNDAMENTAL THEOREM LECTURE 2. BASIC SDF THEORY

Essentially, Karlin’s result is saying that there is a hyperplane H (φ, α), with
φ  0, α = 0, that separates the cone V from the nonnegative orthant.
In our case, using Karlin’s result, if there is no arbitrage, there is a strictly
positive φ ∈ RS+1 satisfying φ · y ≤ 0 for all y ∈ K = {y ∈ RS+1 : y =
Mz for z ∈ R N }. Since our K is a linear subspace, y ∈ K implies −y ∈ K. This
means φ must in fact satisfy φ · y = 0 for all y ∈ K, since φ · y < 0 for any y ∈ K
would mean φ · (−y) > 0, contradicting the fact that, since −y ∈ K, it should
obey φ · (−y) ≤ 0.
What does φ · y = 0 (∀y ∈ K ) mean? Using the definition of K, we must
have, ∀z ∈ R N ,

0 = φ · ( Mz)
= (φ> M )z

which is only possible—if you think about it, since it must hold for every choice
of z—if φ> M = 0.
Since φ ∈ RS+1 , we may abuse notation slightly and write it as (φ0 , φ1 , φ2 , . . . φS ).
Then, using the definition of M, and the rules for products of transposes,
 
φ0  
 φ1  φ1
 φ2 
 
    
0 = − p X  φ2  = − pφ0 + X  . 
 ..   .. 
 . 
φS
φS

Now, let ψ ≡ (1/φ0 )(φ1 , φ2 , . . . φS )—which is feasible since φ0 > 0—and rear-
range to obtain
p = Xψ
which completes the proof that no arbitrage (condition 1 of the Theorem) im-
plies the existence of a state price vector (condition 2).
Now, for (2) implies (3). This part is easy: we just construct a hypothet-
ical investor whose marginal utilities of consumption in each of the S states
tomorrow are given by the state price vector ψ = (ψ1 , . . . ψS ). For example,
define

U (W0 − p · z, e + X > z) ≡ W0 − p · z + ψ> (e + X > z)


= W0 − p · z + ψ · e + ψ> X > z
= W0 − p · z + ψ · e + p · z
= W0 + ψ · e

which clearly has a finite optimal portfolio choice—z = 0 would work—since


the investor is indifferent amongst all choices of z.
The theorem does not guarantee uniqueness of the state price vector. In
fact, if asset markets are incomplete, it’s easy to see how non-uniqueness may

32
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY

arise. When markets are incomplete, there are fewer than S linearly indepen-
dent asset—i.e., X has less than S linearly independent rows, so p = Xψ repre-
sents fewer than S independent equations in the S variables (ψ1 , ψ2 , . . . ψS ).
Suppose, for example, that the N assets have linearly independent payoff
vectors (no redundant assets), but that N < S—fewer assets than states. Then,
the rank of X is N, and the set of solutions to the linear system p = Xη, that is,
the set {η ∈ RS : Xη = p}, has dimension N − S > 0.4

Exercize 2.3. Using the same X from exercize 2.2, and assuming p is any p
satisfying the condition you derived in exercize 2.2, characterize the set of possible
state price vectors ψ satisfying p = Xψ.

Showing that asset market incompleteness leads to non-uniqueness of the


state price vector is not quite the same as showing that complete markets guar-
antee uniqueness. We’ll do that now:

Result 2.2. Suppose there is a strictly positive state price vector ψ which correctly
prices assets ( p = Xψ). If asset markets are complete, then ψ is the unique state price
vector.
P ROOF. An easy way to show this is to note that when markets are com-
plete, for each unit basis vector es in RS there is a portfolio zs such that X > zs =
es . Since ψ correctly prices assets, we must have

p · zs = ψ · es = ψs (2.13)

for all s = 1, 2, . . . S. Since these equations must hold for any state price vector
that correctly prices assets, state prices are uniquely determined.5

2.3 The Representation Theorem: Going from state


prices to SDFs and risk neutral probabilities
If we were working in infinite-dimensional spaces, the proof of this result
would require some serious mathematical machinery. With S being finite-
dimensional, though, it involves little more than some algebra. Like Theorem
2.1, it is an equivalence result. In this case existence of a state price vector is
linked to the existence of a stochastic discount factor—see equation (2.1)—and
the existence of risk-neutral probabilities (2.2).
4 Seeany linear algebra text, for example Shilov [Shi77].
5 Note that by no arbitrage—which, by Theorem 2.1, must hold if there is a ψ  0 that correctly
prices assets—if ẑs also gives X > ẑs = es , then p · ẑs = p · zs , so the left-hand side of (2.13) is a
unique value for each s.

33
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY

Before we can state it, though, it will be useful to have some notation that
represents, for v, w ∈ RS , the vector (v1 w1 , v2 w2 , . . . vS wS ) ∈ RS . Let vw—
with no ‘·’, no >—denote this vector. Recall that π = (π1 , π2 . . . πS )  0 are
probabilities over the S possible states. Then,

S
E(vw) = ∑ πs vs ws .
s =1

Now, suppose we have a positive state price vector ψ that correctly prices
assets. Let ms = ψs /πs > 0, which is feasible since all states have positive
probability. Then, for any asset i,

S
pi = ∑ xi,s ψs (2.14)
s =1
S
= ∑ πs (ψs /πs )xi,s (2.15)
s =1
S
= ∑ πs ms xi,s (2.16)
s =1
= E(mxi ) (2.17)

In the last line, xi denotes the ith row of X. The equation shows that there is an
m ∈ RS , m  0, such that pi = E(mxi ) for any asset i. By the linearity of ex-
pectations, this is also the case for any portfolio z of assets: p · z = E[m( X > z)].
Conversely, suppose there is an m  0 such that pi = E(mxi ) holds for
any asset i, and that all states have positive probability. Then ψ defined by
ψs = πs ms (∀s ∈ S) is a strictly positive state price vector that correctly prices
assets. This is the first part of the representation theorem—there is a state price
vector ψ  0 that correctly prices assets if and only if there is a random variable
m  0 such that pi = E(mxi ) for every asset i.
For the next part of the result—which relates to risk-neutral probabilities—
we need another bit of notation, and a risk-free asset. Since we’ll be considering
expectations taken with respect to other probabilities (the risk-neutral proba-
bilities that we’ll be constructing), we need some notation for expectations that
indicates the probabilities we’re using. So, for any vector y ∈ RS and any
probabilities6 φ ∈ RS , let Eφ denote the expectation of y with respect to φ:

S
Eφ ( y ) = ∑ φs ys
s =1

With that notation, we can write expectations with respect to the true, or objec-
tive, probabilities π as Eπ ( · ). We may also write E( · ) for expectations with
respect to π when there is no chance for confusion.
6φ ∈ RS are probabilities if φ ≥ 0 and ∑s φs = 1.

34
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY

This part of the result also assumes that we have a risk-free asset. Let q
denote the risk-free asset’s price and 1 = (1, 1, . . . 1), a row of ones, its payoff
vector. I want to keep this vector distinct from the payoffs in X, but at the same
time, statements of the form ‘prices assets correctly’ or ‘for all assets’ should be
understood as applying not just to p and X, but to
 
q
p

and
1
 

X
Now, suppose there is a stochastic discount factor m  0 that prices assets
according to pi = Eπ (mxi ) and q = Eπ (m1). Note that Eπ (m1) = Eπ (m), so

1 1
= = RF
Eπ ( m ) q

the gross risk-free rate of return, as we defined it in Lecture 1. Note, too, that
the vector
1 1
φ≡ πm = ( π m , π2 m2 , . . . π S m S )
Eπ ( m ) Eπ ( m ) 1 1
obeys φ ≥ 0 and ∑s φs = 1. Thus, φ is a probability measure on S . Now, divide
both sides of pi = E(mxi ) by q = E(m), and use the above considerations to
obtain
1 1
p = E(mxi )
q i E( m )
1
= E( mx )
E( m ) i
πs ms
=∑ xi,s
s ∑r πr mr
= ∑ φs xi,s
s
= Eφ ( x i )

Using, q = 1/R F , we may write this more suggestively as saying that for
any asset i,
1
pi = Eφ ( x i )
RF
—that is, i’s price is just the discounted present value of its expected payoff
under the probabilities φ.
Conversely, suppose there are probabilities φ such that the last equation
holds for all assets i. We can go in the other direction to derive a stochastic

35
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY

discount factor m, by defining ms = qφs /πs . The steps in the transformation


are

pi = q ∑ φs xi,s
s
qφs
= ∑ πs ( )x
s πs i,s
= ∑ πs ms xi,s
s
= Eπ (mxi )

In sum, we’ve shown:


Theorem 2.2 (Representation theorem). Suppose all the states s = 1, 2, . . . S have
positive probability. Then the following are equivalent (and, by Theorem 2.1, equiva-
lent to no arbitrage):
1. There is a strictly positive state price vector ψ that prices assets correctly—i.e.

p = Xψ.

2. There is a strictly positive stochastic discount factor m that prices assets—i.e.,


for any asset i,
pi = E(mxi ).

3. There are strictly positive probabilities φ that price assets according to the risk-
neutral pricing formula
1
pi = Eφ ( x i ).
RF
Remark 2.3. Since the SDF representation (2) holds for any asset—including a risk-
less asset with payoff vector 1—if q denotes the riskless asset’s price, then (2) gives us
q = E(m1) = E(m). Letting R F = 1/q, we also have R F = 1/E(m).
Remark 2.4. The SDF representation also has a covariance interpretation. For any
asset i, let cov( xi , m) denote the covariance between asset i’s payoff xi and the SDF
m—i.e., cov( xi , m) = E(mxi ) − E(m)E( xi ). With this notation, and using the
previous remark, the SDF representation can be written as

pi = E(mxi )
= E(m)E( xi ) + cov( xi , m)
E( x i )
= + cov( xi , m)
RF
That is, the price of asset i is the discounted present value of its expected payoff plus the
covariance of its payoff with the SDF. Note the analogy to equation (1.23) from Lecture
1.

36
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY

Exercize 2.4. Suppose you observe an economy with two states and two assets.
The price vector you observe is p = (1, 1), and the asset payoffs are
 
1.008 1.008
X=
0.905 1.235

so the first asset is riskless and the second risky. The probabilities of the two states
are π = ( 12 , 12 ). Write a little M ATLAB code to find state prices ψ that correctly
price the assets; the stochastic discount factor m; and the risk neutral probabilities
φ. Turn in your results and your code.

2.3.1 The representations in return form


The stochastic discount factor and risk-neutral pricing representations have
useful versions when we work in terms of asset returns, as opposed to prices
and payoffs. Let Ri = (1/pi ) xi ∈ RS denote the vector of gross returns for
asset i; if state s occurs, a unit investment in i returns Ri,s . We’ll continue to
write R F = 1/q for the gross risk-free return.
In terms of returns, the stochastic discount factor representation is easily
seen to be:
1 = E(mRi ). (2.18)
which follows simply from dividing both sides of (2) in Theorem 2.2 by pi .
This is true for R F as well, since—as noted above—R F = 1/E(m), implying
1 = R F E(m) = E(mR F ).
As was the case with the prices and payoffs version, (2.18) can be written in
terms of covariance:

1 = E(mRi )
= cov( Ri , m) + E( Ri )E(m)
1
= cov( Ri , m) + E( Ri )
RF

using E(m) = 1/R F in the last line. Rearranging, we get:

E( Ri ) − R F = − R F cov( Ri , m) (2.19)

That is, the expected excess return on asset i depends negatively on the covari-
ance between the return on asset i and the stochastic discount factor.
Thinking about equation (2.19), suppose there is an asset whose return, call
it Rm , is perfectly correlated with m. To keep things simple, suppose Rm = λm

37
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY

for some real scalar λ. Applying (2.19) to that asset’s return gives

E( Rm ) − R F = − R F cov( Rm , m)
R
= − F cov( Rm , Rm )
λ
RF
= − var( Rm ) (2.20)
λ
Combining (2.19) and (2.20)—and using cov( Ri , m) = (1/λ)cov( Ri , Rm )—we
obtain the beta representation

E( R i ) − R F = β i (E( R m ) − R F ) (2.21)

where β i = cov( Ri , Rm )/var( Rm ). This is exactly anologous to the CAPM


formula (1.30), with Rm in the role of the market portfolio.
The transformation of the representation in terms of risk-neutral probabil-
ities is even simpler: divide both sides of condition (3) in Theorem 2.2 by pi ,
and multiply both sides by R F to get:

Eφ ( R i ) = R F (2.22)

—that is, the expected return on any asset, under the risk-neutral probability
measure, is simply the riskless rate of return. Note that (2.22) implies that we
can write i’s return Ri (that is, the random variable, not the expectation) as

Ri = R F + vi

where vi is a random variable that has a zero expectation under the probabili-
ties φ.
The utility of either the stochastic discount factor representation or the risk-
neutral probability representation is that, once we know m or φ in a given econ-
omy (assuming they are unique), we can price all assets by these very simple
formulæ.

2.4 Investor utilities and the pricing representations


All our results thus far have been in an essentially preference-free context,
apart from assuming investors prefer more to less. Now, we’ll relate our pric-
ing representations to investors’ utility functions and optimal choices.
Recall that the typical investor’s utility function is given by U h (c0 , c1 ) : R ×
RS → R, and the investor has the budget set

Bh = {(c0 , c1 ) ∈ R+ × RS+ : c0 = W0h − p · z, c1 = eh + X > z, some z ∈ R N }

Assume that U h is increasing and continuous. Bh is obviously convex and

38
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY

closed. If there is no arbitrage, it must also be bounded, hence compact.7 There-


fore, a solution to the problem max{U h (c0 , c1 ) : (c0 , c1 ) ∈ Bh } must exist.
Our first result will relate the investor’s vector of marginal utilities at his
optimal choice to state prices. Assume U h is also concave and differentiable.
Denote the partial derivatives of U h by
∂ h
U0h (c0 , c1 ) = U ( c0 , c1 )
∂c0
and
h ∂
U1,s ( c0 , c1 ) = U h ( c0 , c1 ).
∂c1,s
For compactness, let U1h (c0 , c1 ) denote the vector of S partial derivatives
h h
(U1,1 (c0 , c1 ), . . . U1,S (c0 , c1 )).
Our first result is:
Result 2.3. Let (c0∗ , c1∗ ) ∈ Bh , and (c0∗ , c1∗ )  0. Then, (c0∗ , c1∗ ) is an optimal choice
if and only
1
U1h (c0∗ , c1∗ )
U0h (c0∗ , c1∗ )
is a state price vector.
This follows simply from the first-order conditions to the investor’s utility
maximization problem. With concave preferences and a convex feasible set,
those conditions are necessary and sufficient for an interior optimum. Substi-
tute the budget constraint into the utility function—as in (2.6)—and write the
problem as
max U h (W0h − p · z, eh + X > z)
z
Differentiate with respect to the ith asset, zi , to obtain
S
−U0h (c0 , c1 ) pi + ∑ U1,s
h
(c0 , c1 ) xi,s = 0.
s =1

If a feasible (c0∗ , c1∗ ) satisfies this condition for all i = 1, 2, . . . N, it is an opti-


mal choice; conversely a (c0∗ , c1∗ )  0 that is an optimal choice will satisfy this
condition for every i. But this means
S
1
pi =
U0 (c0∗ , c1∗ )
h ∑ U1,s
h
(c0∗ , c1∗ ) xi,s
s =1
7 Thisis easiest to see indirectly, by invoking Theorem 2.1, and using the state price vector ψ.
Since p = Xψ, p · z = ψ> X > z. Thus, c0 = W0 − ψ> X > z and c1 = e + X > z. It’s clear that if
c0 → +∞, then we must have ψ> X > z → −∞. Since ψ  0, this means some component of X > z
must go to −∞. But then, some component of c1 would eventually become negative. Conversely,
if some c1,s → +∞, some element of X > z must diverge to +∞. But, then ψ> X > z would go to +∞,
eventually making c0 negative.

39
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY

or in vector form (treating U1h as an S × 1 column vector)


1
p= XU1h (c0∗ , c1∗ ).
U0h (c0∗ , c1∗ )
Remark 2.5. Under complete markets, any state price vector is unique. In fact, when
markets are complete (and there is no arbitrage), we can write the investor’s problem
equivalently, with no reference to asset choices at all, as

max{U h (c0 , c1 ) : W0 + ψ · e = c0 + ψ · c1 }
c0 ,c1

where ψ is the unique state price vector.

Exercize 2.5. Prove that last remark.

Exercize 2.6 (Reinterpreting the CAPM). Doing this exercize with many in-
vestors would present a number of complications that would obscure the point, so
suppose there is a single representative investor. His utility function over (c0 , c1 )
is
S
b
U (c0 , c1 ) = c0 + ∑ πs ( ac1,s − (c1,s )2 ).
s =1
2
In equilibrium he must hold an exogenous supply of assets, so you can treat his
period-one consumption vector (in equilibrium) as some exogenous c1∗ —i.e., equi-
librium prices are such that choosing c1∗ is optimal. Derive an expression for the
state price vector, and use it to show that, for any asset i

E( x i )
pi = − bcov(c1∗ , xi ),
RF

where cov(c1∗ , xi ) = E(c1∗ xi ) − E(c1∗ )E( xi ). Hint: Also use the state price vector
to price the riskless asset with price q and payoff 1.

For the remainder of the discussion, let’s specialize preferences to the time-
separable, expected utility form,
S
U h ( c0 , c1 ) = u h ( c0 ) + β ∑ πs uh (c1,s ).
s =1

Then,

1 h ∗ ∗
π1 u0h (c1,1
∗ ) π2 u0h (c1,2
∗ ) πS u0h (c1,S
∗ )
U ( c
1 0 1 , c ) = ( β , β , . . . β ) (2.23)
U0h (c0∗ , c1∗ ) u0h (c0∗ ) u0h (c0∗ ) u0h (c0∗ )

40
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY

is the form of the state price vector associated with the optimal choice (c0∗ , c1∗ ).
With (2.23) as the form of the state price vector, for any asset i we have
S πs u0h (c1,s
∗ )
pi = ∑ β
u0h (c0∗ )
xi,s
s =1
S
= ∑ πs ms xi,s
s =1
= E(mxi )
where
u0h (c1,1
∗ ) u0h (c1,2
∗ ) u0h (c1,S
∗ )
m = (β ,β ,...β ) (2.24)
u0h (c0∗ ) u0h (c0∗ ) u0h (c0∗ )
is the form taken by the stochastic discount factor under time-separable ex-
pected utility. In a slight abuse of notation, let βu0h (c1∗ )/u0h (c0∗ ) stand for the
whole vector of marginal rates of substitution in (2.24). We then get the famil-
iar form  0 ∗ 
u (c )
pi = E β 0h 1∗ xi .
u h ( c0 )
And, in terms of excess returns and covariances, we have
u0h (c1∗ )
E( Ri ) − R F = − R F cov( Ri , m) = − R F cov( Ri , β ) (2.25)
u0h (c0∗ )
Equations such as the last two will play a prominent role in the next lecture,
when we get to Lucas’s model and, after that, the equity premium puzzle.
Next, we can derive the representation of pricing under risk neutral prob-
abilities when we take into account investor preferences. As above, we can go
from a stochastic discount factor to risk neutral probabilities using
πs ms
∑ πs ms xi,s = E(m) ∑ E(m) xi,s
s s
1
RF ∑
= φs xi,s
s
1
= Eφ ( x i )
RF
where φ = πm/E(m), and we’ve used E(m) = 1/R F . The risk neutral proba-
bilities under expected utility then have the form

1 u0h (c1,1
∗ ) u0h (c1,2
∗ ) u0h (c1,S
∗ )
φ= ( π 1 β , π 2 β , . . . π S β ) (2.26)
E( βu0h (c1∗ )/u0h (c0∗ )) u0h (c0∗ ) u0h (c0∗ ) u0h (c0∗ )
Finally, note that the gross risk-free rate, under expected utility, obeys
1
RF = (2.27)
E( βu0h (c1∗ )/u0h (c0∗ ))

41
Lecture 3

Some notes on Lucas (1978),


Mehra-Prescott (1985), and the
Equity Premium Puzzle

In this Lecture, we move beyond two periods to consider asset prices in economies
with an infinite time horizon. Lucas’s 1978 model [Luc78] is the seminal piece
of work in this vein—almost all developments since can be thought of as adding
particular bells and whistles to Lucas’s simple framework.
Mehra and Prescott [MP85]—and the ‘equity premium puzzle’ they uncov-
ered using Lucas’s framework—provided much of the impetus for adding bells
and whistles to the model.
The longer time horizon creates a few wrinkles in our approach, compared
to the two-period framework in which we studied the basic theory.
For one, whereas in our two-period model asset payoffs and returns and the
stochastic discount factor were random variables (and prices were just values
to be determined at the initial date), in these models (and all that we’ll look at
after them), payoffs, returns, prices, and stochastic discount factors will all be
stochastic processes.
What happens to representations like p = E[mx ] = (1/R F )Eφ [ x ]? We’re
going to take it for granted—without going through all the mathematical niceties—
that these translate in a straightforward way. The stochastic discount factor
representation, for example, will become

p t = Et [ m t +1 x t +1 ] ,

where Et [ · ] denotes expectation conditional on information available at date


t. We’ll discuss how risk-neutral pricing gets modified below.
Finally, we’ll now need to take account of long-lived assets. Equity, for
example, will be an asset that pays some dividend d (usually in units of con-
sumption) at each date. The payoff at t + 1 to holding a share of equity, though,

42
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

is more than the dividend at t + 1, since—after the dividend is paid—one still


owns the equity share, which could be sold at the date t + 1 price: the payoff
becomes xt+1 = dt+1 + pt+1 . The return on equity is then

d t +1 + p t +1
R t +1 = ,
pt

and the stochastic discount factor representation for pt becomes

pt = Et [mt+1 (dt+1 + pt+1 )] . (3.1)

Note that the last expression gives us a form of discounted present value. To
see it, update (3.1) one period (to give an expression for pt+1 ), then substitute
this into the right-hand side of (3.1):

pt = Et [mt+1 (dt+1 + pt+1 )]


= Et [ m t +1 d t +1 ] + Et [ m t +1 p t +1 ]
= Et [mt+1 dt+1 ] + Et [mt+1 Et+1 [mt+2 (dt+2 + pt+2 )]]
= Et [ m t +1 d t +1 ] + Et [ m t +1 m t +2 d t +2 ] + Et [ m t +1 m t +2 p t +2 ]

In arriving at the last line, we used the Law of Iterated Expectations, one state-
ment of which is Et [Et+1 [yt+2 ]] = Et [yt+2 ].1 In any case, you can probably see
the pattern that’s developing. Let

ρt,t+k = mt+1 mt+2 mt+3 · · · mt+k (k = 1, 2, . . .).

Then, assuming
lim Et [ρt,t+k pt+k = 0]
k→∞
we obtain

pt = ∑ Et [ρt,t+k dt+k ] . (3.2)
k =1

3.1 Lucas (1978) “Asset prices in an exchange econ-


omy”
In classical general equilibrium theory (see, for example, [McK02]), an exchange
economy, sometimes called an endowment economy, is one in which there is no
production—agents trade from initial endowments, given prices, and equilib-
rium prices must be such that all markets clear.
Lucas’s model is likewise one without production—there are productive
assets that yield dividends (in the form of a consumption good) each period,
and those dividends are an exogenous stochastic process. Aggregate output
1 A more general statement is: If E[ x | I ] denotes expectation conditional on an information set

I, then E[E[ x | I 0 ] | I ] = E[ x | I ] for I ⊂ I 0 . Over time, information necessarily accrues: It ⊂ It+1 .


Another way of stating it is: forecast errors are unforecastable—E[ x − E[ x | It+1 ] | It ] = 0

43
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

(equal to aggregate consumption, because the good is nonstorable) is thus ex-


ogenous. Agents (a mass of identical agents, or one representative agent) trade
claims to the assets, and equilibrium prices must be such that shares of all as-
sets are held, agents are maximizing their utility, and the economy’s resource
constraint is satisfied.

3.1.1 Some historical context


We’ve talked about asset market completeness, but not about what, in finance,
is known as market efficiency. This is distinct from economists’ standard notions
of efficiency (the Pareto criterion), and has to do with the extent to which asset
prices reflect available information. The efficient markets hypothesis (EMH) as
formulated by Fama [Fam65, Fam70] and others requires that, at a minimum,
future asset prices cannot be forecast from past prices (the ‘weak form’ of the
EMH). Stronger forms require that price changes not be forecastable using any
publicly available information. Typical time series tests—of which there were
a lot done in the 1970s—focussed on whether stock returns were predictable
or whether stock prices followed random walks. Campbell, Lo and MacKinlay
[CLM96] offer some intuition for these tests: Suppose

p t = Et [ F ∗ ],

where F ∗ is the asset’s fundamental value (which must come out of a model
of price formation). If the expectation is conditioned on all publicly available
information at t, and information accrues over time, then the expectation of
pt+1 must obey

Et [ pt+1 ] = Et [Et+1 [ F ∗ ]]
= Et [ F ∗ ]
= pt

The second line uses the Law of Iterated Expectations. Take away the expecta-
tion, and one gets

pt+1 = pt + ut+1 where Et [ut+1 ] = 0

—which is the definition of a random walk. A random walk is a particular


example of a class of stochastic processes known as martingales, hence Lucas’s
reference to this sort of process in his introductory paragraphs.2
So, one point of Lucas’s paper (in addition to the paper’s methodological
contributions) is to write down a perfectly good asset pricing model, in which
information is used efficiently, and which nonetheless may give rise to equilib-
rium prices that don’t follow random walks (or expected returns that move in
predictable ways).
2A martingale obeys E[ pt+1 | pt , pt−1 , pt−2 . . .] = pt .

44
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

Around the same time as Lucas’s paper, Breeden [Bre79] formulated the
closely related Consumption-Based CAPM.3 In that model, an asset’s excess
return can be written in single beta form as

Rie − R F = βCi [ RC
e
− R F ],

where RC e is the expected return on an asset whose return equals the consump-

tion growth rate (so RC e equals expected consumption growth), and βC is asset
i
i’s beta with respect to asset C—i.e., βCi = cov( Ri , RC )/var( RC ).
Something very similar emerges from Lucas’s model.

3.1.2 An economy with ‘trees’


There are n assets—think of them as trees—indexed by i = 1, 2, . . . n. Asset
i produces yit at date t (units of a nonstorable consumption good—i.e., divi-
dends, or ‘fruit’). yt ≡ (y1t , y2t , . . . ynt ) follows a vector Markov process with a
conditional distribution F (y0 , y):

F (y0 , y) = Pr{yt+1 ≤ y0 |yt = y}.

Aggregate output is the sum of all the fruit produced, and this must equal
aggregate consumption, since the good is nonstorable:
n
∑ yit = ct .
i =1

This is the economy’s resource constraint.


Agents in the economy trade shares or claims to the trees: zi shares of asset
i at date t are a claim to zi yit units of the asset’s output. There is one perfectly
divisible share of each asset in net supply, so another constraint the economy
needs to obey in equilibrium is

z = (1, 1, . . . 1) = 1.

3.1.3 The representative agent


You can think of the agents in Lucas’s economy as a unit mass of identical
agents or, more simply, one representative agent. The agent has additively
separable expected-utility preferences over consumption. He seeks to maxi-
mize

" #
E0 ∑ βt U (ct )
t =0

subject to a sequence of budget constraints.


3 The paper is available here: http://www.dougbreeden.net/uploads/Breeden_1979_JFE_

Consumption_CAPM_Theory.pdf.

45
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

The agent’s portfolio at the start of period t is zt = (z1t , z2t , . . . znt ), describ-
ing his holdings of shares of the n assets. This gives him resources at the start
of t equal to
n
∑ zit (yit + pit ) = zt · (yt + pt )
i =1

where pit is the ex dividend price of asset i (its price immediately after it pays its
dividend).
The agent spends his resources on consumption (ct ) and asset holdings for
next period ( pt · zt+1 ). Thus, he faces the sequence of budget constraints

z t · ( y t + p t ) ≥ c t + p t · z t +1

Note that since there are no adjustment costs or transactions costs, the agent
is indifferent between, say, holding an asset in both t and t + 1 and selling it at
the start of t just to buy it again to hold until t + 1. So, the budget constraint is
written as if the agent sells his whole portfolio at the start of the period, then
uses that value (plus his dividend payments) to finance consumption and a
new portfolio.
We must assume the agent starts with an initial portfolio z0 at the start of
t = 0.

3.1.4 Recursive formulation


Saying what it means for an agent to take prices as given in a stochastic envi-
ronment requires some clarification. The formulation Lucas uses is a recursive
one, and his competitive equilibrium concept is a recursive competitive equilib-
rium.
The agent takes as given the law of motion for the aggregate state (y) and
a set of n price functions that map realizations of the state into realizations of
prices:
p(y) = ( p1 (y), p2 (y), . . . pn (y)).
The agent’s resources at each date depend on both the aggregate state y and
the agent’s individual state (in the case z). Given the nature of the agent’s pref-
erences, and the constraint he faces, his problem is recursive: it obeys Bellman’s
principle of optimality. If v(z, y) is the maximized value of the agent’s lifetime
utility beginning from the state (z, y), then v obeys a Bellman equation:

v(z, y) = max U (c) + βE v( x, y0 )|y : z · (y + p(y)) ≥ c + p(y) · x


  
(3.3)
c,x

The optimal choices of c and x (next-period’s portfolio) can be expressed as


functions of the state (z, y): c(z, y) and x (z, y) = ( x1 (z, y) . . . xn (z, y)). These
are often called the agent’s ‘decision rules’, ‘policy functions’ or ‘optimal poli-
cies’.
My definition of equilibrium differs slightly from Lucas’s, because I want to
make the agent’s choices a bit more explicit. A recursive competitive equilibrium

46
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

consists of several objects: A value function v(z, y), decision rules c(z, y) and
x (z, y), and price functions p(y) such that:
1. (Agent optimality) The decision rules solve the agent’s maximization problem—
they attain v(z, y)—given the price functions p(y) and the law of motion
for the aggregate state.

2. (Market-clearing) All shares are held in equilibrium—x (1, y) = 1 at all


y—and the economy’s resource constraint is satisfied:
n
c(1, y) = ∑ yi .
i =1

Remark 3.1. Note that a version of Walras’s Law holds here—from the budget con-
straint, starting from z0 = 1, if all shares are held (so the portfolio x is 1), then
c = 1 · y. And, conversely, if c = 1 · y holds, and n − 1 of the assets are fully held,
then so is the nth, again assuming the agent’s z0 = 1.

3.1.5 Characterizing equilibrium


Characterizing equilibrium is simple in the Lucas environment, because equi-
librium aggregate consumption must equal the exogenous output ∑i yi . Our
procedure will be to derive the first-order conditions for agent optimality, plug
in the exogenous consumption process, and see what those conditions imply
for the equilibrium price functions p(y).
Example 3.1. As an example, imagine a static two-good exchange economy with many
identical agents. All agents have the same utility function u(c) = u(c1 , c2 ) and the
same endowment e = (e1 , e2 ). They trade at price vector p = ( p1 , p2 ). Since everyone
is identical, in equilibrium, everyone must decide that it’s optimal to simply consume
their own endowment. From this we can deduce the equilibrium price vector must
satisfy  
∂U ∂U
( p1 , p2 ) = α ( e ), (e)
∂c1 ∂c2
for some constant α.
To see the first-order conditions that Lucas derives simply, substitute c out
of the utility function, using the budget constraint. This gives the agent’s prob-
lem as:

v(z, y) = max U (z · (y + p(y)) − p(y) · x ) + βE v( x, y0 )|y


  
x

Differentiating the objective with respect to xi yields

U 0 (c) pi (y) = βE vi ( x, y0 )|y


 
(3.4)

47
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

where vi is the partial derivative of the value function with respect to the ith
asset holding, and c is optimal consumption at (z, y). Applying a standard
‘envelope’ argument to v(z, y) tells us that
vi (z, y) = U 0 (c)(yi + pi (y)). (3.5)
Advancing the expression in (3.5) by one period and plugging it into (3.4) gives
U 0 (c) pi (y) = βE U 0 (c0 )(yi0 + pi (y0 ))|y for i = 1, 2, . . . n
 
(3.6)
where we use the shorthand c0 for optimal consumption at ( x, y0 ).
Equation (3.6)—a type of Euler equation—is a necessary condition for an
optimal choice by the agent. Now, to derive an equilibrium asset pricing for-
mula, we simply impose the resource constraint c = ∑i yi :
" #
U 0 (∑ yi ) pi (y) = βE U 0 (∑ yi0 )(yi0 + pi (y0 ))|y for i = 1, 2, . . . n (3.7)
i i

We can rewrite (3.7) to give


U 0 (∑i yi0 ) 0
 
0
pi ( y ) = E β 0 (y + pi (y ))|y for i = 1, 2, . . . n (3.8)
U ( ∑i yi ) i
or, stacking the n equations more compactly in vector form
U 0 (∑i yi0 ) 0
 
0
p(y) = E β 0 (y + p(y ))|y . (3.9)
U ( ∑i yi )
As Lucas emphasizes, (3.8) or (3.9) can be viewed as a set of n functional
equations—that is, mappings whose arguments are functions. In particular,
imagine using an arbitrary function f 0 ( · ) : Rn+ → Rn+ in place of p( · ) on
the right-hand side of (3.9). Performing the operations specified on the right-
hand side of (3.9) then defines a function from Rn+ to Rn+ , call it f 1 ( · ). Its value
at any y ∈ Rn+ is simply:
U 0 (∑i yi0 ) 0
 
0
f 1 (y) = E β 0 (y + f 0 (y ))|y . (3.10)
U ( ∑i yi )
Equation (3.10) describes a mapping that takes a function as its argument and
creates another function. We can write (3.10) more suggestively as
f1 = T ( f0 )
where T is our mapping defined on the space of functions from Rn+ to Rn+ ,
taking values in the space of functions from Rn+ to Rn+ . The equilibrium price
function is a fixed point of this mapping: equation (3.9) is just saying
p = T ( p ).
Lucas—as always—is careful about specifying the conditions under which
the mapping that I’m calling T has a fixed point.

48
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

3.1.6 The SDF implied by Lucas’s model


Looking carefully at equation (3.10) or (3.9), we can see immediately what the
stochastic discount factor must be—it is
U 0 (∑i yi0 )
m(y, y0 ) = β . (3.11)
U 0 ( ∑i yi )
The stochastic discount factor applied to next-period payoffs depends on both
the current and next-period state. Note that in general the return on an asset
from the current period to next period also depends on both the current and
next-period state:
y 0 + pi ( y 0 )
Ri (y, y0 ) = i .
pi ( y )
Returns then must satisfy the pricing relationship

E m(y, y0 ) Ri (y, y0 )|y for i = 1, 2, . . . n


 

An exception would arise if we introduced a one-period riskless bond into


Lucas’s environment (which we will do in Mehra and Prescott’s model). It’s
price q and return R F depend only on the current state: q(y) = E[m(y, y0 )|y],
which only depends on y,4 and R F (y) = 1/q(y).
Returning to the time series notation we began with—and using ct = ∑i yit —
we can write m in the more familiar form
U 0 ( c t +1 )
m t +1 = β . (3.12)
U 0 (ct )
Asset prices obey

U 0 (c )
 
pit = Et [mt+1 (yi,t+1 + pi,t+1 )] = Et β 0 t+1 (yi,t+1 + pi,t+1 ) (3.13)
U (ct )
And if Ri,t+1 is the gross return on some asset between t and t + 1—that is,
Ri,t+1 = (yi,t+1 + pi,t+1 )/pi,t —then

U 0 (c )
 
1 = Et [mt+1 Ri,t+1 ] = Et β 0 t+1 Ri,t+1 (3.14)
U (ct )
Note that by the Law of Iterated Expectations, we can replace the conditional
expecations in the last expression with unconditional expectations.5
Doing that, we can use the last expression—together with E[ xy] = cov( x, y) +
E[ x ] E[y]—to derive a suggestive expression for expected asset returns:

U 0 ( c t +1 )
 
1 1
E[ Ri,t+1 ] − =− cov β 0 , Ri,t+1
E [ m t +1 ] E [ m t +1 ] U (ct )
4 The y0 gets ‘integrated out’ in taking the expectation.
5 IfEt [mt+1 Ri,t+1 − 1] = 0 when we’re conditioning on information that’s accumulated through
date t, then the unconditional expectation—when we know nothing at all—is surely zero as well.

49
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

3.1.7 Taking Lucas to the computer


There are a number of ways to implement Lucas’s model computationally, but
the simplest by far is to just replace Lucas’s general Markov process F (y0 , y)
with a Markov chain. A Markov chain is a set of S states ( x1 , x2 , . . . xS )—which
could simply be (1, 2, . . . S)—and a transition matrix P = [ Pij ], where

Pij = Pr xt+1 = x j | xt = xi .

Since each row is a probability in RS , P ≥ 0 and ∑ j Pij = 1.


The invariant, or long-run, distribution of the Markov chain is a probability
π ∈ RS that satisfies P> π = π. We’ll talk about calibrating them momentarily.
We’ll sometimes write P(i, j) for Pij and π (i ) for πi . That will help avoid
confusion in places, and it also has the advantage of being closer to M ATLAB
conventions.
For now, let’s suppose there is only one tree, so y is a scalar, and there’s
only one price function p(y) to solve for. We can also solve for the price of a
one-period riskless bond, q(y), but that’s easy enough to do once we see how
to get p. Note that if aggregate output/aggregate consumption is assumed to
follow an S-state Markov chain, then:
• y and the price function p will be vectors in RS . Instead of a function p(y)
to solve for, we only need to solve for S numbers, p = ( p(1), p(2), . . . p(S)).
Here, p(i ) is the asset price when today’s state is i—i.e., when aggregate
output is y(i ).6

• If we price a one-period riskless asset, it’s price q and return R F are also
vectors in RS .
• The stochastic discount factor m and the return to the risky asset R are
both S × S matrices. We’ll write m(i, j) for the discount factor applied
between state i today and state j tomorrow (and similarly for R, writing
R(i, j)). In particular,
U 0 (y( j))
m(i, j) = β 0
U (y(i ))

If P is our transition probability matrix, then the SDF version of the pricing
relationship (3.9) becomes

S
p (i ) = ∑ P(i, j)m(i, j) (y( j) + p( j)) .
j =1

Let Ψ denote the S × S matrix with typical element Ψ(i, j) = P(i, j)m(i, j).7
6 In other words, we may as well identify the states with (1, 2, . . . S), rather than writing c(y(s)).
7 In M ATLAB terms, Psi = P. ∗ m.

50
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

Then, in vector form, we can write the pricing relationship compactly as


   
p (1) y (1) + p (1)
 p (2)   y (2) + p (2) 
 ..  = Ψ 
   
.. 
 .   . 
p(S) y(S) + p(S)

p = Ψ (y + p)

If I − Ψ is invertible, then we our solution is immediate

p = ( I − Ψ)−1 Ψy. (3.15)

There are a number of other objects of interest one could then calculate,
using the solution for p or the expression for the pricing kernel m. Some of
them bear on the questions that motivate Lucas’s paper—i.e., “Will prices fol-
low random walks?” or “Will expected returns be predictable?” and so forth.
The objects of interest one might construct include:
• The price of a riskless one period bond and its return:

q (i ) = ∑ P(i, j)m(i, j)
j
1
R F (i ) =
q (i )

• Return on the risky asset, across every state transition:

p( j) + y( j)
R(i, j) =
p (i )

• The expected return conditional on being in state i:

Ei [ R ] = ∑ P(i, j) R(i, j)
j

• The expected excess return conditional on being in i, Ei [ R] − R F (i ).


• The (unconditional) expected return on the risky asset and the expected
riskless rate:

E[ R ] = ∑ π ( i ) Ei [ R ]
i
h i
E R F
= ∑ π (i ) R F (i )
i

51
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

• The unconditional expected


  excess return on the risky asset (the ‘equity
premium’), E[ R] − E R F .

p (i )
• Price/dividend ratios in each state, P/D (i ) = .
y (i )
• Risk neutral probabilities:

P(i, j)m(i, j)
φ(i, j) = .
∑h P(i, h)m(i, h)

• Finally, one could check for mean-reversion in the asset price (which
would be inconsistent with price following a random walk), by exam-
ining Ei [ p] − p(i ) = ∑ j P(i, j) p( j) − p(i ). Loosely, the price would be
mean reverting if 0 < (E[ p] − p(i ))(Ei [ p] − p(i )), where E[ p] is the un-
conditional mean π · p.8
An excercize below will ask you to calculate all these things, and describe
some of the results. First, though, we need a little digression on approximating
autoregressions with Markov chains. We’ll focus on chains with S = 2, which
can be done practically with pencil and paper.9
Making a two-state Markov chain that mimics a first-order autoregressive
process is fairly simple. Suppose we want to mimic a process of the form

x t +1 − µ = ρ ( x t − µ ) + e t +1 (3.16)

We have estimates of the mean µ, the persistence parameter ρ, and the uncon-
ditional standard deviation of xt , call it σx .10 Let hats denote estimates, and
assume that |ρ̂| < 1.
A two-state Markov chain that mimics (3.16) will have a low-x state (xl )
and a high-x state ( xh ), with xl < µ̂ < xh . It will also have a 2 × 2 transition
probability matrix P,  
P Plh
P = ll
Phl Phh
say. Since the rows must sum to one, though, P really has only two parame-
ters we would need to determine—for example, the probabilities of remaining
in the low and high states, Pll and Phh . That’s four parameters to determine,
( xl , xh , Pll , Phh ), with only three restrictions—the unconditional mean, uncon-
ditional standard deviation, and persistence—with which to determine them.
8 This one may actually be immediate just from the fact that p inherits the autoregressive prop-

erty of y.
9 There are a couple popular methods for approximating autoregressions with Markov chains of

any order. The most commonly used is due to Tauchen [Tau86]. Based on the results in a paper by
Kopecky and Suen [KS10], I’ve lately switched, in my own work, to using Rouwenhorst’s method
[Rou95]. If you google these, you’ll find papers with descriptions of them and probably some
M ATLAB code for implementing them as well. We may use Rouwenhorst’s method in a subsequent
lecture.
10 σ is related to the standard deviation of e by σ2 = σ2 / (1 − ρ2 ).
x t x e

52
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

We have, in a sense, one free parameter, which one may usefully think of
as the long-run probability of being in one or the other state—we could match
the unconditional mean, standard deviation and persistence, and still have a
parameter to play with that would influence the long-run distribution. Adding
the long-run distribution as another ‘target’ of our calibration will pin down
the free parameter.
Let πl denote the long-run probability of being in the low state, and πh =
1 − πl the probability of being in the high state. The symmetry inherent in the
process (3.16) suggests setting πl = πh = 1/2. Once we make that assump-
tion, as you’ll see, we have enough conditions to pin down all the parameters.
In particular, the assumption of equal long-run probabilities implies that the
transition matrix P is symmetric—and in the 2 × 2 case, that means that if we
can just pin down one entry, we can pin down all four entries.11
In any case, given equal long-run probabilities, one then shows that ( xl , xh ) =
(µ̂ − σ̂x , µ̂ + σ̂x ) satisfies
1 1
xl + xh = µ̂
2 2
1 1
( x − µ̂) + ( xh − µ̂)2 = σ̂x2
2
2 l 2
That leaves one parameter in P to be determined, say Pll , by trying to match
the persistence ρ̂. What we want to try to match is basically either

El [ x − µ̂] = ρ̂( xl − µ̂)

or
Eh [ x − µ̂] = ρ̂( xh − µ̂).
Let’s examine the former, using Plh = 1 − Pll and the definitions of xl and xh :

Pll ( xl − µ̂) + (1 − Pll )( xh − µ̂) = ρ̂( xl − µ̂)


Pll (−σ̂x ) + (1 − Pll )(σ̂x ) = ρ̂(−σ̂x )
− Pll + 1 − P11 = −ρ̂

implying
1 + ρ̂
Pll =
2
11 π satisfies P> π = π, while P must by definition obey P1 = 1. This implies, in the 2 × 2 case, a

simple relation between π and the off-diagonal elements of P


πl P
= hl
πh Plh
This can be combined with π · 1 = 1 to give an expression for π in terms of Plh and Pll .

53
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT

From that, we can fill in


1 − ρ̂
Plh = 1 − Pll =
2
1 − ρ̂
Phl = Plh =
2
1 + ρ̂
Phh = 1 − Phl =
2
All that said, we’re now ready for the promised computational exercize
related to Lucas’s model. We’ll break it into two parts. The first is about setting
up the Markov chain:

Exercize 3.1. Using some some U.S. data on annual log consumption of non-
durables and services (detrended using a Hodrick-Prescott filter), from 1950 to
2006, I estimated the following AR(1)

log(ct+1 ) = ρ log(ct ) + et+1

I didn’t include a constant because the detrended series has mean zero. The esti-
mates were ρ̂ = 0.63775 and σ̂e = 0.0093346, so an estimate of σlog(c) would be

0.0093346/ 1 − 0.637752 .
Either in M ATLAB or by hand (with M ATLAB preferred), construct a 2-state
Markov chain to approximate the process for log(ct ), assuming the long-run dis-
tribution is (1/2, 1/2). Note this process has µlog(c) = 0.
Now, exponentiate your vector of log(c) values to get states in terms of c. That
is, if logc is the 2 × 1 vector of values the chain can take on, set c = exp(logc).
Lastly, normalize c so it has a long-run mean of 1—if pi = [0.5, 0.5] is your
vector of long-run probabilities, set c = c/(pi ∗ c).

Now that you have a Markov chain, you are ready for the main exercize
itself:

Exercize 3.2. Using the Markov chain you constructed in the last problem, write
a M ATLAB program to solve for the vector of asset prices p, and all the ‘objects of
interest’ we listed after we derived equation (3.15). Assume that

c 1− α
U (c) =
1−α
and write a program that calculates the results for given values of α and β.

54
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT

Report results for α = 5 and β = 0.95. Discuss the extent to which the
price/dividend ratio is useful for forecasting the asset return—i.e., what’s the rela-
tionship between whether the P/D ratio is low or high and whether conditionally
expected returns are low or high? Is this true for expected excess returns?
Now, jack up α to 20 and β = 0.99. Describe the differences between the risk
neutral probabilities and the actual probabilities P in this case. Were they that
different under the orginal parameters? How does the equity premium compare to
the (α, β) = (5, 0.95) case? How do you interpret the returns (realized returns
by state, not expected returns)?

3.2 Mehra and Prescott (1985) and the Equity Pre-


mium Puzzle
If you understood Lucas’s model, Mehra-Prescott should be very simple, since
they modify Lucas’s model in just a couple ways. The meat of the paper is the
calibration, and the findings they report in their Figure 4 on page 155.
This paper spawned a very large literature, one that’s still very much active
today. The paper makes the simple point that the representative agent model
with constant relative risk aversion/constant elasticity of intertemporal sub-
stitution preferences—which is to say, the workhorse model of business cycle
theory at the time, and to some extent even today—is inconsistent with the size
of the historical risk premium stocks have commanded relative to (practically)
risk-free bonds. There are some assumptions there, of course—for example,
that aggregate consumption is a good proxy for the aggregate dividend from
holding equity.
The essence of the result is that, historically, aggregate consumption is just
‘too smooth’—when filtered through power utility, it yields an intertemporal
marginal rate of substitution that doesn’t have much volatility, consequently
little covariance with the aggregate equity return. Little covariance in turn
means a small excess return. Making the agent more risk averse—increasing
the curvature of the marginal utility of consumption—can help get bigger ex-
cess returns, but only at the cost of lowering the expected I MRS—which then
raises the model-implied riskless rate well above the historically low values we
observe.
Like Lucas, it’s an exchange economy, but this helps their cases—they give
the model the best chance to succeed, because they don’t even require that it
generate, endogenously, a realistic consumption process. They plug in a realistic
consumption process, and the model’s price predictions fail.

55
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT

3.2.1 Differences relative to Lucas’s model


There are only a few noteworthy differences in Mehra and Prescott’s model, as
it compares to Lucas’s framework—only one of them of great importance.
1. Mehra and Prescott employ a Markov chain model from the very outset.
This is a possibility implicit in Lucas’s framework, but since Mehra and
Prescott’s interest is quantitative, they make it explicit from the begin-
ning.
2. They introduce a one-period riskless bond. This—as we know from the
last section—is not a difficult addition. In terms of equilibrium, the as-
sumption is that the bond is in zero net supply—prices must be such that
in equilibrium, a representative agent holds zero units of the bond.
3. The one really substantive change is the modification they make to the
consumption process. Rather than a Markov chain for the level of ag-
gregate consumption, they assume that aggregate consumption evolves
according to
y t +1 = x t +1 y t
where the growth rate xt+1 follows a Markov chain. This implies that the
log of consumption has a unit root.
The third modification has some implications for the structure of equilib-
rium prices and the stochastic discount factor. With the power utility function
they use, the SDF depends only on the growth rate of consumption:

m t +1 = β ( x t +1 ) − α .
This means that, in writing down the pricing relationships (or putting them
on the computer), m only depends on next period’s state, not also on today’s.
It’s distribution, and conditional expected value, still depend on today’s state
through the transition matrix P.12
As for the equilibrium equity price, it depends on both today’s y and to-
day’s x, but it is homogeneous of degree one (in fact linear) in y. This follows
from the present value relationship (3.2) we derived way back at the outset of
the lecture:

pt = ∑ Et [ρt,t+k yt+k ] .
k =1
Since each yt+k is proportional to yt and all the ρt,t+k terms are products of
mt+h ’s, pt must be proportional to yt . Thus, if p(y, x ) denotes the equilibrium
price function, then p(y, x ) = yw( x ) for some function w.
The equilibrium pricing relationship, in terms of the SDF, and not yet spe-
cializing to a Markov chain, is
p(y, x ) = E m( x 0 ) x 0 y + p( x 0 y, x 0 ) | x
  

12 I’m sticking with the notation we’ve developed. Mehra and Prescott use φ for elements of the
ij
transition matrix.

56
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT

or—dividing out today’s y—


w( x ) = E m( x0 ) x0 + x0 w( x0 ) | x .
  
(3.17)
The price of the riskless one-period bond is, as usual,
q( x ) = E m( x0 ) : x .
 
(3.18)
And that’s really it—(3.17) and (3.18) are the guts of the formal model. We
specify a Markov chain for x—making x, w, m and q into vectors, and giving
us a P to calculate the conditional expectations—and we’re in a position where
we know how to solve that model very simply.

3.2.2 Mehra and Prescott’s calibration


Mehra and Prescott construct a two state Markov chain to mimic an estimated
AR(1) for annual per capita consumption growth, in particular a long-run
mean of 1.018 (1.8% growth rate), standard deviation of 0.036, and autocor-
relation coefficient −0.14. They assume the long-run distribution of the two
states is π = (1/2, 1/2). Using the techniques described above, this then gives
   
1.018 − 0.036 0.982
x= =
1.018 + 0.036 1.054
1 − 0.14
P11 = P22 = = 0.43,
2
and
1 + 0.14
P12 = P21 = = 0.57.
2

3.2.3 The target and the results


That’s the process they feed in, at various combinations of the taste parameters
α and β. What do they hope to get out of the model? The measure of success
(or failure) is how close the model’s average equity return, average risk-free
 re-
turn, and average equity premium—in our notation from above, E[ R] , E R F ,


and E[ R] − E R —come to the historical averages they document:


 F

E[ R] = 1.07
h i
E R F = 1.008
h i
E[ R] − E R F = 0.062

—i.e., 7%, 0.8%, and 6.2 percentage points.


Mehra and Prescott allow the risk aversion parameter α to vary between
zero and ten, and the discount factor β between zero and one.13 The results
13 They give some arguments for not taking α above ten. In a subsequent lecture, we’ll talk about

disentangling risk aversion from intertemporal substitution, and what constitutes a ‘plausible’
degree of risk aversion.

57
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT

are summarized in their Figure 4—getting the equity premium as high as 0.35
(which is presumably at α = 10) entails pushing the riskless rate up to 4%.
What’s going on? Consider the riskless rate first. Increasing α lowers the
elasticity of intertemporal substitution (which is just 1/α). In an economy in
which consumption, on average, is growing, a lower elasticity of intertemporal
substitution will increase the compensation agents require for deferring con-
sumption from today to tomorrow—in a deterministic economy, that would
mean a higher interest rate.14 There’s more at work with regard to R F than just
that deterministic logic, but that mechanism appears to be dominant.15
With regard to the equity premium, given the low volatility of consump-
tion, the stochastic discount factor has a small variance, unless we set α very
high. The volatility of the discount factor puts a bound on how big a reward
the economy can pay an investor for holding a risky asset. The following dis-
cussion draws on chapter 5 of Cochrane’s book [Coc01]. To see the nature of
the bound, consider h  i
Et mt+1 Rt+1 − RtF = 0.

By the Law of Iterated Expectations, this must hold unconditionally as well:


h  i
E mt+1 Rt+1 − RtF = 0.

Using E[ xy] = cov( x, y) + E[ x ] E[y], re-write this as


h i
cov(mt+1 , Rt+1 − RtF ) + E[mt+1 ] E Rt+1 − RtF = 0

or
h i cov(mt+1 , Rt+1 − RtF )
E Rt+1 − RtF = − .
E [ m t +1 ]
Now, cov(mt+1 , Rt+1 − RtF ) = corr(mt+1 , Rt+1 − RtF )σ(mt+1 )σ ( Rt+1 − RtF ), where
corr(mt+1 , Rt+1 − RtF ) ∈ [−1, 1] is the correlation between mt+1 and the excess
return Rt+1 − RtF , and the σ ( · )’s are standard deviations. Thus,

E Rt+1 − RtF
 
σ ( m t +1 )
= −corr(mt+1 , Rt+1 − RtF ) .
F
σ ( R t +1 − R t ) E [ m t +1 ]

The quantity on the left-hand side of the last expression is called a Sharpe
ratio, and it measures an asset or portfolio’s excess return per unit of volatility.
It is, in some sense, a market measure of the price of risk. For the U.S. equity
market as a whole, the mean in the numerator is around 0.062 and the stan-
dard deviation in the denominator is around 0.166 (from Mehra and Prescott’s
14 Recall that in a deterministic economy, g ≈ EIS(r − η ),where g is the growth rate, r is the

interest rate, and η = 1/β − 1 is the rate of time preference. Flipping this around gives, r ≈
η + (1/EIS) g. If g > 0, a lower EIS raises r.
15 In a subsequent lecture, we’ll see if preferences that separate risk aversion from intertemporal

substitution can help with this.

58
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT

calculations, reported in their Table 1). This implies a Sharpe ratio of around
0.37.
It’s straightforward to show—since correlation is between −1 and 1—that
the last expression implies

|E Rt+1 − RtF |
 
σ ( m t +1 )
≤ . (3.19)
F
σ ( R t +1 − R t ) E [ m t +1 ]

If m prices all assets, then the absolute value of the Sharpe ratio for any asset
is bounded by the magnitude on the right, which is a measure of the volatility
of the stochastic discount factor relative to its mean. For a given mean, the less
volatile a model’s SDF, the smaller the maximum Sharpe ratio that the model
can generate.
When mt+1 = β( xt+1 )−α , we can say more, especially if we assume xt+1 is
lognormally distributed—i.e., log( xt+1 ) ∼ N (µ, σ2 ), so log [( xt+1 )−α ] = −α log( xt+1 ) ∼
N (−αµ, α2 σ2 ). A property of lognormally distributed random variables is that,
for log(y) ∼ N (µy , σy2 ),
q
σ ( y ) = E[ y ] exp(σy2 ) − 1

Thus,
σ ( m t +1 )
q
= exp(α2 σ2 ) − 1.
E [ m t +1 ]
p
Using the approximation exp(α2 σ2 ) − 1 ≈ ασ, we can express the bound
in (3.19) as approximately:

|E Rt+1 − RtF |
 
≤ ασ.
σ ( Rt+1 − RtF )

Under power utility—and assuming lognormality is a good approximation—


generating a significant premium for bearing aggregate asset return risk re-
quires either a lot of risk (a big σ) or a lot of aversion to risk (a big α).

Exercize 3.3. Write a M ATLAB program to replicate Mehra and Prescott’s exer-
cize. Report results for nine parameter combinations—eight given by pairs of β ∈
{0.95, 0.99} and α ∈ {1, 5, 10, 20}, plus one with
 (β, α) = (1.125, 18). For
 each
combination, report 100 ∗ (E[ R] − 1), 100(E R F − 1), 100(E[ R] − E R F )


(all long-run, unconditional expectations), and σ (m)/E[m], the ratio of the un-
conditional standard deviation of the SDF to the unconditional mean of the SDF.

59
3.3. SECOND MOMENTS LECTURE 3. LUCAS, MEHRA-PRESCOTT

3.3 Additional dimensions of the equity premium


puzzle: Second moment aspects
In solving exercize 3.3, you hopefully discovered that setting α = 18 and β =
1.125 gets you pretty close to the first moments of asset returns that Mehra and
Prescott set as the target of their exercize—an average risk-free rate just under
1% and an average equity return of around 7%. With α = 18, and the Mehra-
Prescott Markov chain for consumption, the maximum allowable Sharpe ratio
in the model—given by the bound

E R − RF
 
σ(m)
≤ (3.20)
F
σ[ R − R ] E[ m ]

that we derived in section 3.2.3—is actually higher than the average Sharpe
ratio we observe in the data (0.56 versus about 0.4).
Are we therefore done with the equity premium puzzle—should we declare
success and move on to a different topic? Even if we set aside the plausibility
of α so large and a β > 1—which says that, in a deterministic environment,
starting from a constant consumption path, and agent would be willing to save
at a significantly negative real interest rate—we probably shouldn’t declare
victory just yet.16
As it turns out, the second moment implications of our model with (α, β) =
(18, 1.125) are off—our model has an unconditional standard deviation of the
risk-free rate which is too high and an unconditional standard deviation of the
equity return which is too low.
Mehra and Prescott estimate from their historical data that σ ( R F ) = 0.056
and σ ( R) = 0.165—i.e., 5.6 percentage points for the risk-free rate, and 16.5
percentage points for the equity return. For the model of exercize 3.3, with
(α, β) = (18, 1.125), we get

σ ( R F ) = 0.079
σ ( R) = 0.139

These numbers may seem close to the historical data, but we should first
bear in mind that the volatility of the risk-free rate in Mehra and Prescott’s 100-
year sample is quite high compared to estimates over more recent samples—in
fact, Campbell and Cochrane [CC99] (discussed more below) look at the more
recent data, and set a constant riskless rate as one of the targets of their model.
Even taking the Mehra-Prescott volatility data at face value, though, the
distribution of our model’s volatility across the low- and high-growth states is
such that the model produces only negligible variation in the conditional ex-
cess return to equity and its volatility, hence little variation in the conditional
16 You may wonder whether β > 1 poses a problem for existence. If consumption has a long-run

growth rate of x, the relevant sufficient condition for existence of optimal paths is βx1−α < 1, as
proved by Brock and Gale in 1969 [BG69] and later re-stated by Kocherlakota in 1990 [Koc90].

60
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT

Sharpe ratio. This is in contrast to the data, which show large swings in condi-
tional Sharpe ratios from business cycle peaks (where the conditional Sharpe
ratio is low) to business cycle troughs (where it is high). A difference on the
order of 1.0 between highs and lows is not uncommon.17
Our model with (α, β) = (18, 1.125) produces:

E R − RF |x σ R − RF |x E R − R F | x /σ R − R F | x
       

Low x state 0.069 0.114 0.605


High x state 0.051 0.099 0.516

Are these discrepancies between model and data all part of the same puz-
zle? One could think of three, or perhaps even four separate puzzles we might
give names to—an equity premium puzzle (it’s hard to get a big equity pre-
mium), a risk-free rate puzzle (it’s hard to get a low risk-free rate), a volatility
puzzle (it’s hard to get a big standard deviation of equity returns and a low
standard deviation for the risk-free rate), and a Sharpe ratio puzzle (it’s hard to
get a strongly countercyclical Sharpe ratio). Giving them different names may
help to clarify the phenomena we’re trying to explain, but in some sense they
are all part of one puzzle, since we would not want to say we’ve solved one, if
in doing so we’re failing along the other dimensions.

3.4 The Melino and Yang insight: Getting SDFs con-


sistent with first and second moments of returns
Time permitting we will talk about Melino and Yang’s [MY03] proposed res-
olution of the equity premium, and related, puzzles, which relies on state-
dependent preferences. Of interest to us here, though, is a trick they use to
characterize the process for returns that would be consistent with the con-
sumption process in the Mehra-Prescott model and consistent with Mehra and
Prescott’s estimates for the unconditional means and standard deviations of
returns.
To see the value of that, imagine that you had risk-free rates for the low-
and high-growth states  F 
F R̂ (1)
R̂ =
R̂ F (2)
and a matrix of equity returns for all state transitions
 
R̂(1, 1) R̂(1, 2)
R̂ =
R̂(2, 1) R̂(2, 2)
17 See, for example, Ludvigson and Ng [LN07], Lettau and Ludvigson [LL10] or Tang and

Whitelaw [TW11].

61
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT

that, under the Mehra-Prescott transition matrix P for consumption growth,


and its long-run distribution π = (1/2, 1/2), had unconditional first and sec-
ond moments that matched Mehra and Prescott’s historical estimates. Assum-
ing these didn’t imply a collinearity, you could then solve
2
∑ P(i, j)m(i, j) R̂(i, j) = 1 (i = 1, 2)
j =1
2
∑ P(i, j)m(i, j) = R̂F (i) (i = 1, 2)
j =1

for the stochastic discount factor m that would exactly match those first two
moments of returns in the historical data.
Knowing what such a stochastic discount factor m looks like—or what the
implied risk-neutral probabilities look like—would be of great help in identi-
fying models that can (or cannot) match the data. Given the behavior of m, the
question would be—What sort of model of preferences would map the con-
sumption growth process into that stochastic discount factor?18
How do they construct their R̂ F and R̂? They assume that consumption
growth is a sufficient statistic for the risk-free rate and the equity ‘price-dividend’
ratio—which is the w in the formula (3.17). This means (as we’ve seen in our
solution) that R F takes on two values and  w takes on two values. They then
calculate values of R̂ F = R̂ F (1), R̂ F (2) and ŵ = (ŵ(1), ŵ(2)) that produce
unconditional means and standard deviations for returns that match Mehra
and Prescott’s historical estimates. The equity return from state i to state j is
related to w by R̂(i, j) = x ( j) (1 + ŵ( j)) /ŵ(i ).
The risk-free rate is the easier of the two. R̂ F solves:
1 F 1
R̂ (1) + R̂ F (2) = 1.008
r 2 2
1 F 1
( R̂ (1) − 1.008)2 + ( R̂ F (2) − 1.008)2 = 0.056
2 2
This is a quadratic equation, which has two solutions:
   
F 1.064 F 0.952
R̂1 = and R̂2 = (3.21)
0.952 1.064

Assuming that state 1 is the low-growth state, R̂1F implies a countercyclical risk-
free rate, R̂2F a procyclical one. At this point, we can’t say which solution is the
relevant one, though we will be able to do so after finding the candidate R̂’s.
Finding the candidate R̂’s is considerably harder—it’s a much more com-
plex quadratic equation to solve. Using
x ( j)(1 + ŵ( j))
R̂(i, j) =
ŵ(i )
18 In fact, as we’ll see, Melino and Yang find that preferences must display counter-cyclical risk

aversion—this motivates their focus on state-dependent preferences.

62
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT

and the Mehra-Prescott values for the Markov chain ( x, P),


 
0.982
x= (3.22)
1.054
and  
0.43 0.57
P= (3.23)
0.57 0.43
Melino and Yang find the values of ŵ that satisfy
2 2
1 1
2 ∑ P(1, j) R̂(1, j) + 2 ∑ P(2, j) R̂(2, j) = 1.07
j =1 j =1
v
u
u1 2 2
1
t
2 ∑ P(1, j)( R̂(1, j) − 1.07)2 + 2 ∑ P(2, j)( R̂(2, j) − 1.07)2 = 0.165
j =1 j =1

As with R̂ F , they find two solutions for ŵ:


   
23.467 27.839
ŵ1 = and ŵ2 =
27.839 23.467

which implies two possibilities for R̂:


   
1.02385 1.29528 1.01727 0.92633
R̂1 = and R̂2 = (3.24)
0.86306 1.09186 1.20680 1.09891

This gives four possible combinations of R̂ F and R̂, but—as Melino and
Yang note—three of the four can be ruled out on the grounds of violating no
arbitrage. The combination that remains after eliminating the three that allow
arbitrage is:
 
1.064
R̂ F = (3.25)
0.952
 
1.02385 1.29528
R̂ = (3.26)
0.86306 1.09186

Exercize 3.4. Show that the other three combinations from (3.21) and (3.24) im-
ply arbitrage opportunities. You can do it in M ATLAB if you like. Since these
are returns, treat the price vector as p = (1, 1). The wrinkle here, compared to
Lecture 2, is the two possible states today. For each combination of R̂iF and R̂ j ,
you’ll want to ask—Is there an arbitrage opportunity if today’s state is state 1? Is
there an arbitrage opportunity if today’s state is state 2? ‘Yes’ to either of those
questions is enough to constitute an arbitrage opportunity—an arbitrage need not
be available in both states. Each of those questions is answered using the definition
of arbitrage 2.3 from Lecture 2.

63
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT

R̂ and R̂ F are all you need to back out implied risk neutral probabilities.
Routledge and Zin [RZ10] perform this calculation, and get
 
0.85 0.15
ψ̂ = (3.27)
0.61 0.39

Compare this with the transition matrix P (3.23). If today’s state is the high-
growth state, the risk-neutral probabilities (the second row of ψ̂) are not that
different from the objective probabilities (the second row of P), indicating lit-
tle risk aversion. By contrast, if today’s state is the low-growth state, the risk-
neutral probabilities put a much bigger weight on remaining in the low-growth
state (and a much smaller weight on moving to the high-growth state), as com-
pared to P. That indicates significant risk aversion. In other words—the struc-
ture of the asset returns R̂ and R̂ F implies countercyclical risk aversion.
This is consistent with observations on the countercyclicality of the con-
ditional Sharpe ratio (and in fact is a sort of confirmation of it). The sort of
story one could tell goes like this—in recessions, risk aversion is high, and
consequently the price of risk (measured as the compensation in excess return
required to bear risk) is high.
It also tells you why the unadorned Mehra-Prescott model we solved in
exercize 3.3 fails to generate a strongly countercyclical Sharpe ratio. The SDF
for that model depends only on the consumption growth rate realized next
period, and is independent of this period’s state. In other words—it’s a vector,
not a matrix. The next exercize asks you to calculate the SDF consistent with
Melino and Yang’s R̂ and R̂ F , from (3.25). It’s exactly identified, and you’ll see
it has two non-collinear rows.

Exercize 3.5. Using the Melino-Yang returns (3.25) and the Mehra-Prescott
Markov chain transition matrix (3.23), calculate conditional Sharpe ratios in the
two states, and solve for the stochastic discount factor m̂ consistent with the re-
turns and the transition matrix P.

As we go through the various responses to the equity premium puzzle,


we’ll want to keep the Melino-Yang characterization in mind. The first set of
responses we’ll look at are models with habit formation.

64
Lecture 4

Responses to the Equity


Premium Puzzle, I: Modifying
the Representative Agent

Over the next two lectures, we’ll look at various responses to the equity pre-
mium puzzle. Given time constraints, this won’t be an exhaustive catalog of
the models spawned by Mehra and Prescott’s observation, though we’ll try
to hit the ones that are currently the most significant (in terms of both their
success and the amount of work currently being done on them).
I’ve grouped the responses into two broad categories. In this lecture, we’ll
look at responses that modify the representative agent’s preferences. In the
next, we’ll look at models that tinker in some way with the consumption pro-
cess.
We’ll begin with models that incorporate habit formation in the agent’s
preferences.

4.1 Models with habit formation


Models with habits were among the earliest responses to Mehra and Prescott’s
puzzle. Constantinides [Con90] is an early example. See the review of the
literature (as of 1999) in Campbell and Cochrane [CC99] for more examples.
The idea is quite simple. Replace the agent’s preference over consumption
streams— " #
E0 ∑ βt u(ct )
t

—with " #
E0 ∑ β u(ct − ht )
t
t

65
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

where ht , the habit stock, is a function of past consumption, and is predeter-


mined at date t. For a given curvature of u( · ), the presence of the habit stock
will translate fluctuations in ct of a given size into larger fluctuations in the
marginal utility of consumption. We’ll see momentarily that large fluctuations
in the intertemporal marginal rate of substitution are possible as a result.
As noted, the habit stock is assumed to be a function of past consumption.
For example, h might take the form

h t = D ( L ) c t −1 (4.1)

where D ( L) is a polynomial in the lag operator. For the most part, we’ll work
with a simple version of (4.1), in which only last period’s consumption matters:

ht = δct−1 (4.2)

where δ ∈ (0, 1) measures the response of the habit stock to past consumption.

4.1.1 Internal versus external habits


Note that if the agent takes account of the effect of today’s consumption on
tomorrow’s habit stock (an internal habit), marginal rates of substitution can
get quite complicated, even in the simple case of (4.2).
To see what happens, forget about uncertainty (and asset pricing) for a mo-
ment, and just think about calculating marginal utilities from

U (C ) = ∑ βt u(ct − δct−1 ). (4.3)
t =0

where C = {ct }∞ t=0 . Let Ut (C ) denote the marginal utility of date t consump-
tion, ct . Then,

Ut (C ) = βt u0 (ct − δct−1 ) − δβt+1 u0 (ct+1 − δct ), (4.4)


t +1 0 t +2 0
Ut+1 (C ) = β u (ct+1 − δct ) − δβ u (ct+2 − δct+1 ) (4.5)

and the marginal rate of substitution between ct and ct+1 , call it MRSt,t+1 is

u0 (ct+1 − δct ) − δβu0 (ct+2 − δct+1 )


MRSt,t+1 = β (4.6)
u0 (ct − δct−1 ) − δβu0 (ct+1 − δct )
which is a bit unpleasant to look at, and would pose some difficulties in work-
ing with computationally.
Beginning at least with Abel [Abe90], and especially since the success of
Campbell and Cochrane’s paper [CC99], it’s become increasingly routine to
avoid the complexities inherent in (4.6) by assuming the habit is external—i.e.,
the agent takes as given the evolution of ht , which is viewed as a function of
past aggregate or average consumption. As such, external habits reflect a type
of ‘keeping up with the Joneses’ phenomenon.1
1 http://en.wikipedia.org/wiki/Keeping_up_with_the_Joneses

66
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

In our simple context, with utility u(ct − δct−1 ), the agent assumes the habit
stock δct−1 is out of his control—we would want to write this more precisely
as u(ct − δc̄t−1 ), with a bar to denote an aggregate or average quantity. The
marginal rate of substitution relevant for the agent’s choices then becomes
u0 (ct+1 − δc̄t )
MRSt,t+1 = β
u0 (ct − δc̄t−1 )
In a representative agent model (or a model with a unit mass of identical
agents), we would take first order conditions for the agent’s intertemporal
choice problem, then impose the equilibrium requirement ct = c¯t to get
u0 (ct+1 − δct )
MRSt,t+1 = β (4.7)
u0 (ct − δct−1 )
This differs only slightly—but nonetheless significantly—from the marginal
rates of substitution we’ve worked with this far.

4.1.2 Putting habits in the Mehra-Prescott model


Henceforth, we’re going to work with external habits. Putting them in the
Mehra-Prescott model is straightforward. We won’t re-derive everything, but
let’s at least state the agent’s problem. After that, we’ll jump to how the stochas-
tic discount factor gets modified, and from there we’ll be able to price assets.
First, we need to expand the aggregate state to include last period’s aggre-
gate consumption, call it y−1 . The aggregate state, call it s for short, is now
s ≡ (y, y−1 , x ). From one period to the next, the aggregate state evolves ac-
cording to
s0 = (y0 , y0−1 , x 0 )
= ( x 0 y, y, x 0 )
where we assume that x 0 evolves according to a Markov chain { x (1), . . . , x (S); P},
as in the original Mehra-Prescott model.
The individual state is (z, b), where z is the agent’s beginning-of-period
equity holding and b his riskless asset holding. The Bellman equation becomes
v(z, b, s) = max u(c − δy−1 ) + βE v(z0 , b0 , s0 )|s
  

subject to
z(y + p(s)) + b ≥ c + p(s)z0 + q(s)b0
Since the habit is external, there’s not much change to the agent’s first-order
conditions; combined with an envelope condition, they yield, for the choice of
z0 :
u0 (c − δy−1 ) p(s) = βE u0 (c0 − δy)(y0 + p(s0 ))|s .
 
(4.8)
For the choice of b0 we get:
u0 (c − δy−1 )q(s) = βE u0 (c0 − δy)|s .
 
(4.9)

67
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

We now impose the equilibrium condition c = y (together with z0 = 1,


b0 = 0) and re-arrange to get the pricing relationships
 0 0
u ( x y − δy) 0

0
p(s) = βE β 0 ( x y + p(s ))|s (4.10)
u (y − δy−1 )
and
u0 ( x 0 y − δy)
 
q(s) = βE β 0 |s . (4.11)
u (y − δy−1 )
The form of the stochastic discount factor is easily seen. Note that if u0 (r ) =
r −α ,then the SDF has form
−α
u0 ( x 0 y − δy)
 0
x y − δy
β 0 =β
u (y − δy−1 ) y − δy−1
 0 −α
x y 1 − δ/x 0

y 1 − δ/x
−α
1 − δ/x 0

= β ( x 0 )−α (4.12)
1 − δ/x
The stochastic discount factor thus depends on both this period’s and next pe-
riod’s consumption growth—from the Melino-Yang characterization, we know
that dependence is a critical feature, if we’re to be consistent with countercycli-
cal risk aversion.
Since the discount factor just depends on x and x 0 , and because the present
value relationship is linear, it is again the case that the equity price can be
written as p(s) = w( x )y—there’s no dependence on y−1 and the dependence
on y is linear. The riskless asset price q depends only on x—abusing notation a
bit, q(s) = q( x ).
The key pricing relations become

w( x ) = E m( x, x 0 ) x 0 (1 + w( x 0 ))| x
 
(4.13)

and
q( x ) = E m( x, x 0 )| x
 
(4.14)
where, from (4.12)
−α
1 − δ/x 0

0 0 −α
m( x, x ) = β( x ) . (4.15)
1 − δ/x
Based on equations (4.13) to (4.15), it’s easy to operationalize the model in
M ATLAB:
• Using the Mehra-Prescott Markov chain (3.22) and (3.23), x, w, and q are
2 × 1 vectors, m is a 2 × 2 matrix.
• Form the stochastic discount factor using (4.15) to fill in m(i, j) at all the
pairs x(i) and x(j).

68
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

• After that, everything else follows as in exercize 3.3.


You of course need to specify values for the preference parameters, which
are now a triple (α, β, δ). In an exercize, you’ll experiment with a few com-
binations to see the impact of habits, and the new parameter governing their
strength. Note that since we need consumption to exceed the habit stock in all
states (else the argument of the utility function becomes negative), this means
the size of δ is limited by the requirement that 1 − δ/x (i ) > 0 for all i, or
δ < min{ x (i )}.
Note, too, that in Mehra and Prescott’s two-state framework, the term in the
expression for the stochastic discount that is added when habits are introduced—
that is, the −α
1 − δ/x 0


1 − δ/x
—will be 1 for x 0 = x, in other words for transitions from state one to state one
or from state two to state two. Its values for the ‘off-diagonal’ state transitions—
from one to two and from two to one—are inversely related:
−α
1 − δ/x (2)

1
=  −α (4.16)
1 − δ/x (1) 1 − δ/x (1)
1 − δ/x (2)

If we assume—as we did above—that we’re fixing state one as the low-


growth state, then the term on the left is between zero and one for 0 ≤ δ <
x (1).2 Its inverse for the other transition ranges over [1, ∞).
This suggests a parsimonious way to parametrize the model with the sim-
ple external habit—at least for purposes of exploring the impact of habit. Let
θ denote the quantity on the left side of (4.16), and let m = (m(1), m(2)) rep-
resent the stochastic discount factor from the original, no-habit model—i.e., the
model of exercize 3.3. The stochastic discount factor for the habit model is then
given by  
m (1) θ × m (2)
(4.17)
θ −1 × m (1 ) m (2)
It turns out that because habit only modifies the stochastic discount factor
in the very limited way given by (4.17), it can’t produce an m that matches
the Melino-Yang stochastic discount factor you derived in exercize 3.5. Its ef-
fects in the low- and high- growth states are simply too symmetric. It can de-
liver the first moments of equity premium data, with minimal utility curvature
and a β that’s only slightly bigger than one. It fails on the second moments,
though—which motivates the more complex habit introduced by Campbell
and Cochrane. The next exercize asks you to explore this.
2 The term inside the parentheses varies has a minimum of one (at δ = 0) and diverges to +∞
as δ approaches x (1). Raising it to the −α < 0 flips that range into (0, 1].

69
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

Exercize 4.1. For this exercize, set α = 1 and β = 1.01, and use the Mehra-
Prescott Markov chain (3.22) and (3.23). You’re going to incorporate habit into
the stochastic discount factor as in (4.17). Set theta = linspace(.01, 1, 100)—
note you have to start at a θ > 0. Write a M ATLAB program that calculates, for
each theta(i), the average returns, the average equity premium, the uncondi-
tional volatility of the risk-free rate, and the difference between the conditional
Sharpe ratio in the low-growth state and the high-growth state.
First, (a) find the theta(i) that gets you closest to an equity pre-
mium of 0.062, or 6.2%. A simple way to do this is to use M ATLAB ’ S
min and abs functions—if EP is the vector of average equity premia, use
min(abs(EP − 0.062)). What do the other results look like at that theta(i)?
What’s the implied habit parameter δ at that value of theta(i)?
Repeat this for (b) the theta(i) that gets the unconditional volatility of the
risk-free rate closest to 0.056, and (c) the theta(i) that gets the change in the
conditional Sharpe ratio closest to 1.
Make some plots of theta versus the average equity premium, versus the
volatility of the risk-free rate, and versus the change in the Sharpe ratio.

4.1.3 Habits and countercyclical risk aversion


Under external habits, the agent’s Arrow-Pratt measure of relative risk aver-
sion is non-constant, and in fact increases as consumption gets closer to the
habit stock. Thus, the agent’s risk aversion will be higher in those states where
consumption growth is low.
Recall that the coefficient of relative risk aversion is essentially the (abso-
lute value of the) elasticity of the marginal utility of consumption with respect
to consumption—i.e., the proportionate change in the marginal utility of con-
sumption for a given proportionate change in consumption. It’s a measure of
the curvature of marginal utility.
Formally, for the usual case without habits, we have

∆(u0 (c))/u0 (c)



Coefficient of RRA =
∆c/c
00 0
u (c)∆c/u (c)

=
∆c/c
00
u (c)c
= 0 .
u (c)

When u0 (r ) = r −α , this gives us α as the coefficient of relative risk aversion.


When an external habit is present (so we treat h as fixed when we vary c,

70
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

and changing c today has no perceived consequences for tomorrow), we obtain


00
u (c − h)c
Coefficient of RRA = 0
u (c − h)
00
u (c − h)(c − h) c
=
u0 (c − h) c−h

c
or α in the case of u0 (r ) = r −α .
c−h
If we think about habits of the simple form ht = δct−1 , and ct = xt ct−1 , we
have
ct ct 1
α =α =α
ct − ht ct − δct−1 1 − δ/xt
—low consumption growth brings δ/xt closer to one, thus raising the agent’s
relative risk aversion.

The Arrow-Pratt measure of relative risk aversion can be motivated with a thought ex-
periment along the following lines. Imagine the agent faces uncertainty over today’s
consumption. His level of consumption will be θc, where θ is the realization of a posi-
tive random variable with mean 1 and variance σθ2 . We’re interested in calculating the
fraction of certain consumption c = E[θc] he would be willing to give up, call it s, such
that he’s indifferent between having (1 − s)c for sure and having uncertain consump-
tion θc:
E[u(θc)] = u ((1 − s)c)
Taking a second-order Taylor series expansion of u(θc) around θ = 1, then passing
the expectation operator through, gives

1
E[u(θc)] ≈ u(c) + u00 (c)c2 σθ2 .
2
A first-order Taylor series expansion of u((1 − s)c) around s = 0 gives

u ((1 − s)c) ≈ u(c) − u0 (c)cs.

Combining these gives us an approximate expression for the relative risk premium s,
assuming u0 > 0 and u00 < 0:
1 u00 (c)c 2

s≈ 0 σ (4.18)
2 u (c) θ

4.1.4 Campbell and Cochrane’s model


Campbell and Cochrane [CC99] take as a starting point the stochastic discount
factor, which we’ll write with time subscripts as
−α
c t +1 − h t +1

m t +1 = β .
ct − ht

71
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

Rather than specify how the habit stock ht depends on past consumption,
they define what they call the ‘surplus consumption ratio’,

ct − ht
St =
ct
and make assumptions about its dynamics. The pricing kernel, with this defi-
nition, becomes
c t +1 − α S t +1 − α
   
m t +1 = β .
ct St
Like the stochastic discount factor we derived above, it has a ‘standard’ part
(depending on a discount rate, consumption growth and a curvature parame-
ter) and something ‘non-standard’, the part which depends on growth of the
surplus ratio.
Just as habit led to time-varying risk aversion above, it does as well in
Campbell and Cochrane’s model, with the coefficient or relative risk aversion
inversely related to the surplus ratio:
00
u (ct − ht )ct α
u 0 ( c t − h t ) = St .

The process the specify for the surplus ratio has consumption growth as its
driving impulse, but allows for much richer dynamics than we could achieve
in the context of the simple habit model we laid out above. In particular, they
assume that

log(St+1 ) = (1 − φ)s̄ + φ log(St ) + λ(log St )[log(ct+1 /ct ) − g] (4.19)

where g is the mean of log consumption growth, φ controls the persistence of


the surplus ratio process, and the crucial function λ(St ) controls the sensitivity
of changes in the surplus ratio to shocks to consumption growth.
Log consumption growth is assumed to be a very simple i.i.d. process with
normal innovations—log(ct+1 /ct ) = g + vt+1 , with vt+1 ∼ N (0, σ2 ). Much of
the analysis is carried out in term of logarithmic approximations and exploits
the properties of lognormal random variables. Conditional on St , for example,
St+1 inherits lognormality for log consumption growth. They can write the
stochastic discount factor as

mt+1 = βG −α e−α(log(St+1 /St )+vt+1 )


= βG −α e−α((φ−1)(log(St )−s̄)+[1+λ(log(St ))]vt+1 )

where G = e g . The SDF is also conditionally lognormal.


This means, in particular, that conditional on St , the ratio of the standard
deviation of mt+1 to its mean is approximately equal to the conditional stan-
dard deviation of log(mt+1 ), following the same results we used subsequent to
equation (3.19) near the end of Lecture 3. Thus, the Sharpe ratio bound ((3.19)
or (3.20)) from this discount factor is approximately ασ[1 + λ(log(St ))]. Since

72
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

low values of the surplus ratio correspond to bad times—periods with low
consumption—having λ be a decreasing function of the surplus ratio makes
possible a countercyclical Sharpe ratio.
The function λ also plays a role in transmitting consumption volatility into
volatility of the risk-free rate. Here again, a decreasing λ works in the right
direction. In fact, Campbell and Cochrane ‘reverse engineer’ the λ function so
as to guarantee a constant risk-free rate. Their choice of λ at the same time
guarantees that the habit stock is unresponsive to consumption innovations
at the model’s steady state,and moves positively with consumption near the
steady state.
How does the Campbell-Cochrane model translate into the two-state Mehra-
Prescott framework we’ve been using throughout the last two Lectures? The
first thing to note is that if, within our model, consumption growth is a suffi-
cient statistic for the surplus ratio—so St takes on two values, one in the low-
growth state and another in the high-growth state, then we can add nothing
beyond what we achieved with the simple habit of the previous subsection. As
was the case with the simpler habit, the resulting SDF using the surplus ratio
would necessarily relate to the basic Mehra-Prescott SDF by (4.17). What we
called θ in the discussion would simply be S(2)/S(1).
There’s really no easy way to get the rich dynamics inherent in Campbell
and Cochrane’s specification (4.19) into our two-state version of the Mehra-
Prescott model without adding another state variable (probably St itself). Adding
a state variable isn’t necessarily a bad idea—we’ll need to do it eventually, to
incorporate disasters or long-run risk—but it’s hard to see how to create a sim-
ple Markov chain that would mimic (4.19).
To do anything more in the two-state framework, we would need to assume
that growth of the surplus ratio differs across the four possible state transitions,
which means St+1 /St must be a more complicated function of xt and xt+1 .
Without specifying that function, we might nonetheless ask what it would have
to look like to match the exactly identified stochastic discount factor we found
using the Melino-Yang returns (exercize 3.5).
That is, simply write, say, γS for St+1 /St , and solve for the
 
γ (1, 1) γS (1, 2)
γS = S
γS (2, 1) γS (2, 2)

such that βx ( j)−α γS (i, j)−α equals the m̂(i, j) we derived in exercize 3.5.

Exercize 4.2. Do that, assuming α = β = 1. You can (sort of) interpret

log(γS (1, 2)/γS (1, 1))


λ1 ≡
log( x (2)/x (1))
log(γS (2, 2)/γS (2, 1))
λ2 ≡
log( x (2)/x (1))

73
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I

as analogous to values of the λ function in Campbell and Cochrane’s (4.19). What


do you find?

4.1.5 Some additional issues regarding habits


There are a couple additional issues regarding habits that deserve some men-
tion.
First, regarding the assumption of external habits, to paraphrase some re-
marks I heard Lars Hansen give in a talk at a meeting of the Econometric Soci-
ety, either external habits are just a gimmick, or the externality is a real one, and
we should be thinking about the optimal policy response (as we would with
any externality). If it’s the latter case, then Lars Ljungqvist and Harald Uh-
lig’s “Optimal Endowment Destruction under Campbell-Cochrane Habit For-
mation” [LU09] makes for a cautionary analysis.
Ljungqvist and Uhlig approach the model laid out by Campbell and Cochrane
from the perspective of social planners. They solve the social planner’s prob-
lem and find, for the model calibrated as in Campbell-Cochrane, a society of
agents with Campbell and Cochrane’s preferences and endowment process
would experience a welfare gain equal to about a permanent 16% increase in
consumption if the planner could enforce one month of fasting each year.
A second issue pertains to what happens if we introduce production. It’s
turning out that we’re probably going to have little time to consider asset-
pricing in production economies, but the issue is still worth mentioning. The
problem is that the bells and whistles that ‘help’, when consumption is exoge-
nous, may create problems when consumption is endogenously determined
by agents’ production and investment decisions. In production economies we
face the added constraint that whatever we come up with should not only be
consistent with the asset return facts, but also with business cycle facts. That’s
a tall order to fill.
Specifically regarding habits, the problem entailed by allowing production
is this: agents with habits strong enough to generate high equity premia in
endowment economies will want to keep their consumption so smooth that—
when allowed the chance to do so in production economies—they choose con-
sumption paths that are extremely smooth. Wildly volatile investment—the
residual—then absorbs the effects of technology, and other, shocks. Beginning
with Jermann [Jer98], this has meant: If we augment the basic stochastic growth
model to include habits (to match asset returns data), we must also add capital
adjustment costs (to foil agents’ attempts to achieve extremely smooth con-
sumption). To paraphrase Jermann, in order for a model to get both the asset
return and business cycle facts right, agents must really, really want to smooth
consumption—but be prevented from doing it.
Related to the desire to keep consumption smooth in the presence of habits,
habits can also lead to perverse labor supply responses, as agents respond

74
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

to negative labor productivity shocks—which push consumption toward the


habit level—with higher work effort. See Graham [Gra08] or some of the im-
pulse responses plotted in [Dol11].
Nevertheless, habits are now quite common in many DSGE models far re-
moved from asset-pricing; see for example the medium-scale model of Smets
and Wouters [SW03].

4.2 Epstein-Zin preferences


Epstein-Zin preferences, also sometimes known as Epstein-Zin-Weil prefer-
ences or Kreps-Porteus preferences, preserve some of the most attractive fea-
tures of standard time-additively separable, constant discounting, expected
utility preferences—in particular, their recursivity, which makes dynamic pro-
gramming possible—while breaking (at least to some extent) the tight link be-
tween risk aversion and intertemporal substitution implied by the standard
preferences.3

4.2.1 Basic properties


We’ve already seen that with preferences of the form

∞ ∞
" # " #
1− α
t ct
E ∑ β u(ct ) = E ∑ β
t
(4.20)
t =0 t =0
1−α

the parameter α is the coefficient of relative risk aversion, and 1/α is the elastic-
ity of intertemporal substitution.4 With Epstein-Zin preferences, two separate
parameters govern the degree of risk aversion (for timeless gambles) and will-
ingness to substitute consumption over time (in deterministic settings).
To motivate the Epstein-Zin form, think about writing (4.20) in a recursive
way, letting Ut denote lifetime utility from date t onward, which is a stochastic
process, assuming consumption is a stochastic process. We get

Ut = u(ct ) + βEt [Ut+1 ] (4.21)

—lifetime utility from today on is an aggregate of within-period utility from ct ,


and discounted expected lifetime utility starting tomorrow. It’s the additivity
in (4.21) that gives us separability over time; that we’re taking expectations of
future utility implies separability over states.5
3 The main references for the theoretical development are [KP78], [EZ89], and [Wei90].
4 The elasticity of intertemporal substitution is the elasticity of the ratio ct+1 /ct with respect to
the relative price of consumption in periods t and t + 1. It is only cleanly defined for deterministic
consumption paths.
5 There are a number of ways to define separability. For a differentiable U ( x , x , . . . , x , . . . ),
1 2 n
a standard definition is that xi and x j are separable from xk if the marginal rate of substitution
Ui ( x )/Uj ( x ) does not depend on xk . See [BPR78] for many more definitions.

75
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

We can imagine relaxing the separability over time by replacing the linear
‘aggregator’ in (4.21) with something more general:

Ut = W (ct , Et [Ut+1 ]). (4.22)

This is in fact the form created by Kreps and Porteus [KP78], who gave ax-
ioms on a primitive preference ordering over temporal lotteries such that the
ordering was representable by a recursive function of the form (4.22).
Epstein and Zin [EZ89] and, independently, Weil [Wei90] wrote down para-
metric versions of Kreps-Porteus preferences. These preferences give a specific
CES form to the aggregator. They also make a convenient monotone transfor-
mation of the utility process Ut from (4.22), in such a way that the parameter
governing (timeless) risk aversion is very clearly separated from the parame-
ter governing (deterministic) intertemporal substitution. The Epstein-Zin, or
Epstein-Zin-Weil, form is:
h i1
ρ ρ
Ut = (1 − β)ct + βµt (Ut+1 )ρ , (4.23)

where µt ( · ) is, in the language of Epstein and Zin, a ‘certainly equivalent’


operator, conditional on information at date t, having the form:
 h i 1
1− α
µt (Ut+1 ) = Et Ut1+−1α . (4.24)

We assume that ρ ≤ 1 in (4.23) and α > 0 in (4.24).6 Some key features of


the preferences described by (4.23) and (4.24) are:
• The certainty equivalent operator and the CES aggregator are both ho-
mogeneous of degree one; thus Ut is homogeneous of degree one in con-
sumption.
• If there’s no uncertainty, µt (Ut+1 ) = Ut+1 , and the resulting preferences
are ordinally equivalent to time-additively separable preferences with a
1
constant elasticity of intertemporal substitution given by .
1−ρ
• The utility of a constant consumption path c is just c (think about it); call it
U (c) = c. Thus, for gambles over constant consumption paths (timeless
gambles), preferences are expected utility, with a constant coefficient of
relative risk aversion equal to α:
 h i 1
1− α
µ0 [U (c̃)] = E0 c̃1−α .

6 In the case of ρ = 0, we get U = c 1− β


t t µt (Ut+1 ) β , which is ordinally equivalent to log additivity.
For α = 1, the certainty equivalent can be thought of as exp(Et [log(Ut+1 )]).

76
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

• If α = 1 − ρ—i.e., if the risk aversion coefficient is the inverse of the in-


tertemporal substitution elasticity, the preferences collapse to a form or-
dinally equivalent to (4.20).7
• In contrast to (4.20), the timing of the resolution of uncertainty matters
for agents with Epstein-Zin preferences. The standard preferences (4.20)
conform to the ‘reduction of compound lotteries’ axiom of expected util-
ity. This is not so with (4.23) and (4.24). Consider two lotteries over
consumption streams. Lottery A gives c today, c tomorrow, and then
either (c, c, c, . . .) or (c0 , c0 , c0 , . . .) with probabilities δ and 1 − δ. Lottery
B gives c today, and then either (c, c, c, . . .) or (c, c0 , c0 , . . .) with probabil-
ities δ and 1 − δ.8 Lottery A has a later resolution of uncertainty than
lottery B. An agent with preferences given by (4.20) would view them as
equivalent. Under Epstein-Zin preferences, early resolution is preferred
if 1 − α − ρ < 0, while late resolution is preferred if 1 − α − ρ > 0.

4.2.2 Asset pricing with Epstein-Zin preferences


We can pretty easily put Epstein-Zin preferences into the Mehra-Prescott model.
The key question, of course, is—What does the stochastic discount factor look
like?
Because Epstein-Zin preferences are recursive, we can do dynamic pro-
gramming as before. As is out discussion of the habit model, let s stand for
the aggregate state, which in this case is s = ( x, y), just as in the basic Mehra-
Prescott model. We write µs ( · ) for the certainty equivalent conditional on the
state s. The agent’s dynamic program is then given by:
h ρ i ρ1
v(z, b, s) = max (1 − β)cρ + βµs v(z0 , b0 , s0 ) (4.25)
c,z0 ,b0

subject to
z( p(s) + y) + b ≥ c + p(s)z0 + q(s)b0 . (4.26)
For compactness, let W (c, µ) = [(1 − β)cρ + βµρ ]1/ρ , and note that the par-
tial derivatives of W obey W1 (c, µ) = (1 − β)W (c, µ)1−ρ cρ−1 and W2 (c, µ) =
βW (c, µ)1−ρ µρ−1 . Note too that, under some mild assumptions, we can pass
derivatives through µ as follows. Consider the case of z0 (the case of b0 is anal-
ogous):
∂ 0 0 0 ∂  h 0 0 0 1−α i 1−1 α
µ s ( v ( z , b , s )) = E v(z , b , s ) |s
∂z0 ∂z0
1  h 0 0 0 1−α i 1−1 α −1 ∂ h 0 0 0 1−α i
= E v(z , b , s ) |s E v(z , b , s ) |s
1−α ∂z0
 

= µs (v(z0 , b0 , s0 ))α E v(z0 , b0 , s0 )−α 0 v(z0 , b0 , s0 )|s
∂z
7 Utility function V is ordinally equivalent to utility function U if U = f (V ) for some increasing

function f . In this case define V = (1/ρ)U ρ .


8 See Figure 1 in Weil [Wei90].

77
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

With all that in mind, the first-order condition for the choice of z0 is

W1 (c, µs (v(z0 , b0 , s0 ))) p(s) = W2 (c, µs (v(z0 , b0 , s0 ))) µs (v(z0 , b0 , s0 ))
∂z0
which we can write as
 
ρ −1 0 0 0 ρ + α −1 ∂0 0
0 0 0 0 −α
(1 − β)c p(s) = βµs (v(z , b , s )) E v(z , b , s ) v(z , b , s )|s
∂z0
(4.27)
We can use an envelope argument to get an expression for the derivative of
v with respect to z. At state (z, b, s), this is given by


v(z, b, s) = W1 (c, µs (v(z0 , b0 , s0 )))( p(s) + y)
∂z
= (1 − β)W (c, µs (v(z0 , b0 , s0 )))1−ρ cρ−1 ( p(s) + y)
= (1 − β)v(z, b, s)1−ρ cρ−1 ( p(s) + y)

where the last line uses that fact that v(z, b, s) = W (c, µs (v(z0 , b0 , s0 ))) at the
optimum. We now advance this expression one period and plug it into the
right hand side of (4.27) to get
h i
cρ−1 p(s) = βµs (v(z0 , b0 , s0 ))ρ+α−1 E v(z0 , b0 , s0 )1−α−ρ (c0 )ρ−1 ( p(s0 ) + y0 )|s

The first-order condition for holdings of the riskless asset is analogous, sim-
ply plugging in q(s) for p(s) and 1 for the payoff, rather than p(s0 ) + y0 —
h i
cρ−1 q(s) = βµs (v(z0 , b0 , s0 ))ρ+α−1 E v(z0 , b0 , s0 )1−α−ρ (c0 )ρ−1 |s

As we’ve done before, we now impose equilibrium (c = y, z = z0 = 1,


b = b0 = 0) and rearrange to obtain the model’s pricing formulas. It’ll be
convenient to let V (s) = v(1, 0, s)—i.e., V (s) is the agent’s value function in
equilibrium. With the imposition of equilibrium and some re-arranging, we
get, for the equity price,
"   1− α − ρ  0  ρ −1 #
V (s0 ) y 0 0
p(s) = E β ( p(s ) + y )|s (4.28)
µs (V (s0 )) y

We can immediately see the form of the stochastic discount factor, which
for now we’ll write as depending on both s and s0 . In a moment we’ll think
more carefully about which variables it depends on. We have:
1− α − ρ   ρ −1
V (s0 ) y0

0
m(s, s ) = β (4.29)
µs (V (s0 )) y

This stochastic discount factor—like the stochastic discount factor in the habit
model—incorporates a ‘standard’ part and a ‘non-standard’ part. The standard

78
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

part is β(y0 /y)ρ−1 —the utility discount factor times a decreasing power func-
tion of aggregate consumption growth.9 This is analogous to the β( x 0 )−α piece
in the stochastic discount factors we’ve previously encountered.
The non-standard part is the part involving the agent’s value function. Sup-
pose that 1 − α − ρ < 0. Then, other things the same, payoffs in states where
realized lifetime utility falls short of its conditional certainty equivalent value
will be weighted more heavily in pricing assets then payoffs in states where
realized lifetime utility exceeds its conditional certainty equivalent value. This
has the potential to increase the volatility of the stochastic discount factor, by
reinforcing the volatility coming from the standard β( x 0 )ρ−1 channel.
Note that in the special case of α = 1 − ρ, the stochastic discount factor
reduces to the standard β( x 0 )−α .
So, what does m really depend on?—( x, y, x 0 , y0 )? Or maybe ( x, x 0 )? Or just
x ? Make the Mehra-Prescott assumption that y0 = x 0 y, with x 0 following a
0
Markov chain. Under that assumption, m depends on just x and x 0 , with the
dependence on x coming through the conditioning in the certainty equivalent
present in (4.29). There is no dependence on the level of consumption y, be-
cause of the degree-one homogeneity of preferences. That homogeneity allows
us to write the equilibrium value function as

V (s) = φ( x )y

for some function φ. The ‘non-standard’ term in the stochastic discount factor
then becomes
 1− α − ρ   1− α − ρ
V (s0 ) φ( x 0 )y0

=
µs (V (s0 )) µ x (φ( x 0 )y0 )
 1− α − ρ
φ( x0 ) x0 y

=
µ x (φ( x 0 ) x 0 y)
 1− α − ρ
φ( x0 ) x0

=
µ x (φ( x 0 ) x 0 )

where the last line uses the homogeneity of µ.


Plugging this into (4.29) gives
1− α − ρ
φ( x0 ) x0

0 0 ρ −1
m( x, x ) = β( x ) (4.30)
µ x (φ( x 0 ) x 0 )

Under the assumption that x 0 follows a Markov chain { x (1), . . . x (S), P}, x 0 and
φ are in fact just vectors, and m is a matrix:
  1− α − ρ
ρ −1 φ( j) x ( j)
m(i, j) = β( x ( j)) (4.31)
µi (φ0 x 0 ))
9 Recall ρ ≤ 1.

79
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

where !1/(1−α)
S
µi ( φ 0 x 0 ) = ∑ P(i, j) (φ( j)x( j)) 1− α
. (4.32)
j =1
As in the basic Mehra-Prescott model, the equity price here is again ho-
mogeneous in aggregate consumption—we have p( x, y) = w( x )y—and the
riskless asset price depends only on x, q = q( x ). Under the Markov chain
assumption, these are vectors, and the pricing equations become
S
w (i ) = ∑ P(i, j)m(i, j)(w( j)x( j) + x( j)) (4.33)
j =1
S
q (i ) = ∑ P(i, j)m(i, j) (4.34)
j =1

Once we know the matrix m = [m(i, j)], we know how to solve (4.33) and (4.34)
for w and q, and from them, all the objects we might be interested in.
Finding m poses a new challenge, though, because of it’s dependence on the
value function, through φ. We first need to find φ, and that will require some-
thing we’ve not had to do thus far—solving for something iteratively, rather
than just inverting some matrices.

4.2.3 Solving for φ


Recall that φ( x )y is the value function in equilibrium—i.e., the agent’s maxi-
mized lifetime utility from state ( x, y) on, given that c = y at all dates and
states. It thus follows the Bellman-like equation
h ρ i1/ρ
φ( x )y = (1 − β)yρ + βµ x φ( x 0 )y0

or—dividing both sides by y, using y0 = x 0 y and the homogeneity of µ—


h ρ i1/ρ
φ( x ) = 1 − β + βµ x φ( x 0 ) x 0

Under the Markov chain assumption, this becomes


h ρ i1/ρ
φ(i ) = 1 − β + βµi φ0 x 0 (4.35)

where µi (φ0 x 0 ) is the same object as in (4.32).


To solve for φ, we treat (4.35)—together with (4.32)—as a mapping that
takes a vector φ0 into a new vector φ1 . Given a φ0 , the steps in that mapping,
in M ATLAB, are:
1. Use (4.32) to create a vector of certainty equivalents—call it, for example,
mu. If you’ve set up φ0 and x so they are both column vectors, you can do
this in one ‘vectorized’ step:
mu = (P ∗ ((phi0. ∗ x).∧(1 − alpha))).∧(1/(1 − alpha));

80
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

2. Given mu, update φ using (4.35). Again, this step can be done in one
operation, without resorting to a ‘for’ loop:

phi1 = (1 − beta + beta ∗ (mu. ∧ rho)). ∧ (1/rho);

You would perform those two steps repeatedly, inside a ‘while’ loop, until
the successive iterates are changing only negligibly from one iteration to the
next. Before starting the loop, you need to set initial conditions and create
some variables to control the loop’s behavior. You start with some φ0 —say a
column of ones. I create one variable called chk which will record the distance
between iterates, and a variable tol that is my threshold for stopping the loop
(when chk gets less that tol). You can use M ATLAB’s norm function to measure
the distance between iterates. I set tol to be something very small, 10−5 , say,
or 10−7 .10
I also create a variable called maxits—the maximum number of times to
iterate, regardless of whether the iterates converge. This is just in case some-
thing is not right, so the loop doesn’t go on forever. There’s a variable t which
starts at 0 and gets incremented by 1 with each pass through the loop. The loop
breaks when either chk < tol or t > maxits.
The general form would be:
phi0 = ones(2,1);
chk = 1;
tol = 1e-7;
maxits = 1000;
t = 0;
while chk>tol && t<maxits;
[Steps to make φ1 here]
chk = norm(phi0-phi1)
t = t+1;
phi0 = phi1;
end;
Note the last line, which sets φ0 = φ1 —we’re repeatedly mapping a φ into a
new φ, then plugging that result into the mapping to get another iterate.

Exercize 4.3. Mehra-Prescott meet Epstein-Zin—or, reproducing some results in


Weil [Wei89]. Write some M ATLAB code to solve the Mehra-Prescott model with
Epstein-Zin preferences. Report the average equity premium and the average risk-
free rate for the following parameter combinations:

10 This choice depends a bit on what you’re trying to find and what you plan on using it for.

81
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

β ρ α
1 0.95 1/2 5
2 0.95 1/2 45
3 0.95 −9 5
4 0.95 −9 45
5 0.98 1/2 5
6 0.98 1/2 45
7 0.98 −9 5
8 0.98 −9 45

These are some of the parameter combinations for which Weil reports results
in the tables on page 413 of his 1989 J.M.E. paper. His γ is our α, and his ρ is
our 1 − ρ; in the table above, ρ = 1/2 corresponds to an EIS of 2, and ρ = −9
corresponds to an EIS of 0.1.
When I did this, I found some minor differences (at the first or second decimal
places) between my results and Weil’s, so don’t be surprised if you don’t obtain
exact matches. If you’re doing it right, though, your numbers should be very,
very close to Weil’s.

4.2.4 An alternative for eliminating the value function from


the pricing rules altogether
When you’re working in the ‘laboratory’ of computational experiments, the
dependence of the equilibrium pricing rules on the value function is not a big
deal. One can simply solve for the value function computationally. The pres-
ence of the value function in expressions like (4.28) is more problematic if one
wants to do econometrics. The value function is unobserved.
Epstein and Zin [EZ89] first noted that it was possible to eliminate the value
function from the stochastic discount factor, replacing it with a term related
to the return on the representative agent’s wealth. The steps are somewhat
involved, so let’s state the result up front, then go through the derivation. It’s
convenient to use time subscripts for much of this. We’re going to show how
to re-write the stochastic discount factor

c t +1 ρ −1
   1− α − ρ
v t +1
m t +1 = β (4.36)
ct µ t ( v t +1 )
as
 ρ−1 ! 1−ρ α
c t +1 1− α −1
m t +1 = β ( R t +1 ) ρ (4.37)
ct
where Rt+1 is the equilibrium equity return.

82
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

Ignoring holdings of the riskless bond (which is assumed to be in zero net


supply, anyway), define the agent’s wealth at the start of a period by

Wt = zt ( pt + yt ).

We can then relate wealth to wealth one period earlier by

Wt+1 = zt+1 ( pt+1 + yt+1 )


z t +1 ( p t +1 + y t +1 )
= p t z t +1
p t z t +1
= Rzt+1 [zt ( pt + yt ) − ct ]
= Rzt+1 (Wt − ct ) (4.38)

Here Rzt+1 is the return on the agent’s portfolio; in equilibrium this is, of course,
the market equity return, which we denote as before by Rt+1 .11
Now write the agent’s dynamic program with W as the individual state (s
still denotes the aggregate state, and we assume R is a function of s):
h ρ i1/ρ
v(W, s) = max (1 − β)cρ + βµs v( R0 (W − c), s0 ) .
c

In this expression, we’ll assume the agent holds the market portfolio, and we
write R0 as shorthand for R(s, s0 ), the market return between states s and s0 .12
The degree-one homogeneity of utility implies that v is degree-one homoge-
neous in W—i.e., we can write

v(W, s) = ξ (s)W

for some function ξ. Note that since (W − c) is non-stochastic (conditional on


s),

µ s v ( R 0 (W − c ) , s 0 ) = µ s ξ ( s 0 ) R 0 (W − c )
 

= µ s ξ ( s 0 ) R 0 (W − c )


Thus, the agent’s maximization problem has the simple form:


h i1/ρ
max (1 − β)cρ + βµs ξ (s0 ) R0 (W − c)ρ

. (4.39)
c

The problem looks almost static, but the term βµs (ξ (s0 ) R0 ) is capturing the
ρ

trade-off between consumption today and wealth tomorrow. The first-order


condition for the problem is

(1 − β)cρ−1 = βµs ξ (s0 ) R0 (W − c)ρ−1


11 It should be clear that including b, or applying this to the Lucas case of multiple trees, makes

no difference for the form.


12 We’ll briefly discuss portfolio choice below, in section 4.2.5.

83
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

which has a solution of the form


c = κ ( s )W (4.40)
where ! −1
ρ 1/(1−ρ)
βµs (ξ (s0 ) R0 )

κ (s) ≡ 1+ .
1−β
Of use later will be the following relation between κ (s) and µs (ξ (s0 ) R0 ) im-
ρ

plied by the first order conditions:


ρ 1
1 − κ (s) βµs (ξ (s0 ) R0 ) 1−ρ

= . (4.41)
κ (s) 1−β
Also, substituting c = κ (s)W back into the Bellman equation gives a direct
relationship between ξ and κ. Making this substitution gives us:
ξ (s)ρ = (1 − β)κ (s)ρ + βµs ξ (s0 ) R0 (1 − κ (s))ρ

βµs (ξ (s0 ) R0 )
ρ 
1 − κ (s) ρ
  
= (1 − β )κ ( s ) 1 +
ρ
1−β κ (s)
"  1− ρ   #
1 − κ (s) 1 − κ (s) ρ

= (1 − β )κ ( s ) 1 +
ρ
κ (s) κ (s)
1 − κ (s)
 
= (1 − β )κ ( s ) ρ 1 +
κ (s)
= (1 − β ) κ ( s ) ρ −1 (4.42)
where the third line uses (4.41).
The equations (4.41) and (4.42), plus the budget constraint (4.38) (with the
equilibrium market return Rt+1 ), and the decision rule (4.40) are all the pieces
we need. We proceed by asking what all these relationships imply for the
growth rate of consumption between states st and st+1 . There are several steps,
at various times utilizing (4.38), and (4.40)–(4.42):
c t +1 κ (st+1 )Wt+1
=
ct κ (st )Wt
κ (st+1 ) Rt+1 (1 − κ (st )Wt
=
κ (st )Wt
1 − κ (st )
= R t +1 κ ( s t +1 )
κ (st )
 1   1
ξ (st+1 )ρ ρ−1 βµst (ξ (st+1 ) Rt+1 )ρ 1−ρ

= R t +1
1−β 1−β
ρ
1 1− ρ ρ
= β 1−ρ Rt+1ρ−1 (ξ (st+1 ) Rt+1 ) ρ−1 µst (ξ (st+1 ) Rt+1 ) 1−ρ
1
  ρ
1
1 − ρ ξ ( s t + 1 ) R t + 1
ρ −1
= β 1− ρ R t +1
µ s t ( ξ ( s t +1 ) R t +1 )

84
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

Now, rearrange to solve for the terms involving the value function—i.e., the
terms in ξ:
 1
c t +1 1− ρ

ξ ( s t +1 ) R t +1 1
= ( βRt+1 ) ρ . (4.43)
µ s t ( ξ ( s t +1 ) R t +1 ) ct
But, note, the term on the left is precisely vt+1 /µt(vt+1 ):

ξ ( s t +1 ) R t +1 ξ (st+1 ) Rt+1 (Wt − ct ) v t +1


= = .
µ s t ( ξ ( s t +1 ) R t +1 ) µst (ξ (st+1 ) Rt+1 (Wt − ct )) µt(vt+1 )

Now, just substitute the right-hand side of (4.43) for vt+1 /µt (vt+1 ) in (4.36) and
simplify the resulting expression to obtain (4.37).

4.2.5 A note on portfolio choice in the EZ framework


We assumed at the start of this section that the agent held the market portfo-
lio, but the treatment of portfolio choice in the Epstein-Zin framework is itself
interesting.
Suppose there are many assets, with R0 = R10 , R20 , . . . R0n the vector of re-


turns. One asset, perhaps the first, may be the risk-free asset. Let θ denote
the vector of portfolio weights; as in our treatment of mean-variance portfo-
lio choice, the budget constraint becomes a constraint that the weights sum to
one—θ · 1 = 1. The portfolio return is ∑i θi Ri0 = θ · R0 .
The agent’s dynamic programming problem becomes:
h 
0
 i1/ρ
0 ρ
v(W, s) = max (1 − β)c + βµs v(θ · R (W − c), s )
ρ
: θ·1 = 1
c,θ

As before, we still have (from the homogeneity of preferences) that v(W, s) =


ξ (s)W. The choice of c still solves a maximization problem of the form (4.39),
given a choice of θ.
Since the aggregator W (c, µ) is increasing in µ, the choice of θ just maxi-
mizes µs (ξ (s0 )θ · R0 ), that is, θ solves:

max µs ξ (s0 )θ · R(s, s0 ) : θ · 1 = 1 .


 
θ

This problem appears to be essentially a static one, but the normalized value
function ξ (s0 ) encodes information about the marginal value to the agent of
consumption today versus wealth in different states tomorrow.
Taking account of the form of the certainty equivalent operator µs , the prob-
lem can be written as
!1−α  1−1 α
  

 

max Es  ξ (s ) ∑ θi Ri (s, s )
0 0  : ∑ θi = 1 . (4.44)
θ  i i

85
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

The first-order condition for the kth portfolio weight is:


 !−α 

Es  ξ (s0 ) ∑ θi Ri (s, s0 ) ξ (s0 ) Rk (s, s0 ) = λ(s) (4.45)


i

where λ(s) is the Lagrange multiplier on the constraint ∑i θi = 1. Since this


condition holds for every asset k, we have, for k and h,
 !−α 

Es  ξ (s0 ) ∑ θi Ri (s, s0 ) ξ (s0 ) Rk (s, s0 ) − Rh (s, s0 )  = 0.



i

Alternatively, multiply (4.45) by θk and sum over all k to get



1− α
! 

λ(s) = Es  ξ (s0 ) ∑ θi Ri (s, s0 ) 


i
!1− α
= µs ξ (s0 ) ∑ θi Ri (s, s0 )
i

Then, for any asset k,

(ξ (s0 ) ∑i θi Ri (s, s0 ))−α ξ (s0 )


" #
0
Es Rk (s, s ) = 1 (4.46)
µs (ξ (s0 ) ∑i θi Ri (s, s0 ))1−α
But then, (4.46) indicates that

(ξ (s0 ) ∑i θi Ri (s, s0 ))−α ξ (s0 )


(4.47)
µs (ξ (s0 ) ∑i θi Ri (s, s0 ))1−α
is a stochastic discount factor. In fact, it is the same stochastic discount fac-
tor as in (4.36) or (4.37), assuming the agent holds the market portfolio—i.e.,
∑i θi,t Ri,t+1 equals the equilibrium equity return Ri,t+1 .

Exercize 4.4. Use (4.43) and ∑i θi,t Ri,t+1 = Rt+1 to show that (4.47) is the same
stochastic discount factor described by either (4.36) or (4.37).

Forgetting about asset pricing momentarily, within the framework of the


last two sections, if an agent faces i.i.d. returns, then it’s straightforward to
show that his consumption-wealth ratio κ and marginal utility of wealth ξ must
be constants. In that case, the portfolio problem (4.44) does not depend on ξ
and consists simply in choosing a portfolio with a maximal certainty equivalent
return, µ(∑i θi Ri,t+1 ).13
13 Because returns are i.i.d., there is no dependence of µ on t.

86
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I

Given that return, the agent chooses consumption today in the same way
he would if he faced a constant certain return of R̄ = µ(∑i θi Ri,t+1 ). Under
certainty, Epstein-Zin preferences are ordinally equivalent to

∑ βt ct /ρ.
ρ

t =0

An agent with those preferences facing a constant rate of return on savings


equal to R̄, would choose a path of consumption satisfying the Euler equations
ρ −1 ρ −1
ct = β R̄ct+1

which are solved by a decision rule of the form ct = κWt , for a constant κ.14
The κ that solves these first order conditions is given by
1
1 − κ = ( β R̄ρ ) 1−ρ .

This is the same κ one obtains from combining (4.41) and (4.42) under the
assumption of i.i.d. returns.
If there is one risky asset—so R̄ = µ( Rt+1 )—then a mean-preserving spread
of the distribution of Rt+1 lowers µ( Rt+1 ). This can either increase or decrease
the agent’s savings rate (1 − κ ), depending on whether ρ/(1 − ρ) is positive or
not. Recalling that ρ = 1 − 1/EIS, an increase in rate-of-return uncertainty, in
an i.i.d. world, increases the agent’s savings rate so long as his EIS > 1.
This result is worth keeping in mind when we get to the long-run risk
model of Bansal and Yaron [BY04]—an EIS greater than one is critical to their
results. This property of Epstein-Zin preferences—and others—were shown
by Weil in his Q.J.E. paper [Wei90].

4.3 State-dependent preferences: More on Melino


and Yang (2003)
In section 3.4, we saw Melino and Yang’s [MY03] argument for some form of
state-dependent preferences as a way of matching the first and second mo-
ments of asset returns in the simple two-state Mehra-Prescott model. In this
section, we’ll look at the form of state-dependence they studied.
Their approach is basically to take the Epstein-Zin preferences described
by (4.23) and (4.24), and allow the three parameters ρ, β and α to vary with the
aggregate state—i.e., letting ρ = ρ(s), β = β(s) and α = α(s). In the context of
the Mehra-Prescott model, each parameter (potentially) takes on two values,
depending one whether consumption growth is low, x (1), or high, x (2).
They experiment with allowing one, two or all three of the parameters to
vary with the growth state. Allowing two of the parameters to depend on
14 You can verify this by plugging in κWt for ct and κ R̄(1 − κ )Wt for ct+1 .

87
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I

x (while the third is a constant) gives exactly the degrees of freedom neces-
sary make the model’s stochastic discount factor consistent with the return
process—(3.25)—that Melino and Yang derived as consistent with the first two
moments of asset returns data, given the two-state Mehra-Prescott consump-
tion process.15 But, as Melino and Yang show, just having the necessary de-
grees of freedom doesn’t mean the resulting state-dependent parameters will
satisfy natural requirements like ρ ≤ 1 or α > 0.
If only one parameter is allowed to vary with the state, they have to choose
what to target; in these cases, they seek values of the state-dependent pa-
rameters that are consistent with the pricing relation for the equity return,
Et (mt+1 Rt+1 ) = 1, then see what those parameters imply for the values taken
on by the risk-free rate. When all three parameters are allowed to vary with the
state, they have an extra degree of freedom, and there are consequently many
ways to match the data.
Their notation is almost identical to ours, though they use α for our 1 − α,
and denote the consumption growth rate (our x) by g. The price-dividend ratio
we have called w, they denote by P—i.e., whereas we have written the equity
return from state i to state j as R(i, j) = x ( j)(1 + w( j))/w(i ), they would write
g( j)(1 + P( j))/P(i ). One major difference, though—and this becomes appar-
ent if you try to derive the form of their stochastic discount factor within our
model of Epstein-Zin preferences—is that whereas we write the CES aggrega-
tor W (c, µ) as
1
W (c, µ) = [(1 − β)cρ + βµρ ] ρ ,
they write
1
W (c, µ) = [cρ + βµρ ] ρ .
With a constant utility discount factor, these preferences just differ by a factor
of proportionality—they imply the same marginal rates of substitution, hence
are equivalent representations. This is not the case when β can vary with the
state.
The key expression is their equation (6.9), which describes the stochastic
discount factor when all three parameters are allowed to vary with the state.
Translated it into our notation, and with our form for the aggregator, it reads:

  1− 1− α ( s t ) 1− α ( s t )
−α(s ) wt ρ(st ) −1
m t +1 = β ( s t ) x t +1 t ( 1 + w t +1 ) ρ ( s t +1 )
β(st )
!1− α ( s t )
(1 − β(st+1 ))1/ρ(st+1 )
× (4.48)
(1 − β(st ))1/ρ(st )
The last piece—the one involving β(st ) and β(st+1 )—is one that comes about
because of the form we use for the aggregator. Setting that last term equal to
one gives the pricing kernel studied by Melino and Yang.
15 Alternatively, they have the degrees of freedom to equate the model’s stochastic discount factor

to the m̂ you derived in exercize 3.5.

88
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I

How do they get this? If none of the parameters vary with the state, (4.48) is just a
version of our (4.37)—simply plug in xt+1 for ct+1 /ct , xt+1 (1 + wt+1 )/wt for Rt+1 , and
rearrange.
When the parameters vary, there is another route, that relies on an equilibrium re-
lationship between the consumption-wealth ratio—the κ of the last section—and the
price-dividend ratio w. At the start of the section 4.2.4, we defined the agent’s wealth as
zt ( pt + yt ). In equilibrium, with z = 1 and c = y, the consumption-wealth ratio κ obeys
ct
κt =
Wt
yt
=
pt + yt
1
=
pt /yt + 1
1
=
wt + 1
If you begin with the version (4.47) of the stochastic discount factor, taking account of
the dependence of the preference parameters on the state—that is, begin with
1− α ( s t ) −α(st )
ξ t +1 R t +1
m t +1 =
µt(ξ t+1 Rt+1 )1−α(st )

—and (a) use (4.41) to replace the µt (ξ t+1 Rt+1 ) with an expression in wt , ρ(st ), and
β(st ); (b) use (4.42) to replace the ξ t+1 with an expression in wt+1 , ρ(st+1 ), and β(st+1 );
and (c) use xt+1 (1 + wt+1 )/wt to replace Rt+1 , and re-arrange some terms, you should
obtain (4.48).

In terms of the the Markov chain representation for the evolution of the
state, we can re-write (4.48) as

  1− 1− α ( i ) 1− α ( i )
− α (i ) w (i ) ρ (i ) −1
m(i, j) = β(i ) x ( j) (1 + w( j)) ρ( j)
β (i )
!1− α ( i )
(1 − β( j))1/ρ( j)
× . (4.49)
(1 − β(i ))1/ρ(i)

To perform the sort of computational experiments that Melino and Yang


perform, one would plug the Mehra-Prescott growth rates ( x ) and the Melino-
Yang price-dividend ratios (w) into (4.49). The Melino-Yang price-dividend
ratios are (see the Pl and Ph they settle on toward the bottom of page 810):
   
w (1) 23.467
=
w (2) 27.839

89
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I

Also, plug in parameter values for any of the taste parameters that are not going
to be state-varying. Then, seek values of the state-dependent parameters to try
to satisfy one or more of the four pricing relations
2
∑ P(i, j)m(i, j) R̂(i, j) = 1 (4.50)
j =1
2
∑ P(i, j)m(i, j) = 1/ R̂F (i) (4.51)
j =1

where P is the Mehra-Prescott transition matrix and R̂ and R̂ F are the Melino-
Yang returns (3.25).
Melino and Yang consider several combinations of state-dependence in one
or more parameters, while keeping the other(s) constant. The cases they report
results for are:
1. β and ρ fixed, α state-dependent.
2. β and α fixed, ρ state-dependent.
3. ρ fixed, α and β state-dependent.
4. β fixed, α and ρ state-dependent.
5. All three parameters state-dependent.
In (1) and (2) they seek values to satisfy (4.50), then check what those values
imply for the risk-free rate—i.e., they check how badly they miss on (4.51). In
(3) and (4), they can hit both sets of targets, but not necessarily with plausible
parameter values.
Probably their most striking finding is that countercyclical risk aversion
alone—that is, with constant ρ and β—doesn’t help: there’s no real improve-
ment over what you could get with all parameters constant. A procyclical
elasticity of intertemporal substitution allows you to match first moments of
returns, keeping the other parameters fixed. But, countercyclical risk aversion
together with a slightly procyclical willingness to substitute intertemporally
allows you to match both first and second moments of the returns data.
Since the risk-free rate is countercyclical—in the data as well as in their RˆF —
it’s not surprising that we get a procyclical EIS: in bad times agents must be
demanding more compensation to substitute consumption over time. Not sur-
prisingly, given the risk neutral probabilities implied by their returns R̂ and RˆF ,
the risk aversion parameter α alternates between extreme risk aversion and ap-
proximate risk neutrality. For a constant utility discount factor of β = 0.98, for
example, they match first and second moments exactly with risk aversion alter-
nating between α ≈ 23 and α ≈ 0, while ρ moves slightly between about −1.98
and −2.10—i.e., the agent’s EIS alternates between roughly 0.34 and 0.32.16
16 And, in a way, that’s the truly striking part. Just countercyclical risk aversion?—no real gain.

Countercyclical risk aversion plus an almost imperceptible change in the EIS?—Bingo.

90
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

4.4 Preferences displaying first-order risk aversion


Habits and Epstein-Zin preferences both relax the state separability character-
istic of the standard time-additive, expected-utility preferences used by Mehra
and Prescott (and in most RBC/DSGE models). For timeless gambles, they still
conform to the axioms of expected utility (in particular, state separability). The
Melino-Yang preferences are a bit harder to categorize, but conditional on the
state of the economy, they are simply EZ preferences, and thus conform to ex-
pected utility as well (again, conditional on the state). The preferences in this
section and the next, each, in different ways, dispense with the independence
axiom, and thus depart from expected utility, even for timeless gambles.17
The first form we’ll look at goes by different names—‘first order risk aver-
sion’, ‘rank dependent expected utility’, ‘expected utility with rank dependent
probabilities’, ‘anticipated utility’, or ‘Yaari’ preferences, to name a few. We’ll
refer to them as first-order risk averse, or FORA, preferences, though bear in
mind that’s a bit vague (the disappointment aversion preferences of the next
section also display first-order risk aversion).
It’s probably easiest to describe what they are, and then talk about why they
are what they are. FORA risk preferences were first applied to the equity pre-
mium puzzle by Epstein and Zin [EZ90]. In terms of functional structure, they
are like standard EZ preferences—a CES aggregator plus a certainty equivalent
operator. What’s different is the form of the certainty equivalent operator.

4.4.1 FORA: The ‘what’


Suppose φ follows an n-state Markov chain with transition probabilities P, and
importantly, suppose that states are ordered such that
φ (1) ≤ φ (2) ≤ φ (3) · · · ≤ φ ( n ).
Then, we define the certainty equivalent, for α ≥ 0, by:
" #1/(1−α)
n
µ̂i (φ) = ∑ GP (i, j)φ( j)1−α , (4.52)
j =1

where the GP (i, j) are defined, for γ ∈ (0, 1], by:


GP (i, 1) = P(i, 1)γ , ( j = 1) (4.53)

j
GP (i, j) = ∑ P(i, h) − GP (i, j − 1), ( j = 2, 3, . . . n). (4.54)
h =1

Thus, for example, in the two-state case, we have


GP (i, 1) = P(i, 1)γ
GP (i, 2) = 1 − P(i, 1)γ
17 The best reference work on issues concerning choice under uncertainty is, to my mind, Krep’s

‘underground classic’, Notes on the Theory of Choice [Kre88].

91
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

Note that the GP (i, j) have all the properties of Markov chain probabilities:
GP (i, j) ≥ 0 for all i and j, and ∑ j GP (i, j) = 1 for all i. Thinking of them as
probabilities, note that their ‘CDF’ obeys:

j
Pr {φt+1 ≤ φ( j) : φt = φ(i )} = ∑ GP (i, k)
k =1

j
= ∑ P(i, k)
k =1
j
≥ ∑ P(i, k)
k =1

The last line follows from the fact that when γ ∈ (0, 1], r γ ≥ r for all r ∈ [0, 1].
In a sense, GP gives more weight to the lower valued outcomes than does
the true probability P. This is most apparent, again, in the two-state case. Sup-
pose that P(i, 1) = P(i, 2) = 12 , and that γ = 0.9. Then
 
( GP (i, 1), GP (i, 2)) = (1/2)0.9 , 1 − (1/2)0.9 ≈ (0.54, 0.46) .

In terms of asset-pricing, everything works here as in our treatment of the


basic Epstein-Zin model—we just replace the original µ( · ) with the new µ̂( · ),
and all the expectations that the agent takes with expectations using the ‘prob-
abilities’ GP .18
Let Ê denote expectation taken with respect to GP . The key pricing equation
becomes (for any asset return R)
" − 1   1 −α #
c t +1 ψ v t +1 ψ
Êt R t +1 = 1 (4.55)
ct µ̂t (vt+1 )

The rank-ordering of the outcomes is based on the values of vt+1 —the Markov
states need to be ordered so that state 1 has the lowest value of vt+1 and state
n has the highest.
Because the aggregator and certainty equivalent are still homogeneous of
degree one, we get all the useful homogeneity properties we had before—vt
still equals Φ( xt )ct in equilibrium when consumption follows a Mehra-Prescott-
type process, and the price of a consumption claim still has the form p( xt , ct ) =
w( xt )ct .
If you’ve written good code to solve the Mehra-Prescott model with Epstein-
Zin preferences, it’s easy to modify that code to solve the model with FORA
preferences. An exercize below will ask you to do just that.
18 And note well that when we get to the steps where we’re calculating expectations—e.g., the

expected return on equity—we use the true probability P, not GP .

92
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

4.4.2 FORA: The ‘why’


At first glance, FORA preferences may seem a bit ad hoc—we’ve basically just
made the agent more pessimistic about the evolution of the state of the econ-
omy than is really the case. But, these preferences actually have a very good
theoretical pedigree. First, though, let’s understand what’s meant by ‘first-
order’ risk aversion.
The ‘first-order’ in the name refers to the risk premia these preferences
imply—that is, the compensation an agent requires to be indifferent between
a risky consumption outcome and a certain one. With standard expected util-
ity preferences—as we saw back in equation (4.18)—the risk premium asso-
ciated with a small gamble is proportional to the gamble’s variance (which is
second-order small); those are ‘second-order risk averse’ preferences. For gam-
bles with a small standard deviation, σ2 is much smaller than σ—consumption
growth, for example, has a standard deviation around 0.03 and a variance
around 0.009. In fact, as σ gets small, σ2 ≈ 0, so expected utility agents are
approximately risk neutral for sufficiently small gambles.19
Yet, evidence suggests that people are non-negligibly averse to small gam-
bles.20 Getting expected utility maximizing agents to care about small risks
requires extreme amounts of curvature in their utility functions. In contrast,
preferences that display ‘first order’ risk aversion generate risk premia that
(for small gambles) are proportional to gambles’ standard deviations—so risk
premia decline linearly with the size of the gamble, rather than quadratically.
A first-order risk averse agent might think about extending the warranty
on his refrigerator. An expected utility agent never would—or, if he would do
that, then he would necessarily behave in a wildly risk averse way with respect
to sizeable gambles.21
Epstein and Zin’s specification of first-order risk aversion is based on the
non-expected utility formulations of Yaari [Yaa87] and Quiggin [Qui82]. Risk
preferences of this sort can be derived under various sets of axioms (see Wakker
[Wak94], and the references therein). A key feature of these preferences—like
many other alternatives to expected utility—is that they are non-linear in prob-
abilities, hence will violate the independence axiom underlying expected util-
ity. Among the aims of the authors who originally formulated risk preferences
of this form was to elaborate models of choice under risk capable of rationaliz-
ing the apparent fact that individuals often make choices that are inconsistent
with the independence axiom—for example, the Allais paradox, or the com-
19 The distinction between first-order and second-order risk aversion was first made by Segal

and Spivak [SS90].


20 Not everyone agrees on the evidence, but people do routinely insure against small risks—e.g.,

paying $50 or so to extend the warranty on a $500 refrigerator. More seriously, see, for example,
the application in Bernasconi [Ber98], which uses first-order risk aversion to rationalize what ap-
pear (from an EU perspective) to be puzzlingly high rates of tax compliance in most developed
economies.
21 This point—which we discuss in more detail below—is formalized in Rabin’s [Rab00] ‘calibra-

tion theorem’. See also [SS08].

93
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

mon ratio effect documented by Kahneman and Tversky [KT79].22

The Allais Paradox is the following observation about choices individuals


typically make. Consider the following choice problems (parameters from
Kahneman and Tversky [KT79]):
(P1) Choose between:
A $2,500 with probability 0.33, $2,400 with probability 0.66, $0
with probability 0.01.
B $2,400 with probability 1.
(P2) Choose between:
C $2,500 with probability 0.33, $0 with probability 0.67.
D $2,400 with probability 0.34, $0 with probability 0.66.
Most people choose B in problem P1 and C in problem P2, in violation of
expected utility.
The Common Ratio Effect is the following observation about the choices peo-
ple typically make. Again, consider the following two choice problems
(parameters again from Kahneman and Tversky):
(P1) Choose between:
A $4,000 with probability 0.80, $0 with probability 0.20.
B $3,000 with probability 1.
(P2) Choose between:
C $4,000 with probability 0.20, $0 with probability 0.80.
D $3,000 with probability 0.25, $0 with probability 0.75.

Again, most people choose B in problem P1 and C in problem P2, in viola-


tion of expected utility.

When α 6= 0 in (4.52) and γ 6= 1 in (4.53) and (??), µ̂ incorporates aspects of


preferences featuring both first-order risk aversion and more standard constant
relative risk aversion (CRRA). If γ = 1, we are in the case of basic Epstein-Zin
preferences, while if α = 0 we have risk preferences that match the formulation
of Yaari. Of course, if γ = 1 and α = 1/ψ, we obtain the case of time-additively
separable expected utility.
The fact that risk preferences of this form are non-linear in probabilities
gives them another attractive feature: the ability to at least partially divorce
agents’ attitudes towards risk from their attitudes towards wealth.23 Under
expected utility, aversion to risk is equivalent to diminishing marginal util-
ity of wealth, and the intimate connection between the two concepts has been
22 Starmer [Sta00] is an excellent recent survey of this literature.
23 As Yaari [Yaa87] puts it:“At the level of fundamental principles, risk aversion and diminishing
marginal utility of wealth, which are synonymous under expected utility, are horses of different
colors.” In Yaari’s theory the divorce of the two concepts is complete.

94
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I

shown to be problematic for the EU model. For example, Chetty [Che06] has
shown that estimates of labor supply elasticity (and the degree of complemen-
tarity between consumption and leisure) can put sharp bounds on admissible
coefficients of relative risk aversion, since both values are linked to the curva-
ture of agents’ von Neumann-Morgenstern utilities over consumption. Chetty
finds that the mean coefficient of relative risk aversion implied by 33 studies of
labor supply elasticity is roughly unity, which would mean that the EU model
is incapable of rationalizing both observed labor supply behavior and the de-
grees of risk aversion observed in many risky choice settings, many of which
imply double-digit coefficients of relative risk aversion.
As we hinted at above, another attractive feature of FORA preferences is
the fact that they can be parametrized to give a reasonable amount of risk aver-
sion for both large and small gambles. This is in contrast to the standard ex-
pected utility specification. In the CRRA class, for example, if the coefficient of
risk aversion is calibrated so that an agent with those preferences gives plau-
sible answers to questions about large gambles, the agent will be roughly risk
neutral for small gambles. If, on the other hand, the coefficient of risk aver-
sion is set sufficiently large that the agent gives plausible answers to questions
about small gambles, he will appear extremely risk averse when confronted
with large gambles.24
One way to visualize the approximate risk neutrality of the standard ex-
pected utility specification with constant relative risk aversion is to note that
it’s “smooth at certainty”—the agent’s indifference curves between consump-
tion in different states of nature are smooth and tangent (at the certainty point)
to the indifference curves of a risk neutral agent. This is true for EU with any
differentiable von Neumann-Morgenstern utility function.
FORA preferences introduce a kink into agents’ indifference curves at the
certainty point; the kink is what allows for a plausible calibration of risk aver-
sion for small gambles.25 The parameter γ—which makes outcome rankings
matter—is the source of the kink. The parameter α, analogous to the risk aver-
sion coefficient in CRRA preferences, governs curvature away from the cer-
tainty point and allows for a plausible calibration of risk aversion for large
gambles.
24 This point was made formally by Rabin [Rab00], though EU preferences aren’t the only form

susceptible to this critique. Indeed, as Safra and Segal show in a recent paper [SS08], almost all
common alternatives to expected utility are susceptible to this criticism. The one exception noted
by Safra and Segal is Yaari’s dual theory of choice under risk, which is the special case of (4.52)
when α = 0. A useful perspective is offered by Palacios-Huerta, Serrano and Volij [PSV04]: “[I]t
is more useful not to argue whether expected utility is literally true (we know that it is not, since
many violations of its underpinning axioms have been exhibited). Rather, one should insist on the
identification of a useful range of empirical applications where expected utility is a useful model
to approximate, explain, and predict behavior.”
25 See figure 1 in [EZ90]. The “disappointment aversion” preferences used by Routledge and Zin

[RZ10] and Campanale et al. [CCC10] share this feature.

95
4.5. DISAPPOINTMENT AVERSION LECTURE 4. PUZZLE RESPONSES, I

4.4.3 FORA: Some results


Blah blah.

4.5 Models with disappointment aversion


This is a promising approach that, unfortunately, we won’t have time to go
over in class. The key papers are: Gul [Gul91], for the axiomatic background;
Routledge and Zin [RZ10], for the asset-pricing application; and Campanale,
Castro and Clementi [?], for situating preferences of this form in a production
economy.

96
Lecture 5

Responses to the Equity


Premium Puzzle, II:
Modifying the Consumption
Process

The models of the last lecture all modified, in some way, the preferences of
Mehra and Prescott’s representative agent.1 This lecture will focus on two
models that alter the consumption process faced by the representative agent—
Bansal and Yaron’s [BY04] ‘long run risk’ approach, and the ‘rare disasters’
model, originally due to Rietz [Rie88], but lately revived by Barro [Bar06], Gou-
rio [Gou08], and Gabaix [Gab08].
Your first thought might be—“The consumption process is whatever it is
in the data; you can’t just plug in another one.” That would be true if the
data spoke definitively on the subject, but, given a limited number of observa-
tions, the data may not be sufficient to discriminate between alternatives that,
while close to one another in some statistical sense, have dramatically different
implications for the behavior of economic models. Is the distribution of log
consumption growth rates better described by a normal distribution or by a
distribution with fatter tails? Whether a consumption disaster is likely to oc-
cur once every couple hundred years or a couple times every hundred years is
difficult to decide with just 100 years’ worth of data, but makes an enormous
difference for pricing claims to aggregate consumption.
Looking ahead to Bansal and Yaron’s long-run risk, it’s difficult to distin-
guish between a consumption process with log differences that are i.i.d. about a
1 Unfortunately,we didn’t have time to look at models that dispense with the representative
agent altogether—for example, [Guv09]. Our treatment was also far from exhaustive, omitting the
interesting work on disappointment aversion by Routledge and Zin [RZ10] or Campanale et al.
[CCC10].

97
LECTURE 5. PUZZLE RESPONSES, II

constant mean, and a process with i.i.d. fluctuations about a conditional mean
subject to very small but very persistent fluctuations. Figure 5 plots some ar-
tificial data I created in M ATLAB. In one of the series in the top panel I used
a constant mean growth rate, in the other a fluctuating conditional mean. The
fluctuating conditional mean—the difference between the two series in the top
panel—is shown in the lower panel. The parameters are Bansal and Yaron’s,
so the standard deviation of the innovations to the conditional mean is very
small compared to the innovations around the conditional mean. The i.i.d. in-

Simulated consumption growth, with and without long-run risk


0.02

0.01

-0.01

-0.02

-0.03
0 20 40 60 80 100 120

Long-run risk component x(t)


0.02

0.01

-0.01

-0.02

0 20 40 60 80 100 120

Figure 5.1: Simulated data using parameters from Bansal and Yaron’s con-
sumption process. Top panel shows log growth rates with and without long-
run risk component. Bottom panel shows long-run risk component.

novations swamp the innovations to the conditional mean, making it difficult


to see much difference between the series. The Bansal-Yaron parameters are
calibrated for monthly data, so the 120 observations in the figure correspond
to ten years’ worth of data. Given enough data, the difference between the two
processes is readily apparent in the level of consumption.
Figure 5 plots the log-levels of consumption—i.e., cumulated log growth
rates—for series with and without a long-run risk component. To highlight
the differences, in this case I’ve plotted simulated series of 10, 000 observations
each—about 800 years’ worth. The differences are clear even over shorter sub-

98
LECTURE 5. PUZZLE RESPONSES, II

10,000 periods of log consumption, with and without long-run risk


16

14

12

10

0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Figure 5.2: Simulated data using parameters from Bansal and Yaron’s con-
sumption process. The figure shows cumulated log growth rates for series
with and without a long-run risk component.

samples, but I’ve chosen to show 10, 000 periods just because anything shorter
might look like I’m cherry-picking the data. And, it’s certainly not movements
at that low a frequency that are driving the difference between the asset pric-
ing implications with and without long-run risk. In an exercize to follow, you’ll
be asked to solve a version of the Bansal-Yaron model. That solution will in-
volve iterating on the representative agent’s (equilibrium) value function. By
checking how many iterations it takes for the iterates of the value function to
converge to within a reasonably small tolerance of one another—say 10−7 —
you’ll see that the agent effectively looks into the future far fewer than 10, 000
periods.2
2 To be sure, technically the agent looks over the whole infinite horizon; practically speaking,

though, anything beyond 150 or so periods is discounted so heavily that it has only a negligible
impact on the agent’s utility.

99
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

5.1 Long-run risk in consumption growth: Bansal


and Yaron (2004)
5.1.1 Some context: Explaining movements in price-dividend
ratios
Let’s begin with a little bit in the way of context, though. Long-run risk emerged
as a potential resolution of the equity premium puzzle only after movements
in the long-run growth rate of dividends were suggested as an explanation for
the apparent excessive volatility of price-dividend ratios.
That ‘excess volatility’ literature begins in the early 1980s, with the papers
of Shiller [Shi81] and Leroy and Porter [LP81]. It was initially framed in terms
of the volatility of stock prices relative to the present discounted value of divi-
dends. While that initial approach suffered from some econometric flaws that
were quickly pointed out, it’s still worth a quick look, since its descendants still
frame current discussions.3
Shiller’s paper was probably the more influential of the two. Shiller argued
that stock prices should be expectations of what he called their ‘ex post rational
value’, or pt = Et ( p∗t ). His ‘ex post rational value’ was the discounted sum of
actual future dividend payments, discounted at a constant rate γ:

p∗t = ∑ γ j dt+ j (5.1)
j =1

Note that the dt+ j ’s represent actual, realized dividends.


Then, if pt = Et ( p∗t ), it follows that we can remove the expectation and
write
p∗t = pt + ut (5.2)
where ut is orthogonal to pt . If we take the variance of both sides of (5.2), we
obtain
var( p∗t ) = var( pt ) + var(ut )
since pt and ut are orthogonal. Since variance is always nonnegative, we obtain
the variance bound
var( p∗t ) ≥ var( pt ) (5.3)
which states that the variance of the stock price cannot exceed the variance of
the ‘ex post rational price’.
Shiller then constructed an empirical counterpart to p∗t using data on ac-
tual dividends, an assumption about γ, and approximating the infinite series
in (5.1) with a truncated one. He then compared detrended pt and p∗t and con-
cluded that pt was much more volatile that p∗t , in apparent violation of (5.3).
See figure 5.1.1, which reproduces Figure 1 from Shiller’s paper.
3 Leroy’s Palgrave entry on “Excess Volatility Tests” [Ler08] gives a nice history of this literature.

A version of this short, informative paper is available from Leroy’s website, here http://www.
econ.ucsb.edu/~sleroy/downloads/excess.pdf.

100
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

422 THE AMERICAN ECONOMIC REVIEW JUNE 1981

Index
300- Index
2000-

225!
1500
p

150- *
1000-

75-
500-

0 I year yeor
0 I I I I 1
1870 1890 1910 1930 1950 1970 1928 1938 1948 1958 1968 1978

FIGURE 1 FIGURE 2
Note: Real Standardand Poor'sCompositeStock Price Note:RealmodifiedDow JonesIndustrialAverage(solid
Index (solid line p) and ex post rationalprice (dotted line p) and ex post rational price (dotted line p*),
line p*), 1871- 1979,both detrendedby dividinga long- 1928-1979,both detrendedby dividingby a long-run
run exponentialgrowth factor. The variablep* is the exponentialgrowthfactor.Thevariablep* is the present
present value of actual subsequentreal detrendeddi- value of actual subsequentreal detrendeddividends,
vidends, subject to an assumptionabout the present subject to an assumptionabout the present value in
value in 1979 of dividendsthereafter.Data are from 1979of dividendsthereafter.Data are from Data Set 2,
Data Set 1, Appendix. Appendix.

Figure 5.3:path
growth Figure
for taken from Shiller
the Standard [Shi81].
and Poor's that p, =E,( p*), i.e., p, is the mathematical
series, 16-38 percentbelow the growthpath expectation conditional on all information
for the Dow Series)only for a few depression availableat time t of p*. In otherwords,p, is
While this approach years: 1933,
had 1935,and 1938.problems
1934,econometric
some The mov- associated
the optimalwithforecast
it—of p*. One can define
ing averagewhich determinesp* will smooth the forecast error as u,= p* -pt. A funda-
having to do with the outpossible nonstationarity
such short-run fluctuations.of boththe
Clearly pricesmental
and dividends—
principleof optimal forecastsis that
as well as some economic problems, 4 the point still stuck. Eventually, the idea
stock market decline beginningin 1929 and the forecast error u, must be uncorrelated
ending
of excess price volatility wasin 1932 could not in
formulated be more
rationalized
robustin waywithasthe forecast;that
a question of is, the covariancebe-
terms of subsequentdividends!Nor could it tween p, and u, must be zero. If a forecast
what explains the volatility of price-dividend ratios.
be rationalizedin terms of subsequentearn- error showed a consistent correlationwith
Campbell and Shiller [CS88]
ings, since provided
earnings a useful
are relevant framework—referred
in this model to now
the forecastitself, then that would in itself
as the Campbell-Shiller as indicators of laterthinking
onlyapproximation—for dividends. Of the
about imply that the
sources forecast could be improved.
of volatil-
it can be shown from the
ity of price-dividendcourse,
ratios.
say
the efficient marketsmodel does not
They start with an identity basedMathematically,
p=p*. Might one still suppose that this
on the
theory of definition
conditional expectations that u,
of an ex post return on a stock
kind (or market
of stock index crash
of stocks):
was a rational must be uncorrelatedwithp,.
mistake,a forecasterrorthat rationalpeople If one uses the principlefrom elementary
Rt+might make?This paperwill explorehere the
1 = ( dt+1 /dt )(1 + pt+1 /dt+1 ) / ( pt /dt ).
statisticsthat the varianceof the sum of two
notion that the very volatility of p (i.e., the uncorrelatedvariables is the sum of their
tendency of big movements in p to occur variances,one then has var(p*) var(u)+
They then take a Taylor series approximation of this
again and again) implies that the answeris identity to get Since variancescannot be negative,
var(p).
no. this means var(p)) ?var(p*) or, converting
Tortgive
+1 = + goft+the
anκ0idea 1 +kind
κ1 zt+ − zt
of1 volatility (5.4)
to more easily interpreted standard devia-
comparisonsthat will be made here, let us tions,
where gt+1 = log(dconsider
t+1 /dputs
which
at this point the simplestinequality
t ) and
limitsrton = measure
+1 one log( Rtof ) are the continuous divi-
+1volatil- (1) (p or(P*)
dend growth rate and continuous return, and
ity: the standarddeviationof p. The z t is the log price-dividend ratio
efficient
marketsmodel
log( pt /dt ). The coefficients κ0 andcan κ1be >described
0 come out as asserting Thisseries
of the Taylor inequality (employed before in the
approx-
imation, and involve means of r, z and g.
As it stands, (5.4) has no economic content—it’s an approximation of an
identity that must hold ex post. One can give it economic content by turning
4 What model predicts a constant-discounted, expected present value as the price of an asset?

Oddly, Shiller’s paper came out after Lucas’s 1978 paper, though was apparently not at all informed
by it.

101
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

it around, applying rational expectations, and treating it as determining to-


day’s price-dividend ratio given expectations of tomorrow’s return, dividend
growth, and price-dividend ratio. Recursively substituting for the future price-
dividend ratio gives an expression relating today’s price-dividend ratio to the
expected value of all future returns and dividend growth rates:

z t = Et [ κ 0 + g t +1 − r t +1 + κ 1 z t +1 ]

" #
κ0 j −1
+ Et ∑ κ 1

= gt + j − r t + j (5.5)
1 − κ1 j =1

The expression (5.5) reveals two sources of variation in an asset or portfo-


lio’s price-dividend ratio, changes in expected dividend growth rates or changes
in expected returns. For example, a little manipulation of (5.5) shows that
zt+1 − zt is given by

1
z t +1 − z t = − Et( gt+1 − rt+1 )
κ1

" #
1 − κ1 κ0 j −1
) + (Et +1 − Et ) ∑

+ (zt − κ1 gt + j − r t + j
κ1 1 − κ1 j =2

I think it’s fair to say that for the past two decades, the assumption has
been that most of the action here had to come from changing h expectations of
∞ j −1 i
returns—if gt+1 is i.i.d., or close to it, then (Et+1 − Et ) ∑ j=2 κ1 gt+ j ≈ 0.
This is the perspective, for example, of Campbell and Cochrane [CC99], who
assume the growth rates are in fact i.i.d., and seek a mechanism for generating
swings in the expected returns.
However, Barsky and DeLong [BDL93], and later Bansal and Lundblad
[BL02], pointed out that permanent or very highly persistent changes in the ex-
pected growth rate, even if very small, could have large effects on asset prices.
From (5.5), if κ1 is close to one, a small permanent increase in the conditional
mean of dividend growth can lead to a large increase in the log price-dividend
ratio, holding fixed the returns rt+ j . Suppose Et+1 ( gt+ j ) − Et ( gt+ j ) = ∆ for all
j. Then,

" #
j −1 κ ∆
(Et +1 − Et ) ∑ κ 1 gt + j = 1 ,

j =2
1 − κ1

which can be much larger than ∆ if κ1 is close to one.5


A similar conclusion holds if the increase is not permanent, but highly
persistent—say Et+1 ( gt+ j ) − Et ( gt+ j ) = ρ j−1 ∆ for j = 2, 3, . . ., which ρ near
one. This is, in fact, the assumption made by Bansal and Yaron—in particular,
they assume that log consumption growth has both an i.i.d. component and a
5 Recall that κ was a Taylor series coefficient in the Campbell-Shiller approximation. It’s given
1
by 1/(1 + exp(−z̄)), where z̄ is the mean of z. Since prices are typically at least a few multiples of
current dividend, it’s reasonable to assume that κ1 will in fact be very close to one.

102
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

persistent component. They model dividends as sharing the persistent compo-


nent in consumption growth, but subject to larger fluctuations.

5.1.2 The model


Barsky-DeLong and Bansal-Lundblad are not equilibrium models. The chal-
lenge that Bansal and Yaron take on is to write down a consumption-based
equilibrium model that incorporates the mechanism described above, whereby
small persistent movements in expected dividend growth rates cause large
movements in price-dividend ratios. Such a model needs to have two features:
(1) the change in expected growth rate has to dominate any resulting change
in the expected return; (2) the model’s stochastic discount factor has to price
the expected growth rate risk. This leads them to a model with Epstein-Zin
preferences with an elasticity of intertemporal substitution greater than one.
Why an EIS greater than one? It’s not reasonable to expect that a change
in the conditional mean growth rate of dividends, in (5.5), will not also entail
a change in expected returns. The assumption of an EIS greater than one is a
way of guaranteeing the net effect is still positive. The intuition for this can be
had by considering the long-run relationship between the consumption growth
rate and the real interest rate under certainty:

1
r = − log( β) + g (5.6)
ψ

where ψ is the elasticity of intertemporal substitution, g is the log consumption


growth rate, and r is the log, or continuous, real interest rate. Equation (5.6) is
one we’ve seen several times before. It implies

1
∆r = ∆g,
ψ

so that  
1
∆g − ∆r = 1− ∆g
ψ
which is positive if ψ > 1.
The need for Epstein-Zin preferences will become shortly.
Bansal and Yaron assume the following process for log consumption growth,
log(ct+1 /ct ) ≡ gt+1 :

gt+1 = ν + xt + σηt+1 (5.7)


xt+1 = ρxt + φe σet+1 (5.8)

where ηt+1 and et+1 are both i.i.d. standard normal—i.e., N (0, 1)—variables.
I’ve used ν for their µ, since we’ve already been using µ to denote the Epstein-
Zin certainty equivalent operator. Thus, the log consumption growth rate is
conditionally normal with conditional mean Et ( gt+1 ) = ν + xt and constant

103
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

conditional variance vart ( gt+1 ) = σ2 . The process for xt has an uncondi-


tional mean of zero; its conditional variance is (φe σ )2 , and its unconditional
variance—much larger if ρ is near one—is (φe σ )2 /(1 − ρ2 ).
They also model dividends separately from consumption, and price both
claims to the consumption process and claims to the dividend process. Log
dividend growth, log(dt+1 /dt ) ≡ gd,t+1 , is assumed to obey

gd,t+1 = νd + φxt + φd σut+1 (5.9)

where ut+1 is N (0, 1) and independent of ηt+1 and et+1 . They will assume φd
is large, so that dividend growth has a much higher conditional variance than
consumption growth. φ will also be large, so innovations to xt —the et+1 ’s—
will have a larger impact on the conditional mean of dividend growth than on
the conditional mean of consumption growth.6
Some features of the Bansal-Yaron process:
• φe is going to be calibrated so that the variance of xt+1 is small compared
to σ2 , making the white noise component of consumption growth domi-
nant over short horizons.
• ρ is going to be calibrated close to one, so that the fluctuations in the
conditional mean Et ( gt+1 ) = ν + xt will be highly persistent. This also
means that the unconditional variance of xt+1 will be large.
• Our Mehra-Prescott process matches up with the i.i.d. part of gt+1 . To be
sure, we modeled consumption growth as having a slight negative auto-
correlation, but Mehra and Prescott’s process is not that far from i.i.d..
• The state variables—on which prices and the agent’s value will depend—
will be ct (or dt ) and xt . As before, homogeneity will allow us to divide
out the dependence on levels. The i.i.d. shocks are not state variables—
while the stochastic discount factor will be seen to depend on xt , xt+1 ,
and ηt+1 , the i.i.d. disturbance will integrate out when we take expecta-
tions.
• Bansal and Yaron also incorporate time-varying volatility (σ varies over
time). Time permitting, we’ll add that in after we examine the more basic
model with constant volatility.
Let pc ( xt , ct ) denote the price of a claim to the consumption process, pd ( xt , dt )
the price of a claim to the dividend process, and q( xt ) the price of a riskless
claim to one unit of consumption next period. Using the price of the consump-
tion claim as an example, the assets are priced in the by-now-familiar way

pc ( xt , ct ) = Et [mt+1 ( pc ( xt+1 , ct+1 ) + ct+1 )] . (5.10)


6 Aggregate dividends are in fact much more volatile than aggregate consumption.

104
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

With Epstein-Zin preferences, we’ll continue to obtain our homogeneity result,


that we may write pc ( xt , ct ) = wc ( xt )ct for some function wc ( xt ), the price-
dividend ratio for the consumption claim. Then,
 
c
w c ( x t ) = Et m t +1 t +1 ( w c ( x t +1 ) + 1 )
ct
ν+ xt +ση+1
= Et m t +1 e
 
( w c ( x t +1 ) + 1 ) (5.11)
= eν+ xt Et [mt+1 eση+1 (wc ( xt+1 ) + 1)] (5.12)

where (5.11) uses the fact that ct+1 /ct = e gt+1 , and (5.12) relies on the fact that
xt can be taken outside the date-t conditional expectation.
Bansal and Yaron solve their model using the return form of the Epstein-Zin
stochastic discount factor, as we derived in section 4.2.4, equation (4.37). They
describe their results in terms of log-linear approximations of the Campbell-
Shiller type, though the computational method they use to obtain the numbers
in their tables is a polynomial projection method.7
I prefer to examine the model using the Epstein-Zin stochastic discount fac-
tor in its form
 1   1 −α
c t +1 − ψ

v t +1 ψ
m t +1 = β
ct µ t ( v t +1 )
where I’ve written 1/ψ for our previous ρ − 1, both to make the dependence on
the value of the EIS more explicit, and to avoid confusion with the persistence
parameter ρ from (5.8).
The (equilibrium) lifetime utility process vt still obeys the recursion
 11
1− ψ1

1− 1 1− ψ
v t = (1 − β ) c t + βµt (vt+1 ) ψ ,

where ct is equilibrium consumption. The only variables at t that are informa-


tive about t + 1 and beyond are the level of consumption, ct , and the persistent
part of consumption growth, xt . Moreover, nothing that Bansal and Yaron have
added to the basic Epstein-Zin model changes the essential homogeneity prop-
erties of the model. Thus, we can write

vt = Φ( xt )ct

for some function Φ.


Taking account of this homogeneity, we can divide ct out of the utility pro-
7 We don’t have time to go into polynomial methods, but this model is well-suited for that

approach. Judd [Jud98] discusses computationally efficient methods for solving dynamic models
using polynomial projections.

105
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

cess to get
"  1− 1 # 1−1 1
c ψ ψ
Φ( xt ) = 1 − β + βµt Φ ( x t +1 ) t +1
ct
  1
1− ψ1 1− ψ1
= 1 − β + βµt Φ( xt+1 )eν+ xt +σηt+1
    1
1− ψ1 (ν+ xt ) 1− 1 1− ψ1
= 1 − β + βe µt (Φ( xt+1 )eσηt+1 ) ψ (5.13)

Likewise, the stochastic discount factor can be written as


 1 −α
Φ( xt+1 )eσηt+1

 1
ν+ xt +σηt+1 − ψ
ψ
m t +1 = β e (5.14)
µt (Φ( xt+1 )eσηt+1 )

Note that both (5.13) and (5.14) involve µt (Φ( xt+1 )eσηt+1 ). Because ηt+1 is
independent from xt+1 , we can split the certainty equivalent of Φ( xt+1 )eσηt+1
as
µt (Φ( xt+1 )eσηt+1 ) = µ (eσηt+1 ) µt (Φ( xt+1 )) .
This follows from the fact—which you can verify from the definitions of µ
and statistical independence—that if y and z are independent, then µ(yz) =
µ(y)µ(z). The certainty equivalent of eσηt+1 , using the rules for expectations of
lognormal random variables,8 is
1 2
µ(eσηt+1 ) = e 2 (1−α)σ .

Applying these results to (5.13) and (5.14), after some algebra, we can derive
the two expressions that will be the basis for our computational solution of the
model:
 1 −α
Φ ( x t +1 )

− ψ1 xt −ασηt+1 ψ
m t +1 = m 0 e e (5.15)
µt (Φ( xt+1 ))
  1
1
(1− ψ1 )(ν+ xt +(1/2)(1−α)σ2 ) 1− ψ1 1− ψ
Φ( xt ) = 1 − β + βe µt (Φ( xt+1 )) (5.16)

where the m0 in the expression for  the stochastic discount factor collects to-
gether several constants, m0 = β exp −(1/ψ)ν + (1/2)(α − (1/ψ))(1 − α)σ2 .

From (5.16), we can see that Φ( xt ) is increasing in xt .9 Also, the effect on


Φ( xt ) of a given increase in xt is bigger the more persistent the x process is—
i.e., the closer ρ is to one.10
 
8 If z ∼ N [E(z) , var (z)], then E[ez ] = exp E(z) + 12 var (z) .
9 Thisshould be intuitive. Because of the positive autocorrelation in x, an increase in xt raises
future growth rates (conditionally), and a higher growth rate must produce higher lifetime utility

106
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

Persistence of x matters for impact of x(t+1) on SDF


0.4

 = .979
0.3  = .900

0.2

0.1
log[(t+1)/t()]

-0.1

-0.2

-0.3

-0.4
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
100*x(t+1)

Figure 5.4: The figure plots log (Φ( xt+1 )/µt (Φ( xt+1 ))) for ρ = .979 and ρ = .9,
illustrating the role of persistence in the pricing of xt+1 risk. Φ is solved for
using the Markov chain method described in the next section. The parameters
are as in Bansal and Yaron

All of this tells how the stochastic discount factor given by (5.15) is go-
ing to price risk associated with innovations to the conditional mean of of
the consumption growth rate. The i.i.d. risk is priced by the e−ασηt+1 term,
which is standard—a high growth realization makes the agent less ‘hungry’
(in Cochrane’s terminology), so assets whose payoffs covary positively with
ηt+1 are less valuable than assets whose payoffs covary negatively with ηt+1 .
The key parameter here is the risk aversion parameter α.
The xt+1 risk—risk associated with the innovations et+1 in xt+1 = ρxt +
φe σet+1 —is priced by the scaled value function Φ( xt+1 ). A positive realization
of et+1 raises the conditional mean rate of consumption growth xt+1 , gener-
ating an increase in Φ( xt+1 ); the increase is larger the higher is the degree of
persistence in the x process. If ψ1 − α < 0, which it will be if ψ > 1 and α > 0, a
positive innovation to xt+1 makes the agent less ‘hungry’. Assets whose pay-
offs covary positively with xt+1 will require a risk premium to convince the
agent to hold them in equilibrium. This characterizes both the consumption

per unit of consumption today (which is what Φ( xt ) measures). Proving it is a bit more subtle and
involves treating (5.16) as a mapping that (one can show) preserves monotonicity. And, the limit
of a sequence of increasing functions is, at the least, a nondecreasing function. Note that all this is
true independent of whether ψ > 1 or not.
10 At the other extreme ( ρ = 0), the x process is i.i.d., and µ ( Φ ( x
t t+1 )) = µ ( Φ ( xt+1 )) is a
 for all t. In this case, the effect of changes in xt on Φ( xt ) is fully captured by the
constant
1
exp 1− ψ xt term in (5.16).

107
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

and dividend claims, under the assumption that ψ > 1.


We can write the return on the consumption claim as

p c ( x t +1 , c t +1 ) + c t +1 w c ( x t +1 ) + 1
Rct+1 = = eν+ xt +σηt+1 (5.17)
p c ( x t +1 , c t +1 ) wc ( xt )
Under the assumption that ψ > 1, for the reasons described above at the start of
this section, the consumption price-dividend ratio wc ( xt+1 ) will be increasing
in xt+1 , so the consumption claim is exposed to the xt+1 risk. Similarly, the
return on the dividend claim is
w d ( x t +1 ) + 1
Rdt+1 = eνd +φxt +φd σut+1 . (5.18)
wd ( xt )

The dividend claim is exposed to the xt+1 as well, through wd ( xt+1 ). Note
that ut+1 , the i.i.d. part of the dividend claim’s return, is independent of ev-
erything in the SDF mt+1 . This means that the i.i.d. risk in dividend growth is
not priced—if the only risk associated with the dividend claim were from ut+1 ,
then the dividend claim’s return would equal the risk-free rate.

5.1.3 Computation
As mentioned above, introducing polynomial methods is beyond the scope of
this course, so our computational approach here will be similar to ones we’ve
used before—namely, discretizing the state space using Markov chains. In par-
ticular, we’re going to use a Markov chain to approximate the process for xt ,
given in (5.8). Let ( x (1), x (2), . . . x (S); P) denote the Markov chain.
As before, under the Markov chain assumption, the stochastic discount fac-
tor m will be a matrix, while the scaled value function Φ, and the prices wc , wd
and q, will be vectors. The certainty equivalent of Φ conditional on x = x (i )
today is
" # 1
S 1− α
µi ( Φ ) = ∑ P(i, j)Φ( j)1−α .
j =1

Φ itself we can solve for iteratively, using the Bellman-like equation


1
 1  1
" # 1− ψ 1− ψ
S 1− α
(1− ψ1 )(ν+ x (i )+(1/2)(1−α)σ2 )
Φ (i ) =  ∑ P(i, j)Φ( j)1−α
 
1 − β + βe


j =1

(5.19)
The next step is to evaluate the pricing relations

wc ( xt ) = Et mt+1 eν+ xt +σηt+1 (wc ( xt+1 ) + 1)


 

wd ( xt ) = Et mt+1 eν+φxt +φd σut+1 (wd ( xt+1 ) + 1)


 

q ( x t ) = Et [ m t +1 ]

108
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

One thing to note is that, because ut+1 and ηt+1 are independent of xt+1 , the
terms involving these innovations (including the ηt+1 term in mt+1 ) can be
collected together, and their expectations evaluated separate from the terms in
xt+1 . Under the Markov chain assumption, these relations then become—using
(5.15)—

S  1 −α
Φ( j)

(1− ψ1 ) x (i ) ψ
wc (i ) = A c e ∑ P(i, j) µi ( Φ )
( w c ( j ) + 1) (5.20)
j =1
S  1 −α
Φ( j)

(φ− ψ1 ) x (i ) ψ
wd (i ) = A d e ∑ P(i, j) µi ( Φ )
( w d ( j ) + 1) (5.21)
j =1
S  1 −α
Φ( j)

− ψ1 x (i ) ψ
q (i ) = A q e ∑ P(i, j) µi ( Φ )
(5.22)
j =1

where Ac , Ad , and Aq collect together various constant terms:


h i
Ac = m0 eν E e(1−α)σηt+1
Ad = m0 eνd E e−ασηt+1 E eφd σut+1
   

Aq = m0 E e−ασηt+1
 

With all this machinery in hand, it’s straightforward to put the model on
the computer and see what it spits out. At this point, you should be familiar
enough with M ATLAB and models of this sort to put together a program that
solves the model as described above. Most of the parameters, I’ll take from
Bansal and Yaron.11 We still need to specify the Markov chain, though.
Rather than use a two-state Markov chain, as we’ve been doing, I wanted
the flexibility to add more states, so I used Rouwenhorst’s method.12 I exper-
imented with the number of states ranging from 3 to 101—adding any more
than around 21, though, didn’t change the results much at all.

5.1.4 Calibration and results


You should consult Bansal and Yaron for the justifications they give for their
parameter choices. They calibrate the model to a monthly frequency—that’s
new for us–and the parameters they pick for the consumption, long-run risk,
and dividend processes in their basic version of the model are given in Table
5.1.
11 An exercize will ask you to solve the model with some parameters estimated by Con-

stantinides and Ghosh [CG11], available here: http://faculty.chicagobooth.edu/george.


constantinides/documents/AssetPricingTestsKEEPAugust11_11.pdf
12 I’ve mentioned Rouwenhorst’s [Rou95] method before in passing. In a way, it just extends

the technique we learned for two states to many states. The invariant distribution that results is
exactly binomial, so with enough states, it approximates a normal distribution. Kopecky and Suen
[KS10] describe the method, and show that it does a good job of approximating very persistent
AR(1) processes.

109
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

Table 5.1: Parameters for stochastic consumption, long-run risk, and dividend
processes in Bansal-Yaron’s basic model.

ν σ ρ φe νd φ φd
0.0015 0.0078 0.979 0.044 0.0015 3.0 4.5

I approximated the AR(1) for xt with a Markov chain—using Rouwen-


horst’s method—with 31 states.
The preferences parameters β, ψ and α, we’ll play around with a bit. Bansal
and Yaron report some results for α either 7.5 or 10, and ψ either 1.5 or 0.5, with
β = 0.998. I have a hard time getting the riskless rate as low as they report
with β = 0.998, and do much better with β = 0.999—almost exactly matching
their numbers, in fact. This could be for a couple reasons—one, we’re using
different approximation methods, and two, I’m not quite sure I’m getting the
time aggregation right.13 In any case, the results here are for β = 0.999. That’s
relatively high—it annualizes to 0.988.
Like Bansal and Yaron, I’ll report results for the risk-free rate and the re-
turn on the dividend claim (what they refer to as the market return) rather
than the aggregate consumption claim. I solved for Φ iteratively from (5.19),
then solved the pricing equations (5.20)–(5.22), and then formed returns ac-
cording R F = 1/q, and Rd given by (5.18). I focused on log returns, log( RtF ) =
− log(qt ), and log( Rdt+1 ) = νd + φxt + φd σut+1 + log(wd,t+1 + 1) − log(wd,t ).
I also looked at the properties of log(wd,t ), the log price-dividend ratio for the
dividend claim.
F
 when you put these in terms of the Markov chain, log( Rt ) =
Note that,
F
log R (i ) = − log (q(i )) and log (wd,t ) = log (wd (i )). The log return on the
dividend claim has both a Markov chain piece, νd + φx (i ) + log (wd ( j) + 1) −
log (wd (i )) and an i.i.d. piece, φd σut+1 . For the expectation of log( Rdt+1 ), the
i.i.d. piece is zero. If π ∗ is the invariant probability distribution of the Markov
transition matrix P, then
h  i  
E log R F = ∑ π ∗ (i ) log R F (i ) ,
i

E[log(wd )] = ∑ π ∗ (i) log (wd (i)) ,


i
13 If you just want to annualize a monthly rate, you either multiply it by 12 (if it’s a log rate)

or raise the gross monthly rate to the 12th power, and subtract one. If you want the average for an
annual period, then, if we’re working with log rates, and they’re i.i.d., the annual mean and variance
are both 12 times the monthly mean and variance. If they’re not i.i.d., then it’s more complicated,
but I didn’t really have the time to work it out before class. The issue may be the approximation,
though. They do have another paper [BKY07], which employs another solution technique, where
they too assume β is higher—β = 0.9989.

110
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

and
  
h i wd ( j ) + 1
E log( Rd ) = ∑ π ∗
( i ) ∑ P ( i, j ) νd + φx ( i ) + log
wd (i )
i j

For standard deviations, computing s.d. log( R F ) and s.d.(log(wd )) is sim-




ple: we just use the invariant probabilities π ∗ and the means we calculated in
the previous step. For log( Rd ), we take the square root of the sum of (1) the
variance of the Markov chain part (using P and π ∗ ) and (2) the variance of the
i.i.d. part, (φd σ )2 .
To put the return-related quantities into annual average percent
√ terms we
multiply the means by 1200 and the standard deviations by 12 × 100.14 We
leave log(wd ) in monthly units. The results are in Table 5.2; lower-case letters
denote logs, and pd − d denotes log(wd ).

Table 5.2: Some results for the long-run risk model. Parameters for consump-
tion, long-run risk, and dividends are as in Table 5.1. For all cases here,
β = .999.

α ψ E[r d − r F ] E[r F ] s.d.(r d ) s.d.(r F ) s.d.( pd − d)


7.5 0.5 0.56 4.82 13.16 1.17 0.07
7.5 1.5 2.93 1.63 16.90 0.39 0.17
10.0 0.5 1.16 4.93 13.08 1.17 0.07
10.0 1.5 4.25 1.36 16.49 0.38 0.16
10.0 0.1 −5.6 15.55 33.95 5.83 0.44
10.0 2.0 4.68 0.88 17.05 0.29 0.17

Exercize 5.1. Constantinides and Ghosh [CG11] estimated an annual version


of the Bansal-Yaron model using GMM. One set of results they obtain (they try
several specifications), estimating only the parameters of the stochastic processes,
is given by:

ν σ ρ φe νd φ φd
0.02 0.006 0.437 5.20 0.01 2.06 15.8

Write a M ATLAB program to solve the model—following the steps outlined


above—and report results for the same quantities as in Table 5.2, for α = 10,
ψ = 1.5 and β = 0.99.
They also do an estimation of the preference parameters and stochastic process
parameters together. In this case, they obtain:

14 See footnote 13.

111
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II

β α ψ ν σ ρ φe νd φ φd
0.968 9.34 1.41 0.021 0.012 0.482 0.90 0.018 5.14 3.06

Solve the model and report results for these parameters. Note since
√ this is an
annual model, you only need to multiply things by 100, not 1200 or 12 × 100.
Compare the results in both cases to row 4 of Table 5.2. What might explain
the differences in the results you obtain? (Answer—a lot of things; just try to get
some intuitive feel for how the model works.)
You should use Rouwenhorst’s method for calibrating the Markov chain. I’ll
post the code on the website.
Optional. Investigate the sensitivity of the results to the number of states in
the Markov chain—if you tried n = 3, 5, 7 . . ., is there an n above which the re-
sults stop changing by very much? (Note: using odd numbers of states guarantees
that the mean of zero is one of the states of the Markov chain.)

5.1.5 Time-varying volatility


We don’t have time to go into this in any detail, but basically, the model as
described above produces almost no variation in the conditional expected ex-
cess returns (across x (i ) states)—the conditionally expected dividend return
and the risk-free return have, in effect, roughly the same ‘slope’ with respect to
variation in x. Since the dividend return is conditionally i.i.d., it’s conditional
standard deviation is constant at φd σ. This means the model produces virtually
no variation in the conditional Sharpe ratio.
As a remedy to this, Bansal and Yaron introduce time-varying volatility. In
particular, the standard deviation σ in (5.7), (5.8) and (5.9) is replaced with a
time-varying σt , the square of which is assumed to follow an AR(1) process:

σt2+1 = σ2 + ρ1 (σt2 − σ2 ) + σe et+1 (5.23)

where it is assumed that et+1 is i.i.d., N (0, 1), and independent of the other i.i.d.
innovations.15
In terms of the pricing framework we’ve developed, σt is now a second
state variable. In particular, Φ( xt ) becomes Φ( xt , σt ), which is decreasing in σt ,
the more so the closer is the persistence parameter ρ1 to one.16 So the pricing
kernel will now price volatility risk in the sense that a high σt+1 realization
means greater ‘hunger’ (if 1/ψ − α < 0).
Bansal and Yaron show that when ψ > 1, the model’s price-dividend ratios
are decreasing in σt , the more so the greater is the persistence of the volatility
process. This means that the consumption and dividend claims are exposed to
15 The standard deviation σ must be chosen to be very small, since we don’t want the process to
e
violate σt2 > 0. Technically speaking, the et+1 ’s shouldn’t really be assumed to be normal.
16 Basically, more volatile consumption is less valuable given a concave certainty equivalent.

112
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

the volatility risk (they’re worth less when the agent is more hungry), so the
time-varying volatility raises their conditionally expected returns when σt is
high.
Of course, variation in σt also affects the risk-free rate, but the effects are
much smaller—thus conditionally expected excess returns vary positively with
σt .
σt will also affect the conditional standard deviation of returns (positively,
of course), so for this mechanism to produce a time-varying Sharpe ratio, the
movements in conditionally expected excess returns need to be larger than the
movements in the conditional standard deviation. That turns out to be the case
for Bansal and Yaron’s calibration of the model.

5.2 Rare consumption disasters


Rare consumption disasters were one of the first proposed potential resolutions
of the equity premium puzzle—Thomas Rietz’s “The equity risk premium:
A solution” [Rie88] appeared in the J.M.E. just three years after Mehra and
Prescott’s paper. Rietz’s idea was that equity claims are subject to occasional
large crashes that are not well-captured by the Mehra-Prescott consumption
process. In fact, writing the process, as Mehra and Prescott do, as a two-state
Markov chain, the best and worst outcomes are just one standard deviation
above and below the mean.
One could use a chain with more states, but techniques like Rouwenhorst’s
or Tauchen’s assume normality of the innovations to the underlying AR(1)
process. Adding states at ±2σ, ±3σ, ±4σ, etc. will have little effect because the
probabilities of being in those states will be negligibly small. Disasters matter
only if the tails of the distribution of growth rates are fatter than normal.
If you read the background in Barro [Bar06] you’ll see that Rietz’s resolu-
tion was eventually dismissed because no one really thought the tails of the
distribution were as fat as Rietz needed them to be to get his results. Rietz
considered many combinations of disaster size (as the percent decline in aggre-
gate consumption) and frequency. He found a number of combinations such
that the model’s first moment implications roughly match the data. The least
extreme examples were for a roughly one in one hundred chance of a 25% de-
cline in consumption; with risk aversion around ten and β very close to one,
the model roughly reproduced the average risk-free rate and equity premium.
Another criticism of Rietz’s model was that it assumed the risk-free asset
would remain risk-free even in the event of disaster. This didn’t seem right, as
crises often lead to defaults on even normally riskless government debt (some-
times implicitly, through unexpected inflation).17
Two decades after Rietz, Barro revived interest in disaster models, not so
much by offering anything new on the modeling front (though he did include
partial default on the ‘riskless’ asset), but rather by documenting empirically
17 Mehra and Prescott responded with all these criticisms in a subsequent issue of the J.M.E

[MP88] with the cleverly-titled “The equity risk premium: A solution?”.

113
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

that disasters may in fact be as large and as common as Rietz supposed. Given
that we have only short time series for any one country, Barro uses data from
many countries to characterize the probability of large declines in consumption
(actually per capita GDP). You should read through his evidence. He sums
up his findings as implying a disaster probability of between 1.5–2.0% with
associated declines in per capita GDP ranging from 15% to 64%. Wars account
for many of the disaster episodes he catalogs.
Barro’s analysis was subsequently refined by Gourio [Gou08], Gabaix [Gab08],
and others. The model we’ll look at in this section is the one presented in Gou-
rio’s paper. First, though, we’ll look at an approach closer to Rietz’s.

5.2.1 Modeling rare disasters


Disasters are actually quite easy to incorporate into our basic model. For ex-
ample, consider augmenting the Mehra-Prescott process in the following way.
In either ‘normal’ state i = 1, 2, the distribution of next period’s growth rate is
given by: with probability 1 − f , xt+1 is distributed according to P(i, {1, 2})—
i.e., xt+1 = x ( j) with probability P(i, j); and, with probability f , xt+1 = x d ,
the disaster outcome, where x d ≪ { x (1), x (2)}.18 And, if the current state
is x d , next period’s state is either x (1) or x (2), with probabilities given by the
invariant distribution of P—in our case, { 21 , 12 }.
This is similar to Rietz’s approach. It implies a three-state Markov chain for
xt+1 , given by
x ∈ { x (1), x (2), x d } ≡ X
and  
(1 − f ) P(1, 1) (1 − f ) P(1, 2) f
Π = (1 − f ) P(2, 1) (1 − f ) P(2, 2) f
1/2 1/2 0
One can show that with a symmetric P, the invariant distribution of Π is given
by  
1 1 f
π∗ = , , .
2(1 + f ) 2(1 + f ) (1 + f )
Given the new Markov chain ( X, Π), one just proceeds to solve the model ex-
actly as before, as we did in the basic Mehra-Prescott model.

Remark 5.1. Actually, what Rietz did was to recalibrate P and { x (1), x (2)} for
each choice of f and x d , so as to guarantee that the three-state Markov chain ( X, Π)
has mean, standard deviation and first-order autocorrelation consistent with Mehra-
Prescott’s estimates of E( x ) = 1.018, s.d.( x ) = 0.036, and AC1 ( x ) = −0.14. This
has always struck me as the wrong approach. Apart from the Great Depression—which
is big, but not ‘a-25%-decline’ big— the Mehra-Prescott numbers are estimated for an
economy operating in ‘normal’ times. So I think it’s better to use those estimates for
18 Note that since we’re moving on from Bansal and Yaron, we’ll go back to letting xt+1 denote
the gross rate of growth of consumption from t to t + 1.

114
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

the P and { x (1), x (2)} as I’ve put them in the three-state chain above—they describe
the behavior of growth conditional on ‘no disaster’.
Maybe it’s intuitive that disaster risk (which affects the payoff from a claim
to aggregate consumption) raises the equity premium. How, though, does it
keep the risk-free rate at a reasonable level, as Rietz was able to achieve?
If, as I want to suppose, P and { x (1), x (2)} correspond to the Mehra-Prescott
process, then you can see immediately that disasters will lower the risk-free
rate—it will be lower than in the comparable Mehra-Prescott economy with
the same α and β.
If there is no default on the riskless asset in a disaster, then its price in state
i = 1, 2 will be

q ( i ) = Ei ( m )
2
= (1 − f ) ∑ P(i, j) βx ( j)−α + f β( x d )−α
j =1

= (1 − f )qMP (i ) + f β( x d )−α (5.24)

where qMP (i ) is the riskless asset price from the corresponding Mehra-Prescott
economy. If β( x d )−α is large, even a small f can greatly increase q(i ) compared
to qMP (i ), and thus lower R F (i ) = 1/q(i ) in the ‘normal states’.
In state 3, the disaster state,
1 1
q (3) = βx (1)−α + βx (2)−α
2 2
which one can show is between qMP (1) and qMP (2).
To incorporate default on the riskless asset, we would no longer assume
it pays one unit in every state, but rather (1, 1, 1 − d) in states 1, 2, and 3. The
parameter d ∈ [0, 1] can be thought of as either (a) in a disaster, the issuer of the
riskless claim only pays d, or (b) conditional on a disaster, the issuer defaults
entirely with probability d (and with probability 1 − d pays one unit).
It turns out that, in the Rietz-like model described here, adding default on
the normally riskless asset harms the predictions of the model—unless d is
very small, it will greatly increase the risk-free rate. In terms of equation (5.24),
as d → 1, it’s essentially eliminating the f β( x d )−α term that was holding the
risk-free rate down. The next exercize asks you to explore some of these phe-
nomena.

Exercize 5.2. This should be simple, given that you know how to solve the basic
Mehra-Prescott model, as in Exercize 3.3. Use the Mehra-Prescott process for
{ x (1), x (2); P}. Write a M ATLAB program and solve the model described above
for the following three sets of parameters:

115
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

α β f xd d
9 .97 0 0 0
9 .97 0.01 0.75 0
9 .97 0.01 0.75 0.30

Report the average (unconditional mean) equity premium, risk-free rate and
equity return for each combination (in percent terms). Line one of the table is just
the basic Mehra-Prescott model. You should find that results look pretty good for
the middle row, but bad for the first and last.
Now, keeping f , x d , and d as in the last row of the table, explore the effects of
varying α and β. Are there any combinations that will get your results back in
the range of the nice ones you obtained for the middle configuration? Document
what you find, maybe with a table or graph. I don’t know the answer to this part,
so I’m curious to see what you find.

Gourio’s approach simplifies this in one dimension and complicates it in


another. Making things simpler, Gourio assumes that consumption growth in
non-disaster periods is i.i.d.. In the event of a disaster, consumption growth has
both an i.i.d. component and a (negative) disaster component. This eliminates
the need to model the escape from the disaster state to the set of ‘normal’ states
(the third row in the Rietz-style Π above). Complicating things a bit, Gourio
assumes that the probability of a disaster varies over time. Given that the Rietz
(or Barro) models can match the unconditional means of the equity and riskless
asset returns, Gourio is exploring whether a model with time-varying disaster
risk can match other features of the data—in particular the volatility of the
equity price-dividend ratio, and the price-dividend ratio’s predictive power
for returns and excess reutnrs.
For preferences, Gourio considers both the standard case (time-additively
separable) and Epstein-Zin. He touches only briefly on the issue of default on
the riskless asset.
Formally, the consumption growth process is given in logs by19
(
ν + σηt+1 with probability 1 − f t ,
log(ct+1 /ct ) =
ν + σηt+1 + log(1 − b) with probability f t

where ηt+1 ∼ N (0, 1) and b ∈ (0, 1) is the size of the disaster. The disaster
probability f t is assumed to follow a first-order Markov process with transition
probabilities P( f t , f t+1 )—the probability of going from f t to f t+1 —and such
that f t ∈ [ f , f ] for all t. In one case, Gourio assumes the Markov process is
actually i.i.d.—P( f t , f t+1 ) = P( f t+1 ). Consistent with the way I’ve written the
transition probability, we’ll be treating the Markov process as a Markov chain
throughout our discussion of the model.
19 We modify the notation a bit compared to Gourio.

116
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

5.2.2 Standard preferences


The time-varying disaster probability doesn’t add that much complication when
preferences take the standard form,

" #
1− α
t ct
E0 ∑ β .
t =0
1−α

The stochastic discount factor is


 −α
c t +1
m t +1 = β ,
ct
or (
βe−α(ν+σηt+1 ) if there is no disaster at t + 1
m t +1 =
(1 − b)−α βe−α(ν+σηt+1 ) if there is a disaster at t + 1
Conditional on the probability of disaster f t , the price of the riskless asset—
call it q( f t )—is easily computed as

q ( f t ) = E [ m t +1 : f t ]
2 2
= (1 − f t ) βe−αν+(1/2)(ασ) + f t (1 − b)−α βe−αν+(1/2)(ασ)
2
= 1 − f t + f t (1 − b)−α βe−αν+(1/2)(ασ)


so the log risk-free rate is given by


1
log RtF = − log( β) + αν − (ασ)2 − log 1 − f t + f t (1 − b)−α .

(5.25)
2
Note that because 0 < b < 1, (1 − b)−α is bigger than one, and possibly very
large—with, say, b = 0.25 and α = 10, (1 − b)−α ≈ 18. For f t in the neighbor-
hood of 0.01, log (1 − f t + f t (1 − b)−α ) ≈ f t ((1 − b)−α − 1), so small changes
in f t will have large consequences for the log risk-free rate.20
Likewise, the price-dividend ratio for a claim to aggregate consumption—
call it wc ( f t )—obeys
 
c
w c ( f t ) = E m t +1 t +1 ( w c ( f t +1 ) + 1 ) : f t
ct
  2
= 1 − f t + f t (1 − b)1−α βe(1−α)ν+(1/2)((1−α)σ) E(wc ( f t+1 ) + 1 : f t )
(5.26)

Note that in obtaining this expression (and the expression for q( f t )), we’re
taking expectations with respect to several things that, conditional on f t , are
20 Remember, you’d multiply log( R F ) by 100 to put it in percent terms. For the b and α mentioned
t
in the text, a 0.01 increase in f t , say from 0.01 to 0.02, subtracts roughly 17 percentage points off the
log risk-free rate. Assuming that you can get the level right at some value of f , you would want f t
to have a very small variance around that value.

117
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

independent—expectations with respect to ηt+1 , the occurrence (or not) of a


disaster, and the new probability of a disaster given the current probability.
See footnote 6 in Gourio.
An important feature of (5.26) is that we can re-arrange it to give
 
w c ( f t +1 ) + 1 1
E : ft = 2 (5.27)
wc ( f t ) (1 − f t + f t (1 − b) ) βe(1−α)ν+(1/2)((1−α)σ)
1 − α

This is useful because, given the assumptions about the consumption process,
the expected return on the consumption claim (conditional on f t ) obeys
   
c c t +1 w c ( f t +1 ) + 1
E R t +1 : f t = E E

: ft
ct wc ( f t )
where
 
c t +1 2 2
E = (1 − f t )eν+(1/2)σ + f t (1 − b)eν+(1/2)σ
ct
2
= (1 − f t + f t (1 − b)) eν+(1/2)σ
Thus, in this simple version of the model, we can evaluate the conditionally
expected equity return without having to solve for the price-dividend ratios. It
is
2
(1 − f t + f t (1 − b)) eν+(1/2)σ
E Rct+1 : f t =

2
(1 − f t + f t (1 − b)1−α ) βe(1−α)ν+(1/2)((1−α)σ)
1 − f t + f t (1 − b )
 
−1 αν+(1/2)σ2 −(1/2)((1−α)σ)2
=β e
1 − f t + f t (1 − b )1− α
In log terms,

1 1
log E Rct+1 : f t = − log( β) + αν + σ2 − ((1 − α)σ)2

2 2
1 − f t + f t (1 − b )

+ log (5.28)
1 − f t + f t (1 − b )1− α

If you differentiate the term inside the last log( · ) you’ll find that the expected
return is increasing in f t iff α > 1.
Combining (5.25) and (5.28), we can immediately obtain an expression for
the conditional equity premium, in log terms:
!
E Rct+1

(1 − f t + f t (1 − b)−α ) (1 − f t + f t (1 − b))
 
2
log : f t = ασ + log
RtF 1 − f t + f t (1 − b )1− α
(5.29)
Not surprisingly, it’s a typical i.i.d. piece (ασ2 ) à la Mehra-Prescott, plus a
disaster-related piece. Using (5.29), we can explore the effects of variation in f t
on the conditional equity premium (as well as the roles of b and α).

118
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

Gourio’s Proposition 2 proves establishes some results on how magnitudes


in the model vary with f t : the risk-free rate is decreasing in f t ; the equity return
are increasing in f t iff α > 1; the price-dividend ratio is increasing in f t iff α > 1;
the equity premium is increasing in f t if f t is small.
Thinking about time series behavior, there is a problem with the directions
of change proven in the proposition—since we’ll likely be assuming α > 1, the
proposition describes a model where the conditional equity return and condi-
tional excess return covary positively with the price-dividend ratio, which is
the opposite of what’s found in the data.
This leads Gourio to examine the behavior of the model using Epstein-Zin
preferences, to which we now turn.

5.2.3 Epstein-Zin preferences, i.i.d. disaster risk


Gourio initially specializes the process for the disaster probability f t to be i.i.d.—
P( f t , f t+1 ) = P( f t+1 ). It’s still a state variable that will affect current prices, but
the i.i.d. assumption means that the current realization is not informative about
future realizations. Thinking about Epstein-Zin preferences, this mean that the
certainty equivalent of scaled lifetime utility from tomorrow onward—that is,
vt+1 once we divide out the level of current consumption, or what we’ve been
calling Φt+1 —will be a constant.
Specifically, the (equilibrium) lifetime utility of the representative agent,
normalized by ct , will follow:
1
" 1− 1 # 1
1−

c t +1 ψ
Φt = 1 − β + βµ Φt+1 ψ
, (5.30)
ct
where
 
c  
µ Φt+1 t+1 = µ Φt+1 elog(ct+1 /ct )
ct
  1
ν+ 21 (1−α)σ2 1− α 1 − α µ ( Φ t +1 )
=e 1 − f t + f t (1 − b ) (5.31)

and µ(Φt+1 ) is the constant

  1
1−α
µ ( Φ t +1 ) =  ∑ P ( f 0 ) Φ ( f 0 )1− α  . (5.32)
f0

Note that these steps utilize the various independences built into the process
for log(ct+1 /ct ). Equation (5.31), for example, uses the fact that ηt+1 is i.i.d.
and independent of the realization of the disaster state. And the realization
of the disaster state, in turn, is independent of the realization of next-period’s
disaster probability.21 In (5.32), I’ve assumed that the distribution for f t+1 is
21 Hence, we’ve really done a µ( xyz) = µ( x )µ(y)µ(z) split of the certainty equivalent.

119
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

discrete—P( f 0 ) is the probability of f 0 , and Φ( f 0 ) is the value of Φt+1 in that


state.
Combining these expressions, we can write the (scaled) utility process as
" # 1
 1−1/ψ 1−1/ψ
(1− ψ1 )(ν+ 12 (1−α)σ2 ) 1− ψ1

1− α 1− α
Φ( f t ) = 1 − β + βe 1 − f t + f t (1 − b ) µ (Φ)

More compactly, letting θ = ν + (1/2)(1 − α)σ2 and A = µ(Φ), we can write


the process as
" # 1
 1− 1   1−1/ψ 1−1/ψ
1− α
Φ( f t ) = 1 − β + β eθ A 1 − f t + f t (1 − b )1− α
ψ
. (5.33)

Note that Φ is decreasing in f t , independent of whether α ≷ 1 or ψ ≷ 1. This


follows because the term
  1
1− α
1 − f t + f t (1 − b )1− α

is decreasing in f t for both α > 1 and α < 1.22


Computationally, (5.33) defines a function Φ( f ) for an arbitrary choice of A;
the equilibrium Φ is the one that satisfies (5.33) and A = µ(Φ). So, one would
solve this iteratively—from an initial Φ0 , calculate A = µ(Φ0 ), use (5.33) to
calculate Φ1 , and repeat with Φ0 = Φ1 .
The recursion (5.33) is one of two building blocks for pricing assets in the
model. The other, of course, is the stochastic discount factor. The general form
is the by-now-familiar
 − 1   1 −α
c t +1 ψ v t +1 ψ
m t +1 = β .
ct µ t ( v t +1 )
In pricing assets, mt+1 involves three independent random variables with re-
spect to which expectations must be taken—whether or not a disaster occurs,
with probability f t ; the i.i.d. innovation to consumption growth, ηt+1 ; and the
draw of next-period’s disaster probability, f t+1 , from the distribution P( f t+1 ).
The price-dividend ratio for a claim to aggregate consumption will be (as
before) a function of the disaster probability, and the return on the consump-
tion claim will be
c 1 + w c ( f t +1 )
Rct+1 = t+1 .
ct wc ( f t )
The price-dividend ratio obeys the pricing relation
"   1   1 −α
c t +1 1− ψ
#
v t +1 ψ
w c ( f t ) = Et β (1 + wc ( f t+1 )) (5.34)
ct µ t ( v t +1 )
22 If α < 1, (1 − b )1−α < 1, and so 1 − f + f (1 − b )1−α is decreasing in f ; since 1/ (1 − α ) > 0, the

whole expression is decreasing in f . On the other hand, if α > 1, then (1 − b)1−α > 1, and the stuff
inside the parentheses is increasing in f , but the whole thing is raised to the 1/(1 − α) < 0 power.

120
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

It’s left as an exercize for you to show that (5.34) specializes to

 1−(1/ψ)
(1− ψ1 )θ

1− α
wc ( f t ) = βe Aα−1/ψ 1 − f t + f t (1 − b)1−α
1 −α
× ∑ P( f 0 )Φ( f 0 ) ψ 1 + wc ( f 0 )

(5.35)
f0

The price of the riskless asset q obeys


 
1 − f + f ( 1 − b ) −α 1
q( f t ) = βeθ1 Aα−1/ψ 
t t  ∑ P( f 0 )Φ( f 0 ) ψ −α (5.36)
(1/ψ)−α
( 1 − f t + f t ( 1 − b ) 1− α ) 1− α f0

where θ1 = (α − (1/ψ)) θ − αν + (1/2)(ασ)2 .

Exercize 5.3. Derive the expressions (5.35) and (5.36). Then, show that when
the elasticity of intertemporal substitution, ψ, is equal to one, future utility
(Φ) doesn’t matter  for the equity premium—we obtain the same result for
log E( Rct+1 )/RtF as we did in the case of standard preferences—i.e., equation
(5.29). A key step is to show that a constant wc solves (5.35) when ψ = 1.

As complicated as these last expressions appear, Gourio is still able to prove


some results analytically. The key is that, under the i.i.d. assumption on f ,
1 −α 1 −α
terms like ∑ f 0 P( f 0 )Φ( f 0 ) ψ and ∑ f 0 P( f 0 )Φ( f 0 ) ψ (1 + wc ( f 0 )) are constants
with respect to variation in f t . Thus, for example
  (1/ψ )−α 
1 − ft + ft (1 − b )1− α
1− α
log RtF = − log (q( f t )) = constants + log  
1 − f t + f t (1 − b ) − α

Gourio shows that if α ≥ 1, then log RtF is decreasing in f t for small values of
ft.
It’s only slightly more complicated to characterize the dependence on f t or
the price-dividend ratio and return on the consumption claim. First, note that
from (5.35)
  1−(1/ψ)
1− α
wc ( f t ) = (positive constants) × 1 − f t + f t (1 − b)1−α (5.37)

This is decreasing in f t if 1 − (1/ψ) > 0—i.e., if the EIS ψ > 1. Having wc ( f t )


decreasing in f t is a desirable feature, at least on intuitive grounds—if there’s
an increase in the probability of disaster striking tomorrow, one would expect
today’s equity price to fall.

121
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

Conditional on f t , the expected return on the consumption claim is

1 2 ∑ f 0 P( f 0 )(1 + wc ( f 0 ))
E Rct+1 : f t = eν+ 2 σ (1 − f t + f t (1 − b))

. (5.38)
wc ( f t )
Using (5.37), and collecting together constants (relative to f t ), we can write the
log expected return as
 
1 − f t + f t ( 1 − b )
log E Rct+1 : f t = constants + log 


1−(1/ψ)
(1 − f t + f t (1 − b ) ) 1 − α 1 − α

Whether log E Rct+1 : f t is increasing or decreasing in f t depends on whether




ψ ≤ 1. If ψ ≤ 1, the log expected return is definitely decreasing in f t . If ψ > 0,


it can be increasing or decreasing depending on the precise parameter values,
and the value of f t .
Intuitively, there are two effects on log E Rct+1 : f t of an increase in f t . On


the one hand, an increase in f t lowers the asset’s expected payoff next period
(the 1 − f t + f t (1 − b) term). But an increase in f t also lowers the asset’s price
 1−(1/ψ)
today—i.e., wc ( f t ) falls, which is captured in the 1 − f t + f t (1 − b)1−α 1−α
term. Whether the expected return increases or decreases will depend on whether
the expected falls by less or more than the fall in price. If ψ < 1, the answer is
always ‘by less’. If ψ > 1, the answer depends on parameter values and on f t .
It turns out that, for α sufficiently large,ψ need only be a bit bigger than
one to obtain the result that log E Rct+1 : f t goes up when f t goes up. Note
that the increase in log E Rct+1 : f t in that case is entirely due to a wider risk


spread—we’ve already seen that higher f t lowers RtF for α ≥ 1.


Figure 5.2.3 plots my calculation of the set of (α, ψ) pairs consistent with
an increasing log expected equity return. I calculated this assuming b = 0.43
(as Gourio does) and for f t = 0.012. It turns out that for f t of that order of
magnitude—around a 1% chance of disaster, give or take a half a percent or
so—I see little noticeable difference in the curve shown in the figure.
From the figure we also see that if risk aversion α is too low—especially
around α = 2—the necessary ψ values get unreasonably high.23
In any case, these hopeful results in the i.i.d. case prompt Gourio to explore,
computationally, the behavior of the model with Epstein-Zin preferences and
persistence in the process describing the evolution of f t .

5.2.4 Epstein-Zin preferences, persistent probability of disas-


ter
If you’ve followed Gourio’s model thus far, incorporating persistence in f t is
really not a big deal. What changes? First, and most importantly, the constant
23 As you may have gathered from reading Bansal and Yaron, even an EIS as high as ψ = 1.5—

their assumption—is pushing the envelope, as far as EIS values that most people would consider
plausible.

122
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

c
Region giving log E(Rt+1) increasing in ft
2.5

2
 (EIS)

1.5 Parameter combinations in the area


above the curve give a log expected t+1
return that is increasing in the disaster
probability.

1
2 3 4 5 6 7 8 9 10
 (RRA coefficient)

Figure 5.5: Combinations of α and ψ that produce a log expected equity return
that is increasing in the probability of disaster. The pairs above the curve have
this property. The calculations were made assuming b = 0.43 and f t = 0.012.
From the model of Gourio [Gou08], with EZ preferences and f t ∼ i.i.d.

that we’d labelled A—that is, µ (Φ)—is no longer a constant. Rather, it’s now
  1
1− α

µ ( Φ : f t ) =  ∑ P ( f t , f 0 ) Φ ( f 0 )1− α  .
f0

1 −α 1 −α
Also, expressions like ∑ f 0 P( f 0 )Φ( f 0 ) ψ , ∑ f 0 P( f 0 )Φ( f 0 ) ψ (1 + wc ( f 0 )), and
∑ f 0 P( f 0 ) (1 + wc ( f 0 )) become
1 −α
∑0 P( f t , f 0 )Φ( f 0 ) ψ ,
f

1 −α
∑0 P( f t , f 0 )Φ( f 0 ) ψ 1 + wc ( f 0 ) ,

f

123
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

and
∑0 P( f t , f 0 ) 1 + wc ( f 0 ) ,

f

resulting in obvious modifications of the key equations (5.35), (5.36), and (5.38).
While these modifications are not especially difficult to make, they do make
it impossible to draw any conclusions about the model on a purely analytical
basis, as Gourio did with the first two versions of the model. That makes it
necessary to solve the model computationally. Gourio assumes a simple, sym-
metric two-state Markov chain for f t — f t ∈ { f l , f h }, and
 
1−π π
P=
π 1−π

Gourio calibrates f l , and f h so as to give an unconditional mean probability of


disaster equal to 1.7%.24 This gives f l = 0.017 − e and f h = 0.017 + e, where e is
the unconditional standard deviation of f t . Gourio experiments with different
values for e, as well as for π, which governs the persistence of the process.25
Initially, Gourio chooses e = 0.01 and π = 0.1—the latter choice implies a first-
order autocorrelation of AC1 = 1 − 2π = 0.8. As Gourio notes, there’s very
little empirical guidance for the choice of either e or π.
The choice of b is guided by Barro’s work—Gourio sets b = 0.43, which
is about the mid-point of the range of disasters Barro catalogs. Gourio also
assumes that partial default on the riskless bond (only the fraction 1 − b is
repaid), which occurs with probability 0.4 in the event of a disaster. In terms
of the variable ‘d’ we introduced at the beginning of this section on disasters,
Gourio is effectively setting 1 − d = 0.6(1 − 0.43).
The other parameters are the taste parameters, β, α and ψ, and the param-
eters of the i.i.d. part of consumption growth, ν and σ. Gourio sets ν = 0.025,
σ = 0.02, and β = 0.97. For the most part, he sets α = 4, apart from one case
with α = 3.17.26 He experiments with values for the EIS ranging from ψ = 0.25
to ψ = 1.5.
Some variations that Gourio also considers include:
• Leverage—as in Bansal and Yaron, Gourio also prices a dividend stream
that’s much more volatile than aggregate consumption. In particular,
log(dt+1 /dt ) = λ log(ct+1 /ct ), with λ = 3. Leverage proves to be im-
portant for matching the volatility of the price-dividend ratio and the
price-dividend ratio’s ability to forecast future returns.
• Having the probability of disaster tomorrow depend on the occurrence
(or not) of a disaster today. The f t process above is independent of whether
disasters actually occur. Gourio tries some experiments where a dis-
aster today makes a disaster tomorrow either more or less likely. This
24 SinceP is symmetric, its invariant distribution is (1/2, 1/2).
25 Recall that a symmetric two-state Markov chain that mimics an autoregressive process will
have π = (1 − AC1 )/2, where AC1 is the process’s first-order autocorrelation.
26 Our α is his θ.

124
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II

amounts to having different P matrices depending on the occurrence or


non-occurrence of a disaster. This channel turns out to have little effect
on the results.
• Picking parameters—in particular, α, e, and π—that best match the fol-
lowing moments in the data: the volatility of the price-dividend ratio,
the mean equity premium, and the coefficient of a regression of excess
returns on the (inverse of) the price-dividend ratio. Those results match
many features of the data, though the model implies too low a volatility
of the riskless rate and too high a volatility of dividend growth.

125
Lecture 6

Bond Pricing and the Term


Structure of Interest Rates

So far, we’ve worked with models where we’ve priced either infinitely-lived
equity or one-period bonds.

126
References

[Abe90] Andrew B. Abel. Asset prices under habit formation and keeping up
with the Joneses. American Economic Review, 80(2):38–42, 1990.
[Bar06] Robert J. Barro. Rare disasters and asset markets in the twentieth
century. Quarterly Journal of Economics, 121(3):823–866, 2006.
[BDL93] Robert B. Barsky and J. Bradford De Long. Why does the stock mar-
ket fluctuate? The Quarterly Journal of Economics, 108(2):291–311,
1993.
[Ber98] Michele Bernasconi. Tax evasion and orders of risk aversion. Journal
of Public Economics, 67:123–134, 1998.
[BG69] William A. Brock and David Gale. Optimal growth under factor aug-
menting progress. Journal of Economic Theory, 1(3):229–243, 1969.
[BKY07] Ravi Bansal, Dana Kiku, and Amir Yaron. Risks for the long run:
Estimation and inference. Unpublished manuscript, 2007.
[BL02] Ravi Bansal and Christian Lundblad. Market efficiency, asset returns,
and the size of the risk premium in global equity markets. Journal of
Econometrics, 109(2):195–237, 2002.
[BPR78] Charles Blackorby, Daniel Primont, and R. Robert Russell. Duality,
Separability, and Functional Structure: Theory and Economic Applica-
tions. North-Holland, 1978.

[Bre79] Douglas T. Breeden. An intertemporal asset pricing model with


stochastic consumption and investment opportunities. Journal of Fi-
nancial Economics, 7(3):265–296, 1979.
[BRW08] Jacob Boudoukh, Matthew Richardson, and Robert Whitelaw. The
myth of long-horizon predictability. Review of Financial Studies,
24(4):1577–1605, 2008.
[BY04] Ravi Bansal and Amir Yaron. Risks for the long run: A potential res-
olution of asset-pricing puzzles. Journal of Finance, 59(4):1481–1509,
2004.

127
REFERENCES REFERENCES

[CC99] John Y. Campbell and John H. Cochrane. By force of habit: A


consumption-based explanation of aggregate stock market behavior.
Journal of Political Economy, 107(2):205–251, 1999.
[CCC10] Claudio Campanale, Rui Castro, and Gian Luca Clementi. Asset pric-
ing in a production economy with Chew-Dekel preferences. Review
of Economic Dynamics, 13(2):379–402, 2010.
[CG11] George M. Constantinides and Anisha Ghosh. Asset pricing tests
with long run risk in consumption growth. Unpublished manuscript,
University of Chicago Booth School of Business, August 2011.
[Che06] Raj Chetty. A new method of estimating risk aversion. American
Economic Review, 96:1821–1834, 2006.
[CLM96] John Y. Campbell, Andrew W. Lo, and A. Craig MacKinlay. The
Econometrics of Financial Markets. Princeton University Press, 1996.
[Coc01] John H. Cochrane. Asset Pricing. Princeton University Press, 2001.
[Coc08] John H. Cochrane. The dog that did not bark: A defense of return
predictability. Review of Financial Studies, 21(4):1533–1575, 2008.
[Con90] George M. Constantinides. Habit formation: A resolution of the
equity premium puzzle. Journal of Political Economy, 98(3):519–543,
1990.
[CS88] John Y. Campbell and Robert J. Shiller. The dividend-price ratio and
expectations of future dividends and discount factors. Review of Fi-
nancial Studies, 1(3):195–228, 1988.
[Den95] The term structure of interest rates in real and monetary economies.
Journal of Economic Dynamics and Control, 19(5-7):909–940, 1995.
[Dol11] Jim Dolmas. Risk preferences, intertemporal substitution, and busi-
ness cycle dynamics. Unpublished manuscript, May 2011.
[DR89] Philip H. Dybvig and Stephen A. Ross. Arbitrage. In John Eatwell,
Murray Milgate, and Peter Newman, editors, The New Palgrave: Fi-
nance, pages 57–71. W. W. Norton Publishing Company, 1989.
[Duf92] Darrell Duffie. Dynamic Asset Pricing Theory. Princeton University
Press, 1992.
[EZ89] Larry Epstein and Stanley Zin. Substitution, risk aversion and the
temporal behavior of asset returns: A theoretical analysis. Economet-
rica, 57:937–969, 1989.
[EZ90] Larry Epstein and Stanley Zin. ‘First-order’ risk aversion and the
equity premium puzzle. Journal of Monetary Economics, 26:387–407,
1990.

128
REFERENCES REFERENCES

[Fam65] Eugene F. Fama. The behavior of stock market prices. Journal of


Business, 38(1):34–105, 1965.
[Fam70] Eugene F. Fama. Efficient capital markets: A review of theory and
empirical work. Journal of Finance, 25(2):383–417, 1970.

[Gab08] Xavier Gabaix. Variable rare disaster: A tractable theory of ten puz-
zles in macro-finance. American Economic Review, 98(2):64–67, 2008.
[Gou08] François Gourio. Time-series predictability in the disaster model.
Finance Research Letters, 5(4):191–203, 2008.
[Gra08] Liam Graham. Consumption habits and labor supply. Journal of
Macroeconomics, 30(1):382–395, 2008.
[Gul91] Faruk Gul. A theory of disappointment aversion. Econometrica,
59(3):667–686, 1991.
[Guv09] Fatih Guvenen. A parsimonious macroeconomic model for asset
pricing. Econometrica, 77(6):1711–1750, 2009.
[Hic35] John R. Hicks. A suggestion for simplyfying the theory of money.
Economica, pages 1–19, February 1935.
[Jer98] Urban Jermann. Asset pricing in production economies. Journal of
Monetary Economics, 41(2):257–275, 1998.

[Jud98] Kenneth L. Judd. Numerical Methods in Economics. MIT Press, 1998.


[Kar59] Samuel Karlin. Mathematical Methods and Theory in Games, Program-
ming, and Economics. Addison-Wesley Publishing Company, Inc.,
1959.

[Koc90] Narayana R. Kocherlakota. On the ‘discount’ factor in growth


economies. Journal of Monetary Economics, 25(1):43–47, 1990.
[Koc96] Narayana R. Kocherlakota. The equity premium: It’s still a puzzle.
Journal of Economic Literature, 34(1):42–71, March 1996.

[KP78] David Kreps and Evan Porteus. Temporal resolution of uncertainty


and dynamic choice theory. Econometrica, 46(1), 1978.
[Kre88] David M. Kreps. Notes on the Theory of Choice. Westview Press, 1988.
[KS10] Karen A. Kopecky and Richard M. H. Suen. Finite state Markov-
chain approximations to highly persistent processes. Review of Eco-
nomic Dynamics, 13:701–714, 2010.
[KT79] Daniel Kahneman and Amos Tversky. Prospect theory: An analysis
of decision under risk. Econometrica, 47(2):263–291, 1979.

129
REFERENCES REFERENCES

[Ler08] Stephen F. Leroy. Excess volatility tests. In Steven N. Durlauf and


Lawrence E. Blume, editors, The New Palgrave Dictionary of Economics.
Palgrave MacMillan, 2nd edition, 2008.
[Lin65] John Lintner. The valuation of risk assets and the selection of risky
investments in stock portfolios and capital budgets. Review of Eco-
nomics and Statistics, 47(1):13–37, 1965.
[LL10] Martin Lettau and Sydney C. Ludvigson. Measuring and model-
ing variation in the risk-return tradeoff. In Yacine Aı̈t-Sahalia and
Lars Peter Hansen, editors, Handbook of Financial Econometrics, vol-
ume 1, chapter 11, pages 617–690. Elsevier Science B.V., 2010.

[LN07] Sydney C. Ludvigson and Serena Ng. The empirical risk-return re-
lation: A factor analysis approach. Journal of Financial Economics,
83(1):171–222, 2007.
[LP81] Stephen F. Leroy and Richard D. Porter. The present value relation:
Tests based on implied variance bounds. Econometrica, 49(3):555–574,
May 1981.
[LP04] Francis A. Longstaff and Monika Piazzesi. Corporate earnings and
the equity premium. Journal of Financial Economics, 74(3):400–421,
2004.

[LU09] Lars Ljungqvist and Harald Uhlig. Optimal endowmwnet destruc-


tion under Campbell-Cochrane habit formation. NBER Working Pa-
per No. 14772, 2009.
[Luc78] Robert E. Lucas, Jr. Asset prices in an exchange economy. Economet-
rica, 46(6):1429–1445, November 1978.

[Mar52] Harry M. Markowitz. Portfolio selection. Journal of Finance, 7(1):77–


91, 1952.
[Mar99] Harry M. Markowitz. The early history of portfolio theory: 1600-
1960. Financial Analysts Journal, 55(4):5–16, 1999.

[McK02] Lionel W. McKenzie. Classical General Equilibrium Theory. MIT Press,


2002.
[MP85] Rajnish Mehra and Edward C. Prescott. The equity premium: A puz-
zle. Journal of Monetary Economics, 15:145–161, 1985.
[MP88] Rajnish Mehra and Edward C. Prescott. The equity risk premium: A
solution? Journal of Monetary Economics, 22:133–136, 1988.
[MY03] Angelo Melino and Alan X. Yang. State-dependent preferences can
explain the equity premium puzzle. Review of Economic Dynamics,
6(4):806–830, 2003.

130
REFERENCES REFERENCES

[PSV04] Ignacio Palacios-Huerta, Roberto Serrano, and Oscar Volij. Rejecting


small gambles under expected utility. Economics Letters, 91:250–259,
2004.
[Qui82] John Quiggin. A theory of anticipated utility. Journal of Economic
Behavior and Organization, 3:323–343, 1982.

[Rab00] Matthew Rabin. Risk aversion and expected-utility theory: A cali-


bration result. Econometrica, 68:1281–1292, 2000.
[Rie88] Thomas A. Rietz. The equity risk premium: A solution. Journal of
Monetary Economics, 22(1):117–131, 1988.

[Ros04] Stephen A. Ross. Neoclassical Finance. Princeton University Press,


2004.
[Rou95] K. Geert Rouwenhorst. Asset pricing implications of equilibrium
business cycle models. In Thomas F. Cooley and Edward C. Prescott,
editors, Frontiers of Business Cycle Research, pages 294–330. Princeton
University Press, Princeton, NJ, 1995.
[RZ10] Bryan R. Routledge and Stanley E. Zin. Generalized disappointment
aversion and asset prices. Journal of Finance, 65(4):1303–1332, 2010.
[Sar07] Thomas J. Sargent. Commentary. Federal Reserve Bank of St. Louis
Review, 89(4):301–303, 2007.
[Sha64] William F. Sharpe. Capital asset prices: A theory of market equilib-
rium under conditions of risk. Journal of Finance, 19(3):425–442, 1964.
[Shi77] Georgi E. Shilov. Linear Algebra. Dover Publications, 1977.

[Shi81] Robert J. Shiller. Do stock prices move too much to be justified


by subsequent changes in dividends? American Economic Review,
71(3):421–436, 1981.
[SS90] Uzi Segal and Avia Spivak. First order versus second order risk aver-
sion. Journal of Economic Theory, 51(1):111–125, 1990.

[SS08] Zvi Safra and Uzi Segal. Calibration results for non-expected utility
theories. Econometrica, 76:1143–1166, 2008.
[Sta00] Chris Starmer. Developments in non-expected utility theory: The
hunt for a descriptive theory of choice under risk. Journal of Economic
Literature, 38:333–382, 2000.

[SW03] Frank Smets and Raf Wouters. An estimated dynamic stochastic gen-
eral equilibrium model of the euro area. Journal of the European Eco-
nomic Association, 1(5):1123–1175, 2003.

131
REFERENCES REFERENCES

[Tau86] George Tauchen. Finite state Markov chain approximations to uni-


variate and vector autoregressions. Economics Letters, 20:177–181,
1986.
[Tob58] James Tobin. Liquidity preference as behavior toward risk. Review of
Economic Studies, 25(2):65–86, 1958.

[TW11] Yi Tang and Robert F. Whitelaw. Time-varying Sharpe ratios and


market timing. Quarterly Journal of Finance, 1(3):465–493, 2011.
[Wak94] Peter Wakker. Separating marginal utility and probabilistic risk aver-
sion. Theory and Decision, 36:1–44, 1994.

[Wei89] Philippe Weil. The equity premium puzzle and the risk-free rate puz-
zle. Journal of Monetary Economics, 24(3):401–422, 1989.
[Wei90] Philippe Weil. Non-expected utility in macroeconomics. Quarterly
Journal of Economics, 105(1):29–42, 1990.

[Yaa87] Menachem Yaari. The dual theory of choice under risk. Econometrica,
55:95–115, 1987.

132
Appendix A

An introduction to using
M ATLAB

A.1 Introduction
M ATLAB is a matrix-based numerical calculation and visualization tool. It is
much like an extremely powerful calculator, with the ability to run scripts—i.e.,
programs—and generate high-quality plots and graphs. It is also an extremely
easy package to learn.
When you start up M ATLAB by double-clicking on the M ATLAB icon, the
‘M ATLAB desktop’ opens. The layout you see will depend on who’s used it
last, and whether they’ve tinkered with the desktop layout. At the very least,
you’ll see the Command Window and (maybe) views of the workspace, file di-
rectory, or a list of recently used commands. M ATLAB uses standard Windows
conventions—e.g., typing A LT +F brings down the F ILE menu, A LT +E brings
down the E DIT menu, etc.
The ‘prompt’ in the Command Window—the spot where you type in commands—
looks sort of like this: >>.
One way to familiarize yourself quickly is to enter demo or help at the
prompt. Entering demo gives you access to a video tour of M ATLAB’s features
(enter demo 'matlab''getting started' to see a menu of videos about basic
features). Entering help gives you a long list of topics you can get help on.

A.2 Creating matrices


There are several ways to create matrices in Matlab. You can create a matrix
by typing in the elements of the matrix, inside square brackets. Use a space or
comma to separate the elements of a row and use a semicolon to separate rows.
Thus, either A = [1 2 3; 4 5 6] or A = [1, 2, 3; 4, 5, 6] will create the

133
A.2. CREATING MATRICES APPENDIX A. USING MATLAB

2 × 3 matrix A which is  
1 2 3
A=
4 5 6
You’ll note that if you enter A = [1 2 3;4 5 6];—i.e., this time ending the
line with a semi-colon—M ATLAB apparently does nothing. This is not the
case. M ATLAB still records that A now denotes the above matrix; the semi-
colon at the end of the line simply tells M ATLAB to suppress displaying the
result. You’ll want to do this in most cases, especially with large matrices or
vectors, or when you are running an iterative program—displaying the execu-
tion of each line greatly slows the program down.
The colon (‘:’) is useful for creating certain types of vectors: in M ATLAB,
b = 1:5; produces a row vector b = (1, 2, 3, 4, 5). Using the colon, you could
write our matrix A above by typing A = [1:3; 4:6];
Matlab also has several built-in functions for creating specialized matrices,
among them: zeros (matrix of zeros), ones (matrix of ones), eye (identity ma-
trix), and rand (matrix of pseudorandom variables drawn from a uniform dis-
tribution on [0, 1]).1 The syntax for all these functions is the same: zeros(N,M)
creates an N × M matrix of zeros, and zeros(N) makes a square N × N matrix
of zeros. For these, or any function, typing help function name will display help
related to the function function name.
Having created a matrix, say A, you can call an element of it—say the (2, 1)
element—by typing A(2,1). If A is the matrix described above, typing A(2,1)
at the command line returns ans = 4 (‘ans’ is Matlab’s shorthand for ‘an-
swer’). Entering A(2,1) is treated as the question ‘What’s the 2–1 element of
A?’, the answer of which is four. If A is still the matrix
 
1 2 3
A=
4 5 6
entering A(3,1) will produce an error message, ‘Index exceeds matrix dimensions’,
since A only has two rows and we have asked about the first element in its (non-
existent) third row.
You can change an element of a matrix by giving it a new value: if you type
A(2,1)=0, Matlab returns  
1 2 3
A=
0 5 6
You can call or assign values to all the elements of a row or column or
some submatrix of a matrix using the colon. In M ATLAB, A(i:j,h:k) is the
submatrix of A consisting of A’s rows i through j and columns h through k,
A(i,:) is the ith row, and A(:,k) is the kth column.
Type A(1,:) and Matlab will return ans = [1 2 3], all the elements of the
first row. Likewise, A(:,2) returns
 
2
ans =
5
1 Purely deterministic computers can’t create real random variables, but they can create things

like random numbers. See http://en.wikipedia.org/wiki/Pseudorandom_number_generator.

134
A.3. BASIC MATRIX OPERATIONS APPENDIX A. USING MATLAB

all the elements of the second column. A(1:2,2:3) returns the 2 × 2 submatrix
 
2 3
ans =
5 6

In the same way, entering A(2,:)=ones(1,3) will change the second row
of A to a row of ones, and returns
 
1 2 3
A=
1 1 1

A convenient feature of M ATLAB’s indexing of matrices is its use of the


word ‘end’: A(i,end) is the last element of the ith row of A, A(i,j:end) is the
jth through last elements of the ith row, and A(end,end) or just A(end) is the
last element of the last row and column.

A.3 Basic matrix operations


Multiply matrices with ‘*’, add matrices with ‘+’, subtract matrices with ‘-’.
The matrices have to be conformable for these operations in the usual linear
algebra sense: adding (A+B) and subtracting (A-B) requires the matrices A and B
to be of the same dimension. Matrix multiplication (A*B) is only feasible if the
number of columns of A equals the number of rows of B.
Rules for conformability are relaxed for scalar multiplication or scalar ad-
dition. You can multiply a matrix by a scalar using ‘*’ or add a scalar to every
element of a matrix using ‘+’, so 5*A multiplies every element of A by 5, and
5+A adds 5 to every element of A.
Transpose matrices using the prime (') symbol: if A is the original matrix
we were manipulating above, entering B = A' returns
 
1 4
B = 2 5
3 6

Other transformations which are occasionally useful are those that reorient
a matrix or vector in various ways. For example, if you enter x=1:5 you will
get the vector  
x= 1 2 3 4 5
If you then enter y=fliplr(x) you’ll get the vector
 
y= 5 4 3 2 1

—fliplr is short for ‘flip left-to-right’. More information about these is avail-
able by entering help elmat, where the ‘elmat’ stands for ‘elementary matrices
and matrix manipulation’.

135
A.3. BASIC MATRIX OPERATIONS APPENDIX A. USING MATLAB

To find the inverse of a matrix A in M ATLAB, you can type inv(A). Of


course, A must be a square matrix. For example, enter A = [1,2;3,4]; then B
= inv(A). You should get
 
−2.0000 1.0000
B=
1.5000 −0.5000
If you then enter C = B*A or C = A*B, you should see an identity matrix. Well,
actually, you’ll see  
1.0000 0
C=
0.0000 1.0000
which leads us to the issue of precision. Given the precision of the calculations
M ATLAB uses here, C(1,2) is zero, but C(2,1) is not. You can see this by using
M ATLAB’s method for checking equality, the double equal sign. In general,
entering x==y for scalars x and y returns a one if x = y and a zero otherwise. If
x and y are vectors or matrices of the same size, then x==y returns a vector or
matrix consisting of ones for the indices where the elements of x and y are equal
and zeros elsewhere.2 So, entering C(1,2)==0 returns a one, while C(2,1)==0
returns a zero. You can see what C(2,1) actually is by just entering C(2,1). It
is a very, very small number, but nonetheless not equal to zero.
M ATLAB also has operations for matrix division, which use the slash ‘/’
and backslash ‘\’. For scalars a and b, a/b and b\a are ordinary division of a
by b. When A and B are matrices, A/B is matrix ‘right division’ of B into A,
which is the same as AB−1 , while B\A is matrix left division, or B−1 A. Enter
help slash for more information on these operations. The methods M ATLAB
uses for these calculations are more precise than the ones it uses for inv, and
one can use the slashes for finding inverses—if A is an n × n matrix, then B =
eye(n)/A calculates A−1 . Using the previous A, set B = eye(2)/A. If you then
enter C = A*B, you should see the more exact3
 
1 0
C=
0 1

A.3.1 An example—ordinary least squares


Suppose you wanted to calculate the ordinary least squares estimate of β in the
regression
y = Xβ + u
where y is an N × 1 vector of N observations on the ‘dependent variable’, X is
an N × k matrix of N observations on each of k ‘independent’ variables, β is a
2 Type help relop for more information about ‘==’ and M ATLAB ’s other ‘relational operators’—

e.g., operators for checking whether x ≥ y or whether x has any non-zero elements and so forth.
3 Calculating inverses using slashes is also faster than using inv. This could be an important

consideration if you have an iterative program that needs to calculate a large number of inverses.
Just as a test, I made a random 100 × 100 matrix, and calculated its inverse 1000 times, first as
inv(A), then as eye(100)\A. The amount of time required for the slash method was just 3% of the
time required for the inv method.

136
A.3. BASIC MATRIX OPERATIONS APPENDIX A. USING MATLAB

k × 1 vector of ‘coefficients’, and u is an N × 1 vector of disturbances, whose


expectation conditional on X is zero. The idea is that the yi ’s are random vari-
ables, whose expected values conditional on X, have the following linear form

E[yi : Xi ] = Xi1 β 1 + Xi2 β 2 + · · · + Xik β k

If this is the true underlying process generating the data, then realizations—our
observations—should obey the regression equation above. The point of least-
squares regression analysis is to use the observations X and y to construct an
estimate of the (unobserved) vector of coefficients β. In a purely mathematical
sense, what least squares does is pick β to minimize the Euclidean distance
between the vector y ∈ R N and the subspace of R N generated by the k columns
of X.4 The least-squares estimate of β—call it β̂ OLS —is thus the solution to the
following quadratic minimization problem:

min(y − Xb)> (y − Xb).


b

If you multiply the quadratic form out, you’ll see that the problem can be writ-
ten as
min y> y − 2b> X > y + b> X > Xb
b
The first-order condition is

−2X > y + 2X > Xb = 0,

or
X > Xb = X > y,
which gives
β̂ OLS = ( X > X )−1 X > y
if X > X is nonsingular. In any case, in M ATLAB one can perform OLS regres-
sion easily. If y and X are matrices containing your observations of the de-
pendent and independent variables, then entering beta = inv(X'X)*X'y, or
beta=(X'X)\X'y will yield the vector of OLS estimates.
A good exercize to try is to create some data X and y and apply the re-
gression formula to it. Try X = [ones(50,1),2*rand(50,1)] and set the true
beta as beta true = [5;5]. Make y by adding some normally distributed
disturbances to the conditional mean X*beta true. M ATLAB’s randn func-
tion makes N (0, 1) pseudorandom variables, so let’s create y = X*beta true
+ 2*randn(50,1). The mean of X should be about [1, 1], so the conditional
mean of our y will be about 10 and the disturbances will then have a standard
deviation about 20 percent of the mean. Now use either of the expressions from
the last paragraph to calculate the OLS beta, and compare it to beta true. It
should be different, but close.
4 If x and x are two vectors in Rn , the subspace generated by x and x is the set of all vectors
1 2 1 2
of the form αx1 + βx2 for α, β ∈ R. So, {z = Xb : b ∈ Rk } is the subspace in R N one gets from
taking all possible linear combinations—with coefficients b = (b1 , b2 , . . . bk )—of the columns of X.

137
A.4. ARRAY OPERATIONS APPENDIX A. USING MATLAB

A.4 Array operations and elementary functions


M ATLAB also has so-called ‘array’ operations, which are done by putting a
period ‘.’ in front of a regular operation. If A and B are two matrices of the
same dimension, then A.*B multiplies every element of A by the corresponding
element of B. That is, if A = [ aij ] and B = [bij ] then A.*B is the matrix whose ijth
element is aij bij . A./B performs a similar sort of division—the typical element
of the resulting matrix is aij /bij . If a is a scalar and B is a matrix, then a./B
results in a matrix whose typical element is a/bij .
Other useful operations or functions are raising matrices to powers, taking
their square roots, logarithms or exponentials. If A is a square matrix and n
an integer, then A^n multiplies A by itself n times—i.e., this is An . The array
version, A.^n, raises every element of A to the nth power, for which operation
A need not be square.
With square roots, logarithms and exponentials, the convention is reversed
in that B = sqrt(A) produces a matrix B consisting of the square roots of the
elements of A; B = log(A) produces a matrix B consisting of the natural log-
arithms of the elements of A; and B = exp(A) creates a matrix B whose ijth
element is e aij . That is, these unadorned functions operate in an ‘array’ sense.
The ‘matrix’ versions of these functions, sqrtm, logm and expm, perform ma-
trix square roots, logs and exponentials—e.g., if A is a matrix, B = sqrtm(A)
tries to calculate a matrix B satisfying BB = A, and B = expm(A) tries to ap-
proximate the matrix exponential

1 2 1
eA = I + A + A + A3 + · · ·
2! 3!
These functions don’t get used too often in most of the applications we nor-
mally do.
Help on these topics can be had by entering help ops for basic operations
and help elfun for elementary functions like logs and exponentials. As usual,
just typing help gives you a list of specific categories about which you can look
for help.

A.5 Multi-dimensional arrays


M ATLAB allows you to work with matrices, or ‘arrays’, with more than two
dimensions. Three-dimensional arrays are sometimes useful in solving dy-
namic programming problems or asset pricing problems. Arrays with more
than three dimensions may be useful, but are less intuitive, or at least less ca-
pable of being visualized.
Enter B = rand(2,2,3) at the command line (no semi-colon at the end) to
see what a three-dimensional array looks like. It’s just like three 2 × 2 matri-
ces. M ATLAB will display them in a vertical list, B(:,:,1) then B(:,:,2) then
B(:,:,3). When programming with three-dimensional arrays, I find it useful
to visualize the third dimension as ‘depth’ (the first two dimensions, rows and

138
A.6. STRUCTURE ARRAYS APPENDIX A. USING MATLAB

columns, corresponding to ‘height’ and ‘width’), so the three dimensional ar-


ray is like three 2 × 2 matrices printed on three cards, the three cards arranged
so that B(:,:,1) is in front, B(:,:,2) is behind it, and B(:,:,3) is behind
B(:,:,2).

A.6 Structure arrays


Structure arrays are a convenient form for collecting together, and moving
around, groups of objects of different dimensions. I use them a lot in setting
the parameters of a model to be solved or collecting the results.
It’s easiest to describe structure arrays just by creating one as an example.
Suppose we have a model to solve that will depend on an elasticity of intertem-
poral substitution (a scalar, η say), a discount factor (a scalar β), and a Markov
chain describing the process for consumption (which is a vector of consump-
tion states c and a transition probability matrix P). Our program to solve the
model will need those parameters, and our program may call functions that
we’ve written that also need those parameters. It would be convenient to have
an array parameters that contains all those variables, so we can pass them
simply to the program and the functions used by the program. Since they are
all different dimensions, we can’t use a matrix or multi-dimensional array. So,
we’ll make parameters a structure array.
Suppose we want to set η = 1/2, β = .94, c = (.96, 1.04) and
 
0.975 0.025
P=
0.025 0.975

To create a structure parameters with the objects in it, we just enter:


parameters.eta = 1/2;
parameters.beta = .94;
parameters.c = [.96; 1.04];
parameters.P = [0.975 0.025; 0.025 0.975];
That’s all there is to it. If you now type parameters and hit E NTER, you’ll
see a list of what’s in the structure. You can call an object in the structure just
by typing parameters.object. If the object is a vector or matrix you can call
an element of the object just by typing parameters.object(i,j). For example,
entering parameters.P(2,1) returns 0.025.

A.7 Eigenvalues and eigenvectors


M ATLAB allows you to easily calculate things like eigenvalues and eigenvec-
tors and to diagonalize or otherwise decompose matrices. If A is a square
matrix—say  
1 2
A=
3 4

139
A.8. MAX AND MIN APPENDIX A. USING MATLAB

—then eig(A) will calculate the eigenvalues of A and display them. The eig
function is one which can have multiple outputs. If you enter [P,D] = eig(A);
M ATLAB calculates both the eigenvectors and eigenvalues of A. The eigenvec-
tors get stored in the matrix P, as P’s columns, and the eigenvalues get stored
in D, where D is a diagonal matrix with the eigenvalues on the diagonal. The
matrices are arranged so that eigenvalues and eigenvectors are matched in the
sense that the eigenvector associated with the eigenvalue in the ith column of
D is in the ith column of P.
Try entering [P,D]=eig(A) for the 2 × 2 matrix A defined above; then, com-
pare A*P(:,1) with D(1,1)*P(:,1). They should be the same, or at least their
difference should fairly close to zero (on the order of 10−16 ), allowing for some
imprecision in the calculations.

A.8 Max and min


M ATLAB has a ‘max’ function and a ‘min’ function for finding the biggest or
smallest elements in matrices or vectors. If A and B are matrices of the same
dimension (say n × k), then entering C = max(A,B) returns the n × k matrix
C whose i, jth element is the maximum of A(i,j) and B(i,j). The operation
of min is analogous. Both max and min will also work in this manner if A is a
matrix and B is a scalar (or vice versa).
More typical for what we do, though, is the following use of min and max. If
x is a vector, then max(x) by itself returns the maximal value in x. This function
has multiple outputs, too. [M,m] = max(x) returns the maximal value in M
and the number of the element of x where the maximum occurs in m. The min
function can do the same for finding minima. Thus if you enter x = -5:5; and
y = x.^2;—which creates the vectors

x = −5 −4 −3 −2 −1 0 1 2 3 4 5

and 
y = 25 16 9 4 1 0 1 4 9 16 25
—then [M,m] = min(y), you should get that M = 0 and m = 6. If you en-
ter [M,m] = max(y), you’ll find that when M ATLAB is ‘indifferent’—here both
y(1) and y(11) are maxima—it opts for the first occurrence in the vector, in
this case returning M = 25 and m = 1.
When max and min are applied to matrices, the operation described above
(by default) looks for the max and min in each column of the matrix—so it’s
operating along, or searching up and down, the rows of each column. Thus,
if A is an n × k matrix, [M,m] = max(A) finds the biggest element in each of
the k columns of A and stores them in M which is a 1 × k row vector. For each
column, the row number where the maximum in that column occurs is stored
in m. Thus, if  
1 4
A=
2 3

140
A.9. SPECIAL SCALARS APPENDIX A. USING MATLAB

then [M,m] = max(A) returns M = [2 4] and m = [2 1].


You can make M ATLAB change the dimension max and min operate on by
entering the desired dimension as an additional argument. Since, as we saw
above, a second argument already has a specified purpose in max and min,
the dimension is entered as a third argument, and we put an empty matrix—
squares braces with nothing inside—as the second argument. Enter [M,m]
= max(A,[],1) and M ATLAB returns the same answer as before—taking the
max along the rows (dimension 1), in each column, is the default. [M,m] =
max(A,[],2) takes the max along the columns, in each row. Note that M and
m are column vectors in that case. You can also use max and min along the
third (or higher) dimensions of multi-dimensional arrays. Trying entering A =
randn(2,2,2), then [M,m]=max(A,[],3). Do the results make sense to you?
I should add that, though I keep using M and m for the output, you can
use whatever variable name you like, as long as it begins with a letter, not a
number. M ATLAB will accept almost anything as the name of a variable, so
in applications you should feel free to give things mnemonic names—e.g., you
can use, say, K2L for the capital-labor ratio in some model or wage for the wage
rate. There are rules about avoiding certain special characters, and, I think, a
limit on the number of characters in the variable name.

A.9 Special scalars


M ATLAB has a few special scalars which come up often. One example is Inf,
for plus infinity. Inf can result from division by zero—try entering 10/0—
or from ‘overflow’—when a number is so big that to machine precision it is
infinite.5 Loosely, Inf satisfies the properties of +∞—if you add something
to Inf, the answer’s still Inf, and dividing something by Inf (other than Inf
itself) yields an answer of zero.
If you enter Inf/Inf or 0/0, you will get to see another special scalar, M AT-
LAB’s NaN, or ‘not a number’. It is usually not good to see this in your output,
unless you put it there intentionally.6
M
√ATLAB also uses i, j and pi to denote particular numbers—i and j stand
for −1 and pi stands for the irrational number π. Within a given session,
you can make i, j and pi whatever you’d like. This is useful as i and j make
nice indices for so-called ‘for loops’—a loop that executes some operation ‘for
5 Any computer has to describe a number in a certain finite number of bytes, so numbers larger

than some critical size are effectively infinite while numbers smaller than some size are regarded
as zero. On the computer I’m using right now, for example, anything much over 1.7e+308 is just
Inf, and anything less than around 4.9e-324 is just 0. Note that machine precision varies with
the order of magnitude of the numbers you’re working with. On most systems, the next biggest
number than 0 which M ATLAB can recognize is 4.9407e-324. The difference between 1 and the next
biggest number than 1 (called eps) is 2.2204e-016. Try entering 1+eps==1; M ATLAB will return
a 0, indicating the statement is false. Now try 1+eps/2==1; M ATLAB returns a 1, indicating the
statement is true. But, enter eps/2>0, and you’ll see that that’s true, too.
6 Like with zeros or ones, NaN(N,M) creates an N × M matrix of NaN’s. These are sometimes

useful in programming.

141
A.10. LOOPS AND SUCH APPENDIX A. USING MATLAB

i = 1, 2, . . . n’ or ‘for j = 1, 2, . . . n’. The next time you start up M ATLAB—or if


you ‘clear’ the renamed variables—i, j and pi will be restored to their default
values. For example—and this also illustrates the use of clear—type pi = 3
at the prompt. You’ve now set pi equal to 3. If you subsequently want to
use the real π, or rather the computer’s approximation to it, type clear pi.
This removes your variable pi from the workspace. If you then type pi at the
prompt, you will see that pi again denotes π.
M ATLAB’s clear command, just illustrated, is used to remove variables
from your ‘workspace’—which is, roughly, the memory where M ATLAB keeps
track of what matrices you’ve created. You can clear particular variables, as we
did above—clear X Y Z removes variables X, Y and Z from the workspace—
or remove all variables from the workspace, by typing clear by itself. You can
find out what variables are currently in the workspace, and some information
about them such as size and memory allocated to them, by typing whos.

A.10 Loops and such


M ATLAB has all the standard sorts of ‘loops’ you might use in programming—
‘for’ loops, ‘while’ loops, and ‘if–then’ structures. Typing help for, help while
or help if gives M ATLAB’s help on each of these. Personally, I think learning
through examples is a good way to quickly see how these loops work.

A.10.1 A ‘while’ loop example


Typically, a ‘while’ loops performs some specified set of commands repeat-
edly as long as—‘while’—something is less than or greater than something
else. Here’s a really simple example. Enter t = 0; at the M ATLAB prompt.
Then, enter while t<10000;. You’ll note that as soon as you hit E NTER, the
prompt disappears. Do not be alarmed. This is normal; the prompt will come
back after you’ve ended your loop with an end;. Now, type t = t + 1;—don’t
forget the semi-colon!—and hit E NTER. Finally, type end; and hit E NTER. The
prompt should quickly reappear. You have just written and executed a ‘while’
loop. What your loop did was increase t by increments of 1, starting from t
equal to zero, until t was at least as big as 10, 000. If you now type t, you
should see t = 10000. A bit silly, but you get the idea.
Here’s a richer example of a ‘while’ loop—maybe also a silly one, though.
We know that the transcendental number e is defined as
1 1 1 1
e = 1+ + + + +···
1! 2! 3! 4!
Suppose we wanted to approximate this by terminating the series at some n—
i.e., approximating e with

1 1 1 1 1
1+ + + + +···+
1! 2! 3! 4! n!

142
A.10. LOOPS AND SUCH APPENDIX A. USING MATLAB

for some n. How big an n do we need to get within some e of e?—or, rather,
how big an n do we need to get within e of M ATLAB’s approximation to e,
since computers can’t handle nonterminating, nonrepeating decimals either.
In M ATLAB, e is exp(1). What our while loop will do is start with e = 1 and
add terms of the form 1/n! for n = 1, 2, . . ., continuing until the difference
between our e and M ATLAB’s exp(1)is less than some e.
Let me describe roughly how the loop will work, and then show you how
to execute it in M ATLAB. There are two things we’ll need to keep track of, the
value of e and the value of n. We’ll also want to calculate n!, which we can
do by introducing a third variable—call it x—which will begin at x = 1 and be
updated to n times its current value at each pass through the loop. Thus, on the
first pass it will be 1; on the second (n = 2), it will be 2 · 1; on the third (n = 3),
it will be 3 · 2 · 1; and so on. Starting with e = n = x = 1, each pass through
the loop will perform the following commands: e = e + (1/x ); n = n + 1; and
x = nx. The ‘while’ statement will be such that this will go on as long as e
is more than e away from exp(1). If you think about the commands I listed,
you’ll see that the first pass will set

1
e = 1 + (1/1) = 1 + ,
1!
n = 1 + 1 = 2,
and
x = 2 · 1 = 2!.
The next pass will set

1 1
e = 1 + 1 + (1/2!) = 1 + + ,
1! 2!
n = 2 + 1 = 3,
and
x = 3 · 2! = 3!
and so forth.
So, this loop will do exactly what it is supposed to do. Now, what about
the while statement. Suppose our e is 10−15 , which can be entered as 1e-15
in M ATLAB. Our while statement will be: while abs(e-exp(1))>1e-15;. The
‘abs’ is for absolute value—our loop will go on as long as the absolute value of
the difference between e and exp(1) exceeds 10−15 . Don’t worry—this doesn’t
take long either.
All that said, we’ll now write out the steps as we would perform them in
M ATLAB. First, we need our ‘initial conditions’:

e = 1;
n = 1;
x = 1;

143
A.10. LOOPS AND SUCH APPENDIX A. USING MATLAB

Then, we have the loop itself:


while abs(e-exp(1))>1e-15;
e = e + (1/x);
n = n + 1;
x = n*x;
end;
Of course, our interest in writing this loop was to find out the value of n which
brings our approximation within e of M ATLAB’s exp(1). To see what the value
is, just type n. The answer should be n = 18.
Do not despair if a loop goes haywire, seeming never to end. To break a
loop that’s gone haywire or is going on longer than you’d like, just hit C TRL +C,
the ‘Control’ key together with the letter ‘C’.

A.10.2 A ‘for’ loop example


Again, rather than try to describe abstractly how a ‘for’ loop works, we will try
to learn through an example.
Suppose you wanted to generate an artificial time series { xt }tT=1 which
obeys the first-order autoregression, or AR(1),
xt = ρxt−1 + ξ t
starting from some initial value x0 , where the ξ t ’s are i.i.d. normal random
variables with mean zero and variance σ2 . Let’s take T = 100, ρ = .9, x0 = 0
and suppose that the standard deviation of the ξ t ’s is σ = .035. The first step
is to construct the ξ t ’s, of which there will be 100. Recall that if z is a standard
normal random variable—that is, z is normal with mean zero and variance
one—then x = µ + σz is distributed normally with mean µ and variance σ2 .
So, ‘xi = .035*randn(1,100);’ creates a 1 × 100 row vector ξ consisting of
realizations of random variables drawn from a normal distribution with mean
zero and standard deviation .035. If you want to see what ξ looks like, enter
‘plot(xi);’.
Now we want to create the xt ’s. Let’s begin by creating a vector x consisting
of x0 followed by T zeros. Our loop will then turn the zeros into realizations
following the AR(1) process above. Since x0 = 0 itself, you can accomplish
this with ‘x = zeros(1,101);’ which creates a 101-element row vector x with
all zeros.7 Note that given the way M ATLAB numbers elements of a vector, x0
corresponds to x(1) in the vector x, and the T realizations of x1 through x T
will correspond to elements x(2) through x(101). Then, the following ‘for’
loop creates x1 through x T according to the AR(1) process above:
for t = 1:100;
x(t+1) = .9*x(t) + xi(t);
end;
7 If x were something other than zero, 1 say, you would want to create a vector x consisting of
0
x0 followed by 100 zeros. You could do this with x = [1,zeros(1,100)];.

144
A.11. PROGRAMMING APPENDIX A. USING MATLAB

You’ll notice that, just as with the ‘while’ loop, as soon as you enter the
‘for’ statement here, the prompt disappears until the loop is completed with
an ‘end’ statement. You can see what x looks like by entering plot(x) at the
prompt. It’s interesting to compare what the disturbances xi look like as com-
pared to x. One way to put both on the same plot is by using M ATLAB’s hold
command. If you enter plot(xi), then hold, you’ll get a message like ‘Current
plot held’, which means that any additional plot statements will superimpose
their plots on the existing plot. Alternatively, create a vector t=(1:100); Then
enter the command plot(t,xi,t,x(2:end)). The ‘x(2:end)’ is the second
through last elements of x—that is, the 100 values after x0 that we created with
our loop.
On your own, you could play around a little bit and see what a random
walk looks like—set ρ = 1—and also how difficult it must be to distinguish
between that sort of process and the AR(1) with ρ, say, equal to .98. You can
also see what a what a random walk with drift looks like—the process is

x t = δ + x t −1 + ξ t

where δ is the ‘drift’ term. In terms of the loop, you would just change the
‘x(t+1) =. . .’ line to ‘x(t+1) = delta + x(t) + xi(t);’ where delta is what-
ever you want to set the drift to—you could put an actual number there, or
preface the whole loop with a definition delta = [number], then write the line
exactly as I have written it here.

A.11 Writing programs, scripts and function files


A.11.1 Scripts
Programs, or scripts, are collections of M ATLAB commands stored in a file.
When the program is run, M ATLAB simply executes whatever commands are
contained in the file. As you’ll see, writing programs in M ATLAB is very easy.
To create a new script, which M ATLAB refers to as an ‘M-file’, open the F ILE
menu, select N EW, then M- FILE. As soon as you do this, the M ATLAB editor
opens, with a blank document, in which you can type your program.
M ATLAB ignores spaces, and carriage returns or semi-colons are used to
separate commands. M ATLAB also ignores anything on a line that begins with
the percent ‘%’ sign, which is useful for adding comment lines to your pro-
grams. To write a program, then, simply type the commands you want M AT-
LAB to execute—with semi-colons at the end of each if you want to suppress
displaying the execution of each command—and separate the commands with
carriage returns.
As a simple example, let’s write a program that executes the same com-
mands as in our AR(1) example. Type the following:
T = 100;
rho = .98;

145
A.11. PROGRAMMING APPENDIX A. USING MATLAB

x0 = 0;
sigma = .035;
xi = sigma*randn(1,T);
x = [x0,zeros(1,T)];
for t = 1:T;
x(t+1) = rho*x(t) + xi(t);
end;
figure;
plot(x);
This program will generate the AR (1) process we looked at above. The
only slight differences are that we’ve created variables for T, ρ, x0 and σ. This
will make it easy to go back and change any of their values, since we’ll only
have to change it one place. Another difference is the line ‘figure;’ which
will open a new figure window before the plot commands are executed. We’ll
need this if we want to run the program a couple of times and compare the
figures. You will have noticed by now that if no figure windows are open, a
‘plot’ command opens a new figure window. What you may not have noticed
is that if a figure window is already open, a ‘plot’ command will put its plot in
the open window, replacing whatever was in that window before. If multiple
figure windows are open, the plot command will puts its plot in whichever
window is the ‘current’ one—i.e., whichever one was most recently viewed.
See help figure for more information. In any case, if we do not want our
program to disrupt any of our existing figures, we will want it to open a new
window before it executes the plot command.
Now you want to leave the editor and go back to the command window, but
first you’ll want to first save your program. From the editor’s F ILE menu, select
S AVE. You’ll need to give the file a name, and the extension must be ‘.m’—i.e.,
if I wanted to name this program ‘ar1maker’, I would enter ‘ar1maker.m’ as
the filename. I assume that the default directory where the editor will save the
file is somewhere on M ATLAB’s ‘path’—that is, in a place where M ATLAB can
find it. If you get a message that a file with this name already exists—which
is very likely if you all run out and try and name it ‘ar1maker.m’—please be
courteous, and do not overwrite the existing file (unless it’s your own). Come
up with a new name—maybe ar1maker2.m or ar1maker3.m. Whatever name
you come up with—let’s say it’s ‘ar1maker.m’—after you save it and return
to M ATLAB’s command window, you simply type ar1maker at the prompt (no
need for the ‘.m’). Your program will then be run. You’ll know that it worked
if a figure window opens up and you see xt plotted.
You can now check out one of the main advantages to writing scripts—
the ability to go back and change some parameter quickly and easily, and re-
execute the commands, without a lot of needless typing. To tinker with your
program, just open the file using the F ILE menu. You might change the value
of ρ or T, save the changes to the file, and run the program again.

146
A.12. LAST TIPS APPENDIX A. USING MATLAB

A.11.2 Function files


M ATLAB lets you create your own functions, which is very useful in solving
or simulating models in economics. A function file is a special type of M-file,
one that declares in the first line that it is a function, and what the syntax of the
function is.
An extremely simple introduction to creating function files is to turn the
M-file you just created into a function file. So, go back to the M-file you just
created in the last sub-section using the M ATLAB editor. If the filename you
gave it was ‘ar1maker’, then type the following as the first line of the file:
function x = ar1maker(rho,sigma,x0,T)
We’ve declared our file is a function file and specified the function’s syntax.
Our function will produce (and plot) the sample path x, given inputs for the
parameters rho and sigma, the initial value x0 and the number of periods T.
Next, since our original M-file specified values for rho, sigma, x0 and T,
we need to either delete or comment out those lines. The editor allows you to
easily comment out a block of text by selecting it with the cursor, then choosing
E DIT ¿C OMMENT from the editor’s menu bar. If you take this route (rather than
deleting), your finished product will look like this:
function x = ar1maker(rho,sigma,x0,T)
% T = 100;
% rho = .98;
% x0 = 0;
% sigma = .035;
xi = sigma*randn(1,T);
x = [x0,zeros(1,T)];
for t = 1:T;
x(t+1) = rho*x(t) + xi(t);
end;
figure;
plot(x);
Now, save and return to the command window. Test your function by typ-
ing something like x = ar1maker(.9,.05,1,20); The function should create
the simulated series (x) and make the plot, just as it did before.
If you’re really eager to test how well you’re learning M ATLAB, create a
simulated xt series, and—referring back to the OLS regression example—figure
out how to calculate ρ̂OLS in the regression xt = ρxt−1 + ut .

A.12 Last tips


This document is just meant as a quick introduction to get you started using
M ATLAB. There are a lot of topics we haven’t covered—notably, using ‘strings’
(pieces of text) in programs or functions, set operations (union, intersection),

147
A.12. LAST TIPS APPENDIX A. USING MATLAB

and the whole area of logical operations. You can learn more about these topics
from M ATLAB’s help facility.
As you write increasingly complex programs, an important lesson to keep
in mind is that M ATLAB is designed to do array operations really, really fast.
Things like ‘for’ loops—well, not so much. Anywhere you can replace a loop
with an array operation will greatly speed up your code.
Also, look around for what’s available on the internet to learn from. Most
researchers who use M ATLAB will post their programs on their websites along
with their papers. Read how other people have solved problems or written
programs to perform different tasks. If you borrow someone’s code, make sure
to give them credit in your work, and—as important—make sure you under-
stand what their code is doing, if you’re going to use it. You don’t need to
always be reinventing the wheel (unless you’re assigned to reinvent a wheel
as homework), but you don’t want the code you use to just be a ‘black box’,
either.

148

You might also like