Asset Book
Asset Book
macroeconomics
Jim Dolmas
1
Contents
0 Introduction 5
0.1 Why study asset pricing? . . . . . . . . . . . . . . . . . . . . . . . 5
0.2 Some facts to aim for . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.3 Putting models on the computer . . . . . . . . . . . . . . . . . . 7
0.4 Organization of the lectures . . . . . . . . . . . . . . . . . . . . . 7
2
CONTENTS CONTENTS
Bibliography 126
3
CONTENTS CONTENTS
4
Lecture 0
Introduction
These notes were written for a ‘mini-course’ on asset pricing that I gave for
second-year Ph.D. students in macroeconomics at Southern Methodist Univer-
sity in the spring of 2012. The course was half a semester—seven three-hour
lectures. I chose not to use a book for the course. A book like Duffie’s Dynamic
Asset Pricing Theory [Duf92] might have been useful for the very early material,
on the basic theory surrounding stochastic discount factors, but after that the
focus of the lectures was on published papers.
I also hadn’t planned on writing over a hundred pages of notes: after the
first couple lectures, the plan was to have the notes taper off to little more than
lecture outlines. But, given that I compose in LATEXonly a bit slower than I write
by hand—and am better able to read what I wrote in the former case—it was
difficult to stop once I got going.
5
0.2. SOME FACTS LECTURE 0. INTRODUCTION
One could also note that how we resolve asset pricing puzzles matters for
how we think about the cost of business cycles, which is the source of my own
interest in the subject.
1. The equity premium: The average real return on equity (says measured
by a broad value-weighted index of stocks) has been historically large,
on the order of 7 percent. The average real return on a relatively riskless
asset, like a Treasury bill or commerical paper, has been low, on the order
of 1 percent. The difference between the two—the equity premium—has
averaged around 6 percentage points.
2. Return volatility: The standard deviation of the equity return is large,
around 15 or 16 percent; the standard deviation of the riskless rate is
much smaller, on the order of 3 percent.1
3. The market Sharpe ratio: Around 0.5 on average, but conditional Sharpe
ratios are subject to considerable variation, with swings from 0 (near
business cycle peaks) to 1 (at business cycle troughs) not uncommon
[TW11, LN07].
4. Price-dividend ratios move around a lot, and have a lot of low frequency
power. They appear to forecast returns (high P/D ratios imply low re-
turns ahead), more so at longer horizons [Coc08]. Not everyone agrees
with this, though [BRW08].
5. Short term interest rates are quite persistent.2
6. Nominal bond yields on average rise with maturity. Perhaps also for ex
ante real yields.
7. The volatility of bond yields is fairly constant across maturities.
8. Consumption growth is not very volatile—a mean of about 2 percent and
a standard deviation of about the same size (in annual data).
6
0.3. COMPUTING LECTURE 0. INTRODUCTION
7
0.4. ORGANIZATION LECTURE 0. INTRODUCTION
The main takeaway from the lecture is the basic pricing relationship
p = E (mx ) (1)
where m 0 is the stochastic discount factor. This is the lens through which
all the subsequent models are viewed.
With those results in hand, we turn to infinite-horizon consumption based
models in Lecture 3, which begins with Lucas’s 1978 paper [Luc78]. The focus
here is less on Lucas’s mathematical machinery and more on the structure of
consumption based models. I also try to put Lucas’s work into context, situat-
ing it against the backdrop of 1970s-style tests of market efficiency and the ran-
dom walk hypothesis. In this and subsequent lectures, we take it for granted
that (1) becomes
p t = Et ( m t +1 x t +1 ) (2)
in a many-period context.
Lucas’s is the first consumption-based model we take to the computer—so
the notes on Lucas include a description of the solution technique, including a
first look at Markov chain approximations to AR(1) processes.
From Lucas, the lecture turns to Mehra and Prescott [MP85] and the equity
premium puzzle. The treatment is brief, given that the computational aspects
have already been presented in the context of Lucas’s model. This section also
gives some alternative characterizations of the puzzle(s) in terms of second
moment implications and bounds on attainable Sharpe ratios.
Lectures 4 and 5 then look at responses to the equity premium puzzle.
8
Lecture 1
This lecture treats the mean-variance model of portfolio choice and the equi-
librium asset-pricing model based on it, the Capital Asset Pricing Model, or
CAPM. Think of it as some ‘pre-history’ for the more modern approaches we
will focus on for most of the course.
The mean-variance approach to portfolio choice—which emphasizes the re-
duction of risk by taking into account the covariances among asset payoffs—
took a long time to emerge. Markowitz, in a history of portfolio choice [Mar99],
notes that the theory had few real precursors. The few authors who did con-
sider the problem of investment in risky assets—like Hicks [Hic35]—seemed
to believe a version of the Law of Large Numbers meant all risk could be di-
versified away by investing in a large enough range of assets.
The insight of Markowitz [Mar52] was to recognize that asset payoffs were
typically correlated and that no simple diversification could eliminate all risk.
Rather, rational investors—if they care about the mean and variance of their
final wealth—should take account of the covariance among asset payoffs in
structuring their portfolios, so as to minimize the risk associated with any ex-
pected payoff. After Markowitz, it became clear that the risk of any asset was
not a feature of the asset in isolation, but depended on how the asset’s payoffs
impacted the variance of the investor’s portfolio. That, in turn, depends not
just on the variance of the asset’s payoff, but the covariances with other asset’s
payoffs.
The CAPM—the Capital Asset Pricing Model, developed independently by
Sharpe [Sha64], Lintner [Lin65] and a couple others—followed very directly
from Markowitz’s model of portfolio choice. Like any model of equilibrium
prices, it combines demands (Markowitz portfolios in this case) with supplies
(exogenous supplies of risky assets) to determine prices. According to the
9
1.1. BASIC IDEAS LECTURE 1. MEAN-VARIANCE MODEL
CAPM, an asset’s price depends on the covariance of its payoffs with an aggre-
gate payoff—that of the market portfolio. That role of covariance between in-
dividual asset payoffs and some aggregate is a feature that will carry over into
all the more modern models we will examine throughout the course, though
the aggregate will take different forms (like the marginal utility of consump-
tion or wealth). In fact, once we’ve developed the more modern approach in
the next lecture—which is based on stochastic discount factors, state prices or
risk neutral probabilities—we’ll reinterpret the CAPM as a stochastic discount
factor model.
Before getting into the mean-variance model, though, we’ll try to introduce
some basic ideas through a series of simple examples, beginning with present
discounted values and working our way to the role of covariance in pricing
assets.
E(y) x
p̂ < = = p.
1+r 1+r
10
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
Let’s now add another risky asset, one whose payoffs (call them z) are in a
sense opposite to those of the last asset, z(1) = 0 and z(2) = 2x. We’ll also add
another source of uncertainty that will give us some clue as to how y and z will
be priced relative to one another: suppose that agents also receive endowments
tomorrow, the endowments (call them e) are the same for all agents, and e :
S → R with e(1) ≫ e(2) (endowments if state 1 occurs are much larger than
if state 2 occurs). In this case, even though both assets have expected payoffs
equal to x, the z asset—which pays off in the state where endowments are
low—will be regarded as more valuable by agents. Hence, we’d expect the z
asset’s price—call it p̃—to exceed the price of the y asset, p̂.
We can say one more thing. Note that a portfolio consisting of one-half unit
of the y asset and one-half unit of the z asset exactly replicates the payoffs of
the riskless x asset: (1/2)y(s) + (1/2)z(s) = x for s = 1 or s = 2. An arbitrage
argument thus implies
(1/2) p̂ + (1/2) p̃ = p
Note that an implication of the last equation (and the result that p̃ > p̂) is that
p̃ > p > p̂—the risky z asset is more valuable than the riskless x asset. Even
though it’s payoff is uncertain, the z asset’s price exceeds its expected present
value because it covaries with the agents’ endowments in a way that hedges
their endowment risk.
These are very simple examples, but they illustrate a few concepts that
will be important throughout these lectures—the roles played in the pricing
of assets by arbitrage (precisely, the absence of arbitrage opportunities), risk
aversion (more generally, curvature of the marginal utility of consumption or
wealth), and covariances (between individual asset payoffs and more aggre-
gate sources of uncertainty).
such A is non-singular.
11
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
ωi,j = E[( xi − µi )( x j − µ j )]
x ∈ R++N .
We’ve already stated that there’s no riskless asset, but even if we hadn’t,
the assumption that Ω is positive definite would rule that out. A riskless asset
(say the ith) would have ωi,i = 0, in fact ωi,j = 0 for all j. If u is a vector with
ui > 0 and u j = 0 for j 6= i, then u 6= 0 and u> Ωu = (ui )2 ωi,i = 0. As we’ll see
below, once we define a portfolio, assuming that Ω is positive definite implies
that there is no (non-zero) portfolio that has a zero variance.
There is also a vector of asset prices p ∈ R N , with p 0, which is known
to the investor at date 0. We’ll be a bit vague for now about what units assets,
payoffs and prices are in. Think of the payoffs as being in ‘units of account’
(eventually, they’ll be in terms of consumption); assets as being in ‘shares’; and
prices in units of account per share.
A portfolio is a z ∈ R N where zi denotes the number of shares of the ith
asset held by the investor. We do not rule out short sales—zi < 0 is feasible for
any i. The price of a portfolio z is p · z and its payoff is z · x, a random variable.
The portfolio’s expected payoff is E(z · x ) = z · E( x ) = z · µ ≡ µ(z), using the
linearity of the expectations operator. The variance of the portfolio’s payoff is
σ2 (z) ≡ E[(z · x − z · µ)2 ], which has a simple description in terms of z and Ω:
The typical investor will have some initial wealth W0 to allocate to his port-
folio,
W0 = p · z, (1.1)
and his final wealth will simply be the realized value of his portfolio
W1 = z · x.
12
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
There are no other sources of income. Thus, the mean and variance of his final
wealth will simply be the mean and variance of his portfolio’s payoff, µ(z) and
σ 2 ( z ).
In the mean-variance portfolio choice approach, the investor is assumed to
care only about the mean and variance of final wealth. Keeping variance the
same, a higher mean is preferred, while keeping mean the same, a lower vari-
ance is preferred. The investor’s mean-variance preferences may be justified in
a number of ways, such as:
• This is a primitive assumption about the investor’s utility function—
it’s simply defined as being over the mean and variance of final wealth,
v[µ(z), σ2 (z)] say.
• The investor is an expected utility maximizer, and his von Neumann-
Morgenstern utility function happens to be quadratic—e.g., v(W1 ) =
aW1 − (b/2)(W1 )2 with a, b > 0. In this case, the investor’s expected util-
ity takes the form aµ(z) − (b/2)µ(z)2 − (b/2)σ2 (z), which is decreasing
in σ2 (z) and increasing in µ(z) so long as µ(z) < a/b.
• The investor has a utility function defined over the distribution of final
wealth, but payoffs are normally distributed. Because the normal distri-
bution is completely characterized by its first two moments, under some
regularity conditions the investor’s utility function will have a represen-
tation of the form v[µ(z), σ2 (z)].
We will assume investor preferences simply have the form v[µ(z), σ2 (z)],
with v increasing in its first argument and decreasing in its second. For the
main result of this sub-section, we don’t need to be any more concrete than
that. The typical investor’s problem will be
13
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
subject to
µ(z) = m
W0 = p · z
Using the matrix results from above, we can re-write problem (1.3) in the
following form:
min z> Ωz (1.4)
z
subject to
z·µ = m (1.5)
W0 = p · z (1.6)
Note that this problem will only be interesting if the number of assets N ex-
ceeds 2. The reason is that, apart from the case where µ and p are collinear, the
set of z that satisfy the two constraints (1.5) and (1.6) is a subset of dimension
N − 2, which in R2 is a singelton—i.e., a single portfolio z will satisfy both con-
straints, and we’re left with really no problem to solve. Henceforth, we assume
N ≥ 3.
To solve the problem, form the Lagrangian
1
L = − z> Ωz + λ(z> µ − m) + π (W0 − z> p),
2
where the Lagrange mutlipliers are λ, π ∈ R. The 1/2 in front of the objective
doesn’t alter the solution and saves us from having some 2’s floating around in
the first-order conditions. The minus sign is there because I tend to write any
optimization problem as a maximization problem—I find it economizes on the
number of optimality conditions one needs to remember.
Differentiating L with respect to the vector z gives the first-order condition
−Ωz + λµ − π p = 0
14
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
Note, too, that the efficient portfolio only depends on m and W0 through the
scalars λ and π. For a given W0 —which is to say, for a given investor—as we
vary m, λ and π will vary, but the variance-minimzing portfolio will always
be a linear combination of the same two vectors, Ω−1 µ and −Ω−1 p. The same
holds true as we vary W0 , which is in effect varying investors—all investors
choose a linear combination of these same two vectors.
Since Ω−1 µ and Ω−1 p are vectors in R N , we can think of them as portfolios
(or funds). In this environment, all investors’ asset demands could be met by
two mutual funds, one offering the portfolio Ω−1 µ and the other offering the
portfolio −Ω−1 p. Given what we’ve discussed above regarding equation (1.7),
we may state:
Theorem 1.1 (The two-fund theorem). In the model of this section—no riskless
asset, N ≥ 3 assets, variance-covariance matrix Ω positive definite—there exist two
portfolios such that any mean-variance efficient portfolio can be written as a linear
combination of those two portfolios.
We can actually say a bit more by solving for λ and π. This would be
tedious algebra, but Exercize 1.1 asks you to show the following, by taking
account of the constraints (1.5) and (1.6): at a solution to problem (1.4), the
Lagrange multipliers λ and π are linear in W0 and m. That is, there are coef-
ficients aλ , aπ , bλ , and bπ —real numbers that themselves do not depend on W0
and m—such that
λ = aλ W0 + bλ m (1.8)
π = aπ W0 + bπ m (1.9)
Exercize 1.1 (Linearity of λ and π). Combine the expression (1.7) for z with
the constraints (1.5) and (1.6) to prove the claim from the previous paragraph.
You can make this exercize less tedious by doing as much of it in matrix form as
possible, using expressions like µ> Ω−1 µ, µ> Ω−1 p, and p> Ω−1 p.
Given (1.8) and (1.9), we may re-write the expression (1.7) for a mean-
variance efficient portfolio as follows:
z = λΩ−1 µ − πΩ−1 p
= ( aλ W0 + bλ m)Ω−1 µ − ( aπ W0 + bπ m)Ω−1 p
= W0 ( aλ Ω−1 µ − aπ Ω−1 p) + m(bλ Ω−1 µ − bπ Ω−1 p) (1.10)
15
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
Let’s think now of many investors, assuming all of them have mean-variance
preferences. Our work thus far has not identified the particular portfolio a
utility-maximizing investor would choose, since the expected payoff m is it-
self a choice in the investor’s utility-maximization problem—recall (1.2). We’ve
only identified a family of portfolios to which any solution must belong—those
that, for a given m have the lowest variance. The investor’s preferences—the
trade-off they make between additional mean payoff and additional variance—
together with their initial wealth, will determine their choice of m. Without
modeling that problem, let’s simply let mi denote investor i’s choice of m, W0i
his wealth, and zi his resulting portfolio choice. Then,
zi = W0i ẑ1 + mi ẑ2 . (1.12)
Remark 1.2. Note that even if all investors have identical preferences, their initial
wealth level may affect their choices of m, so different investors would still make differ-
ent choices. This would be the case if their preferences don’t display constant relative
risk aversion—for example if their preferences are quadratic (including both first- and
second-order terms).
Define the market portfolio as the sum of all investors’ portfolios, and denote
it by z M . Summing (1.12) across all investors gives the market portfolio as
z M = (∑ W0i )ẑ1 + (∑ mi )ẑ2
i i
= W0M ẑ1 M
+ m ẑ2 (1.13)
where W M is the aggregate initial wealth of the market, and m M is the aggre-
gate expected payoff from all assets holdings in the market. Note that because
the individual zi satisfy the linear constraints p · zi = W0i and zi · µ = mi , we
have p · z M = W0M and z M · µ = m M —that is, the market portfolio satisfies the
constraints at the market initial wealth W0M and expected payoff m M . Thus,
(1.13) describes a mean-variance efficient portfolio, given W0M and m M .
We’ve thus proved the other major result of this section:
Theorem 1.2 (Efficiency of the market portfolio). In the model of this section, if
all investors have mean-variance preferences, the market portfolio is mean-variance
efficient.
Exercize 1.2 (The mean-variance frontier). This exercize asks you to use Mat-
lab to find mean-variance efficient portfolios, then plot what’s known as the mean-
variance frontier, which is normally drawn in mean-standard deviation space.
The data are for five risky assets:
1.38
1.29
µ= 1.47
1.65
1.25
16
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
and
26.63 13.63 10.86 6.15 8.55
13.63 34.46 15.26 7.59 11.01
Ω=
10.86 15.26 38.32 7.69 13.49
6.15 7.59 7.69 24.7 6.50
8.55 11.01 13.49 6.50 19.01
The price vector is just p = (1, 1, 1, 1, 1)> , so you might think of the payoffs as
gross returns—payoffs per unit of account spent on the asset.
Write some Matlab code to display two portfolios, or funds, from which all
efficient portfolios can be constructed.
Assume an investor’s initial wealth is W0 = 1 and let m vary from
the smallest expected payoff, min(µ), to the largest, max(µ), in 50 evenly-
spaced steps. You might use the Matlab command ‘linspace’: m =
linspace(min(mu),max(mu),50). Write lines that solve for the multipliers
λ and π at each of the the 50 points, calculate the efficient
√ portfolio at each of the
50 points, and calculate the portfolio standard deviation z> Ωz at each of the 50
points. You don’t need to print out all those results, just show me the portfolios
that correspond to the first and last m values, and all your code.
Finally, plot m versus the portfolio standard deviations, with standard devi-
ation on the horizontal axis. This is the mean-variance frontier (only its upper
portion is relevant to investors). It’s interesting to compare the mean-variance
frontier with the means and standard deviations of the underlying assets. You can
do this with Matlab’s hold command. After you’ve plotted the frontier, if mu and
sd are the vectors of expected payoffs and standard deviations, enter hold; then
scatter(sd,mu). Note the standard deviations are just the square roots of the
diagonal of Ω—sd = sqrt(diag(Omega)).
1
RF = .
q
17
1.2. PORTFOLIO CHOICE LECTURE 1. MEAN-VARIANCE MODEL
18
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL
Theorem 1.3 (One fund theorem). Add a riskless asset to the model of the last
subsection, and assume all investors have mean-variance preferences. Then, investors’
optimal portfolios of risky assets are all scalar multiples of one another.
Remark 1.3. One can also see this result in the structure of the Lagrangian for the
problem of finding a mean-variance efficient portfolio:
1
L = − z> Ωz + λ( R F W0 + z · (µ − R F p) − m)
2
The first-order conditions give z = λΩ−1 (µ − R F p), and λ will be the only term that
varies across investors.
Remark 1.4. While (1.19) tells us something important about the structure of the so-
lution to (1.17), it does not necessarily constitute a complete solution for z, since it does
not determine v1 /v2 , assuming this term is not a constant. Put differently, and maybe
more precisely, the first-order conditions (1.18) are N equations in N unknowns, but
not necessarily a system of linear equations.
Remark 1.5. Tobin [Tob58] proved a version of the One Fund Theorem in a 1958
paper on liquidity preference. The riskless asset in Tobin’s model is money, and Tobin
shows that investors’ risky asset portfolios are all scalar multiples of a single portfolio.
The result is thus sometimes known as the Tobin Separation Theorem.
What about demand for the riskless asset? An investor’s holdings of the
riskless asset are obscured by the fact that we’ve substituted them out of the
budget constraint. Once we solve the first-order conditions (1.18) for the risky
asset demands z, we can find the investor’s riskless asset demand by going
back to the original budget constraint, z0 = (1/q)(W0 − p · z). In any case,
the one fund theorem tells us that in a mean-variance environment with one
riskless asset, every investor’s portfolio can be represented as some amount of
the riskless asset plus holdings of one risky portfolio (or fund) common to all
investors.
19
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL
With that bit of notation, investor i’s risky portfolio has the form
1 −1
zi = Ω ( µ − R F p ). (1.20)
αi
1 −1
α M = (∑ ) .
i
αi
The key insight to determining p is that z jM , the market’s demand for shares
of asset j must, in equilibrium, equal the supply of shares in asset j, which we
take to be exogenous. Thus, we can treat z M as exogenous, and rearrange (1.21)
to get
1 α
p= µ − M Ωz M . (1.22)
RF RF
Because it involves the potentially endogenous parameter α M , a measure of
market risk aversion, equation (1.22) does not necessarily constitute a complete
solution for equilibrium prices (α M may itself depend on p). Nevertheless, we
can gain the key insights of the CAPM from (1.22).
The equation (1.22) is really N equations stacked in vector form. Let’s con-
sider the ith row—the expression for the ith asset price:
N
µi α
pi =
RF
− M
RF ∑ ωi,j z jM . (1.23)
j =1
The first term in (1.23), µi /R F , is just the present value (using the gross risk-
less rate R F ) of the ith asset’s expected payoff, µi . The term ∑ j ωi,j z jM is just
the covariance between the ith asset’s payoff, xi , and the payoff of the market
portfolio, z M · x. For simplicity, call this cov(i, M ). Then,
< µi /R F if cov(i, M ) > 0
pi = µi /R F if cov(i, M ) = 0
> µi /R F if cov(i, M) < 0
20
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL
An asset whose payoff covaries positively with that of the market portfolio
is priced at less than the expected present value of its payoff, while an asset
whose payoff covaries negatively with that of the market portfolio is priced
at more than the expected present value of its payoff—analogous to what we
saw in the very simple examples at the start of the this lecture. The middle
case—cov(i, M) = 0—corresponds to an asset whose payoff risk is purely id-
iosyncratic, in the sense of being orthogonal to the market as whole. Under the
CAPM, the price of such an asset, no matter how large its variance, is just equal to
its expected present value—which would be the value placed on the asset by a
completely risk neutral investor. To summarize:
Result 1.1 (The CAPM, price version). Under the assumptions of this section, an
asset’s equilibrium price is less than or greater than the expected present value of the
asset’s payoff, depending on whether the asset’s payoffs covary postively or negatively
with the payoffs of the market portfolio. Idiosyncratic risk is not priced.
Remark 1.6. The results for individual asset prices also hold for any arbitrary portfolio
z of the assets. The price of the portfolio, p · z, is less than, greater than, or equal to
the present value of the portfolio’s expected payoff, (z · µ)/R F , depending on whether
the covariance of the portfolio’s payoff with the market portfolio payoff, z> Ωz M , is
positive, negative, or zero.
Because this must apply as well to the market portfolio, and because (z M )> Ωz M >
0, the price of the market portfolio is necessarily less than the present value of its ex-
pected payoff.
Re = E[ R] = E[Cx ] = Cµ
V = E[( R − Re )( R − Re )> ]
= E[(Cx − Cµ)(Cx − Cµ)> ]
= CE[( x − µ)( x − µ)> ]C T
= CΩC >
21
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL
N
1 = θ0 + ∑ θ i = θ0 + θ · 1 (1.24)
i =1
The return on the investor’s portfolio of risky assets is θ · R, and the return
on his whole portfolio of assets is θ0 R F + θ · R. We can show that the investor’s
final wealth is given by
W1 = (θ0 R F + θ · R)W0 .
Expected final wealth is just initial wealth times the expected portfolio re-
turn,
where (1.25) uses the fact that θ0 = 1 − θ · 1. The last line expresses the expected
portfolio return in terms of the riskless return R F and the portfolio-weighted
excess returns on the risky assets, Rie − R F .
The variance of final wealth is just (W0 )2 times the variance of the portfolio
of risky returns:
22
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL
At this point, we could plug (1.25) and (1.26) into the investor’s utility func-
tion, maximize with respect to θ, and retrace all the steps that led to our equi-
librium price expression (1.22). Rather than take that long route to the return
version of the CAPM, we’ll begin with (1.20) and apply the transformation ma-
trix C. Recall that (1.20) was an expression describing an investor’s portfolio of
risky assets:
1
z i = Ω −1 ( µ − R F p )
αi
Pre-multiply both sides by C −1 to get
1 −1 −1
C −1 z i = C Ω (µ − R F p)
αi
1
= C −1 Ω −1 C −1 C ( µ − R F p )
αi
1
= C −1 Ω−1 C −1 (Cµ − R F Cp)
αi
1 −1 e
θ i W0i = V ( R − R F 1)
αi
W0i i
θM = ∑( W0M
)θ ,
i
1 −1 e
θ M W0M = V ( R − R F 1).
αM
Rearranging gives
Re − R F 1 = α M W0M Vθ M . (1.27)
As before, we view the market portfolio θ M (and aggregate market initial wealth
W0M ) as exogenous data, dictated by the supplies of the N risky assets. Note
that the ith row of Vθ M is the covariance of the ith asset return with the return
θ M · R on the market portfolio. In a slight abuse of our previous notation, let’s
call this magnitude cov(i, M ). We then obtain the following analogue to Result
1.1:
Result 1.2 (The CAPM, return version). Under the assumptions of this section, an
asset’s equilibrium expected excess return is either positive (Rie − R F > 0) or negative
23
1.3. THE CAPM LECTURE 1. MEAN-VARIANCE MODEL
(Rie − R F < 0), depending on whether the asset’s return covaries postively or neg-
atively with the return on the market portfolio. Idiosyncratic risk is not priced—an
asset whose return has a zero covariance with the return on the market portfolio has a
zero expected excess return.
Early in the history of the CAPM, practitioners viewed the market risk pa-
rameter α M as the only unobservable in (1.27); later, doubts were raised about
whether the market portfolio was even observable. In any event, a version of
the CAPM that eliminated α M was seen as desirable. The result—the ‘beta’
form of the CAPM—eliminated α M at the expense of making the CAPM a
purely relative theory of equilibrium returns. This is the form in which the
CAPM is most commonly expressed.
To arrive at it, note that (1.27) has implications for the excess returns on
portfolios, including the market portfolio. The excess return on the market
portfolio is (θ M )> ( Re − R F 1). Pre-multiply both sides of (1.27) by (θ M )> to get
(θ M )> ( Re − R F 1) = α M W0M (θ M )> Vθ M . (1.28)
(θ M )> Vθ M is simply the variance of the return on the market portfolio, which
we denote by var( M ). Equation (1.28) then gives
1
α M W0M = ( θ M ) > ( R e − R F 1) (1.29)
var( M)
Now, plug (1.29) into (1.27) to get the following expression for the equilibrium
expected excess return on the ith asset:
cov(i, M ) M > e
Rie − R F = ( θ ) ( R − R F 1)
var( M )
≡ β i ( θ M ) > ( R e − R F 1) (1.30)
Result 1.3 (The CAPM, beta version). Under the assumptions of this section, the
equilibrium expected excess return on any asset i depends only on the expected excess
return on the market portfolio and the asset’s beta, β i = cov(i, M )/var( M ). In
particular, the only source of systematic variation in the asset’s expected excess return
is variation in the expected excess return on the market portfolio, and the only source
of variation in expected excess returns across assets is variation in their betas.
Remark 1.7. Analogous to our point in Remark 1.6, note that from (1.28), since
var( M ) > 0, the market portfolio earns a positive expected excess return.
Exercize 1.3 (Calculating betas). For this exercize, using the same data from
Exercize 1.2, but interpret the µ from that exercize as Re and the Ω as V. In
Exercize 1.2 you calculated 50 efficient portfolios; assume the 25th portfolio which
you obtained there is the market portfolio θ M here. Write some Matlab code to
calculate the betas for the 5 assets. Report the betas and turn in the code you
wrote.
24
Lecture 2
The common structure of almost all the models we’ll look at throughout the
course is summarized in the relation
p = E(mx ) (2.1)
1
p= Ê( x ) (2.2)
RF
That is, every asset is priced according to the present value of its expected
payoff using the risk-neutral measure.
1 In finance, m is often referred to as a state-price density.
25
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY
We’re going to take an indirect (but standard) route to equations (2.1) and
(2.2), one that begins with the concept of state prices—loosely, a set of prices
(distinct from asset prices) that value a marginal unit of wealth (or consump-
tion or unit of account) in each possible state of the world. A key result—called
the Fundamental Theorem of Asset Pricing—is going to link absence of arbi-
trage with the existence of state prices. A further result—the so-called ‘Repre-
sentation Theorem’—is going prove a type of equivalence between state prices,
risk neutral probabilities, and stochastic discount factors.
The material in this lecture loosely follows the presentation in Duffie [Duf92].
Another good reference is John Cochrane’s book [Coc01]. Dybvig and Ross’s
entry on “Arbitrage” in The New Palgrave: Finance [DR89] and the first chapter
of Ross’s book [Ros04] are also good sources for some of this material, espe-
cially the two theorems.
E( X ) ≡ Xπ. (2.3)
26
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY
Definition 2.2 (Complete asset markets). Asset markets are complete if for any
y ∈ RS there is a portfolio z ∈ R N with X > z = y. Since linear combinations of
portfolios are again portfolios, we can also state market completeness as: there exist S
portfolios, zi , i = 1, 2, . . . S, such that their payoff vectors { X > zi : i = 1, . . . S} form
a basis for RS .
We’ll often assume there are no redundant assets. We may or may not as-
sume markets are complete. The following easy exercize is to help make sure
you understand the concepts of redundancy and completeness.
Exercize 2.1. Suppose there are two assets and three states. The payoff matrix is
1 1 1
X=
0 2 0
Show that asset markets are incomplete, the simple way, by showing a y ∈ R3
that can’t be attained by any portfolio z ∈ R2 . Now, imagine adding a third asset,
and consider two cases. First, add an asset that would be redundant, given the
first two. Make sure to demonstrate that it is redundant. Second, add a third asset
that completes the market. Demonstrate that it completes the market by showing
portfolios z1 , z2 and z3 that attain the basis vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1).
2.1.2 Investors
There will be H investors, indexed by h = 1, 2, . . . H. Investors (potentially)
differ in their preferences, initial wealth, and/or endowments. Investor h’s
preferences, in their most general form, will be described by a utility function
U h : R × RS → R. The arguments are either consumption values or units of
account, as the context dictates.
In the most general case, we’ll write U h (c0 , c1 ) as shorthand for
In some cases, we may assume investors only care about final wealth or
consumption: U h (c1 ). In any case—whether preferences are over (c0 , c1 ) or
just c1 —we assume that investors’ utilities are continuous (or at least upper
semicontinuous). We’ll also assume investors’ utilities are strictly increasing in
all their arguments—they prefer more to less.
27
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY
Investors will have initial wealth W0h , and may receive a stochastic endow-
ment eh = (e1h , e2h , . . . eSh ) in period one. A feasible portfolio for investor h will
obey the budget constraint
W0h ≥ p · z. (2.5)
If they receive an endowment in period one, then the utility from their choice
of portfolio will be
U h (W0h − p · z, eh + X > z). (2.6)
where (2.6) makes use of the fact that, when investor’s prefer more to less, (2.5)
holds as an equality.
Note that the budget set
28
2.1. BASIC STRUCTURE LECTURE 2. BASIC SDF THEORY
Remark 2.1. The existence of state prices is separate from the concept of asset market
completeness. We can have a state price vector that tells us the value of a claim to one
unit of account in state s (and only state s) for each s, even when a set of S assets with
those payoffs—which would constitute a basis for RS —does not exist. Whether or not
markets are complete does matter for the uniqueness of a state price vector.
An arbitrage, or arbitrage portfolio, is—in words—a portfolio that either (a)
costs nothing today, has a positive payoff in at least some states tomorrow, and
no chance of a negative payoff in any state, or (b) costs less than nothing today
(it has a negative price), and has a nonnegative payoff in every state tomorrow.
Formally:
Definition 2.3 (Arbitrage). z is an arbitrage if either of the following two conditions
hold:
p · z ≤ 0 and X > z > 0 (2.9)
or
p · z < 0 and X > z ≥ 0 (2.10)
We say there is no arbitrage if no such z exists.
Remark 2.2. The two conditions in Definition 2.3 can be combined into one if we
stack − p> (which is 1 × N) and X > (which is S × N) into a single (S + 1) × N
matrix. Then, (2.9) and (2.10) can be stated as: an arbitrage is a portfolio z such that
>
−p
z>0 (2.11)
X>
where the matrix product on the left is S + 1 × 1, and you’ll recall that for vectors
w > 0 means all wi ≥ 0 and at least some wi > 0. There is no arbitrage if no such z
exists.
Since if z is a portfolio, so is αz for any real number α, an arbitrage (if one
exists) can be run at any scale. Note that the presence or absence of arbitrage is
a feature of asset prices and payoffs together.
Example 2.1. Suppose asset i is redundant: there is a portfolio z ∈ R N with zi = 0
and ( xi,1 , xi,2 , . . . xi,S ) = z> X. If pi 6= p · z, there is an arbitrage opportunity. To see
this, suppose pi > p · z, and consider the portfolio z − ei , where ei is the ith unit basis
vector in R N . This portfolio corresponds to shorting one unit of asset i and buying the
replicating portfolio z. Then,
29
2.2. THE FUNDAMENTAL THEOREM LECTURE 2. BASIC SDF THEORY
so the payoffs in all states tomorrow are all zero. But, the cost of the portfolio is p ·
(z − ei ) = p · z − pi < 0, which therefore presents an arbitrage opportunity. For the
case of pi < p · z, just consider the portolio −z + ei .
As the next section shows, arbitrage, asset prices and the existence state
prices are intimately linked by a basic result in asset pricing. Once we have
state prices, it will be easy to construct stochastic discount factors and risk
neutral probabilities.
Exercize 2.2. Suppose that there are two assets ( N = 2) and three states (S =
3). The payoff matrix is
1 1 1
X=
0 2 0
Characterize the set of prices vectors p = ( p1 , p2 ) that are consistent with there
being no arbitrage opportunities. Show all the steps in your reasoning. Hint:
There is one inequality that p1 and p2 need to obey for there to be no arbitrage.
2. There exists a strictly positive state price vector ψ that correctly prices assets—
that is, there is a ψ 0 satisfying
p = Xψ, (2.12)
30
2.2. THE FUNDAMENTAL THEOREM LECTURE 2. BASIC SDF THEORY
P ROOF. Condition (3) of the theorem clearly implies condition (1)—if some
investor who prefers more to less has an optimal choice, there must be no arbi-
trage. If we can show that (1) implies (2), and (2) implies (3), we’ll be done.
So, let’s begin with (1) implies (2). Assume then that (1) holds: there is
no z ∈ R N satisfying (2.11). We’ll use a separating hyperplane argument to
contstruct a ψ that satisfies (2). Let’s first define what we mean by a hyperplane
in Rn :
Definition 2.4 (Hyperplanes). A hyperplane in Rn is a set of the form H ( p, α) =
{ x ∈ Rn : p · x = α}, for a given p ∈ Rn and α ∈ R. Associated with any hyperplane
are two closed half-spaces, H ( p, α)− = { x ∈ Rn : p · x ≤ α} and H ( p, α)+ =
{ x ∈ Rn : p · x ≥ α}. The open half-spaces H ( p, α)−− and H ( p, α)++ are defined
similarly, but with strict inequalities. Two sets A, B ⊂ Rn are said to be separated by
a hyperplane H ( p, α) if one of the sets is contained wholly in H ( p, α)− and the other
set is contained wholly in H ( p, α)+ . The sets are said to be strongly separated if one of
them is actually wholly contained in one of H ( p, a)’s open half-spaces. A hyperplane
H ( p, α) is said to support a set A at a point y ∈ A if y ∈ H ( p, α) and A is wholly
contained in either H ( p, α)− or H ( p, α)+ .
Keeping that definition in mind, and referring back to equation (2.11), let
>
−p
M=
X>
31
2.2. THE FUNDAMENTAL THEOREM LECTURE 2. BASIC SDF THEORY
Essentially, Karlin’s result is saying that there is a hyperplane H (φ, α), with
φ 0, α = 0, that separates the cone V from the nonnegative orthant.
In our case, using Karlin’s result, if there is no arbitrage, there is a strictly
positive φ ∈ RS+1 satisfying φ · y ≤ 0 for all y ∈ K = {y ∈ RS+1 : y =
Mz for z ∈ R N }. Since our K is a linear subspace, y ∈ K implies −y ∈ K. This
means φ must in fact satisfy φ · y = 0 for all y ∈ K, since φ · y < 0 for any y ∈ K
would mean φ · (−y) > 0, contradicting the fact that, since −y ∈ K, it should
obey φ · (−y) ≤ 0.
What does φ · y = 0 (∀y ∈ K ) mean? Using the definition of K, we must
have, ∀z ∈ R N ,
0 = φ · ( Mz)
= (φ> M )z
which is only possible—if you think about it, since it must hold for every choice
of z—if φ> M = 0.
Since φ ∈ RS+1 , we may abuse notation slightly and write it as (φ0 , φ1 , φ2 , . . . φS ).
Then, using the definition of M, and the rules for products of transposes,
φ0
φ1 φ1
φ2
0 = − p X φ2 = − pφ0 + X .
.. ..
.
φS
φS
Now, let ψ ≡ (1/φ0 )(φ1 , φ2 , . . . φS )—which is feasible since φ0 > 0—and rear-
range to obtain
p = Xψ
which completes the proof that no arbitrage (condition 1 of the Theorem) im-
plies the existence of a state price vector (condition 2).
Now, for (2) implies (3). This part is easy: we just construct a hypothet-
ical investor whose marginal utilities of consumption in each of the S states
tomorrow are given by the state price vector ψ = (ψ1 , . . . ψS ). For example,
define
32
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY
arise. When markets are incomplete, there are fewer than S linearly indepen-
dent asset—i.e., X has less than S linearly independent rows, so p = Xψ repre-
sents fewer than S independent equations in the S variables (ψ1 , ψ2 , . . . ψS ).
Suppose, for example, that the N assets have linearly independent payoff
vectors (no redundant assets), but that N < S—fewer assets than states. Then,
the rank of X is N, and the set of solutions to the linear system p = Xη, that is,
the set {η ∈ RS : Xη = p}, has dimension N − S > 0.4
Exercize 2.3. Using the same X from exercize 2.2, and assuming p is any p
satisfying the condition you derived in exercize 2.2, characterize the set of possible
state price vectors ψ satisfying p = Xψ.
Result 2.2. Suppose there is a strictly positive state price vector ψ which correctly
prices assets ( p = Xψ). If asset markets are complete, then ψ is the unique state price
vector.
P ROOF. An easy way to show this is to note that when markets are com-
plete, for each unit basis vector es in RS there is a portfolio zs such that X > zs =
es . Since ψ correctly prices assets, we must have
p · zs = ψ · es = ψs (2.13)
for all s = 1, 2, . . . S. Since these equations must hold for any state price vector
that correctly prices assets, state prices are uniquely determined.5
33
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY
Before we can state it, though, it will be useful to have some notation that
represents, for v, w ∈ RS , the vector (v1 w1 , v2 w2 , . . . vS wS ) ∈ RS . Let vw—
with no ‘·’, no >—denote this vector. Recall that π = (π1 , π2 . . . πS ) 0 are
probabilities over the S possible states. Then,
S
E(vw) = ∑ πs vs ws .
s =1
Now, suppose we have a positive state price vector ψ that correctly prices
assets. Let ms = ψs /πs > 0, which is feasible since all states have positive
probability. Then, for any asset i,
S
pi = ∑ xi,s ψs (2.14)
s =1
S
= ∑ πs (ψs /πs )xi,s (2.15)
s =1
S
= ∑ πs ms xi,s (2.16)
s =1
= E(mxi ) (2.17)
In the last line, xi denotes the ith row of X. The equation shows that there is an
m ∈ RS , m 0, such that pi = E(mxi ) for any asset i. By the linearity of ex-
pectations, this is also the case for any portfolio z of assets: p · z = E[m( X > z)].
Conversely, suppose there is an m 0 such that pi = E(mxi ) holds for
any asset i, and that all states have positive probability. Then ψ defined by
ψs = πs ms (∀s ∈ S) is a strictly positive state price vector that correctly prices
assets. This is the first part of the representation theorem—there is a state price
vector ψ 0 that correctly prices assets if and only if there is a random variable
m 0 such that pi = E(mxi ) for every asset i.
For the next part of the result—which relates to risk-neutral probabilities—
we need another bit of notation, and a risk-free asset. Since we’ll be considering
expectations taken with respect to other probabilities (the risk-neutral proba-
bilities that we’ll be constructing), we need some notation for expectations that
indicates the probabilities we’re using. So, for any vector y ∈ RS and any
probabilities6 φ ∈ RS , let Eφ denote the expectation of y with respect to φ:
S
Eφ ( y ) = ∑ φs ys
s =1
With that notation, we can write expectations with respect to the true, or objec-
tive, probabilities π as Eπ ( · ). We may also write E( · ) for expectations with
respect to π when there is no chance for confusion.
6φ ∈ RS are probabilities if φ ≥ 0 and ∑s φs = 1.
34
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY
This part of the result also assumes that we have a risk-free asset. Let q
denote the risk-free asset’s price and 1 = (1, 1, . . . 1), a row of ones, its payoff
vector. I want to keep this vector distinct from the payoffs in X, but at the same
time, statements of the form ‘prices assets correctly’ or ‘for all assets’ should be
understood as applying not just to p and X, but to
q
p
and
1
X
Now, suppose there is a stochastic discount factor m 0 that prices assets
according to pi = Eπ (mxi ) and q = Eπ (m1). Note that Eπ (m1) = Eπ (m), so
1 1
= = RF
Eπ ( m ) q
the gross risk-free rate of return, as we defined it in Lecture 1. Note, too, that
the vector
1 1
φ≡ πm = ( π m , π2 m2 , . . . π S m S )
Eπ ( m ) Eπ ( m ) 1 1
obeys φ ≥ 0 and ∑s φs = 1. Thus, φ is a probability measure on S . Now, divide
both sides of pi = E(mxi ) by q = E(m), and use the above considerations to
obtain
1 1
p = E(mxi )
q i E( m )
1
= E( mx )
E( m ) i
πs ms
=∑ xi,s
s ∑r πr mr
= ∑ φs xi,s
s
= Eφ ( x i )
Using, q = 1/R F , we may write this more suggestively as saying that for
any asset i,
1
pi = Eφ ( x i )
RF
—that is, i’s price is just the discounted present value of its expected payoff
under the probabilities φ.
Conversely, suppose there are probabilities φ such that the last equation
holds for all assets i. We can go in the other direction to derive a stochastic
35
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY
pi = q ∑ φs xi,s
s
qφs
= ∑ πs ( )x
s πs i,s
= ∑ πs ms xi,s
s
= Eπ (mxi )
p = Xψ.
3. There are strictly positive probabilities φ that price assets according to the risk-
neutral pricing formula
1
pi = Eφ ( x i ).
RF
Remark 2.3. Since the SDF representation (2) holds for any asset—including a risk-
less asset with payoff vector 1—if q denotes the riskless asset’s price, then (2) gives us
q = E(m1) = E(m). Letting R F = 1/q, we also have R F = 1/E(m).
Remark 2.4. The SDF representation also has a covariance interpretation. For any
asset i, let cov( xi , m) denote the covariance between asset i’s payoff xi and the SDF
m—i.e., cov( xi , m) = E(mxi ) − E(m)E( xi ). With this notation, and using the
previous remark, the SDF representation can be written as
pi = E(mxi )
= E(m)E( xi ) + cov( xi , m)
E( x i )
= + cov( xi , m)
RF
That is, the price of asset i is the discounted present value of its expected payoff plus the
covariance of its payoff with the SDF. Note the analogy to equation (1.23) from Lecture
1.
36
2.3. REPRESENTATION THEOREM LECTURE 2. BASIC SDF THEORY
Exercize 2.4. Suppose you observe an economy with two states and two assets.
The price vector you observe is p = (1, 1), and the asset payoffs are
1.008 1.008
X=
0.905 1.235
so the first asset is riskless and the second risky. The probabilities of the two states
are π = ( 12 , 12 ). Write a little M ATLAB code to find state prices ψ that correctly
price the assets; the stochastic discount factor m; and the risk neutral probabilities
φ. Turn in your results and your code.
1 = E(mRi )
= cov( Ri , m) + E( Ri )E(m)
1
= cov( Ri , m) + E( Ri )
RF
E( Ri ) − R F = − R F cov( Ri , m) (2.19)
That is, the expected excess return on asset i depends negatively on the covari-
ance between the return on asset i and the stochastic discount factor.
Thinking about equation (2.19), suppose there is an asset whose return, call
it Rm , is perfectly correlated with m. To keep things simple, suppose Rm = λm
37
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY
for some real scalar λ. Applying (2.19) to that asset’s return gives
E( Rm ) − R F = − R F cov( Rm , m)
R
= − F cov( Rm , Rm )
λ
RF
= − var( Rm ) (2.20)
λ
Combining (2.19) and (2.20)—and using cov( Ri , m) = (1/λ)cov( Ri , Rm )—we
obtain the beta representation
E( R i ) − R F = β i (E( R m ) − R F ) (2.21)
Eφ ( R i ) = R F (2.22)
—that is, the expected return on any asset, under the risk-neutral probability
measure, is simply the riskless rate of return. Note that (2.22) implies that we
can write i’s return Ri (that is, the random variable, not the expectation) as
Ri = R F + vi
where vi is a random variable that has a zero expectation under the probabili-
ties φ.
The utility of either the stochastic discount factor representation or the risk-
neutral probability representation is that, once we know m or φ in a given econ-
omy (assuming they are unique), we can price all assets by these very simple
formulæ.
38
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY
39
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY
max{U h (c0 , c1 ) : W0 + ψ · e = c0 + ψ · c1 }
c0 ,c1
Exercize 2.6 (Reinterpreting the CAPM). Doing this exercize with many in-
vestors would present a number of complications that would obscure the point, so
suppose there is a single representative investor. His utility function over (c0 , c1 )
is
S
b
U (c0 , c1 ) = c0 + ∑ πs ( ac1,s − (c1,s )2 ).
s =1
2
In equilibrium he must hold an exogenous supply of assets, so you can treat his
period-one consumption vector (in equilibrium) as some exogenous c1∗ —i.e., equi-
librium prices are such that choosing c1∗ is optimal. Derive an expression for the
state price vector, and use it to show that, for any asset i
E( x i )
pi = − bcov(c1∗ , xi ),
RF
where cov(c1∗ , xi ) = E(c1∗ xi ) − E(c1∗ )E( xi ). Hint: Also use the state price vector
to price the riskless asset with price q and payoff 1.
For the remainder of the discussion, let’s specialize preferences to the time-
separable, expected utility form,
S
U h ( c0 , c1 ) = u h ( c0 ) + β ∑ πs uh (c1,s ).
s =1
Then,
1 h ∗ ∗
π1 u0h (c1,1
∗ ) π2 u0h (c1,2
∗ ) πS u0h (c1,S
∗ )
U ( c
1 0 1 , c ) = ( β , β , . . . β ) (2.23)
U0h (c0∗ , c1∗ ) u0h (c0∗ ) u0h (c0∗ ) u0h (c0∗ )
40
2.4. UTILITY & SDFS LECTURE 2. BASIC SDF THEORY
is the form of the state price vector associated with the optimal choice (c0∗ , c1∗ ).
With (2.23) as the form of the state price vector, for any asset i we have
S πs u0h (c1,s
∗ )
pi = ∑ β
u0h (c0∗ )
xi,s
s =1
S
= ∑ πs ms xi,s
s =1
= E(mxi )
where
u0h (c1,1
∗ ) u0h (c1,2
∗ ) u0h (c1,S
∗ )
m = (β ,β ,...β ) (2.24)
u0h (c0∗ ) u0h (c0∗ ) u0h (c0∗ )
is the form taken by the stochastic discount factor under time-separable ex-
pected utility. In a slight abuse of notation, let βu0h (c1∗ )/u0h (c0∗ ) stand for the
whole vector of marginal rates of substitution in (2.24). We then get the famil-
iar form 0 ∗
u (c )
pi = E β 0h 1∗ xi .
u h ( c0 )
And, in terms of excess returns and covariances, we have
u0h (c1∗ )
E( Ri ) − R F = − R F cov( Ri , m) = − R F cov( Ri , β ) (2.25)
u0h (c0∗ )
Equations such as the last two will play a prominent role in the next lecture,
when we get to Lucas’s model and, after that, the equity premium puzzle.
Next, we can derive the representation of pricing under risk neutral prob-
abilities when we take into account investor preferences. As above, we can go
from a stochastic discount factor to risk neutral probabilities using
πs ms
∑ πs ms xi,s = E(m) ∑ E(m) xi,s
s s
1
RF ∑
= φs xi,s
s
1
= Eφ ( x i )
RF
where φ = πm/E(m), and we’ve used E(m) = 1/R F . The risk neutral proba-
bilities under expected utility then have the form
1 u0h (c1,1
∗ ) u0h (c1,2
∗ ) u0h (c1,S
∗ )
φ= ( π 1 β , π 2 β , . . . π S β ) (2.26)
E( βu0h (c1∗ )/u0h (c0∗ )) u0h (c0∗ ) u0h (c0∗ ) u0h (c0∗ )
Finally, note that the gross risk-free rate, under expected utility, obeys
1
RF = (2.27)
E( βu0h (c1∗ )/u0h (c0∗ ))
41
Lecture 3
In this Lecture, we move beyond two periods to consider asset prices in economies
with an infinite time horizon. Lucas’s 1978 model [Luc78] is the seminal piece
of work in this vein—almost all developments since can be thought of as adding
particular bells and whistles to Lucas’s simple framework.
Mehra and Prescott [MP85]—and the ‘equity premium puzzle’ they uncov-
ered using Lucas’s framework—provided much of the impetus for adding bells
and whistles to the model.
The longer time horizon creates a few wrinkles in our approach, compared
to the two-period framework in which we studied the basic theory.
For one, whereas in our two-period model asset payoffs and returns and the
stochastic discount factor were random variables (and prices were just values
to be determined at the initial date), in these models (and all that we’ll look at
after them), payoffs, returns, prices, and stochastic discount factors will all be
stochastic processes.
What happens to representations like p = E[mx ] = (1/R F )Eφ [ x ]? We’re
going to take it for granted—without going through all the mathematical niceties—
that these translate in a straightforward way. The stochastic discount factor
representation, for example, will become
p t = Et [ m t +1 x t +1 ] ,
42
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
d t +1 + p t +1
R t +1 = ,
pt
Note that the last expression gives us a form of discounted present value. To
see it, update (3.1) one period (to give an expression for pt+1 ), then substitute
this into the right-hand side of (3.1):
In arriving at the last line, we used the Law of Iterated Expectations, one state-
ment of which is Et [Et+1 [yt+2 ]] = Et [yt+2 ].1 In any case, you can probably see
the pattern that’s developing. Let
Then, assuming
lim Et [ρt,t+k pt+k = 0]
k→∞
we obtain
∞
pt = ∑ Et [ρt,t+k dt+k ] . (3.2)
k =1
43
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
p t = Et [ F ∗ ],
where F ∗ is the asset’s fundamental value (which must come out of a model
of price formation). If the expectation is conditioned on all publicly available
information at t, and information accrues over time, then the expectation of
pt+1 must obey
Et [ pt+1 ] = Et [Et+1 [ F ∗ ]]
= Et [ F ∗ ]
= pt
The second line uses the Law of Iterated Expectations. Take away the expecta-
tion, and one gets
44
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
Around the same time as Lucas’s paper, Breeden [Bre79] formulated the
closely related Consumption-Based CAPM.3 In that model, an asset’s excess
return can be written in single beta form as
Rie − R F = βCi [ RC
e
− R F ],
where RC e is the expected return on an asset whose return equals the consump-
tion growth rate (so RC e equals expected consumption growth), and βC is asset
i
i’s beta with respect to asset C—i.e., βCi = cov( Ri , RC )/var( RC ).
Something very similar emerges from Lucas’s model.
Aggregate output is the sum of all the fruit produced, and this must equal
aggregate consumption, since the good is nonstorable:
n
∑ yit = ct .
i =1
z = (1, 1, . . . 1) = 1.
Consumption_CAPM_Theory.pdf.
45
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
The agent’s portfolio at the start of period t is zt = (z1t , z2t , . . . znt ), describ-
ing his holdings of shares of the n assets. This gives him resources at the start
of t equal to
n
∑ zit (yit + pit ) = zt · (yt + pt )
i =1
where pit is the ex dividend price of asset i (its price immediately after it pays its
dividend).
The agent spends his resources on consumption (ct ) and asset holdings for
next period ( pt · zt+1 ). Thus, he faces the sequence of budget constraints
z t · ( y t + p t ) ≥ c t + p t · z t +1
Note that since there are no adjustment costs or transactions costs, the agent
is indifferent between, say, holding an asset in both t and t + 1 and selling it at
the start of t just to buy it again to hold until t + 1. So, the budget constraint is
written as if the agent sells his whole portfolio at the start of the period, then
uses that value (plus his dividend payments) to finance consumption and a
new portfolio.
We must assume the agent starts with an initial portfolio z0 at the start of
t = 0.
46
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
consists of several objects: A value function v(z, y), decision rules c(z, y) and
x (z, y), and price functions p(y) such that:
1. (Agent optimality) The decision rules solve the agent’s maximization problem—
they attain v(z, y)—given the price functions p(y) and the law of motion
for the aggregate state.
Remark 3.1. Note that a version of Walras’s Law holds here—from the budget con-
straint, starting from z0 = 1, if all shares are held (so the portfolio x is 1), then
c = 1 · y. And, conversely, if c = 1 · y holds, and n − 1 of the assets are fully held,
then so is the nth, again assuming the agent’s z0 = 1.
47
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
where vi is the partial derivative of the value function with respect to the ith
asset holding, and c is optimal consumption at (z, y). Applying a standard
‘envelope’ argument to v(z, y) tells us that
vi (z, y) = U 0 (c)(yi + pi (y)). (3.5)
Advancing the expression in (3.5) by one period and plugging it into (3.4) gives
U 0 (c) pi (y) = βE U 0 (c0 )(yi0 + pi (y0 ))|y for i = 1, 2, . . . n
(3.6)
where we use the shorthand c0 for optimal consumption at ( x, y0 ).
Equation (3.6)—a type of Euler equation—is a necessary condition for an
optimal choice by the agent. Now, to derive an equilibrium asset pricing for-
mula, we simply impose the resource constraint c = ∑i yi :
" #
U 0 (∑ yi ) pi (y) = βE U 0 (∑ yi0 )(yi0 + pi (y0 ))|y for i = 1, 2, . . . n (3.7)
i i
48
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
U 0 (c )
pit = Et [mt+1 (yi,t+1 + pi,t+1 )] = Et β 0 t+1 (yi,t+1 + pi,t+1 ) (3.13)
U (ct )
And if Ri,t+1 is the gross return on some asset between t and t + 1—that is,
Ri,t+1 = (yi,t+1 + pi,t+1 )/pi,t —then
U 0 (c )
1 = Et [mt+1 Ri,t+1 ] = Et β 0 t+1 Ri,t+1 (3.14)
U (ct )
Note that by the Law of Iterated Expectations, we can replace the conditional
expecations in the last expression with unconditional expectations.5
Doing that, we can use the last expression—together with E[ xy] = cov( x, y) +
E[ x ] E[y]—to derive a suggestive expression for expected asset returns:
U 0 ( c t +1 )
1 1
E[ Ri,t+1 ] − =− cov β 0 , Ri,t+1
E [ m t +1 ] E [ m t +1 ] U (ct )
4 The y0 gets ‘integrated out’ in taking the expectation.
5 IfEt [mt+1 Ri,t+1 − 1] = 0 when we’re conditioning on information that’s accumulated through
date t, then the unconditional expectation—when we know nothing at all—is surely zero as well.
49
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
• If we price a one-period riskless asset, it’s price q and return R F are also
vectors in RS .
• The stochastic discount factor m and the return to the risky asset R are
both S × S matrices. We’ll write m(i, j) for the discount factor applied
between state i today and state j tomorrow (and similarly for R, writing
R(i, j)). In particular,
U 0 (y( j))
m(i, j) = β 0
U (y(i ))
If P is our transition probability matrix, then the SDF version of the pricing
relationship (3.9) becomes
S
p (i ) = ∑ P(i, j)m(i, j) (y( j) + p( j)) .
j =1
Let Ψ denote the S × S matrix with typical element Ψ(i, j) = P(i, j)m(i, j).7
6 In other words, we may as well identify the states with (1, 2, . . . S), rather than writing c(y(s)).
7 In M ATLAB terms, Psi = P. ∗ m.
50
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
p = Ψ (y + p)
There are a number of other objects of interest one could then calculate,
using the solution for p or the expression for the pricing kernel m. Some of
them bear on the questions that motivate Lucas’s paper—i.e., “Will prices fol-
low random walks?” or “Will expected returns be predictable?” and so forth.
The objects of interest one might construct include:
• The price of a riskless one period bond and its return:
q (i ) = ∑ P(i, j)m(i, j)
j
1
R F (i ) =
q (i )
p( j) + y( j)
R(i, j) =
p (i )
Ei [ R ] = ∑ P(i, j) R(i, j)
j
E[ R ] = ∑ π ( i ) Ei [ R ]
i
h i
E R F
= ∑ π (i ) R F (i )
i
51
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
p (i )
• Price/dividend ratios in each state, P/D (i ) = .
y (i )
• Risk neutral probabilities:
P(i, j)m(i, j)
φ(i, j) = .
∑h P(i, h)m(i, h)
• Finally, one could check for mean-reversion in the asset price (which
would be inconsistent with price following a random walk), by exam-
ining Ei [ p] − p(i ) = ∑ j P(i, j) p( j) − p(i ). Loosely, the price would be
mean reverting if 0 < (E[ p] − p(i ))(Ei [ p] − p(i )), where E[ p] is the un-
conditional mean π · p.8
An excercize below will ask you to calculate all these things, and describe
some of the results. First, though, we need a little digression on approximating
autoregressions with Markov chains. We’ll focus on chains with S = 2, which
can be done practically with pencil and paper.9
Making a two-state Markov chain that mimics a first-order autoregressive
process is fairly simple. Suppose we want to mimic a process of the form
x t +1 − µ = ρ ( x t − µ ) + e t +1 (3.16)
We have estimates of the mean µ, the persistence parameter ρ, and the uncon-
ditional standard deviation of xt , call it σx .10 Let hats denote estimates, and
assume that |ρ̂| < 1.
A two-state Markov chain that mimics (3.16) will have a low-x state (xl )
and a high-x state ( xh ), with xl < µ̂ < xh . It will also have a 2 × 2 transition
probability matrix P,
P Plh
P = ll
Phl Phh
say. Since the rows must sum to one, though, P really has only two parame-
ters we would need to determine—for example, the probabilities of remaining
in the low and high states, Pll and Phh . That’s four parameters to determine,
( xl , xh , Pll , Phh ), with only three restrictions—the unconditional mean, uncon-
ditional standard deviation, and persistence—with which to determine them.
8 This one may actually be immediate just from the fact that p inherits the autoregressive prop-
erty of y.
9 There are a couple popular methods for approximating autoregressions with Markov chains of
any order. The most commonly used is due to Tauchen [Tau86]. Based on the results in a paper by
Kopecky and Suen [KS10], I’ve lately switched, in my own work, to using Rouwenhorst’s method
[Rou95]. If you google these, you’ll find papers with descriptions of them and probably some
M ATLAB code for implementing them as well. We may use Rouwenhorst’s method in a subsequent
lecture.
10 σ is related to the standard deviation of e by σ2 = σ2 / (1 − ρ2 ).
x t x e
52
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
We have, in a sense, one free parameter, which one may usefully think of
as the long-run probability of being in one or the other state—we could match
the unconditional mean, standard deviation and persistence, and still have a
parameter to play with that would influence the long-run distribution. Adding
the long-run distribution as another ‘target’ of our calibration will pin down
the free parameter.
Let πl denote the long-run probability of being in the low state, and πh =
1 − πl the probability of being in the high state. The symmetry inherent in the
process (3.16) suggests setting πl = πh = 1/2. Once we make that assump-
tion, as you’ll see, we have enough conditions to pin down all the parameters.
In particular, the assumption of equal long-run probabilities implies that the
transition matrix P is symmetric—and in the 2 × 2 case, that means that if we
can just pin down one entry, we can pin down all four entries.11
In any case, given equal long-run probabilities, one then shows that ( xl , xh ) =
(µ̂ − σ̂x , µ̂ + σ̂x ) satisfies
1 1
xl + xh = µ̂
2 2
1 1
( x − µ̂) + ( xh − µ̂)2 = σ̂x2
2
2 l 2
That leaves one parameter in P to be determined, say Pll , by trying to match
the persistence ρ̂. What we want to try to match is basically either
or
Eh [ x − µ̂] = ρ̂( xh − µ̂).
Let’s examine the former, using Plh = 1 − Pll and the definitions of xl and xh :
implying
1 + ρ̂
Pll =
2
11 π satisfies P> π = π, while P must by definition obey P1 = 1. This implies, in the 2 × 2 case, a
53
3.1. LUCAS (1978) LECTURE 3. LUCAS, MEHRA-PRESCOTT
Exercize 3.1. Using some some U.S. data on annual log consumption of non-
durables and services (detrended using a Hodrick-Prescott filter), from 1950 to
2006, I estimated the following AR(1)
I didn’t include a constant because the detrended series has mean zero. The esti-
mates were ρ̂ = 0.63775 and σ̂e = 0.0093346, so an estimate of σlog(c) would be
√
0.0093346/ 1 − 0.637752 .
Either in M ATLAB or by hand (with M ATLAB preferred), construct a 2-state
Markov chain to approximate the process for log(ct ), assuming the long-run dis-
tribution is (1/2, 1/2). Note this process has µlog(c) = 0.
Now, exponentiate your vector of log(c) values to get states in terms of c. That
is, if logc is the 2 × 1 vector of values the chain can take on, set c = exp(logc).
Lastly, normalize c so it has a long-run mean of 1—if pi = [0.5, 0.5] is your
vector of long-run probabilities, set c = c/(pi ∗ c).
Now that you have a Markov chain, you are ready for the main exercize
itself:
Exercize 3.2. Using the Markov chain you constructed in the last problem, write
a M ATLAB program to solve for the vector of asset prices p, and all the ‘objects of
interest’ we listed after we derived equation (3.15). Assume that
c 1− α
U (c) =
1−α
and write a program that calculates the results for given values of α and β.
54
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT
Report results for α = 5 and β = 0.95. Discuss the extent to which the
price/dividend ratio is useful for forecasting the asset return—i.e., what’s the rela-
tionship between whether the P/D ratio is low or high and whether conditionally
expected returns are low or high? Is this true for expected excess returns?
Now, jack up α to 20 and β = 0.99. Describe the differences between the risk
neutral probabilities and the actual probabilities P in this case. Were they that
different under the orginal parameters? How does the equity premium compare to
the (α, β) = (5, 0.95) case? How do you interpret the returns (realized returns
by state, not expected returns)?
55
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT
m t +1 = β ( x t +1 ) − α .
This means that, in writing down the pricing relationships (or putting them
on the computer), m only depends on next period’s state, not also on today’s.
It’s distribution, and conditional expected value, still depend on today’s state
through the transition matrix P.12
As for the equilibrium equity price, it depends on both today’s y and to-
day’s x, but it is homogeneous of degree one (in fact linear) in y. This follows
from the present value relationship (3.2) we derived way back at the outset of
the lecture:
∞
pt = ∑ Et [ρt,t+k yt+k ] .
k =1
Since each yt+k is proportional to yt and all the ρt,t+k terms are products of
mt+h ’s, pt must be proportional to yt . Thus, if p(y, x ) denotes the equilibrium
price function, then p(y, x ) = yw( x ) for some function w.
The equilibrium pricing relationship, in terms of the SDF, and not yet spe-
cializing to a Markov chain, is
p(y, x ) = E m( x 0 ) x 0 y + p( x 0 y, x 0 ) | x
12 I’m sticking with the notation we’ve developed. Mehra and Prescott use φ for elements of the
ij
transition matrix.
56
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT
E[ R] = 1.07
h i
E R F = 1.008
h i
E[ R] − E R F = 0.062
disentangling risk aversion from intertemporal substitution, and what constitutes a ‘plausible’
degree of risk aversion.
57
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT
are summarized in their Figure 4—getting the equity premium as high as 0.35
(which is presumably at α = 10) entails pushing the riskless rate up to 4%.
What’s going on? Consider the riskless rate first. Increasing α lowers the
elasticity of intertemporal substitution (which is just 1/α). In an economy in
which consumption, on average, is growing, a lower elasticity of intertemporal
substitution will increase the compensation agents require for deferring con-
sumption from today to tomorrow—in a deterministic economy, that would
mean a higher interest rate.14 There’s more at work with regard to R F than just
that deterministic logic, but that mechanism appears to be dominant.15
With regard to the equity premium, given the low volatility of consump-
tion, the stochastic discount factor has a small variance, unless we set α very
high. The volatility of the discount factor puts a bound on how big a reward
the economy can pay an investor for holding a risky asset. The following dis-
cussion draws on chapter 5 of Cochrane’s book [Coc01]. To see the nature of
the bound, consider h i
Et mt+1 Rt+1 − RtF = 0.
or
h i cov(mt+1 , Rt+1 − RtF )
E Rt+1 − RtF = − .
E [ m t +1 ]
Now, cov(mt+1 , Rt+1 − RtF ) = corr(mt+1 , Rt+1 − RtF )σ(mt+1 )σ ( Rt+1 − RtF ), where
corr(mt+1 , Rt+1 − RtF ) ∈ [−1, 1] is the correlation between mt+1 and the excess
return Rt+1 − RtF , and the σ ( · )’s are standard deviations. Thus,
E Rt+1 − RtF
σ ( m t +1 )
= −corr(mt+1 , Rt+1 − RtF ) .
F
σ ( R t +1 − R t ) E [ m t +1 ]
The quantity on the left-hand side of the last expression is called a Sharpe
ratio, and it measures an asset or portfolio’s excess return per unit of volatility.
It is, in some sense, a market measure of the price of risk. For the U.S. equity
market as a whole, the mean in the numerator is around 0.062 and the stan-
dard deviation in the denominator is around 0.166 (from Mehra and Prescott’s
14 Recall that in a deterministic economy, g ≈ EIS(r − η ),where g is the growth rate, r is the
interest rate, and η = 1/β − 1 is the rate of time preference. Flipping this around gives, r ≈
η + (1/EIS) g. If g > 0, a lower EIS raises r.
15 In a subsequent lecture, we’ll see if preferences that separate risk aversion from intertemporal
58
3.2. MEHRA-PRESCOTT (1985) LECTURE 3. LUCAS, MEHRA-PRESCOTT
calculations, reported in their Table 1). This implies a Sharpe ratio of around
0.37.
It’s straightforward to show—since correlation is between −1 and 1—that
the last expression implies
|E Rt+1 − RtF |
σ ( m t +1 )
≤ . (3.19)
F
σ ( R t +1 − R t ) E [ m t +1 ]
If m prices all assets, then the absolute value of the Sharpe ratio for any asset
is bounded by the magnitude on the right, which is a measure of the volatility
of the stochastic discount factor relative to its mean. For a given mean, the less
volatile a model’s SDF, the smaller the maximum Sharpe ratio that the model
can generate.
When mt+1 = β( xt+1 )−α , we can say more, especially if we assume xt+1 is
lognormally distributed—i.e., log( xt+1 ) ∼ N (µ, σ2 ), so log [( xt+1 )−α ] = −α log( xt+1 ) ∼
N (−αµ, α2 σ2 ). A property of lognormally distributed random variables is that,
for log(y) ∼ N (µy , σy2 ),
q
σ ( y ) = E[ y ] exp(σy2 ) − 1
Thus,
σ ( m t +1 )
q
= exp(α2 σ2 ) − 1.
E [ m t +1 ]
p
Using the approximation exp(α2 σ2 ) − 1 ≈ ασ, we can express the bound
in (3.19) as approximately:
|E Rt+1 − RtF |
≤ ασ.
σ ( Rt+1 − RtF )
Exercize 3.3. Write a M ATLAB program to replicate Mehra and Prescott’s exer-
cize. Report results for nine parameter combinations—eight given by pairs of β ∈
{0.95, 0.99} and α ∈ {1, 5, 10, 20}, plus one with
(β, α) = (1.125, 18). For
each
combination, report 100 ∗ (E[ R] − 1), 100(E R F − 1), 100(E[ R] − E R F )
(all long-run, unconditional expectations), and σ (m)/E[m], the ratio of the un-
conditional standard deviation of the SDF to the unconditional mean of the SDF.
59
3.3. SECOND MOMENTS LECTURE 3. LUCAS, MEHRA-PRESCOTT
E R − RF
σ(m)
≤ (3.20)
F
σ[ R − R ] E[ m ]
that we derived in section 3.2.3—is actually higher than the average Sharpe
ratio we observe in the data (0.56 versus about 0.4).
Are we therefore done with the equity premium puzzle—should we declare
success and move on to a different topic? Even if we set aside the plausibility
of α so large and a β > 1—which says that, in a deterministic environment,
starting from a constant consumption path, and agent would be willing to save
at a significantly negative real interest rate—we probably shouldn’t declare
victory just yet.16
As it turns out, the second moment implications of our model with (α, β) =
(18, 1.125) are off—our model has an unconditional standard deviation of the
risk-free rate which is too high and an unconditional standard deviation of the
equity return which is too low.
Mehra and Prescott estimate from their historical data that σ ( R F ) = 0.056
and σ ( R) = 0.165—i.e., 5.6 percentage points for the risk-free rate, and 16.5
percentage points for the equity return. For the model of exercize 3.3, with
(α, β) = (18, 1.125), we get
σ ( R F ) = 0.079
σ ( R) = 0.139
These numbers may seem close to the historical data, but we should first
bear in mind that the volatility of the risk-free rate in Mehra and Prescott’s 100-
year sample is quite high compared to estimates over more recent samples—in
fact, Campbell and Cochrane [CC99] (discussed more below) look at the more
recent data, and set a constant riskless rate as one of the targets of their model.
Even taking the Mehra-Prescott volatility data at face value, though, the
distribution of our model’s volatility across the low- and high-growth states is
such that the model produces only negligible variation in the conditional ex-
cess return to equity and its volatility, hence little variation in the conditional
16 You may wonder whether β > 1 poses a problem for existence. If consumption has a long-run
growth rate of x, the relevant sufficient condition for existence of optimal paths is βx1−α < 1, as
proved by Brock and Gale in 1969 [BG69] and later re-stated by Kocherlakota in 1990 [Koc90].
60
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT
Sharpe ratio. This is in contrast to the data, which show large swings in condi-
tional Sharpe ratios from business cycle peaks (where the conditional Sharpe
ratio is low) to business cycle troughs (where it is high). A difference on the
order of 1.0 between highs and lows is not uncommon.17
Our model with (α, β) = (18, 1.125) produces:
E R − RF |x σ R − RF |x E R − R F | x /σ R − R F | x
Are these discrepancies between model and data all part of the same puz-
zle? One could think of three, or perhaps even four separate puzzles we might
give names to—an equity premium puzzle (it’s hard to get a big equity pre-
mium), a risk-free rate puzzle (it’s hard to get a low risk-free rate), a volatility
puzzle (it’s hard to get a big standard deviation of equity returns and a low
standard deviation for the risk-free rate), and a Sharpe ratio puzzle (it’s hard to
get a strongly countercyclical Sharpe ratio). Giving them different names may
help to clarify the phenomena we’re trying to explain, but in some sense they
are all part of one puzzle, since we would not want to say we’ve solved one, if
in doing so we’re failing along the other dimensions.
Whitelaw [TW11].
61
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT
for the stochastic discount factor m that would exactly match those first two
moments of returns in the historical data.
Knowing what such a stochastic discount factor m looks like—or what the
implied risk-neutral probabilities look like—would be of great help in identi-
fying models that can (or cannot) match the data. Given the behavior of m, the
question would be—What sort of model of preferences would map the con-
sumption growth process into that stochastic discount factor?18
How do they construct their R̂ F and R̂? They assume that consumption
growth is a sufficient statistic for the risk-free rate and the equity ‘price-dividend’
ratio—which is the w in the formula (3.17). This means (as we’ve seen in our
solution) that R F takes on two values and w takes on two values. They then
calculate values of R̂ F = R̂ F (1), R̂ F (2) and ŵ = (ŵ(1), ŵ(2)) that produce
unconditional means and standard deviations for returns that match Mehra
and Prescott’s historical estimates. The equity return from state i to state j is
related to w by R̂(i, j) = x ( j) (1 + ŵ( j)) /ŵ(i ).
The risk-free rate is the easier of the two. R̂ F solves:
1 F 1
R̂ (1) + R̂ F (2) = 1.008
r 2 2
1 F 1
( R̂ (1) − 1.008)2 + ( R̂ F (2) − 1.008)2 = 0.056
2 2
This is a quadratic equation, which has two solutions:
F 1.064 F 0.952
R̂1 = and R̂2 = (3.21)
0.952 1.064
Assuming that state 1 is the low-growth state, R̂1F implies a countercyclical risk-
free rate, R̂2F a procyclical one. At this point, we can’t say which solution is the
relevant one, though we will be able to do so after finding the candidate R̂’s.
Finding the candidate R̂’s is considerably harder—it’s a much more com-
plex quadratic equation to solve. Using
x ( j)(1 + ŵ( j))
R̂(i, j) =
ŵ(i )
18 In fact, as we’ll see, Melino and Yang find that preferences must display counter-cyclical risk
62
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT
This gives four possible combinations of R̂ F and R̂, but—as Melino and
Yang note—three of the four can be ruled out on the grounds of violating no
arbitrage. The combination that remains after eliminating the three that allow
arbitrage is:
1.064
R̂ F = (3.25)
0.952
1.02385 1.29528
R̂ = (3.26)
0.86306 1.09186
Exercize 3.4. Show that the other three combinations from (3.21) and (3.24) im-
ply arbitrage opportunities. You can do it in M ATLAB if you like. Since these
are returns, treat the price vector as p = (1, 1). The wrinkle here, compared to
Lecture 2, is the two possible states today. For each combination of R̂iF and R̂ j ,
you’ll want to ask—Is there an arbitrage opportunity if today’s state is state 1? Is
there an arbitrage opportunity if today’s state is state 2? ‘Yes’ to either of those
questions is enough to constitute an arbitrage opportunity—an arbitrage need not
be available in both states. Each of those questions is answered using the definition
of arbitrage 2.3 from Lecture 2.
63
3.4. MELINO-YANG INSIGHT LECTURE 3. LUCAS, MEHRA-PRESCOTT
R̂ and R̂ F are all you need to back out implied risk neutral probabilities.
Routledge and Zin [RZ10] perform this calculation, and get
0.85 0.15
ψ̂ = (3.27)
0.61 0.39
Compare this with the transition matrix P (3.23). If today’s state is the high-
growth state, the risk-neutral probabilities (the second row of ψ̂) are not that
different from the objective probabilities (the second row of P), indicating lit-
tle risk aversion. By contrast, if today’s state is the low-growth state, the risk-
neutral probabilities put a much bigger weight on remaining in the low-growth
state (and a much smaller weight on moving to the high-growth state), as com-
pared to P. That indicates significant risk aversion. In other words—the struc-
ture of the asset returns R̂ and R̂ F implies countercyclical risk aversion.
This is consistent with observations on the countercyclicality of the con-
ditional Sharpe ratio (and in fact is a sort of confirmation of it). The sort of
story one could tell goes like this—in recessions, risk aversion is high, and
consequently the price of risk (measured as the compensation in excess return
required to bear risk) is high.
It also tells you why the unadorned Mehra-Prescott model we solved in
exercize 3.3 fails to generate a strongly countercyclical Sharpe ratio. The SDF
for that model depends only on the consumption growth rate realized next
period, and is independent of this period’s state. In other words—it’s a vector,
not a matrix. The next exercize asks you to calculate the SDF consistent with
Melino and Yang’s R̂ and R̂ F , from (3.25). It’s exactly identified, and you’ll see
it has two non-collinear rows.
Exercize 3.5. Using the Melino-Yang returns (3.25) and the Mehra-Prescott
Markov chain transition matrix (3.23), calculate conditional Sharpe ratios in the
two states, and solve for the stochastic discount factor m̂ consistent with the re-
turns and the transition matrix P.
64
Lecture 4
Over the next two lectures, we’ll look at various responses to the equity pre-
mium puzzle. Given time constraints, this won’t be an exhaustive catalog of
the models spawned by Mehra and Prescott’s observation, though we’ll try
to hit the ones that are currently the most significant (in terms of both their
success and the amount of work currently being done on them).
I’ve grouped the responses into two broad categories. In this lecture, we’ll
look at responses that modify the representative agent’s preferences. In the
next, we’ll look at models that tinker in some way with the consumption pro-
cess.
We’ll begin with models that incorporate habit formation in the agent’s
preferences.
—with " #
E0 ∑ β u(ct − ht )
t
t
65
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
h t = D ( L ) c t −1 (4.1)
where D ( L) is a polynomial in the lag operator. For the most part, we’ll work
with a simple version of (4.1), in which only last period’s consumption matters:
ht = δct−1 (4.2)
where δ ∈ (0, 1) measures the response of the habit stock to past consumption.
where C = {ct }∞ t=0 . Let Ut (C ) denote the marginal utility of date t consump-
tion, ct . Then,
and the marginal rate of substitution between ct and ct+1 , call it MRSt,t+1 is
66
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
In our simple context, with utility u(ct − δct−1 ), the agent assumes the habit
stock δct−1 is out of his control—we would want to write this more precisely
as u(ct − δc̄t−1 ), with a bar to denote an aggregate or average quantity. The
marginal rate of substitution relevant for the agent’s choices then becomes
u0 (ct+1 − δc̄t )
MRSt,t+1 = β
u0 (ct − δc̄t−1 )
In a representative agent model (or a model with a unit mass of identical
agents), we would take first order conditions for the agent’s intertemporal
choice problem, then impose the equilibrium requirement ct = c¯t to get
u0 (ct+1 − δct )
MRSt,t+1 = β (4.7)
u0 (ct − δct−1 )
This differs only slightly—but nonetheless significantly—from the marginal
rates of substitution we’ve worked with this far.
subject to
z(y + p(s)) + b ≥ c + p(s)z0 + q(s)b0
Since the habit is external, there’s not much change to the agent’s first-order
conditions; combined with an envelope condition, they yield, for the choice of
z0 :
u0 (c − δy−1 ) p(s) = βE u0 (c0 − δy)(y0 + p(s0 ))|s .
(4.8)
For the choice of b0 we get:
u0 (c − δy−1 )q(s) = βE u0 (c0 − δy)|s .
(4.9)
67
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
w( x ) = E m( x, x 0 ) x 0 (1 + w( x 0 ))| x
(4.13)
and
q( x ) = E m( x, x 0 )| x
(4.14)
where, from (4.12)
−α
1 − δ/x 0
0 0 −α
m( x, x ) = β( x ) . (4.15)
1 − δ/x
Based on equations (4.13) to (4.15), it’s easy to operationalize the model in
M ATLAB:
• Using the Mehra-Prescott Markov chain (3.22) and (3.23), x, w, and q are
2 × 1 vectors, m is a 2 × 2 matrix.
• Form the stochastic discount factor using (4.15) to fill in m(i, j) at all the
pairs x(i) and x(j).
68
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
1 − δ/x
—will be 1 for x 0 = x, in other words for transitions from state one to state one
or from state two to state two. Its values for the ‘off-diagonal’ state transitions—
from one to two and from two to one—are inversely related:
−α
1 − δ/x (2)
1
= −α (4.16)
1 − δ/x (1) 1 − δ/x (1)
1 − δ/x (2)
69
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
Exercize 4.1. For this exercize, set α = 1 and β = 1.01, and use the Mehra-
Prescott Markov chain (3.22) and (3.23). You’re going to incorporate habit into
the stochastic discount factor as in (4.17). Set theta = linspace(.01, 1, 100)—
note you have to start at a θ > 0. Write a M ATLAB program that calculates, for
each theta(i), the average returns, the average equity premium, the uncondi-
tional volatility of the risk-free rate, and the difference between the conditional
Sharpe ratio in the low-growth state and the high-growth state.
First, (a) find the theta(i) that gets you closest to an equity pre-
mium of 0.062, or 6.2%. A simple way to do this is to use M ATLAB ’ S
min and abs functions—if EP is the vector of average equity premia, use
min(abs(EP − 0.062)). What do the other results look like at that theta(i)?
What’s the implied habit parameter δ at that value of theta(i)?
Repeat this for (b) the theta(i) that gets the unconditional volatility of the
risk-free rate closest to 0.056, and (c) the theta(i) that gets the change in the
conditional Sharpe ratio closest to 1.
Make some plots of theta versus the average equity premium, versus the
volatility of the risk-free rate, and versus the change in the Sharpe ratio.
70
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
c
or α in the case of u0 (r ) = r −α .
c−h
If we think about habits of the simple form ht = δct−1 , and ct = xt ct−1 , we
have
ct ct 1
α =α =α
ct − ht ct − δct−1 1 − δ/xt
—low consumption growth brings δ/xt closer to one, thus raising the agent’s
relative risk aversion.
The Arrow-Pratt measure of relative risk aversion can be motivated with a thought ex-
periment along the following lines. Imagine the agent faces uncertainty over today’s
consumption. His level of consumption will be θc, where θ is the realization of a posi-
tive random variable with mean 1 and variance σθ2 . We’re interested in calculating the
fraction of certain consumption c = E[θc] he would be willing to give up, call it s, such
that he’s indifferent between having (1 − s)c for sure and having uncertain consump-
tion θc:
E[u(θc)] = u ((1 − s)c)
Taking a second-order Taylor series expansion of u(θc) around θ = 1, then passing
the expectation operator through, gives
1
E[u(θc)] ≈ u(c) + u00 (c)c2 σθ2 .
2
A first-order Taylor series expansion of u((1 − s)c) around s = 0 gives
Combining these gives us an approximate expression for the relative risk premium s,
assuming u0 > 0 and u00 < 0:
1 u00 (c)c 2
s≈ 0 σ (4.18)
2 u (c) θ
71
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
Rather than specify how the habit stock ht depends on past consumption,
they define what they call the ‘surplus consumption ratio’,
ct − ht
St =
ct
and make assumptions about its dynamics. The pricing kernel, with this defi-
nition, becomes
c t +1 − α S t +1 − α
m t +1 = β .
ct St
Like the stochastic discount factor we derived above, it has a ‘standard’ part
(depending on a discount rate, consumption growth and a curvature parame-
ter) and something ‘non-standard’, the part which depends on growth of the
surplus ratio.
Just as habit led to time-varying risk aversion above, it does as well in
Campbell and Cochrane’s model, with the coefficient or relative risk aversion
inversely related to the surplus ratio:
00
u (ct − ht )ct α
u 0 ( c t − h t ) = St .
The process the specify for the surplus ratio has consumption growth as its
driving impulse, but allows for much richer dynamics than we could achieve
in the context of the simple habit model we laid out above. In particular, they
assume that
72
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
low values of the surplus ratio correspond to bad times—periods with low
consumption—having λ be a decreasing function of the surplus ratio makes
possible a countercyclical Sharpe ratio.
The function λ also plays a role in transmitting consumption volatility into
volatility of the risk-free rate. Here again, a decreasing λ works in the right
direction. In fact, Campbell and Cochrane ‘reverse engineer’ the λ function so
as to guarantee a constant risk-free rate. Their choice of λ at the same time
guarantees that the habit stock is unresponsive to consumption innovations
at the model’s steady state,and moves positively with consumption near the
steady state.
How does the Campbell-Cochrane model translate into the two-state Mehra-
Prescott framework we’ve been using throughout the last two Lectures? The
first thing to note is that if, within our model, consumption growth is a suffi-
cient statistic for the surplus ratio—so St takes on two values, one in the low-
growth state and another in the high-growth state, then we can add nothing
beyond what we achieved with the simple habit of the previous subsection. As
was the case with the simpler habit, the resulting SDF using the surplus ratio
would necessarily relate to the basic Mehra-Prescott SDF by (4.17). What we
called θ in the discussion would simply be S(2)/S(1).
There’s really no easy way to get the rich dynamics inherent in Campbell
and Cochrane’s specification (4.19) into our two-state version of the Mehra-
Prescott model without adding another state variable (probably St itself). Adding
a state variable isn’t necessarily a bad idea—we’ll need to do it eventually, to
incorporate disasters or long-run risk—but it’s hard to see how to create a sim-
ple Markov chain that would mimic (4.19).
To do anything more in the two-state framework, we would need to assume
that growth of the surplus ratio differs across the four possible state transitions,
which means St+1 /St must be a more complicated function of xt and xt+1 .
Without specifying that function, we might nonetheless ask what it would have
to look like to match the exactly identified stochastic discount factor we found
using the Melino-Yang returns (exercize 3.5).
That is, simply write, say, γS for St+1 /St , and solve for the
γ (1, 1) γS (1, 2)
γS = S
γS (2, 1) γS (2, 2)
such that βx ( j)−α γS (i, j)−α equals the m̂(i, j) we derived in exercize 3.5.
73
4.1. HABITS LECTURE 4. PUZZLE RESPONSES, I
74
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
∞ ∞
" # " #
1− α
t ct
E ∑ β u(ct ) = E ∑ β
t
(4.20)
t =0 t =0
1−α
the parameter α is the coefficient of relative risk aversion, and 1/α is the elastic-
ity of intertemporal substitution.4 With Epstein-Zin preferences, two separate
parameters govern the degree of risk aversion (for timeless gambles) and will-
ingness to substitute consumption over time (in deterministic settings).
To motivate the Epstein-Zin form, think about writing (4.20) in a recursive
way, letting Ut denote lifetime utility from date t onward, which is a stochastic
process, assuming consumption is a stochastic process. We get
75
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
We can imagine relaxing the separability over time by replacing the linear
‘aggregator’ in (4.21) with something more general:
This is in fact the form created by Kreps and Porteus [KP78], who gave ax-
ioms on a primitive preference ordering over temporal lotteries such that the
ordering was representable by a recursive function of the form (4.22).
Epstein and Zin [EZ89] and, independently, Weil [Wei90] wrote down para-
metric versions of Kreps-Porteus preferences. These preferences give a specific
CES form to the aggregator. They also make a convenient monotone transfor-
mation of the utility process Ut from (4.22), in such a way that the parameter
governing (timeless) risk aversion is very clearly separated from the parame-
ter governing (deterministic) intertemporal substitution. The Epstein-Zin, or
Epstein-Zin-Weil, form is:
h i1
ρ ρ
Ut = (1 − β)ct + βµt (Ut+1 )ρ , (4.23)
76
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
subject to
z( p(s) + y) + b ≥ c + p(s)z0 + q(s)b0 . (4.26)
For compactness, let W (c, µ) = [(1 − β)cρ + βµρ ]1/ρ , and note that the par-
tial derivatives of W obey W1 (c, µ) = (1 − β)W (c, µ)1−ρ cρ−1 and W2 (c, µ) =
βW (c, µ)1−ρ µρ−1 . Note too that, under some mild assumptions, we can pass
derivatives through µ as follows. Consider the case of z0 (the case of b0 is anal-
ogous):
∂ 0 0 0 ∂ h 0 0 0 1−α i 1−1 α
µ s ( v ( z , b , s )) = E v(z , b , s ) |s
∂z0 ∂z0
1 h 0 0 0 1−α i 1−1 α −1 ∂ h 0 0 0 1−α i
= E v(z , b , s ) |s E v(z , b , s ) |s
1−α ∂z0
∂
= µs (v(z0 , b0 , s0 ))α E v(z0 , b0 , s0 )−α 0 v(z0 , b0 , s0 )|s
∂z
7 Utility function V is ordinally equivalent to utility function U if U = f (V ) for some increasing
77
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
With all that in mind, the first-order condition for the choice of z0 is
∂
W1 (c, µs (v(z0 , b0 , s0 ))) p(s) = W2 (c, µs (v(z0 , b0 , s0 ))) µs (v(z0 , b0 , s0 ))
∂z0
which we can write as
ρ −1 0 0 0 ρ + α −1 ∂0 0
0 0 0 0 −α
(1 − β)c p(s) = βµs (v(z , b , s )) E v(z , b , s ) v(z , b , s )|s
∂z0
(4.27)
We can use an envelope argument to get an expression for the derivative of
v with respect to z. At state (z, b, s), this is given by
∂
v(z, b, s) = W1 (c, µs (v(z0 , b0 , s0 )))( p(s) + y)
∂z
= (1 − β)W (c, µs (v(z0 , b0 , s0 )))1−ρ cρ−1 ( p(s) + y)
= (1 − β)v(z, b, s)1−ρ cρ−1 ( p(s) + y)
where the last line uses that fact that v(z, b, s) = W (c, µs (v(z0 , b0 , s0 ))) at the
optimum. We now advance this expression one period and plug it into the
right hand side of (4.27) to get
h i
cρ−1 p(s) = βµs (v(z0 , b0 , s0 ))ρ+α−1 E v(z0 , b0 , s0 )1−α−ρ (c0 )ρ−1 ( p(s0 ) + y0 )|s
The first-order condition for holdings of the riskless asset is analogous, sim-
ply plugging in q(s) for p(s) and 1 for the payoff, rather than p(s0 ) + y0 —
h i
cρ−1 q(s) = βµs (v(z0 , b0 , s0 ))ρ+α−1 E v(z0 , b0 , s0 )1−α−ρ (c0 )ρ−1 |s
We can immediately see the form of the stochastic discount factor, which
for now we’ll write as depending on both s and s0 . In a moment we’ll think
more carefully about which variables it depends on. We have:
1− α − ρ ρ −1
V (s0 ) y0
0
m(s, s ) = β (4.29)
µs (V (s0 )) y
This stochastic discount factor—like the stochastic discount factor in the habit
model—incorporates a ‘standard’ part and a ‘non-standard’ part. The standard
78
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
part is β(y0 /y)ρ−1 —the utility discount factor times a decreasing power func-
tion of aggregate consumption growth.9 This is analogous to the β( x 0 )−α piece
in the stochastic discount factors we’ve previously encountered.
The non-standard part is the part involving the agent’s value function. Sup-
pose that 1 − α − ρ < 0. Then, other things the same, payoffs in states where
realized lifetime utility falls short of its conditional certainty equivalent value
will be weighted more heavily in pricing assets then payoffs in states where
realized lifetime utility exceeds its conditional certainty equivalent value. This
has the potential to increase the volatility of the stochastic discount factor, by
reinforcing the volatility coming from the standard β( x 0 )ρ−1 channel.
Note that in the special case of α = 1 − ρ, the stochastic discount factor
reduces to the standard β( x 0 )−α .
So, what does m really depend on?—( x, y, x 0 , y0 )? Or maybe ( x, x 0 )? Or just
x ? Make the Mehra-Prescott assumption that y0 = x 0 y, with x 0 following a
0
Markov chain. Under that assumption, m depends on just x and x 0 , with the
dependence on x coming through the conditioning in the certainty equivalent
present in (4.29). There is no dependence on the level of consumption y, be-
cause of the degree-one homogeneity of preferences. That homogeneity allows
us to write the equilibrium value function as
V (s) = φ( x )y
for some function φ. The ‘non-standard’ term in the stochastic discount factor
then becomes
1− α − ρ 1− α − ρ
V (s0 ) φ( x 0 )y0
=
µs (V (s0 )) µ x (φ( x 0 )y0 )
1− α − ρ
φ( x0 ) x0 y
=
µ x (φ( x 0 ) x 0 y)
1− α − ρ
φ( x0 ) x0
=
µ x (φ( x 0 ) x 0 )
Under the assumption that x 0 follows a Markov chain { x (1), . . . x (S), P}, x 0 and
φ are in fact just vectors, and m is a matrix:
1− α − ρ
ρ −1 φ( j) x ( j)
m(i, j) = β( x ( j)) (4.31)
µi (φ0 x 0 ))
9 Recall ρ ≤ 1.
79
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
where !1/(1−α)
S
µi ( φ 0 x 0 ) = ∑ P(i, j) (φ( j)x( j)) 1− α
. (4.32)
j =1
As in the basic Mehra-Prescott model, the equity price here is again ho-
mogeneous in aggregate consumption—we have p( x, y) = w( x )y—and the
riskless asset price depends only on x, q = q( x ). Under the Markov chain
assumption, these are vectors, and the pricing equations become
S
w (i ) = ∑ P(i, j)m(i, j)(w( j)x( j) + x( j)) (4.33)
j =1
S
q (i ) = ∑ P(i, j)m(i, j) (4.34)
j =1
Once we know the matrix m = [m(i, j)], we know how to solve (4.33) and (4.34)
for w and q, and from them, all the objects we might be interested in.
Finding m poses a new challenge, though, because of it’s dependence on the
value function, through φ. We first need to find φ, and that will require some-
thing we’ve not had to do thus far—solving for something iteratively, rather
than just inverting some matrices.
80
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
2. Given mu, update φ using (4.35). Again, this step can be done in one
operation, without resorting to a ‘for’ loop:
You would perform those two steps repeatedly, inside a ‘while’ loop, until
the successive iterates are changing only negligibly from one iteration to the
next. Before starting the loop, you need to set initial conditions and create
some variables to control the loop’s behavior. You start with some φ0 —say a
column of ones. I create one variable called chk which will record the distance
between iterates, and a variable tol that is my threshold for stopping the loop
(when chk gets less that tol). You can use M ATLAB’s norm function to measure
the distance between iterates. I set tol to be something very small, 10−5 , say,
or 10−7 .10
I also create a variable called maxits—the maximum number of times to
iterate, regardless of whether the iterates converge. This is just in case some-
thing is not right, so the loop doesn’t go on forever. There’s a variable t which
starts at 0 and gets incremented by 1 with each pass through the loop. The loop
breaks when either chk < tol or t > maxits.
The general form would be:
phi0 = ones(2,1);
chk = 1;
tol = 1e-7;
maxits = 1000;
t = 0;
while chk>tol && t<maxits;
[Steps to make φ1 here]
chk = norm(phi0-phi1)
t = t+1;
phi0 = phi1;
end;
Note the last line, which sets φ0 = φ1 —we’re repeatedly mapping a φ into a
new φ, then plugging that result into the mapping to get another iterate.
10 This choice depends a bit on what you’re trying to find and what you plan on using it for.
81
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
β ρ α
1 0.95 1/2 5
2 0.95 1/2 45
3 0.95 −9 5
4 0.95 −9 45
5 0.98 1/2 5
6 0.98 1/2 45
7 0.98 −9 5
8 0.98 −9 45
These are some of the parameter combinations for which Weil reports results
in the tables on page 413 of his 1989 J.M.E. paper. His γ is our α, and his ρ is
our 1 − ρ; in the table above, ρ = 1/2 corresponds to an EIS of 2, and ρ = −9
corresponds to an EIS of 0.1.
When I did this, I found some minor differences (at the first or second decimal
places) between my results and Weil’s, so don’t be surprised if you don’t obtain
exact matches. If you’re doing it right, though, your numbers should be very,
very close to Weil’s.
c t +1 ρ −1
1− α − ρ
v t +1
m t +1 = β (4.36)
ct µ t ( v t +1 )
as
ρ−1 ! 1−ρ α
c t +1 1− α −1
m t +1 = β ( R t +1 ) ρ (4.37)
ct
where Rt+1 is the equilibrium equity return.
82
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
Wt = zt ( pt + yt ).
Here Rzt+1 is the return on the agent’s portfolio; in equilibrium this is, of course,
the market equity return, which we denote as before by Rt+1 .11
Now write the agent’s dynamic program with W as the individual state (s
still denotes the aggregate state, and we assume R is a function of s):
h ρ i1/ρ
v(W, s) = max (1 − β)cρ + βµs v( R0 (W − c), s0 ) .
c
In this expression, we’ll assume the agent holds the market portfolio, and we
write R0 as shorthand for R(s, s0 ), the market return between states s and s0 .12
The degree-one homogeneity of utility implies that v is degree-one homoge-
neous in W—i.e., we can write
v(W, s) = ξ (s)W
µ s v ( R 0 (W − c ) , s 0 ) = µ s ξ ( s 0 ) R 0 (W − c )
= µ s ξ ( s 0 ) R 0 (W − c )
The problem looks almost static, but the term βµs (ξ (s0 ) R0 ) is capturing the
ρ
11 It should be clear that including b, or applying this to the Lucas case of multiple trees, makes
83
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
βµs (ξ (s0 ) R0 )
ρ
1 − κ (s) ρ
= (1 − β )κ ( s ) 1 +
ρ
1−β κ (s)
" 1− ρ #
1 − κ (s) 1 − κ (s) ρ
= (1 − β )κ ( s ) 1 +
ρ
κ (s) κ (s)
1 − κ (s)
= (1 − β )κ ( s ) ρ 1 +
κ (s)
= (1 − β ) κ ( s ) ρ −1 (4.42)
where the third line uses (4.41).
The equations (4.41) and (4.42), plus the budget constraint (4.38) (with the
equilibrium market return Rt+1 ), and the decision rule (4.40) are all the pieces
we need. We proceed by asking what all these relationships imply for the
growth rate of consumption between states st and st+1 . There are several steps,
at various times utilizing (4.38), and (4.40)–(4.42):
c t +1 κ (st+1 )Wt+1
=
ct κ (st )Wt
κ (st+1 ) Rt+1 (1 − κ (st )Wt
=
κ (st )Wt
1 − κ (st )
= R t +1 κ ( s t +1 )
κ (st )
1 1
ξ (st+1 )ρ ρ−1 βµst (ξ (st+1 ) Rt+1 )ρ 1−ρ
= R t +1
1−β 1−β
ρ
1 1− ρ ρ
= β 1−ρ Rt+1ρ−1 (ξ (st+1 ) Rt+1 ) ρ−1 µst (ξ (st+1 ) Rt+1 ) 1−ρ
1
ρ
1
1 − ρ ξ ( s t + 1 ) R t + 1
ρ −1
= β 1− ρ R t +1
µ s t ( ξ ( s t +1 ) R t +1 )
84
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
Now, rearrange to solve for the terms involving the value function—i.e., the
terms in ξ:
1
c t +1 1− ρ
ξ ( s t +1 ) R t +1 1
= ( βRt+1 ) ρ . (4.43)
µ s t ( ξ ( s t +1 ) R t +1 ) ct
But, note, the term on the left is precisely vt+1 /µt(vt+1 ):
Now, just substitute the right-hand side of (4.43) for vt+1 /µt (vt+1 ) in (4.36) and
simplify the resulting expression to obtain (4.37).
turns. One asset, perhaps the first, may be the risk-free asset. Let θ denote
the vector of portfolio weights; as in our treatment of mean-variance portfo-
lio choice, the budget constraint becomes a constraint that the weights sum to
one—θ · 1 = 1. The portfolio return is ∑i θi Ri0 = θ · R0 .
The agent’s dynamic programming problem becomes:
h
0
i1/ρ
0 ρ
v(W, s) = max (1 − β)c + βµs v(θ · R (W − c), s )
ρ
: θ·1 = 1
c,θ
This problem appears to be essentially a static one, but the normalized value
function ξ (s0 ) encodes information about the marginal value to the agent of
consumption today versus wealth in different states tomorrow.
Taking account of the form of the certainty equivalent operator µs , the prob-
lem can be written as
!1−α 1−1 α
max Es ξ (s ) ∑ θi Ri (s, s )
0 0 : ∑ θi = 1 . (4.44)
θ i i
85
4.2. EPSTEIN-ZIN PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
Exercize 4.4. Use (4.43) and ∑i θi,t Ri,t+1 = Rt+1 to show that (4.47) is the same
stochastic discount factor described by either (4.36) or (4.37).
86
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I
Given that return, the agent chooses consumption today in the same way
he would if he faced a constant certain return of R̄ = µ(∑i θi Ri,t+1 ). Under
certainty, Epstein-Zin preferences are ordinally equivalent to
∞
∑ βt ct /ρ.
ρ
t =0
which are solved by a decision rule of the form ct = κWt , for a constant κ.14
The κ that solves these first order conditions is given by
1
1 − κ = ( β R̄ρ ) 1−ρ .
This is the same κ one obtains from combining (4.41) and (4.42) under the
assumption of i.i.d. returns.
If there is one risky asset—so R̄ = µ( Rt+1 )—then a mean-preserving spread
of the distribution of Rt+1 lowers µ( Rt+1 ). This can either increase or decrease
the agent’s savings rate (1 − κ ), depending on whether ρ/(1 − ρ) is positive or
not. Recalling that ρ = 1 − 1/EIS, an increase in rate-of-return uncertainty, in
an i.i.d. world, increases the agent’s savings rate so long as his EIS > 1.
This result is worth keeping in mind when we get to the long-run risk
model of Bansal and Yaron [BY04]—an EIS greater than one is critical to their
results. This property of Epstein-Zin preferences—and others—were shown
by Weil in his Q.J.E. paper [Wei90].
87
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I
x (while the third is a constant) gives exactly the degrees of freedom neces-
sary make the model’s stochastic discount factor consistent with the return
process—(3.25)—that Melino and Yang derived as consistent with the first two
moments of asset returns data, given the two-state Mehra-Prescott consump-
tion process.15 But, as Melino and Yang show, just having the necessary de-
grees of freedom doesn’t mean the resulting state-dependent parameters will
satisfy natural requirements like ρ ≤ 1 or α > 0.
If only one parameter is allowed to vary with the state, they have to choose
what to target; in these cases, they seek values of the state-dependent pa-
rameters that are consistent with the pricing relation for the equity return,
Et (mt+1 Rt+1 ) = 1, then see what those parameters imply for the values taken
on by the risk-free rate. When all three parameters are allowed to vary with the
state, they have an extra degree of freedom, and there are consequently many
ways to match the data.
Their notation is almost identical to ours, though they use α for our 1 − α,
and denote the consumption growth rate (our x) by g. The price-dividend ratio
we have called w, they denote by P—i.e., whereas we have written the equity
return from state i to state j as R(i, j) = x ( j)(1 + w( j))/w(i ), they would write
g( j)(1 + P( j))/P(i ). One major difference, though—and this becomes appar-
ent if you try to derive the form of their stochastic discount factor within our
model of Epstein-Zin preferences—is that whereas we write the CES aggrega-
tor W (c, µ) as
1
W (c, µ) = [(1 − β)cρ + βµρ ] ρ ,
they write
1
W (c, µ) = [cρ + βµρ ] ρ .
With a constant utility discount factor, these preferences just differ by a factor
of proportionality—they imply the same marginal rates of substitution, hence
are equivalent representations. This is not the case when β can vary with the
state.
The key expression is their equation (6.9), which describes the stochastic
discount factor when all three parameters are allowed to vary with the state.
Translated it into our notation, and with our form for the aggregator, it reads:
1− 1− α ( s t ) 1− α ( s t )
−α(s ) wt ρ(st ) −1
m t +1 = β ( s t ) x t +1 t ( 1 + w t +1 ) ρ ( s t +1 )
β(st )
!1− α ( s t )
(1 − β(st+1 ))1/ρ(st+1 )
× (4.48)
(1 − β(st ))1/ρ(st )
The last piece—the one involving β(st ) and β(st+1 )—is one that comes about
because of the form we use for the aggregator. Setting that last term equal to
one gives the pricing kernel studied by Melino and Yang.
15 Alternatively, they have the degrees of freedom to equate the model’s stochastic discount factor
88
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I
How do they get this? If none of the parameters vary with the state, (4.48) is just a
version of our (4.37)—simply plug in xt+1 for ct+1 /ct , xt+1 (1 + wt+1 )/wt for Rt+1 , and
rearrange.
When the parameters vary, there is another route, that relies on an equilibrium re-
lationship between the consumption-wealth ratio—the κ of the last section—and the
price-dividend ratio w. At the start of the section 4.2.4, we defined the agent’s wealth as
zt ( pt + yt ). In equilibrium, with z = 1 and c = y, the consumption-wealth ratio κ obeys
ct
κt =
Wt
yt
=
pt + yt
1
=
pt /yt + 1
1
=
wt + 1
If you begin with the version (4.47) of the stochastic discount factor, taking account of
the dependence of the preference parameters on the state—that is, begin with
1− α ( s t ) −α(st )
ξ t +1 R t +1
m t +1 =
µt(ξ t+1 Rt+1 )1−α(st )
—and (a) use (4.41) to replace the µt (ξ t+1 Rt+1 ) with an expression in wt , ρ(st ), and
β(st ); (b) use (4.42) to replace the ξ t+1 with an expression in wt+1 , ρ(st+1 ), and β(st+1 );
and (c) use xt+1 (1 + wt+1 )/wt to replace Rt+1 , and re-arrange some terms, you should
obtain (4.48).
In terms of the the Markov chain representation for the evolution of the
state, we can re-write (4.48) as
1− 1− α ( i ) 1− α ( i )
− α (i ) w (i ) ρ (i ) −1
m(i, j) = β(i ) x ( j) (1 + w( j)) ρ( j)
β (i )
!1− α ( i )
(1 − β( j))1/ρ( j)
× . (4.49)
(1 − β(i ))1/ρ(i)
89
4.3. STATE DEPENDENT TASTES LECTURE 4. PUZZLE RESPONSES, I
Also, plug in parameter values for any of the taste parameters that are not going
to be state-varying. Then, seek values of the state-dependent parameters to try
to satisfy one or more of the four pricing relations
2
∑ P(i, j)m(i, j) R̂(i, j) = 1 (4.50)
j =1
2
∑ P(i, j)m(i, j) = 1/ R̂F (i) (4.51)
j =1
where P is the Mehra-Prescott transition matrix and R̂ and R̂ F are the Melino-
Yang returns (3.25).
Melino and Yang consider several combinations of state-dependence in one
or more parameters, while keeping the other(s) constant. The cases they report
results for are:
1. β and ρ fixed, α state-dependent.
2. β and α fixed, ρ state-dependent.
3. ρ fixed, α and β state-dependent.
4. β fixed, α and ρ state-dependent.
5. All three parameters state-dependent.
In (1) and (2) they seek values to satisfy (4.50), then check what those values
imply for the risk-free rate—i.e., they check how badly they miss on (4.51). In
(3) and (4), they can hit both sets of targets, but not necessarily with plausible
parameter values.
Probably their most striking finding is that countercyclical risk aversion
alone—that is, with constant ρ and β—doesn’t help: there’s no real improve-
ment over what you could get with all parameters constant. A procyclical
elasticity of intertemporal substitution allows you to match first moments of
returns, keeping the other parameters fixed. But, countercyclical risk aversion
together with a slightly procyclical willingness to substitute intertemporally
allows you to match both first and second moments of the returns data.
Since the risk-free rate is countercyclical—in the data as well as in their RˆF —
it’s not surprising that we get a procyclical EIS: in bad times agents must be
demanding more compensation to substitute consumption over time. Not sur-
prisingly, given the risk neutral probabilities implied by their returns R̂ and RˆF ,
the risk aversion parameter α alternates between extreme risk aversion and ap-
proximate risk neutrality. For a constant utility discount factor of β = 0.98, for
example, they match first and second moments exactly with risk aversion alter-
nating between α ≈ 23 and α ≈ 0, while ρ moves slightly between about −1.98
and −2.10—i.e., the agent’s EIS alternates between roughly 0.34 and 0.32.16
16 And, in a way, that’s the truly striking part. Just countercyclical risk aversion?—no real gain.
90
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
91
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
Note that the GP (i, j) have all the properties of Markov chain probabilities:
GP (i, j) ≥ 0 for all i and j, and ∑ j GP (i, j) = 1 for all i. Thinking of them as
probabilities, note that their ‘CDF’ obeys:
j
Pr {φt+1 ≤ φ( j) : φt = φ(i )} = ∑ GP (i, k)
k =1
!γ
j
= ∑ P(i, k)
k =1
j
≥ ∑ P(i, k)
k =1
The last line follows from the fact that when γ ∈ (0, 1], r γ ≥ r for all r ∈ [0, 1].
In a sense, GP gives more weight to the lower valued outcomes than does
the true probability P. This is most apparent, again, in the two-state case. Sup-
pose that P(i, 1) = P(i, 2) = 12 , and that γ = 0.9. Then
( GP (i, 1), GP (i, 2)) = (1/2)0.9 , 1 − (1/2)0.9 ≈ (0.54, 0.46) .
The rank-ordering of the outcomes is based on the values of vt+1 —the Markov
states need to be ordered so that state 1 has the lowest value of vt+1 and state
n has the highest.
Because the aggregator and certainty equivalent are still homogeneous of
degree one, we get all the useful homogeneity properties we had before—vt
still equals Φ( xt )ct in equilibrium when consumption follows a Mehra-Prescott-
type process, and the price of a consumption claim still has the form p( xt , ct ) =
w( xt )ct .
If you’ve written good code to solve the Mehra-Prescott model with Epstein-
Zin preferences, it’s easy to modify that code to solve the model with FORA
preferences. An exercize below will ask you to do just that.
18 And note well that when we get to the steps where we’re calculating expectations—e.g., the
92
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
paying $50 or so to extend the warranty on a $500 refrigerator. More seriously, see, for example,
the application in Bernasconi [Ber98], which uses first-order risk aversion to rationalize what ap-
pear (from an EU perspective) to be puzzlingly high rates of tax compliance in most developed
economies.
21 This point—which we discuss in more detail below—is formalized in Rabin’s [Rab00] ‘calibra-
93
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
94
4.4. FORA PREFERENCES LECTURE 4. PUZZLE RESPONSES, I
shown to be problematic for the EU model. For example, Chetty [Che06] has
shown that estimates of labor supply elasticity (and the degree of complemen-
tarity between consumption and leisure) can put sharp bounds on admissible
coefficients of relative risk aversion, since both values are linked to the curva-
ture of agents’ von Neumann-Morgenstern utilities over consumption. Chetty
finds that the mean coefficient of relative risk aversion implied by 33 studies of
labor supply elasticity is roughly unity, which would mean that the EU model
is incapable of rationalizing both observed labor supply behavior and the de-
grees of risk aversion observed in many risky choice settings, many of which
imply double-digit coefficients of relative risk aversion.
As we hinted at above, another attractive feature of FORA preferences is
the fact that they can be parametrized to give a reasonable amount of risk aver-
sion for both large and small gambles. This is in contrast to the standard ex-
pected utility specification. In the CRRA class, for example, if the coefficient of
risk aversion is calibrated so that an agent with those preferences gives plau-
sible answers to questions about large gambles, the agent will be roughly risk
neutral for small gambles. If, on the other hand, the coefficient of risk aver-
sion is set sufficiently large that the agent gives plausible answers to questions
about small gambles, he will appear extremely risk averse when confronted
with large gambles.24
One way to visualize the approximate risk neutrality of the standard ex-
pected utility specification with constant relative risk aversion is to note that
it’s “smooth at certainty”—the agent’s indifference curves between consump-
tion in different states of nature are smooth and tangent (at the certainty point)
to the indifference curves of a risk neutral agent. This is true for EU with any
differentiable von Neumann-Morgenstern utility function.
FORA preferences introduce a kink into agents’ indifference curves at the
certainty point; the kink is what allows for a plausible calibration of risk aver-
sion for small gambles.25 The parameter γ—which makes outcome rankings
matter—is the source of the kink. The parameter α, analogous to the risk aver-
sion coefficient in CRRA preferences, governs curvature away from the cer-
tainty point and allows for a plausible calibration of risk aversion for large
gambles.
24 This point was made formally by Rabin [Rab00], though EU preferences aren’t the only form
susceptible to this critique. Indeed, as Safra and Segal show in a recent paper [SS08], almost all
common alternatives to expected utility are susceptible to this criticism. The one exception noted
by Safra and Segal is Yaari’s dual theory of choice under risk, which is the special case of (4.52)
when α = 0. A useful perspective is offered by Palacios-Huerta, Serrano and Volij [PSV04]: “[I]t
is more useful not to argue whether expected utility is literally true (we know that it is not, since
many violations of its underpinning axioms have been exhibited). Rather, one should insist on the
identification of a useful range of empirical applications where expected utility is a useful model
to approximate, explain, and predict behavior.”
25 See figure 1 in [EZ90]. The “disappointment aversion” preferences used by Routledge and Zin
95
4.5. DISAPPOINTMENT AVERSION LECTURE 4. PUZZLE RESPONSES, I
96
Lecture 5
The models of the last lecture all modified, in some way, the preferences of
Mehra and Prescott’s representative agent.1 This lecture will focus on two
models that alter the consumption process faced by the representative agent—
Bansal and Yaron’s [BY04] ‘long run risk’ approach, and the ‘rare disasters’
model, originally due to Rietz [Rie88], but lately revived by Barro [Bar06], Gou-
rio [Gou08], and Gabaix [Gab08].
Your first thought might be—“The consumption process is whatever it is
in the data; you can’t just plug in another one.” That would be true if the
data spoke definitively on the subject, but, given a limited number of observa-
tions, the data may not be sufficient to discriminate between alternatives that,
while close to one another in some statistical sense, have dramatically different
implications for the behavior of economic models. Is the distribution of log
consumption growth rates better described by a normal distribution or by a
distribution with fatter tails? Whether a consumption disaster is likely to oc-
cur once every couple hundred years or a couple times every hundred years is
difficult to decide with just 100 years’ worth of data, but makes an enormous
difference for pricing claims to aggregate consumption.
Looking ahead to Bansal and Yaron’s long-run risk, it’s difficult to distin-
guish between a consumption process with log differences that are i.i.d. about a
1 Unfortunately,we didn’t have time to look at models that dispense with the representative
agent altogether—for example, [Guv09]. Our treatment was also far from exhaustive, omitting the
interesting work on disappointment aversion by Routledge and Zin [RZ10] or Campanale et al.
[CCC10].
97
LECTURE 5. PUZZLE RESPONSES, II
constant mean, and a process with i.i.d. fluctuations about a conditional mean
subject to very small but very persistent fluctuations. Figure 5 plots some ar-
tificial data I created in M ATLAB. In one of the series in the top panel I used
a constant mean growth rate, in the other a fluctuating conditional mean. The
fluctuating conditional mean—the difference between the two series in the top
panel—is shown in the lower panel. The parameters are Bansal and Yaron’s,
so the standard deviation of the innovations to the conditional mean is very
small compared to the innovations around the conditional mean. The i.i.d. in-
0.01
-0.01
-0.02
-0.03
0 20 40 60 80 100 120
0.01
-0.01
-0.02
0 20 40 60 80 100 120
Figure 5.1: Simulated data using parameters from Bansal and Yaron’s con-
sumption process. Top panel shows log growth rates with and without long-
run risk component. Bottom panel shows long-run risk component.
98
LECTURE 5. PUZZLE RESPONSES, II
14
12
10
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Figure 5.2: Simulated data using parameters from Bansal and Yaron’s con-
sumption process. The figure shows cumulated log growth rates for series
with and without a long-run risk component.
samples, but I’ve chosen to show 10, 000 periods just because anything shorter
might look like I’m cherry-picking the data. And, it’s certainly not movements
at that low a frequency that are driving the difference between the asset pric-
ing implications with and without long-run risk. In an exercize to follow, you’ll
be asked to solve a version of the Bansal-Yaron model. That solution will in-
volve iterating on the representative agent’s (equilibrium) value function. By
checking how many iterations it takes for the iterates of the value function to
converge to within a reasonably small tolerance of one another—say 10−7 —
you’ll see that the agent effectively looks into the future far fewer than 10, 000
periods.2
2 To be sure, technically the agent looks over the whole infinite horizon; practically speaking,
though, anything beyond 150 or so periods is discounted so heavily that it has only a negligible
impact on the agent’s utility.
99
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
A version of this short, informative paper is available from Leroy’s website, here http://www.
econ.ucsb.edu/~sleroy/downloads/excess.pdf.
100
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
Index
300- Index
2000-
225!
1500
p
150- *
1000-
75-
500-
0 I year yeor
0 I I I I 1
1870 1890 1910 1930 1950 1970 1928 1938 1948 1958 1968 1978
FIGURE 1 FIGURE 2
Note: Real Standardand Poor'sCompositeStock Price Note:RealmodifiedDow JonesIndustrialAverage(solid
Index (solid line p) and ex post rationalprice (dotted line p) and ex post rational price (dotted line p*),
line p*), 1871- 1979,both detrendedby dividinga long- 1928-1979,both detrendedby dividingby a long-run
run exponentialgrowth factor. The variablep* is the exponentialgrowthfactor.Thevariablep* is the present
present value of actual subsequentreal detrendeddi- value of actual subsequentreal detrendeddividends,
vidends, subject to an assumptionabout the present subject to an assumptionabout the present value in
value in 1979 of dividendsthereafter.Data are from 1979of dividendsthereafter.Data are from Data Set 2,
Data Set 1, Appendix. Appendix.
Figure 5.3:path
growth Figure
for taken from Shiller
the Standard [Shi81].
and Poor's that p, =E,( p*), i.e., p, is the mathematical
series, 16-38 percentbelow the growthpath expectation conditional on all information
for the Dow Series)only for a few depression availableat time t of p*. In otherwords,p, is
While this approach years: 1933,
had 1935,and 1938.problems
1934,econometric
some The mov- associated
the optimalwithforecast
it—of p*. One can define
ing averagewhich determinesp* will smooth the forecast error as u,= p* -pt. A funda-
having to do with the outpossible nonstationarity
such short-run fluctuations.of boththe
Clearly pricesmental
and dividends—
principleof optimal forecastsis that
as well as some economic problems, 4 the point still stuck. Eventually, the idea
stock market decline beginningin 1929 and the forecast error u, must be uncorrelated
ending
of excess price volatility wasin 1932 could not in
formulated be more
rationalized
robustin waywithasthe forecast;that
a question of is, the covariancebe-
terms of subsequentdividends!Nor could it tween p, and u, must be zero. If a forecast
what explains the volatility of price-dividend ratios.
be rationalizedin terms of subsequentearn- error showed a consistent correlationwith
Campbell and Shiller [CS88]
ings, since provided
earnings a useful
are relevant framework—referred
in this model to now
the forecastitself, then that would in itself
as the Campbell-Shiller as indicators of laterthinking
onlyapproximation—for dividends. Of the
about imply that the
sources forecast could be improved.
of volatil-
it can be shown from the
ity of price-dividendcourse,
ratios.
say
the efficient marketsmodel does not
They start with an identity basedMathematically,
p=p*. Might one still suppose that this
on the
theory of definition
conditional expectations that u,
of an ex post return on a stock
kind (or market
of stock index crash
of stocks):
was a rational must be uncorrelatedwithp,.
mistake,a forecasterrorthat rationalpeople If one uses the principlefrom elementary
Rt+might make?This paperwill explorehere the
1 = ( dt+1 /dt )(1 + pt+1 /dt+1 ) / ( pt /dt ).
statisticsthat the varianceof the sum of two
notion that the very volatility of p (i.e., the uncorrelatedvariables is the sum of their
tendency of big movements in p to occur variances,one then has var(p*) var(u)+
They then take a Taylor series approximation of this
again and again) implies that the answeris identity to get Since variancescannot be negative,
var(p).
no. this means var(p)) ?var(p*) or, converting
Tortgive
+1 = + goft+the
anκ0idea 1 +kind
κ1 zt+ − zt
of1 volatility (5.4)
to more easily interpreted standard devia-
comparisonsthat will be made here, let us tions,
where gt+1 = log(dconsider
t+1 /dputs
which
at this point the simplestinequality
t ) and
limitsrton = measure
+1 one log( Rtof ) are the continuous divi-
+1volatil- (1) (p or(P*)
dend growth rate and continuous return, and
ity: the standarddeviationof p. The z t is the log price-dividend ratio
efficient
marketsmodel
log( pt /dt ). The coefficients κ0 andcan κ1be >described
0 come out as asserting Thisseries
of the Taylor inequality (employed before in the
approx-
imation, and involve means of r, z and g.
As it stands, (5.4) has no economic content—it’s an approximation of an
identity that must hold ex post. One can give it economic content by turning
4 What model predicts a constant-discounted, expected present value as the price of an asset?
Oddly, Shiller’s paper came out after Lucas’s 1978 paper, though was apparently not at all informed
by it.
101
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
z t = Et [ κ 0 + g t +1 − r t +1 + κ 1 z t +1 ]
∞
" #
κ0 j −1
+ Et ∑ κ 1
= gt + j − r t + j (5.5)
1 − κ1 j =1
1
z t +1 − z t = − Et( gt+1 − rt+1 )
κ1
∞
" #
1 − κ1 κ0 j −1
) + (Et +1 − Et ) ∑
+ (zt − κ1 gt + j − r t + j
κ1 1 − κ1 j =2
I think it’s fair to say that for the past two decades, the assumption has
been that most of the action here had to come from changing h expectations of
∞ j −1 i
returns—if gt+1 is i.i.d., or close to it, then (Et+1 − Et ) ∑ j=2 κ1 gt+ j ≈ 0.
This is the perspective, for example, of Campbell and Cochrane [CC99], who
assume the growth rates are in fact i.i.d., and seek a mechanism for generating
swings in the expected returns.
However, Barsky and DeLong [BDL93], and later Bansal and Lundblad
[BL02], pointed out that permanent or very highly persistent changes in the ex-
pected growth rate, even if very small, could have large effects on asset prices.
From (5.5), if κ1 is close to one, a small permanent increase in the conditional
mean of dividend growth can lead to a large increase in the log price-dividend
ratio, holding fixed the returns rt+ j . Suppose Et+1 ( gt+ j ) − Et ( gt+ j ) = ∆ for all
j. Then,
∞
" #
j −1 κ ∆
(Et +1 − Et ) ∑ κ 1 gt + j = 1 ,
j =2
1 − κ1
102
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
1
r = − log( β) + g (5.6)
ψ
1
∆r = ∆g,
ψ
so that
1
∆g − ∆r = 1− ∆g
ψ
which is positive if ψ > 1.
The need for Epstein-Zin preferences will become shortly.
Bansal and Yaron assume the following process for log consumption growth,
log(ct+1 /ct ) ≡ gt+1 :
where ηt+1 and et+1 are both i.i.d. standard normal—i.e., N (0, 1)—variables.
I’ve used ν for their µ, since we’ve already been using µ to denote the Epstein-
Zin certainty equivalent operator. Thus, the log consumption growth rate is
conditionally normal with conditional mean Et ( gt+1 ) = ν + xt and constant
103
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
where ut+1 is N (0, 1) and independent of ηt+1 and et+1 . They will assume φd
is large, so that dividend growth has a much higher conditional variance than
consumption growth. φ will also be large, so innovations to xt —the et+1 ’s—
will have a larger impact on the conditional mean of dividend growth than on
the conditional mean of consumption growth.6
Some features of the Bansal-Yaron process:
• φe is going to be calibrated so that the variance of xt+1 is small compared
to σ2 , making the white noise component of consumption growth domi-
nant over short horizons.
• ρ is going to be calibrated close to one, so that the fluctuations in the
conditional mean Et ( gt+1 ) = ν + xt will be highly persistent. This also
means that the unconditional variance of xt+1 will be large.
• Our Mehra-Prescott process matches up with the i.i.d. part of gt+1 . To be
sure, we modeled consumption growth as having a slight negative auto-
correlation, but Mehra and Prescott’s process is not that far from i.i.d..
• The state variables—on which prices and the agent’s value will depend—
will be ct (or dt ) and xt . As before, homogeneity will allow us to divide
out the dependence on levels. The i.i.d. shocks are not state variables—
while the stochastic discount factor will be seen to depend on xt , xt+1 ,
and ηt+1 , the i.i.d. disturbance will integrate out when we take expecta-
tions.
• Bansal and Yaron also incorporate time-varying volatility (σ varies over
time). Time permitting, we’ll add that in after we examine the more basic
model with constant volatility.
Let pc ( xt , ct ) denote the price of a claim to the consumption process, pd ( xt , dt )
the price of a claim to the dividend process, and q( xt ) the price of a riskless
claim to one unit of consumption next period. Using the price of the consump-
tion claim as an example, the assets are priced in the by-now-familiar way
104
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
where (5.11) uses the fact that ct+1 /ct = e gt+1 , and (5.12) relies on the fact that
xt can be taken outside the date-t conditional expectation.
Bansal and Yaron solve their model using the return form of the Epstein-Zin
stochastic discount factor, as we derived in section 4.2.4, equation (4.37). They
describe their results in terms of log-linear approximations of the Campbell-
Shiller type, though the computational method they use to obtain the numbers
in their tables is a polynomial projection method.7
I prefer to examine the model using the Epstein-Zin stochastic discount fac-
tor in its form
1 1 −α
c t +1 − ψ
v t +1 ψ
m t +1 = β
ct µ t ( v t +1 )
where I’ve written 1/ψ for our previous ρ − 1, both to make the dependence on
the value of the EIS more explicit, and to avoid confusion with the persistence
parameter ρ from (5.8).
The (equilibrium) lifetime utility process vt still obeys the recursion
11
1− ψ1
1− 1 1− ψ
v t = (1 − β ) c t + βµt (vt+1 ) ψ ,
vt = Φ( xt )ct
approach. Judd [Jud98] discusses computationally efficient methods for solving dynamic models
using polynomial projections.
105
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
cess to get
" 1− 1 # 1−1 1
c ψ ψ
Φ( xt ) = 1 − β + βµt Φ ( x t +1 ) t +1
ct
1
1− ψ1 1− ψ1
= 1 − β + βµt Φ( xt+1 )eν+ xt +σηt+1
1
1− ψ1 (ν+ xt ) 1− 1 1− ψ1
= 1 − β + βe µt (Φ( xt+1 )eσηt+1 ) ψ (5.13)
Note that both (5.13) and (5.14) involve µt (Φ( xt+1 )eσηt+1 ). Because ηt+1 is
independent from xt+1 , we can split the certainty equivalent of Φ( xt+1 )eσηt+1
as
µt (Φ( xt+1 )eσηt+1 ) = µ (eσηt+1 ) µt (Φ( xt+1 )) .
This follows from the fact—which you can verify from the definitions of µ
and statistical independence—that if y and z are independent, then µ(yz) =
µ(y)µ(z). The certainty equivalent of eσηt+1 , using the rules for expectations of
lognormal random variables,8 is
1 2
µ(eσηt+1 ) = e 2 (1−α)σ .
Applying these results to (5.13) and (5.14), after some algebra, we can derive
the two expressions that will be the basis for our computational solution of the
model:
1 −α
Φ ( x t +1 )
− ψ1 xt −ασηt+1 ψ
m t +1 = m 0 e e (5.15)
µt (Φ( xt+1 ))
1
1
(1− ψ1 )(ν+ xt +(1/2)(1−α)σ2 ) 1− ψ1 1− ψ
Φ( xt ) = 1 − β + βe µt (Φ( xt+1 )) (5.16)
where the m0 in the expression for the stochastic discount factor collects to-
gether several constants, m0 = β exp −(1/ψ)ν + (1/2)(α − (1/ψ))(1 − α)σ2 .
106
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
= .979
0.3 = .900
0.2
0.1
log[(t+1)/t()]
-0.1
-0.2
-0.3
-0.4
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
100*x(t+1)
Figure 5.4: The figure plots log (Φ( xt+1 )/µt (Φ( xt+1 ))) for ρ = .979 and ρ = .9,
illustrating the role of persistence in the pricing of xt+1 risk. Φ is solved for
using the Markov chain method described in the next section. The parameters
are as in Bansal and Yaron
All of this tells how the stochastic discount factor given by (5.15) is go-
ing to price risk associated with innovations to the conditional mean of of
the consumption growth rate. The i.i.d. risk is priced by the e−ασηt+1 term,
which is standard—a high growth realization makes the agent less ‘hungry’
(in Cochrane’s terminology), so assets whose payoffs covary positively with
ηt+1 are less valuable than assets whose payoffs covary negatively with ηt+1 .
The key parameter here is the risk aversion parameter α.
The xt+1 risk—risk associated with the innovations et+1 in xt+1 = ρxt +
φe σet+1 —is priced by the scaled value function Φ( xt+1 ). A positive realization
of et+1 raises the conditional mean rate of consumption growth xt+1 , gener-
ating an increase in Φ( xt+1 ); the increase is larger the higher is the degree of
persistence in the x process. If ψ1 − α < 0, which it will be if ψ > 1 and α > 0, a
positive innovation to xt+1 makes the agent less ‘hungry’. Assets whose pay-
offs covary positively with xt+1 will require a risk premium to convince the
agent to hold them in equilibrium. This characterizes both the consumption
per unit of consumption today (which is what Φ( xt ) measures). Proving it is a bit more subtle and
involves treating (5.16) as a mapping that (one can show) preserves monotonicity. And, the limit
of a sequence of increasing functions is, at the least, a nondecreasing function. Note that all this is
true independent of whether ψ > 1 or not.
10 At the other extreme ( ρ = 0), the x process is i.i.d., and µ ( Φ ( x
t t+1 )) = µ ( Φ ( xt+1 )) is a
for all t. In this case, the effect of changes in xt on Φ( xt ) is fully captured by the
constant
1
exp 1− ψ xt term in (5.16).
107
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
p c ( x t +1 , c t +1 ) + c t +1 w c ( x t +1 ) + 1
Rct+1 = = eν+ xt +σηt+1 (5.17)
p c ( x t +1 , c t +1 ) wc ( xt )
Under the assumption that ψ > 1, for the reasons described above at the start of
this section, the consumption price-dividend ratio wc ( xt+1 ) will be increasing
in xt+1 , so the consumption claim is exposed to the xt+1 risk. Similarly, the
return on the dividend claim is
w d ( x t +1 ) + 1
Rdt+1 = eνd +φxt +φd σut+1 . (5.18)
wd ( xt )
The dividend claim is exposed to the xt+1 as well, through wd ( xt+1 ). Note
that ut+1 , the i.i.d. part of the dividend claim’s return, is independent of ev-
erything in the SDF mt+1 . This means that the i.i.d. risk in dividend growth is
not priced—if the only risk associated with the dividend claim were from ut+1 ,
then the dividend claim’s return would equal the risk-free rate.
5.1.3 Computation
As mentioned above, introducing polynomial methods is beyond the scope of
this course, so our computational approach here will be similar to ones we’ve
used before—namely, discretizing the state space using Markov chains. In par-
ticular, we’re going to use a Markov chain to approximate the process for xt ,
given in (5.8). Let ( x (1), x (2), . . . x (S); P) denote the Markov chain.
As before, under the Markov chain assumption, the stochastic discount fac-
tor m will be a matrix, while the scaled value function Φ, and the prices wc , wd
and q, will be vectors. The certainty equivalent of Φ conditional on x = x (i )
today is
" # 1
S 1− α
µi ( Φ ) = ∑ P(i, j)Φ( j)1−α .
j =1
(5.19)
The next step is to evaluate the pricing relations
q ( x t ) = Et [ m t +1 ]
108
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
One thing to note is that, because ut+1 and ηt+1 are independent of xt+1 , the
terms involving these innovations (including the ηt+1 term in mt+1 ) can be
collected together, and their expectations evaluated separate from the terms in
xt+1 . Under the Markov chain assumption, these relations then become—using
(5.15)—
S 1 −α
Φ( j)
(1− ψ1 ) x (i ) ψ
wc (i ) = A c e ∑ P(i, j) µi ( Φ )
( w c ( j ) + 1) (5.20)
j =1
S 1 −α
Φ( j)
(φ− ψ1 ) x (i ) ψ
wd (i ) = A d e ∑ P(i, j) µi ( Φ )
( w d ( j ) + 1) (5.21)
j =1
S 1 −α
Φ( j)
− ψ1 x (i ) ψ
q (i ) = A q e ∑ P(i, j) µi ( Φ )
(5.22)
j =1
Aq = m0 E e−ασηt+1
With all this machinery in hand, it’s straightforward to put the model on
the computer and see what it spits out. At this point, you should be familiar
enough with M ATLAB and models of this sort to put together a program that
solves the model as described above. Most of the parameters, I’ll take from
Bansal and Yaron.11 We still need to specify the Markov chain, though.
Rather than use a two-state Markov chain, as we’ve been doing, I wanted
the flexibility to add more states, so I used Rouwenhorst’s method.12 I exper-
imented with the number of states ranging from 3 to 101—adding any more
than around 21, though, didn’t change the results much at all.
the technique we learned for two states to many states. The invariant distribution that results is
exactly binomial, so with enough states, it approximates a normal distribution. Kopecky and Suen
[KS10] describe the method, and show that it does a good job of approximating very persistent
AR(1) processes.
109
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
Table 5.1: Parameters for stochastic consumption, long-run risk, and dividend
processes in Bansal-Yaron’s basic model.
ν σ ρ φe νd φ φd
0.0015 0.0078 0.979 0.044 0.0015 3.0 4.5
or raise the gross monthly rate to the 12th power, and subtract one. If you want the average for an
annual period, then, if we’re working with log rates, and they’re i.i.d., the annual mean and variance
are both 12 times the monthly mean and variance. If they’re not i.i.d., then it’s more complicated,
but I didn’t really have the time to work it out before class. The issue may be the approximation,
though. They do have another paper [BKY07], which employs another solution technique, where
they too assume β is higher—β = 0.9989.
110
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
and
h i wd ( j ) + 1
E log( Rd ) = ∑ π ∗
( i ) ∑ P ( i, j ) νd + φx ( i ) + log
wd (i )
i j
ple: we just use the invariant probabilities π ∗ and the means we calculated in
the previous step. For log( Rd ), we take the square root of the sum of (1) the
variance of the Markov chain part (using P and π ∗ ) and (2) the variance of the
i.i.d. part, (φd σ )2 .
To put the return-related quantities into annual average percent
√ terms we
multiply the means by 1200 and the standard deviations by 12 × 100.14 We
leave log(wd ) in monthly units. The results are in Table 5.2; lower-case letters
denote logs, and pd − d denotes log(wd ).
Table 5.2: Some results for the long-run risk model. Parameters for consump-
tion, long-run risk, and dividends are as in Table 5.1. For all cases here,
β = .999.
ν σ ρ φe νd φ φd
0.02 0.006 0.437 5.20 0.01 2.06 15.8
111
5.1. BANSAL-YARON LECTURE 5. PUZZLE RESPONSES, II
β α ψ ν σ ρ φe νd φ φd
0.968 9.34 1.41 0.021 0.012 0.482 0.90 0.018 5.14 3.06
Solve the model and report results for these parameters. Note since
√ this is an
annual model, you only need to multiply things by 100, not 1200 or 12 × 100.
Compare the results in both cases to row 4 of Table 5.2. What might explain
the differences in the results you obtain? (Answer—a lot of things; just try to get
some intuitive feel for how the model works.)
You should use Rouwenhorst’s method for calibrating the Markov chain. I’ll
post the code on the website.
Optional. Investigate the sensitivity of the results to the number of states in
the Markov chain—if you tried n = 3, 5, 7 . . ., is there an n above which the re-
sults stop changing by very much? (Note: using odd numbers of states guarantees
that the mean of zero is one of the states of the Markov chain.)
where it is assumed that et+1 is i.i.d., N (0, 1), and independent of the other i.i.d.
innovations.15
In terms of the pricing framework we’ve developed, σt is now a second
state variable. In particular, Φ( xt ) becomes Φ( xt , σt ), which is decreasing in σt ,
the more so the closer is the persistence parameter ρ1 to one.16 So the pricing
kernel will now price volatility risk in the sense that a high σt+1 realization
means greater ‘hunger’ (if 1/ψ − α < 0).
Bansal and Yaron show that when ψ > 1, the model’s price-dividend ratios
are decreasing in σt , the more so the greater is the persistence of the volatility
process. This means that the consumption and dividend claims are exposed to
15 The standard deviation σ must be chosen to be very small, since we don’t want the process to
e
violate σt2 > 0. Technically speaking, the et+1 ’s shouldn’t really be assumed to be normal.
16 Basically, more volatile consumption is less valuable given a concave certainty equivalent.
112
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
the volatility risk (they’re worth less when the agent is more hungry), so the
time-varying volatility raises their conditionally expected returns when σt is
high.
Of course, variation in σt also affects the risk-free rate, but the effects are
much smaller—thus conditionally expected excess returns vary positively with
σt .
σt will also affect the conditional standard deviation of returns (positively,
of course), so for this mechanism to produce a time-varying Sharpe ratio, the
movements in conditionally expected excess returns need to be larger than the
movements in the conditional standard deviation. That turns out to be the case
for Bansal and Yaron’s calibration of the model.
113
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
that disasters may in fact be as large and as common as Rietz supposed. Given
that we have only short time series for any one country, Barro uses data from
many countries to characterize the probability of large declines in consumption
(actually per capita GDP). You should read through his evidence. He sums
up his findings as implying a disaster probability of between 1.5–2.0% with
associated declines in per capita GDP ranging from 15% to 64%. Wars account
for many of the disaster episodes he catalogs.
Barro’s analysis was subsequently refined by Gourio [Gou08], Gabaix [Gab08],
and others. The model we’ll look at in this section is the one presented in Gou-
rio’s paper. First, though, we’ll look at an approach closer to Rietz’s.
Remark 5.1. Actually, what Rietz did was to recalibrate P and { x (1), x (2)} for
each choice of f and x d , so as to guarantee that the three-state Markov chain ( X, Π)
has mean, standard deviation and first-order autocorrelation consistent with Mehra-
Prescott’s estimates of E( x ) = 1.018, s.d.( x ) = 0.036, and AC1 ( x ) = −0.14. This
has always struck me as the wrong approach. Apart from the Great Depression—which
is big, but not ‘a-25%-decline’ big— the Mehra-Prescott numbers are estimated for an
economy operating in ‘normal’ times. So I think it’s better to use those estimates for
18 Note that since we’re moving on from Bansal and Yaron, we’ll go back to letting xt+1 denote
the gross rate of growth of consumption from t to t + 1.
114
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
the P and { x (1), x (2)} as I’ve put them in the three-state chain above—they describe
the behavior of growth conditional on ‘no disaster’.
Maybe it’s intuitive that disaster risk (which affects the payoff from a claim
to aggregate consumption) raises the equity premium. How, though, does it
keep the risk-free rate at a reasonable level, as Rietz was able to achieve?
If, as I want to suppose, P and { x (1), x (2)} correspond to the Mehra-Prescott
process, then you can see immediately that disasters will lower the risk-free
rate—it will be lower than in the comparable Mehra-Prescott economy with
the same α and β.
If there is no default on the riskless asset in a disaster, then its price in state
i = 1, 2 will be
q ( i ) = Ei ( m )
2
= (1 − f ) ∑ P(i, j) βx ( j)−α + f β( x d )−α
j =1
where qMP (i ) is the riskless asset price from the corresponding Mehra-Prescott
economy. If β( x d )−α is large, even a small f can greatly increase q(i ) compared
to qMP (i ), and thus lower R F (i ) = 1/q(i ) in the ‘normal states’.
In state 3, the disaster state,
1 1
q (3) = βx (1)−α + βx (2)−α
2 2
which one can show is between qMP (1) and qMP (2).
To incorporate default on the riskless asset, we would no longer assume
it pays one unit in every state, but rather (1, 1, 1 − d) in states 1, 2, and 3. The
parameter d ∈ [0, 1] can be thought of as either (a) in a disaster, the issuer of the
riskless claim only pays d, or (b) conditional on a disaster, the issuer defaults
entirely with probability d (and with probability 1 − d pays one unit).
It turns out that, in the Rietz-like model described here, adding default on
the normally riskless asset harms the predictions of the model—unless d is
very small, it will greatly increase the risk-free rate. In terms of equation (5.24),
as d → 1, it’s essentially eliminating the f β( x d )−α term that was holding the
risk-free rate down. The next exercize asks you to explore some of these phe-
nomena.
Exercize 5.2. This should be simple, given that you know how to solve the basic
Mehra-Prescott model, as in Exercize 3.3. Use the Mehra-Prescott process for
{ x (1), x (2); P}. Write a M ATLAB program and solve the model described above
for the following three sets of parameters:
115
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
α β f xd d
9 .97 0 0 0
9 .97 0.01 0.75 0
9 .97 0.01 0.75 0.30
Report the average (unconditional mean) equity premium, risk-free rate and
equity return for each combination (in percent terms). Line one of the table is just
the basic Mehra-Prescott model. You should find that results look pretty good for
the middle row, but bad for the first and last.
Now, keeping f , x d , and d as in the last row of the table, explore the effects of
varying α and β. Are there any combinations that will get your results back in
the range of the nice ones you obtained for the middle configuration? Document
what you find, maybe with a table or graph. I don’t know the answer to this part,
so I’m curious to see what you find.
where ηt+1 ∼ N (0, 1) and b ∈ (0, 1) is the size of the disaster. The disaster
probability f t is assumed to follow a first-order Markov process with transition
probabilities P( f t , f t+1 )—the probability of going from f t to f t+1 —and such
that f t ∈ [ f , f ] for all t. In one case, Gourio assumes the Markov process is
actually i.i.d.—P( f t , f t+1 ) = P( f t+1 ). Consistent with the way I’ve written the
transition probability, we’ll be treating the Markov process as a Markov chain
throughout our discussion of the model.
19 We modify the notation a bit compared to Gourio.
116
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
q ( f t ) = E [ m t +1 : f t ]
2 2
= (1 − f t ) βe−αν+(1/2)(ασ) + f t (1 − b)−α βe−αν+(1/2)(ασ)
2
= 1 − f t + f t (1 − b)−α βe−αν+(1/2)(ασ)
Note that in obtaining this expression (and the expression for q( f t )), we’re
taking expectations with respect to several things that, conditional on f t , are
20 Remember, you’d multiply log( R F ) by 100 to put it in percent terms. For the b and α mentioned
t
in the text, a 0.01 increase in f t , say from 0.01 to 0.02, subtracts roughly 17 percentage points off the
log risk-free rate. Assuming that you can get the level right at some value of f , you would want f t
to have a very small variance around that value.
117
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
This is useful because, given the assumptions about the consumption process,
the expected return on the consumption claim (conditional on f t ) obeys
c c t +1 w c ( f t +1 ) + 1
E R t +1 : f t = E E
: ft
ct wc ( f t )
where
c t +1 2 2
E = (1 − f t )eν+(1/2)σ + f t (1 − b)eν+(1/2)σ
ct
2
= (1 − f t + f t (1 − b)) eν+(1/2)σ
Thus, in this simple version of the model, we can evaluate the conditionally
expected equity return without having to solve for the price-dividend ratios. It
is
2
(1 − f t + f t (1 − b)) eν+(1/2)σ
E Rct+1 : f t =
2
(1 − f t + f t (1 − b)1−α ) βe(1−α)ν+(1/2)((1−α)σ)
1 − f t + f t (1 − b )
−1 αν+(1/2)σ2 −(1/2)((1−α)σ)2
=β e
1 − f t + f t (1 − b )1− α
In log terms,
1 1
log E Rct+1 : f t = − log( β) + αν + σ2 − ((1 − α)σ)2
2 2
1 − f t + f t (1 − b )
+ log (5.28)
1 − f t + f t (1 − b )1− α
If you differentiate the term inside the last log( · ) you’ll find that the expected
return is increasing in f t iff α > 1.
Combining (5.25) and (5.28), we can immediately obtain an expression for
the conditional equity premium, in log terms:
!
E Rct+1
(1 − f t + f t (1 − b)−α ) (1 − f t + f t (1 − b))
2
log : f t = ασ + log
RtF 1 − f t + f t (1 − b )1− α
(5.29)
Not surprisingly, it’s a typical i.i.d. piece (ασ2 ) à la Mehra-Prescott, plus a
disaster-related piece. Using (5.29), we can explore the effects of variation in f t
on the conditional equity premium (as well as the roles of b and α).
118
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
1
1−α
µ ( Φ t +1 ) = ∑ P ( f 0 ) Φ ( f 0 )1− α . (5.32)
f0
Note that these steps utilize the various independences built into the process
for log(ct+1 /ct ). Equation (5.31), for example, uses the fact that ηt+1 is i.i.d.
and independent of the realization of the disaster state. And the realization
of the disaster state, in turn, is independent of the realization of next-period’s
disaster probability.21 In (5.32), I’ve assumed that the distribution for f t+1 is
21 Hence, we’ve really done a µ( xyz) = µ( x )µ(y)µ(z) split of the certainty equivalent.
119
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
whole expression is decreasing in f . On the other hand, if α > 1, then (1 − b)1−α > 1, and the stuff
inside the parentheses is increasing in f , but the whole thing is raised to the 1/(1 − α) < 0 power.
120
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
1−(1/ψ)
(1− ψ1 )θ
1− α
wc ( f t ) = βe Aα−1/ψ 1 − f t + f t (1 − b)1−α
1 −α
× ∑ P( f 0 )Φ( f 0 ) ψ 1 + wc ( f 0 )
(5.35)
f0
Exercize 5.3. Derive the expressions (5.35) and (5.36). Then, show that when
the elasticity of intertemporal substitution, ψ, is equal to one, future utility
(Φ) doesn’t matter for the equity premium—we obtain the same result for
log E( Rct+1 )/RtF as we did in the case of standard preferences—i.e., equation
(5.29). A key step is to show that a constant wc solves (5.35) when ψ = 1.
Gourio shows that if α ≥ 1, then log RtF is decreasing in f t for small values of
ft.
It’s only slightly more complicated to characterize the dependence on f t or
the price-dividend ratio and return on the consumption claim. First, note that
from (5.35)
1−(1/ψ)
1− α
wc ( f t ) = (positive constants) × 1 − f t + f t (1 − b)1−α (5.37)
121
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
1 2 ∑ f 0 P( f 0 )(1 + wc ( f 0 ))
E Rct+1 : f t = eν+ 2 σ (1 − f t + f t (1 − b))
. (5.38)
wc ( f t )
Using (5.37), and collecting together constants (relative to f t ), we can write the
log expected return as
1 − f t + f t ( 1 − b )
log E Rct+1 : f t = constants + log
1−(1/ψ)
(1 − f t + f t (1 − b ) ) 1 − α 1 − α
the one hand, an increase in f t lowers the asset’s expected payoff next period
(the 1 − f t + f t (1 − b) term). But an increase in f t also lowers the asset’s price
1−(1/ψ)
today—i.e., wc ( f t ) falls, which is captured in the 1 − f t + f t (1 − b)1−α 1−α
term. Whether the expected return increases or decreases will depend on whether
the expected falls by less or more than the fall in price. If ψ < 1, the answer is
always ‘by less’. If ψ > 1, the answer depends on parameter values and on f t .
It turns out that, for α sufficiently large,ψ need only be a bit bigger than
one to obtain the result that log E Rct+1 : f t goes up when f t goes up. Note
that the increase in log E Rct+1 : f t in that case is entirely due to a wider risk
their assumption—is pushing the envelope, as far as EIS values that most people would consider
plausible.
122
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
c
Region giving log E(Rt+1) increasing in ft
2.5
2
(EIS)
1
2 3 4 5 6 7 8 9 10
(RRA coefficient)
Figure 5.5: Combinations of α and ψ that produce a log expected equity return
that is increasing in the probability of disaster. The pairs above the curve have
this property. The calculations were made assuming b = 0.43 and f t = 0.012.
From the model of Gourio [Gou08], with EZ preferences and f t ∼ i.i.d.
that we’d labelled A—that is, µ (Φ)—is no longer a constant. Rather, it’s now
1
1− α
µ ( Φ : f t ) = ∑ P ( f t , f 0 ) Φ ( f 0 )1− α .
f0
1 −α 1 −α
Also, expressions like ∑ f 0 P( f 0 )Φ( f 0 ) ψ , ∑ f 0 P( f 0 )Φ( f 0 ) ψ (1 + wc ( f 0 )), and
∑ f 0 P( f 0 ) (1 + wc ( f 0 )) become
1 −α
∑0 P( f t , f 0 )Φ( f 0 ) ψ ,
f
1 −α
∑0 P( f t , f 0 )Φ( f 0 ) ψ 1 + wc ( f 0 ) ,
f
123
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
and
∑0 P( f t , f 0 ) 1 + wc ( f 0 ) ,
f
resulting in obvious modifications of the key equations (5.35), (5.36), and (5.38).
While these modifications are not especially difficult to make, they do make
it impossible to draw any conclusions about the model on a purely analytical
basis, as Gourio did with the first two versions of the model. That makes it
necessary to solve the model computationally. Gourio assumes a simple, sym-
metric two-state Markov chain for f t — f t ∈ { f l , f h }, and
1−π π
P=
π 1−π
124
5.2. CONSUMPTION DISASTERS LECTURE 5. PUZZLE RESPONSES, II
125
Lecture 6
So far, we’ve worked with models where we’ve priced either infinitely-lived
equity or one-period bonds.
126
References
[Abe90] Andrew B. Abel. Asset prices under habit formation and keeping up
with the Joneses. American Economic Review, 80(2):38–42, 1990.
[Bar06] Robert J. Barro. Rare disasters and asset markets in the twentieth
century. Quarterly Journal of Economics, 121(3):823–866, 2006.
[BDL93] Robert B. Barsky and J. Bradford De Long. Why does the stock mar-
ket fluctuate? The Quarterly Journal of Economics, 108(2):291–311,
1993.
[Ber98] Michele Bernasconi. Tax evasion and orders of risk aversion. Journal
of Public Economics, 67:123–134, 1998.
[BG69] William A. Brock and David Gale. Optimal growth under factor aug-
menting progress. Journal of Economic Theory, 1(3):229–243, 1969.
[BKY07] Ravi Bansal, Dana Kiku, and Amir Yaron. Risks for the long run:
Estimation and inference. Unpublished manuscript, 2007.
[BL02] Ravi Bansal and Christian Lundblad. Market efficiency, asset returns,
and the size of the risk premium in global equity markets. Journal of
Econometrics, 109(2):195–237, 2002.
[BPR78] Charles Blackorby, Daniel Primont, and R. Robert Russell. Duality,
Separability, and Functional Structure: Theory and Economic Applica-
tions. North-Holland, 1978.
127
REFERENCES REFERENCES
128
REFERENCES REFERENCES
[Gab08] Xavier Gabaix. Variable rare disaster: A tractable theory of ten puz-
zles in macro-finance. American Economic Review, 98(2):64–67, 2008.
[Gou08] François Gourio. Time-series predictability in the disaster model.
Finance Research Letters, 5(4):191–203, 2008.
[Gra08] Liam Graham. Consumption habits and labor supply. Journal of
Macroeconomics, 30(1):382–395, 2008.
[Gul91] Faruk Gul. A theory of disappointment aversion. Econometrica,
59(3):667–686, 1991.
[Guv09] Fatih Guvenen. A parsimonious macroeconomic model for asset
pricing. Econometrica, 77(6):1711–1750, 2009.
[Hic35] John R. Hicks. A suggestion for simplyfying the theory of money.
Economica, pages 1–19, February 1935.
[Jer98] Urban Jermann. Asset pricing in production economies. Journal of
Monetary Economics, 41(2):257–275, 1998.
129
REFERENCES REFERENCES
[LN07] Sydney C. Ludvigson and Serena Ng. The empirical risk-return re-
lation: A factor analysis approach. Journal of Financial Economics,
83(1):171–222, 2007.
[LP81] Stephen F. Leroy and Richard D. Porter. The present value relation:
Tests based on implied variance bounds. Econometrica, 49(3):555–574,
May 1981.
[LP04] Francis A. Longstaff and Monika Piazzesi. Corporate earnings and
the equity premium. Journal of Financial Economics, 74(3):400–421,
2004.
130
REFERENCES REFERENCES
[SS08] Zvi Safra and Uzi Segal. Calibration results for non-expected utility
theories. Econometrica, 76:1143–1166, 2008.
[Sta00] Chris Starmer. Developments in non-expected utility theory: The
hunt for a descriptive theory of choice under risk. Journal of Economic
Literature, 38:333–382, 2000.
[SW03] Frank Smets and Raf Wouters. An estimated dynamic stochastic gen-
eral equilibrium model of the euro area. Journal of the European Eco-
nomic Association, 1(5):1123–1175, 2003.
131
REFERENCES REFERENCES
[Wei89] Philippe Weil. The equity premium puzzle and the risk-free rate puz-
zle. Journal of Monetary Economics, 24(3):401–422, 1989.
[Wei90] Philippe Weil. Non-expected utility in macroeconomics. Quarterly
Journal of Economics, 105(1):29–42, 1990.
[Yaa87] Menachem Yaari. The dual theory of choice under risk. Econometrica,
55:95–115, 1987.
132
Appendix A
An introduction to using
M ATLAB
A.1 Introduction
M ATLAB is a matrix-based numerical calculation and visualization tool. It is
much like an extremely powerful calculator, with the ability to run scripts—i.e.,
programs—and generate high-quality plots and graphs. It is also an extremely
easy package to learn.
When you start up M ATLAB by double-clicking on the M ATLAB icon, the
‘M ATLAB desktop’ opens. The layout you see will depend on who’s used it
last, and whether they’ve tinkered with the desktop layout. At the very least,
you’ll see the Command Window and (maybe) views of the workspace, file di-
rectory, or a list of recently used commands. M ATLAB uses standard Windows
conventions—e.g., typing A LT +F brings down the F ILE menu, A LT +E brings
down the E DIT menu, etc.
The ‘prompt’ in the Command Window—the spot where you type in commands—
looks sort of like this: >>.
One way to familiarize yourself quickly is to enter demo or help at the
prompt. Entering demo gives you access to a video tour of M ATLAB’s features
(enter demo 'matlab''getting started' to see a menu of videos about basic
features). Entering help gives you a long list of topics you can get help on.
133
A.2. CREATING MATRICES APPENDIX A. USING MATLAB
2 × 3 matrix A which is
1 2 3
A=
4 5 6
You’ll note that if you enter A = [1 2 3;4 5 6];—i.e., this time ending the
line with a semi-colon—M ATLAB apparently does nothing. This is not the
case. M ATLAB still records that A now denotes the above matrix; the semi-
colon at the end of the line simply tells M ATLAB to suppress displaying the
result. You’ll want to do this in most cases, especially with large matrices or
vectors, or when you are running an iterative program—displaying the execu-
tion of each line greatly slows the program down.
The colon (‘:’) is useful for creating certain types of vectors: in M ATLAB,
b = 1:5; produces a row vector b = (1, 2, 3, 4, 5). Using the colon, you could
write our matrix A above by typing A = [1:3; 4:6];
Matlab also has several built-in functions for creating specialized matrices,
among them: zeros (matrix of zeros), ones (matrix of ones), eye (identity ma-
trix), and rand (matrix of pseudorandom variables drawn from a uniform dis-
tribution on [0, 1]).1 The syntax for all these functions is the same: zeros(N,M)
creates an N × M matrix of zeros, and zeros(N) makes a square N × N matrix
of zeros. For these, or any function, typing help function name will display help
related to the function function name.
Having created a matrix, say A, you can call an element of it—say the (2, 1)
element—by typing A(2,1). If A is the matrix described above, typing A(2,1)
at the command line returns ans = 4 (‘ans’ is Matlab’s shorthand for ‘an-
swer’). Entering A(2,1) is treated as the question ‘What’s the 2–1 element of
A?’, the answer of which is four. If A is still the matrix
1 2 3
A=
4 5 6
entering A(3,1) will produce an error message, ‘Index exceeds matrix dimensions’,
since A only has two rows and we have asked about the first element in its (non-
existent) third row.
You can change an element of a matrix by giving it a new value: if you type
A(2,1)=0, Matlab returns
1 2 3
A=
0 5 6
You can call or assign values to all the elements of a row or column or
some submatrix of a matrix using the colon. In M ATLAB, A(i:j,h:k) is the
submatrix of A consisting of A’s rows i through j and columns h through k,
A(i,:) is the ith row, and A(:,k) is the kth column.
Type A(1,:) and Matlab will return ans = [1 2 3], all the elements of the
first row. Likewise, A(:,2) returns
2
ans =
5
1 Purely deterministic computers can’t create real random variables, but they can create things
134
A.3. BASIC MATRIX OPERATIONS APPENDIX A. USING MATLAB
all the elements of the second column. A(1:2,2:3) returns the 2 × 2 submatrix
2 3
ans =
5 6
In the same way, entering A(2,:)=ones(1,3) will change the second row
of A to a row of ones, and returns
1 2 3
A=
1 1 1
Other transformations which are occasionally useful are those that reorient
a matrix or vector in various ways. For example, if you enter x=1:5 you will
get the vector
x= 1 2 3 4 5
If you then enter y=fliplr(x) you’ll get the vector
y= 5 4 3 2 1
—fliplr is short for ‘flip left-to-right’. More information about these is avail-
able by entering help elmat, where the ‘elmat’ stands for ‘elementary matrices
and matrix manipulation’.
135
A.3. BASIC MATRIX OPERATIONS APPENDIX A. USING MATLAB
e.g., operators for checking whether x ≥ y or whether x has any non-zero elements and so forth.
3 Calculating inverses using slashes is also faster than using inv. This could be an important
consideration if you have an iterative program that needs to calculate a large number of inverses.
Just as a test, I made a random 100 × 100 matrix, and calculated its inverse 1000 times, first as
inv(A), then as eye(100)\A. The amount of time required for the slash method was just 3% of the
time required for the inv method.
136
A.3. BASIC MATRIX OPERATIONS APPENDIX A. USING MATLAB
If this is the true underlying process generating the data, then realizations—our
observations—should obey the regression equation above. The point of least-
squares regression analysis is to use the observations X and y to construct an
estimate of the (unobserved) vector of coefficients β. In a purely mathematical
sense, what least squares does is pick β to minimize the Euclidean distance
between the vector y ∈ R N and the subspace of R N generated by the k columns
of X.4 The least-squares estimate of β—call it β̂ OLS —is thus the solution to the
following quadratic minimization problem:
If you multiply the quadratic form out, you’ll see that the problem can be writ-
ten as
min y> y − 2b> X > y + b> X > Xb
b
The first-order condition is
or
X > Xb = X > y,
which gives
β̂ OLS = ( X > X )−1 X > y
if X > X is nonsingular. In any case, in M ATLAB one can perform OLS regres-
sion easily. If y and X are matrices containing your observations of the de-
pendent and independent variables, then entering beta = inv(X'X)*X'y, or
beta=(X'X)\X'y will yield the vector of OLS estimates.
A good exercize to try is to create some data X and y and apply the re-
gression formula to it. Try X = [ones(50,1),2*rand(50,1)] and set the true
beta as beta true = [5;5]. Make y by adding some normally distributed
disturbances to the conditional mean X*beta true. M ATLAB’s randn func-
tion makes N (0, 1) pseudorandom variables, so let’s create y = X*beta true
+ 2*randn(50,1). The mean of X should be about [1, 1], so the conditional
mean of our y will be about 10 and the disturbances will then have a standard
deviation about 20 percent of the mean. Now use either of the expressions from
the last paragraph to calculate the OLS beta, and compare it to beta true. It
should be different, but close.
4 If x and x are two vectors in Rn , the subspace generated by x and x is the set of all vectors
1 2 1 2
of the form αx1 + βx2 for α, β ∈ R. So, {z = Xb : b ∈ Rk } is the subspace in R N one gets from
taking all possible linear combinations—with coefficients b = (b1 , b2 , . . . bk )—of the columns of X.
137
A.4. ARRAY OPERATIONS APPENDIX A. USING MATLAB
1 2 1
eA = I + A + A + A3 + · · ·
2! 3!
These functions don’t get used too often in most of the applications we nor-
mally do.
Help on these topics can be had by entering help ops for basic operations
and help elfun for elementary functions like logs and exponentials. As usual,
just typing help gives you a list of specific categories about which you can look
for help.
138
A.6. STRUCTURE ARRAYS APPENDIX A. USING MATLAB
139
A.8. MAX AND MIN APPENDIX A. USING MATLAB
—then eig(A) will calculate the eigenvalues of A and display them. The eig
function is one which can have multiple outputs. If you enter [P,D] = eig(A);
M ATLAB calculates both the eigenvectors and eigenvalues of A. The eigenvec-
tors get stored in the matrix P, as P’s columns, and the eigenvalues get stored
in D, where D is a diagonal matrix with the eigenvalues on the diagonal. The
matrices are arranged so that eigenvalues and eigenvectors are matched in the
sense that the eigenvector associated with the eigenvalue in the ith column of
D is in the ith column of P.
Try entering [P,D]=eig(A) for the 2 × 2 matrix A defined above; then, com-
pare A*P(:,1) with D(1,1)*P(:,1). They should be the same, or at least their
difference should fairly close to zero (on the order of 10−16 ), allowing for some
imprecision in the calculations.
and
y = 25 16 9 4 1 0 1 4 9 16 25
—then [M,m] = min(y), you should get that M = 0 and m = 6. If you en-
ter [M,m] = max(y), you’ll find that when M ATLAB is ‘indifferent’—here both
y(1) and y(11) are maxima—it opts for the first occurrence in the vector, in
this case returning M = 25 and m = 1.
When max and min are applied to matrices, the operation described above
(by default) looks for the max and min in each column of the matrix—so it’s
operating along, or searching up and down, the rows of each column. Thus,
if A is an n × k matrix, [M,m] = max(A) finds the biggest element in each of
the k columns of A and stores them in M which is a 1 × k row vector. For each
column, the row number where the maximum in that column occurs is stored
in m. Thus, if
1 4
A=
2 3
140
A.9. SPECIAL SCALARS APPENDIX A. USING MATLAB
than some critical size are effectively infinite while numbers smaller than some size are regarded
as zero. On the computer I’m using right now, for example, anything much over 1.7e+308 is just
Inf, and anything less than around 4.9e-324 is just 0. Note that machine precision varies with
the order of magnitude of the numbers you’re working with. On most systems, the next biggest
number than 0 which M ATLAB can recognize is 4.9407e-324. The difference between 1 and the next
biggest number than 1 (called eps) is 2.2204e-016. Try entering 1+eps==1; M ATLAB will return
a 0, indicating the statement is false. Now try 1+eps/2==1; M ATLAB returns a 1, indicating the
statement is true. But, enter eps/2>0, and you’ll see that that’s true, too.
6 Like with zeros or ones, NaN(N,M) creates an N × M matrix of NaN’s. These are sometimes
useful in programming.
141
A.10. LOOPS AND SUCH APPENDIX A. USING MATLAB
1 1 1 1 1
1+ + + + +···+
1! 2! 3! 4! n!
142
A.10. LOOPS AND SUCH APPENDIX A. USING MATLAB
for some n. How big an n do we need to get within some e of e?—or, rather,
how big an n do we need to get within e of M ATLAB’s approximation to e,
since computers can’t handle nonterminating, nonrepeating decimals either.
In M ATLAB, e is exp(1). What our while loop will do is start with e = 1 and
add terms of the form 1/n! for n = 1, 2, . . ., continuing until the difference
between our e and M ATLAB’s exp(1)is less than some e.
Let me describe roughly how the loop will work, and then show you how
to execute it in M ATLAB. There are two things we’ll need to keep track of, the
value of e and the value of n. We’ll also want to calculate n!, which we can
do by introducing a third variable—call it x—which will begin at x = 1 and be
updated to n times its current value at each pass through the loop. Thus, on the
first pass it will be 1; on the second (n = 2), it will be 2 · 1; on the third (n = 3),
it will be 3 · 2 · 1; and so on. Starting with e = n = x = 1, each pass through
the loop will perform the following commands: e = e + (1/x ); n = n + 1; and
x = nx. The ‘while’ statement will be such that this will go on as long as e
is more than e away from exp(1). If you think about the commands I listed,
you’ll see that the first pass will set
1
e = 1 + (1/1) = 1 + ,
1!
n = 1 + 1 = 2,
and
x = 2 · 1 = 2!.
The next pass will set
1 1
e = 1 + 1 + (1/2!) = 1 + + ,
1! 2!
n = 2 + 1 = 3,
and
x = 3 · 2! = 3!
and so forth.
So, this loop will do exactly what it is supposed to do. Now, what about
the while statement. Suppose our e is 10−15 , which can be entered as 1e-15
in M ATLAB. Our while statement will be: while abs(e-exp(1))>1e-15;. The
‘abs’ is for absolute value—our loop will go on as long as the absolute value of
the difference between e and exp(1) exceeds 10−15 . Don’t worry—this doesn’t
take long either.
All that said, we’ll now write out the steps as we would perform them in
M ATLAB. First, we need our ‘initial conditions’:
e = 1;
n = 1;
x = 1;
143
A.10. LOOPS AND SUCH APPENDIX A. USING MATLAB
144
A.11. PROGRAMMING APPENDIX A. USING MATLAB
You’ll notice that, just as with the ‘while’ loop, as soon as you enter the
‘for’ statement here, the prompt disappears until the loop is completed with
an ‘end’ statement. You can see what x looks like by entering plot(x) at the
prompt. It’s interesting to compare what the disturbances xi look like as com-
pared to x. One way to put both on the same plot is by using M ATLAB’s hold
command. If you enter plot(xi), then hold, you’ll get a message like ‘Current
plot held’, which means that any additional plot statements will superimpose
their plots on the existing plot. Alternatively, create a vector t=(1:100); Then
enter the command plot(t,xi,t,x(2:end)). The ‘x(2:end)’ is the second
through last elements of x—that is, the 100 values after x0 that we created with
our loop.
On your own, you could play around a little bit and see what a random
walk looks like—set ρ = 1—and also how difficult it must be to distinguish
between that sort of process and the AR(1) with ρ, say, equal to .98. You can
also see what a what a random walk with drift looks like—the process is
x t = δ + x t −1 + ξ t
where δ is the ‘drift’ term. In terms of the loop, you would just change the
‘x(t+1) =. . .’ line to ‘x(t+1) = delta + x(t) + xi(t);’ where delta is what-
ever you want to set the drift to—you could put an actual number there, or
preface the whole loop with a definition delta = [number], then write the line
exactly as I have written it here.
145
A.11. PROGRAMMING APPENDIX A. USING MATLAB
x0 = 0;
sigma = .035;
xi = sigma*randn(1,T);
x = [x0,zeros(1,T)];
for t = 1:T;
x(t+1) = rho*x(t) + xi(t);
end;
figure;
plot(x);
This program will generate the AR (1) process we looked at above. The
only slight differences are that we’ve created variables for T, ρ, x0 and σ. This
will make it easy to go back and change any of their values, since we’ll only
have to change it one place. Another difference is the line ‘figure;’ which
will open a new figure window before the plot commands are executed. We’ll
need this if we want to run the program a couple of times and compare the
figures. You will have noticed by now that if no figure windows are open, a
‘plot’ command opens a new figure window. What you may not have noticed
is that if a figure window is already open, a ‘plot’ command will put its plot in
the open window, replacing whatever was in that window before. If multiple
figure windows are open, the plot command will puts its plot in whichever
window is the ‘current’ one—i.e., whichever one was most recently viewed.
See help figure for more information. In any case, if we do not want our
program to disrupt any of our existing figures, we will want it to open a new
window before it executes the plot command.
Now you want to leave the editor and go back to the command window, but
first you’ll want to first save your program. From the editor’s F ILE menu, select
S AVE. You’ll need to give the file a name, and the extension must be ‘.m’—i.e.,
if I wanted to name this program ‘ar1maker’, I would enter ‘ar1maker.m’ as
the filename. I assume that the default directory where the editor will save the
file is somewhere on M ATLAB’s ‘path’—that is, in a place where M ATLAB can
find it. If you get a message that a file with this name already exists—which
is very likely if you all run out and try and name it ‘ar1maker.m’—please be
courteous, and do not overwrite the existing file (unless it’s your own). Come
up with a new name—maybe ar1maker2.m or ar1maker3.m. Whatever name
you come up with—let’s say it’s ‘ar1maker.m’—after you save it and return
to M ATLAB’s command window, you simply type ar1maker at the prompt (no
need for the ‘.m’). Your program will then be run. You’ll know that it worked
if a figure window opens up and you see xt plotted.
You can now check out one of the main advantages to writing scripts—
the ability to go back and change some parameter quickly and easily, and re-
execute the commands, without a lot of needless typing. To tinker with your
program, just open the file using the F ILE menu. You might change the value
of ρ or T, save the changes to the file, and run the program again.
146
A.12. LAST TIPS APPENDIX A. USING MATLAB
147
A.12. LAST TIPS APPENDIX A. USING MATLAB
and the whole area of logical operations. You can learn more about these topics
from M ATLAB’s help facility.
As you write increasingly complex programs, an important lesson to keep
in mind is that M ATLAB is designed to do array operations really, really fast.
Things like ‘for’ loops—well, not so much. Anywhere you can replace a loop
with an array operation will greatly speed up your code.
Also, look around for what’s available on the internet to learn from. Most
researchers who use M ATLAB will post their programs on their websites along
with their papers. Read how other people have solved problems or written
programs to perform different tasks. If you borrow someone’s code, make sure
to give them credit in your work, and—as important—make sure you under-
stand what their code is doing, if you’re going to use it. You don’t need to
always be reinventing the wheel (unless you’re assigned to reinvent a wheel
as homework), but you don’t want the code you use to just be a ‘black box’,
either.
148