0% found this document useful (0 votes)
11 views63 pages

Section All

Uploaded by

morganadero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views63 pages

Section All

Uploaded by

morganadero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Economics 210 Section Notes

Luke C.D. Stein


Autumn 2009∗

Contents
1 Welcome 3

2 A taxonomy of economic models 3

3 Some thoughts about utility functions 4

4 Some math reminders 4


4.1 Taylor approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 The envelope theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Introduction to sequence problems 7


5.1 Two-period saving problem in partial equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Neoclassical Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

6 Balanced growth practice question 8

7 Introduction to dynamic programming 11


7.1 Where do functions live? A Metric space! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.2 Convergence in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.3 How will convergence help us? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.4 Appendix: Additional proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

8 Algorithms for solving dynamic programming problems 16


8.1 “Guess and verify” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8.2 Value iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8.3 Policy iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8.4 Continuous spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

9 From the steady state towards explicit dynamics 18


9.1 Taylor approximation (a.k.a. “linearization”) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9.2 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
These notes are based on (and at points copy without explicit citation from) notes by previous TAs Max Floetotto,
Marcello Miccoli, Romans Pancs, and William “Gui” Woolston; Manuel Amador and Nir Jaimovich’s slides and solution sets;
Krueger, Macroeconomic Theory (manuscript), section 2.3; Ljungqvist and Sargent, Recursive Macroeconomic Theory, chapter 8;
Stachurski, Economic Dynamics: Theory and Computation, chapters 3 and 6; Stokey, Lucas, and Prescott, Recursive Methods
in Economic Dynamics, section 3.1; and the Wikipedia “Eigenspace” page.
∗ This version updated April 19, 2013.

1
10 Introducing competitive equilibrium 21

11 Comparing planners’ outcomes with competitive equilibria 21


11.1 Asset-free economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
11.2 Partial equilibrium asset market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
11.3 General equilibrium asset markets: social planner . . . . . . . . . . . . . . . . . . . . . . . . . 22
11.4 General equilibrium sequential asset markets: competitive equilibrium . . . . . . . . . . . . . 22
11.5 General equilibrium Arrow-Debreu asset market: competitive equilibrium . . . . . . . . . . . 23

12 Pareto Optimality 24
12.1 A simple Pareto problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
12.2 A simple social planner problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

13 Competitive equilibrium practice question 25

14 Constant returns to scale production 31

15 Continuous-time optimization 32
15.1 Finite-horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
15.2 Infinite time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
15.3 The Hamiltonian “cookbook” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
15.3.1 One control, one state variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
15.3.2 Multiple control or state variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
15.4 Current-value Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

16 Log-linearization 37

16.1 Why are approximations in terms of x̂ ≡ x−x x∗ called “log-linear”? . . . . . . . . . . . . . . . 37
16.2 Log-linearization: first approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
16.3 Log-linearization: second approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

17 Log-linearizing the NCGM in continuous time 40

18 Optimal taxation 42
18.1 The Ramsey model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
18.2 The Primal approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

19 Introducing uncertainty 45
19.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
19.2 Utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

20 Markov chains 46
20.1 Unconditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
20.2 Conditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
20.3 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
20.4 Ergodic distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

21 Risk-sharing properties of competitive markets 49

22 Perfect and imperfect insurance practice question 51

23 Asset pricing with complete markets 54

24 Introducing incomplete markets 55

2
25 Econometrics of incomplete markets practice question 56

26 Hall’s martingale hypothesis with durables practice question 58

27 General equilibrium in incomplete markets 59


27.1 A constant absolute risk aversion example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
27.2 Aiyagari (1994) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

28 Where we have been: a brief reminder 62

1 Welcome
• email: [email protected]

• homepage: http://www.lukestein.com
• Office Hours: Wednesdays, 3:15–5:05 in Landau Economics room 350
• Sections: Fridays, 12:15–2:05 in building 240 room 110

2 A taxonomy of economic models


1. Static Focus of Microeconomics core.
Dynamic Focus of Macroeconomics core.
2. Deterministic e.g., x1 = 6, x2 = 4; xt = xt−1 + 1 with x0 = 0 (a recursive sequence defined by a
difference equation; if the subscripts index time, we also call this a dynamical system). Nir’s class.
IID IID
Stochastic e.g., xt ∼ {H, T } with probability 12 each; xt = xt−1 + εt with εt ∼ N (0, 1) and x0 = 0
(a stochastic recursive sequence). Manuel’s class.
3. Finite horizon (discrete) Use Kuhn-Tucker (i.e., Lagrangians).
Infinite horizon (discrete) as Sequence Problem: Kuhn-Tucker; as Functional Equation: dynamic
programming.
Continuous time Use Hamiltonians.
4. Discrete state/choice spaces Good for computer.
Continuous spaces Good for mathematical analysis.

5. Steady state
Explicit dynamics Use linearization (Taylor approximation) and “diagonalization” (eigenvalue de-
composition) or phase diagrams.
6. One “agent” Either a true single agent (partial equilibrium problems) or a social planner.
Many agents interacting through markets; competitive equilibrium either in period-by-period (i.e.,
sequential) markets or in one big market at t = 0 (Arrow-Debreu).

3
3 Some thoughts about utility functions
Consider our canonical utility function (for problems with discrete time):
T
 X
U {ct }Tt=0 = β t u(ct ),
t=0

where T may be infinite. The function u(·) is called the felicity or instantaneous utility function. This
utility function has several important properties, including:

Time separability The period utility at time t only depends on consumption at time t. For example, there
is no habit persistence.
Exponential discounting β constant and β < 1 mean the agent values consumption today more than
consumption tomorrow, with a constant “strength of preference” for consumption sooner vs. later.
Stationarity The felicity function is time invariant. Thus when T = ∞, the utility function evaluated over
future consumption looks the same from any point in the future.

Most of the time we use one of a small number of felicity functions:


• u(x) = xα .
x1−σ −1
• u(x) = 1−σ ; this represents the same preferences as the previous example.
• u(x) = log(x); the previous example approaches log utility as σ % 1.
These felicity functions give rise to additional useful properties of U (·), including:
Strict monotonicity U (·) is strictly increasing in ct for all t.
Continuity Mathematically convenient.
Twice continuous differentiability Mathematically convenient.
Strict concavity ∂ 2 U/∂c2t < 0 for all t, corresponding to decreasing marginal utility.
Inada conditions limct &0 ∂U/∂ct = +∞ and limct %+∞ ∂U/∂ct = 0 for all t, assuring that optimal ct ∈
(0, +∞) for all t.1
Homotheticity Meaning that {ct }t % {c̃t }t if and only if {λct }t % {λc̃t }t , implying that the units in which
consumption are measured don’t matter (in many models).
Constant relative risk aversion (and elasticity of substitution). The Arrow-Pratt coefficient of relative
risk aversion −ct u00 (ct )/u0 (ct ), and the intertemporal elasticity of substitution εct+1 /ct ,1/R do not depend
on t, ct , or ct+1 .
In particular, we will basically always assume the first four conditions, which help ensure that we can
solve optimization problems using the Kuhn-Tucker (i.e., Lagrangian) algorithm.

4 Some math reminders


Including these here is a bit ad hoc, but the following are three mathematical concepts you need to be familiar
with.
1 In all, there are six Inada conditions. In addition to the two listed limit conditions, they are (1) continuous differentiability,

(2) strict monotonicity, (3) strict concavity, and (4) U (~0) = 0.

4
4.1 Taylor approximation
Consider a differentiable real-valued function on (some subset of) Euclidean space, f : Rn → R. The function
can be approximated in the region around some arbitrary point y ∈ Rn by its tangent hyperplane.
If f : R → R, this approximation takes the form

f (x) ≈ f (y) + f 0 (y)(x − y).

If f : Rn → R, this approximation takes the form


n
X
fi0 (y)(xi − yi ) = f (y) + ∇f (y) · (x − y),
 
f (x) ≈ f (y) +
i=1

where · is the vector dot product operator.

4.2 Concave functions


Gui Woolston has an excellent note on this subject, on which this section is based.
The following are all necessary and sufficient (i.e., equivalent) conditions for concavity of a twice-
differentiable (real-to-real) function f : R → R: for all x, y ∈ R,
1. “Mixtures” give higher values than “extremes”: for all α ∈ [0, 1],

f αx + (1 − α)y ≥ αf (x) + (1 − α)f (y);

2. f 00 (x) ≤ 0;
f 0 (y)−f 0 (x)
3. x ≥ y if and only if f 0 (x) ≤ f 0 (y), which can also be stated compactly as y−x ≤ 0 or (y −
x) f 0 (y) − f 0 (x) ≤ 0; and


4. f (·) lies below its first-order Taylor approximation:

f (x) ≤ f (y) + f 0 (y)(x − y) .


| {z }
Taylor approx. at x about y

Also, for all concave functions (not just differentiable ones), a local maximum is a global maximum. And
a concave function must be continuous on the interior of its domain (although it need not be continuous on
the boundaries).

4.3 The envelope theorem


Envelope theorems relate the derivative of a value functions to the derivative of the objective function. Here
is a simple envelope theorem for unconstrained optimization:

v(q) = max f (x, q)


x

= f x∗ (q), q
dv ∂f  ∂f  ∂x∗
= x∗ (q), q + x∗ (q), q ·
dq ∂q ∂x
| {z } ∂q
=0 by FOC
∂f 
= x∗ (q), q .
∂q
The fact that the derivative of the envelope equals the derivative of the objective function holding the
choice variable fixed is illustrated for v(z) ≡ maxx −5(x − z)2 − z(z − 1):

5
y

0.25

0.2

0.15

0.1

0.05

z
0.2 0.4 0.6 0.8 1

As an exercise, use the first-order and envelope conditions for the functional equation form of the NCGM
(given in equation 8) to derive its intereuler. The derivation should really include an argument that the
optimal k 0 is interior to the feasible set.
A more complete envelope theorem for constrained optimization is:
Theorem 1. Consider a constrained optimization problem v(θ) = maxx f (x, θ) such that g1 (x, θ) ≥ 0, . . . ,
gK (x, θ) ≥ 0.
Comparative statics on the value function are given by:
K
∂v ∂f X ∂gk ∂L
= + λk =
∂θi ∂θi x∗ ∂θi x∗ ∂θi x∗
k=1
P
(for Lagrangian L(x, θ, λ) ≡ f (x, θ) + k λk gk (x, θ)) for all θ such that the set of binding constraints does
not change in an open neighborhood.
Roughly, this states that the derivative of the value function is the derivative of the Lagrangian.
Proof. The proof is given for a single constraint (but is similar for K constraints): v(x, θ) = maxx f (x, θ)
such that g(x, θ) ≥ 0.
Lagrangian L(x, θ) ≡ f (x, θ) + λg(x, θ) gives FOC

∂f ∂g ∂f ∂g
+λ = 0 ⇐⇒ = −λ (1)
∂x ∗ ∂x ∗ ∂x ∗ ∂x ∗

where the notation
 ·|∗ means “evaluated at x ∗ (θ), θ for some θ.”
If g x∗ (θ), θ = 0, take the derivative in θ of this equality condition to get

∂g ∂x∗ ∂g ∂g ∂g ∂x∗
+ = 0 ⇐⇒ =− . (2)
∂x ∗ ∂θ θ ∂θ ∗ ∂θ ∗ ∂x ∗ ∂θ θ

∂f
Note that, ∂L
∂θ = ∂θ + λ ∂g
∂θ . Evaluating at (x∗ (θ), θ) gives

∂L ∂f ∂g
= +λ .
∂θ ∗ ∂θ ∗ ∂θ θ

∂f
If λ = 0, this gives that ∂L

∂θ |∗ = ∂θ |∗ ; if λ > 0, complementary slackness ensures g x∗ (θ), θ = 0 so we can
apply equation 2. In either case, we get that

∂f ∂g ∂x∗
= −λ . (3)
∂θ ∂x ∗ ∂θ θ

6
 
Applying the chain rule to v(x, θ) = f x∗ (θ), θ and evaluating at x∗ (θ), θ gives

∂v ∂f ∂x∗ ∂f
= +
∂θ ∗ ∂x ∗ ∂θ θ ∂θ ∗
∂g ∂x∗ ∂f
= −λ +
∂x ∗ ∂θ θ ∂θ ∗
∂L
= ,
∂θ ∗

where the last two equalities obtain by equations 1 and 3, respectively.

5 Introduction to sequence problems


5.1 Two-period saving problem in partial equilibrium
In the first period, you can buy a risk free bonds which return Ra in the second period. Income y1 and y2
are available in the first and second periods, respectively. We get intereuler

u0 (y1 − a) = βRu0 (y2 + Ra),


| {z } | {z }
c1 c2

which can be interpreted as follows: the left-hand side is the marginal benefit of consuming an extra unit
today, the right-hand side is the marginal cost of consuming an extra unit today, comprising
1. R: Conversion from a unit of consumption today to units of consumption tomorrow,
2. u0 (c2 ): Conversion from units of consumption tomorrow to felicits tomorrow,

3. β: Conversion from felicits tomorrow to utils.


Things in other models can interfere with an intereuler like this holding; e.g.,

1. Incomplete markets, such as if R were stochastic;


2. Non-time-separable utility;
3. Budget/borrowing constraints including irreversible investment might prevent the agent from
borrowing or saving all he would like to, resulting in u0 (c1 ) ≷ βRu0 (c2 ).

5.2 Neoclassical Growth Model


The canonical NCGM can be written as

X
max β t u(ct )
{(ct ,kt+1 )}∞
t=0
t=0
s.t. ∀t, ct + kt+1 ≤ F (kt ) + (1 − δ)kt ,
| {z } (4)
≡f (kt )

ct , kt+1 ≥ 0;
k0 given.

7
It can also be written many other ways; e.g., in terms of investment:

X
max∞ β t u(ct )
{(ct ,it )}t=0
t=0
s.t. ∀t, ct + it ≤ F (kt ),
(5)
kt+1 = (1 − δ)kt + it ,
ct , kt+1 ≥ 0;
k0 given.

There is an art to choosing the simplest formulation. We can turn some inequality constraints into equality
constraints with simple arguments (e.g., monotone utility means no consumption will be “thrown away”),
and Inada conditions (together with a no-free-lunch production function) can ensure that non-negativity
constraints will never bind. Further, in deterministic problems, it is usually best to let the choice space be
the state space; here, that means eliminating consumption and investment:

X
β t u f (kt ) − kt+1

max∞
{kt+1 }t=0
t=0 (6)
s.t. k0 given.

The resulting intereulers

u0 f (kt ) − kt+1 = βf 0 (kt+1 )u0 f (kt+1 ) − kt+2


 
| {z } | {z }
ct ct+1

have an interpretation almost identical to that of the two-period saving problem, with f 0 (kt+1 ) playing the
role of R: the marginal rate of transformation between ct+1 and ct .
So do these intereulers give us a solution to the original problem? No. They give a second -order difference
equation (in kt ) with only one additional condition: the initial condition that k0 is given. It turns out we
need another condition, called the transversality condition (TVC), which is sufficient, together with the
intereulers, for a solution (see SL Theorem 4.15):

lim β t u0 (ct )f 0 (kt )kt = 0.


t→∞

The intuition for the TVC is that if you consume too little and save too much, kt and u0 (ct ) will grow so
fast as to overwhelm the shrinking β t and f 0 (kt ). The intuition for the terms is:
1. kt : Amount of capital,
2. f 0 (kt ): Conversion from amount of capital to amount of consumption (at marginal—not average—
product rate),
3. u0 (ct ): Conversion from amount of consumption to felicits at time t,
4. β t : Conversion from felicits at time t to utils.
A final observation about the steady state of the NCGM. At a steady state ct = ct+1 = c∗ and kt+1 = k∗ ,
so the intereuler reduces to f 0 (k∗ ) = 1/β.

6 Balanced growth practice question


This question comes from the Economics 211 midterm examination in 2006. Its content is straightforward,
but successfully completing it requires diligent care—it is very easy to get bogged down.

8
Question
Consider an economy with a measure one number of identical households. Preferences are given by:

X C 1−σ t
.
0
1−σ

Households have L units of labor, which they supply inelastically to firms that produce a consumption
good using technology:
Ct = (ut Kt )α L1−α ,
where Kt is physical capital and ut ∈ [0, 1] is the fraction of the capital stock used to produce consumption
goods.
Investment goods are produced according to:

It = A(1 − ut )Kt ,

where 1 − ut represents the share of the capital stock used to produce capital goods.
The stock of capital evolves according to:

Kt+1 = (1 − δ)Kt + It .

1. Write the problem of a Social Planner and find the Euler equation.
2. For the balanced growth path of this economy, find expressions for the growth rate of Kt , Ct and It in
terms of the balanced growth path level of ut = u∗ .

3. Find the level of ut along the balanced growth path.

Solution
1. Write the problem of a Social Planner and find the Euler equation.
As we have discussed, there are several ways to write the same problem. Although it is tempting to
substitute to get a problem written only in terms of capital, it turns out that here (and in many other
problems with lots of choice variables), the notation can wind up getting very nasty. I started down
that route, but returned to including more Lagrange multipliers instead. The social planner maximizes

X c1−σ
t
max βt s.t.
{kt+1 ,ct ,ut }t
t=0
1−σ
ct = uα α 1−α
t kt L , ∀t;
kt+1 = (1 − δ + A − Aut )kt , ∀t;
ut ∈ [0, 1], ∀t.

With an appeal to an Inada condition on u(·) and the fact that ut = 0 =⇒ ct = 0, we will not worry
about ut ≥ 0 binding. Without argument, we will also ignore ut ≤ 1; this could actually be a problem.
The Lagrangian is
∞ 
c1−σ
X 
βt t + λt (uα α 1−α

L= k
t t L − ct ) + µt (1 − δ + A − Au )k
t t − kt+1
t=0
1−σ

Taking first-order conditions, we get


λt = β t c−σ
t ;

9
β t αc1−σ
t
λt α uα−1
t ktα L1−α = µt Akt =⇒ µt = ; and
| {z } ut Akt
=ct /ut

α−1 α
µt = λt+1 α ut+1 kt+1 L1−α +µt+1 (1 − δ + A − Aut ) =⇒
| {z }
=ct+1 /ut

αc1−σ
β t βαc1−σ
t+1 αc1−σ
t+1
β
t  t+1  t+1
= + (1 − δ + A − Aut+1 )
ut Akt kt+1 ut+1 Akt+1
c1−σ
t βAut+1 c1−σ
t+1 βc1−σ
t+1
= + (1 − δ + A − Aut+1 )
ut kt ut+1 kt+1 ut+1 kt+1
 1−σ
ct ut kt
= β(1 − δ + A) .
ct+1 ut+1 kt+1

Note there α are other possible simplifications of this intereuler, based on the fact that ct /ct+1 =
ut kt
ut+1 kt+1 .

2. For the balanced growth path of this economy, find expressions for the growth rate of Kt , Ct and It in
terms of the balanced growth path level of ut = u∗ .
Note that the production technology for the investment good is it ∝ Akt , which violates Inada conditions:
∂it
lim = A(1 − ut ) 6= 0.
k→∞ ∂kt

This can give us endogenous growth, since decreasing marginal returns never kick in.
Along the balanced growth path, the law of motion for capital gives the growth rate of capital:

γk ≡ kt+1 /kt = 1 − δ + A − Au∗ .

The investment goods production technology gives us the growth rate of investment:

it+1 A(1 − u∗ )kt+1


γi ≡ = = γk .
it A(1 − u∗ )kt

The consumption goods production technology gives us the growth rate of consumption:

(u∗ kt+1 )α L1−α

ct+1 kt+1
γc ≡ = = = γkα .
ct (u∗ kt )α L1−α kt

3. Find the level of ut along the balanced growth path.


Plugging the growth rates we just found, along with ut = ut+1 = u∗ into our intereuler gives

γcσ−1 = β(1 − δ + A)γk−1


(1 − δ + A − Au∗ )1−α(1−σ) = β(1 − δ + A)
  1
1 − δ − β(1 − δ + A) 1−α+ασ
u∗ = 1 + .
A

10
7 Introduction to dynamic programming
We consider “transforming” a sequence problem of the form given in equations 4–6 into a functional equation

V (k) = max
0
[u(c) + βV (k 0 )]
c,k

s.t. c + k 0 ≤ f (k), (7)


c, k 0 ≥ 0.

As in the SP, we can recast the choice space to be the state space (also arguing that the inequality constraints
do not bind):
u f (k) − k 0 + βV (k 0 ) .
  
V (k) = 0 max (8)
k ∈(0,f (k))

While the solution to the SP (as written in equation 4) is a joint sequence {(ct , kt+1 )}∞
t=0 , the solution to
the FE is a value function v(k) and a policy function y ∗ = g(k). Therefore in order to find a solution to the
FE problem we need to understand a few things, namely:
• In what space do functions “live”?

• How do we define distance in this space? Convergence?


Once we know how to solve FEs, we have a few remaining questions.
• Is the solution unique? If so, we expect the solution to also solve the SP (this is ensured by the Principle
of Optimality, but is not the direction we are interested in.)

• If there are multiple solutions to the FE, which one(s) solve(s) the SP?
Why do we want to solve the FE? We are shifting the problem from studying an infinite sequence to
a function. Is this really easier? Analytically the answer is not clear; there are properties of the solution
that are proved more easily using the SP formulation than the FE one. However, there is at least one clear
advantage to a dynamic programming approach: many problems of interest to macroeconomists cannot be
solved analytically. When we need to find numerical solutions, the FE formulation makes things easier.2

7.1 Where do functions live? A Metric space!


Definition 2. A real vector space is a set X, with elements x ∈ X, together with two operations, addition
and scalar multiplication,3 satisfying the following axioms for any x, y, z ∈ X and α, β ∈ R:
Axioms for addition
1. x + y ∈ X (closure under addition),
2. x + y = y + x (commutativity),
3. (x + y) + z = x + (y + z) (associativity),

4. there is a ~0 ∈ X such that x + ~0 = x (identity existence), and


5. there is −x ∈ X such that x + (−x) = ~0 (inverse existence).
Axioms for scalar multiplication
may not be intuitive. The solution to the SP is a vector in R∞ , while the solution to the latter problem is a function
2 This

V : R → R. The dimensionality of the former object is lower (countably infinite vs. uncountably so) but the function can much
more easily be approximated. We do this by somehow restricting its domain, and then determining its value on a discrete grid of
points.
3 Actually there are two operations: left scalar multiplication and right scalar multiplication (with an additional requirement

that these two give the same result).

11
1. α · x ∈ X (closure under scalar multiplication),
2. α · x = x · α (commutativity),
3. (α · β) · x = α · (β · x) (associativity),
4. 1 · x = x (identity existence), and
5. 0 · x = ~0.
Distributive laws
1. α · (x + y) = α · x + α · y, and
2. (α + β) · x = a · x + β · x.
Examples of real vector spaces include:
• Euclidean spaces (Rn ),
• The set of real-valued functions f : [a, b] → R, and
• The set of continuous (real-to-real) functions f : [a, b] → R;
• The set of continuous, bounded (real-to-real) functions f : [a, b] → R; we refer to this set as C[a, b].
Definition 3. A normed vector space is a vector space S, together with a norm k·k : S → R, such that,
for all x, y ∈ S and α ∈ R:
1. kxk = 0 if and only if x = ~0,
2. kα · xk = |α| · kxk, and
3. kx + yk ≤ kxk + kyk.
As an exercise, you can prove that these assumptions ensure that kxk ≥ 0.
Examples of normed vector spaces include:
• On R, absolute values kxk = |x|;
pPn
• On Rn , the Euclidean norm kxk = 2
i=i xi ;
Pn
• On Rn , the Manhattan norm kxk = i=i |xi |;

• On Rn , the sup norm kxk = supi∈{1,...,n} |xi | (a proof that this is a norm is given in the appendix to
this section); and
• On C[a, b], the sup norm kf k = supt∈[a,b] |f (t)|.
If we have a normed vector space, we can also define a metric space, which has a “metric” (also known
as a “distance”) function ρ : S × S → R defined by ρ(x, y) = kx − yk.
It may be useful for each of the example norms given above to sketch a ball: the locus of points in S less
than distance r away from some element x ∈ S
Although every normed vector space can give rise to a metric space, not every metric space can be
generated in this way. The general definition of a metric space follows; it is not hard to prove that the
properties of a norm imply that ρ(x, y) = kx − yk satisfies the properties required of a metric. (A proof is
given in the appendix to this section.)
Definition 4. A metric space is a set S, together with a metric ρ : S × S → R, such that, for all x, y, z ∈ S:
1. ρ(x, y) = 0 if and only if x = y (which implies, with the later axioms, that the distance between two
distinct elements is strictly positive),
2. ρ(x, y) = ρ(y, x) (symmetry), and
3. ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (triangle inequality).

12
7.2 Convergence in metric spaces

Definition 5. Let (S, ρ) be a metric space and {xn }n=0 be a sequence in S. We say that {xn } converges
to x ∈ S, or that the sequence has limit x if

∀ε > 0, ∃N (ε) such that n > N (ε) =⇒ ρ(xn , x) < ε.

That is, a sequence is convergent if its terms get closer and closer to some point x, to the extent that,
given an arbitrary small number ε > 0, we can always find some positive integer N (ε) such that all terms of
the sequence beyond N (ε) will be no further than ε from x.
Checking directly whether a sequence convergences requires knowing its limit. Most of the time, we don’t
know the limit. The definition of Cauchy sequences will help us with that.

Definition 6. A sequence {xn }n=0 in a metric space (S, ρ) is a Cauchy sequence if

∀ε > 0, ∃N (ε) such that ∀m, n > N (ε), ρ(xm , xn ) < ε.

It’s clear that any converging sequence is also a Cauchy sequence, but the converse is not true: not every
Cauchy sequence converges.4 However there is a particular class of metric spaces, called complete metric
spaces in which this converse does hold.
Definition 7. A metric space (S, ρ) is complete if every Cauchy sequence contained in S converges to some
point in S.
Checking completeness is very difficult. We will take it as given that the real line R with the metric
ρ(x, y) = |x − y| is a complete metric space.
Theorem 8. Let X ⊆ Rn , and let C(X) be the set of bounded, continuous functions f : X → R with the sup
norm, kf k = supx∈X |f (x)|. Then C(X) is a complete metric space.
A proof is given in the appendix to this section.

7.3 How will convergence help us?


Consider a metric space S, ρ and a function f : S → S. Define a sequence as follows, for some given s0 ∈ S:

s0 , f (s0 ), f f (s0 ) , . . . .

Clearly, this sequence can also be written as a recursive sequence {sn }∞


n=0 defined by the difference equation
st+1 = f (st ) and the initial condition.
Does this sequence converge? It depends on f (·) and s0 . For example, consider the sequence defined as
above for
• S = R, ρ(x, y) = |x − y|, and f (s) = s + 1: the sequence diverges for any s0 ;

• S = R, ρ(x, y) = |x − y|, and f (s) = s2 : the sequence converges to 1 for s0 = ±1, converges to 0 for
s0 ∈ (−1, 1), and diverges otherwise;

• S = R+ , ρ(x, y) = |x − y|, and f (s) = s: the sequence converges for any s0 (to 0 if s0 = 0 and to 1
otherwise).
4 Consider the metric space S = (0, 1] with the metric derived from the absolute value norm ρ(x, y) = |x − y|. (This is a

metric space, even though S does not form a vector space with regular addition and multiplication.) The Cauchy sequence
1, 12 , 13 , . . . lies in S but does not converge in S.

13
Drawing pictures is useful here: graph f (·) and the 45-degree line on the same axes.5
Here’s the important thing: suppose that for some s0 the sequence converges to s. Then as long as f
is continuous, s is a fixed point of f ; i.e., f (s) = s. Believe it or not, this will help us solve functional
equations.
Recall the FE of equation 8:

u f (k) − k 0 + βV (k 0 )
  
V (k) = 0 max for all k.
k ∈(0,f (k))

Just as our functions f (·) above mapped real numbers to real numbers, we consider an operator T (·) that
maps functions to functions. In particular, let

u f (k) − k 0 + βv(k 0 )
  
T (v) ≡ 0 max
k ∈(0,f (k))

for any function v. This may take a bit to get your head around, but note that T : R → R and T (v) : R → R.
Given this definition of T , we can write our functional equation compactly as V (k) = [T (V )](k) for all k, or
even more simply as V = T (V ).
That means that V , the solution to our function equation, is a fixed point of the operator T . How can we
find such a thing? Our earlier intuition suggests the following strategy:
1. Show the operator T is continuous (we will use this later);
2. Pick some initial function v0 ;
3. Consider the sequence {vn }∞
n=0 defined by vn+1 = T (vn ) and the initial condition;

4. Show that this sequence converges under some distance over function spaces (we use that induced by
the sup norm);
5. Find the thing V to which the sequence converges;
6. Note that continuity of T implies V is a fixed point of T and hence solves the functional equation.

How do we do this? If we can show that T is a contraction (perhaps using Blackwell’s Theorem) then we
are assured continuity, that the resulting sequence is Cauchy (and hence converges by the completeness of
the metric space), and that it converges to the unique fixed point of T from any starting v0 . We will often
have to use numerical methods to actually find the convergence point V , however.

7.4 Appendix: Additional proofs


Proofs of several results covered in this section follow.
Theorem 9. If S and k·k are a normed vector space, then S and ρ(x, y) ≡ kx − yk are a metric space.

Proof. We must show that only identical elements have zero distance, that the distance function is symmetric,
and that it satisfies the triangle inequality.
1. Let S be the vector space, for x, y ∈ S let z = (x − y) ∈ S (we can derive this from the axioms on
vector spaces). Then ρ(x, y) = kx − yk = kzk ≥ 0 by property (1) of the norm. Notice also that we can
say that z = θ if and only if x = y, which implies that ρ(x, y) = 0 if and only if x = y.

2. Again let z = (x − y) ∈ S and −z = −(x − y) = (y − x) ∈ S. Then kx − yk = kzk and ky − xk =


k−zk = |−1|kzk = kzk. Hence kx − yk = ky − xk.
5 Another good exercise is to consider over what subset of R (if any) these functions are contractions. You can check for a set

S ⊆ R (make sure to confirm that f : S → S) using intuition and/or the definition of a contractions.

14
3. We want to show that ρ(x, z) ≤ ρ(x, y) + ρ(y, z) is true with our metric, i.e. that kx − zk ≤ kx − yk +
ky − zk.
Again let t = x − y ∈ S and w = y − z ∈ S. Then kx − zk = kx − y + y − zk = kt + wk ≤ ktk + kwk ≤
kx − yk + ky − zk.
Theorem 10. The sup norm kxk ≡ supi∈{1,...,n} |xi | over Rn is a norm.

Proof. We must show that only the zero element has zero norm, that scalar multiplication can be taken
outside the norm, and that it satisfies a triangle inequality.
1. kxk = supi {|xi |, i = 1, ..., n} = |xj | ≥ 0 for some 1 ≤ j ≤ n.
2. kα · xk = supi {|α · xi |, i = 1, ..., n} = supi {|α| · |xi |, i = 1, ..., n} = |α| · supi {|xi |, i = 1, ..., n} = |α| · kxk.

3. kx + yk = supi {|xi + yi |, i = 1, ..., n} Call the maximal elements x∗ and y∗ . Then: supi {|xi + yi |, i =
1, ..., n} = |x∗ + y∗ | ≤ |x∗ | + |y∗ | ≤ supi {|xi |, i = 1, ..., n} + supi {|yi |, i = 1, ..., n} = kxk + kyk.
Theorem 11. Let X ⊆ Rn , and let C(X) be the set of bounded continuous functions f : X → R with the
sup norm, kf k = supx∈X |f (x)|. Then C(X) is a complete metric space.

Proof. We take as given that C(X) with the sup norm metric is a metric space. Hence we must show that if
{fn } is a Cauchy sequence, there exists f ∈ C(X) such that

for any ε ≥ 0, there exists N (ε) such that kfn − f k ≤ ε for all n ≥ N (ε).

There are three steps:

1. find f ;
2. show that {fn } converges to f in the sup norm;
3. show that f ∈ C(X), i.e., f is continuous and bounded.

So let’s start:
1. We want to find the candidate function f . In what follows, we will indicate a general element of X as x
and a particular element as x0 .
Consider a Cauchy sequence {fn }. Fix x0 ∈ X, then {fn (x0 )} defines a sequence of real numbers. Let’s
focus on this sequence of real numbers (notice that now we are talking about something different than
the Cauchy sequence of functions {fn }): by the definition of sup and of sup norm we can say that:

fn (x0 ) − fm (x0 ) ≤ sup fn (x) − fm (x) = kfn − fm k.


x∈X

But then we know that the sequence {fn } is Cauchy by hypothesis (this time the sequence of functions),
hence kfn − fm k ≤ ε. But then:

fn (x0 ) − fm (x0 ) ≤ sup fn (x) − fm (x) = kfn − fm k ≤ ε.


x∈X

Thus the sequence of real numbers {fn (x0 )} is also a Cauchy sequence, and since R is a complete
metric space, it will converge to some limit point f (x0 ). Therefore we now have our candidate function
f : X → R.

15
2. We want to show that the sequence of functions {fn } converges to f in the sup norm, i.e. that
kfn − f k → 0 as n → ∞.
Fix an arbitrary ε > 0 and choose N (ε) so that, n, m ≥ N (ε) implies kfn − fm k ≤ ε/2 (we know that
we can do this since {fn } is Cauchy).
Now, for any fixed arbitrary x0 ∈ X and all m ≥ n ≥ N (ε),

fn (x0 ) − f (x0 ) ≤ fn (x0 ) − fm (x0 ) + fm (x0 ) − f (x0 ) (by triangle inequality)


≤ kfn − fm k + fm (x0 ) − f (x0 ) (by def of sup and sup norm)
≤ ε/2 + fm (x0 ) − f (x0 ) (by {fn } being Cauchy).

Since {fm (x0 )} converges to f (x0 ) (this is how we constructed f (x0 )), then we can choose m for each
fixed x0 ∈ X so that |fm (x0 ) − f (x0 )| ≤ ε/2. Hence we have:

fn (x0 ) − f (x0 ) ≤ ε.

But since the choice of x0 was arbitrary, this will hold for all x ∈ X, in particular for supx∈X |fn (x) −
f (x)| = kfn − f k. And since also the choice of ε was arbitrary, then we have obtained that for any
ε > 0, kfn − f k ≤ ε, for all n ≥ N (ε).
3. Now we want to show that f ∈ C(X), i.e. that f is bounded and countinuous.
f is bounded by construction. To prove that f is continuous, we need to show that for every ε ≥ 0
andPevery x ∈ X, there exists δ ≥ 0 such that |f (x) − f (y)| ≤ ε if kx − ykE < δ, where kx − ykE =
n
p n
n
i=1 i − yi | is the Euclidean norm on R .
|x
n

Fix arbitrary ε > 0 and x0 ∈ X. Then choose k such that kf − fk k < ε/3 (we can do this since {fn }
converges to f in the sup norm). Then notice that fk is continuous (by hypothesis the sequence is in
C(X), hence it continuous. Therefore there exists δ such that:

kx0 − ykE < δ implies fk (x0 ) − fk (y) < ε/3.

Then, almost as we did before:

f (x0 ) − f (y) ≤ f (x0 ) − fk (y0 ) + fk (x0 ) − fk (y) + fk (y) − f (y) (by triangle inequality)
≤ 2 · kf − fk k + f (x0 ) − fk (y0 ) (by def of sup and sup norm)
<ε (by continuity of fk ).

But since the choice of ε and x0 was arbitrary, this will hold generally, so we have proved our
statement.

8 Algorithms for solving dynamic programming problems


There are a several different algorithms for solving dynamic programming problems like the canonical

V (k) = 0max u(k, k 0 ) + βV (k 0 ) .


 
k ∈Γ(k)

The two algorithms you are expected to “know” for Economics 210 are “guess and verify” and value (function)
iteration. They are discussed below, along with a third algorithm: policy (function) iteration.

16
8.1 “Guess and verify”
This is an analytical algorithm; the two following algorithms will be numerical.
1. Develop an initial “guess” for the value function, V0 (·), with an appropriately parameterized functional
form.
2. Iterate over Vi using
Vi+1 (k) = 0max u(k, k 0 ) + βVi (k 0 )
 
k ∈Γ(k)

until Vi and Vi+1 have the same functional form.


3. Equate the parameters of Vi and Vi+1 so that the two functions are equal; this is a solution to the FE.
This will only work for some u and Γ, and perhaps only with a very good guess for V0 . If we are lucky,
V0 (k) = 0 will work.

8.2 Value iteration


1. Develop an initial “guess” for the value function, V0 (·). If the space is finite and discrete, this guess can
be represented by a vector of V0 evaluated at all the values k can take on.
2. Iterate over Vi using
Vi+1 (k) = 0max u(k, k 0 ) + βVi (k 0 )
 
k ∈Γ(k)

until Vi and Vi+1 are sufficiently close to each other (typically measured by their sup norm distance).
3. Conclude that Vi+1 is an approximate solution to the FE.

For some u and Γ, we may be able to use the contraction mapping theorem to ensure that this process
converges for any initial guess, V0 .

8.3 Policy iteration


This algorithm works explicitly with the optimal policy function

g(k) = argmax u(k, k 0 ) + βV (k 0 ) ,


 
k0 ∈Γ(k)

where V (·) is the solution to the FE.

1. Develop an initial “guess” for the policy function, g0 (·). If the space is finite and discrete, this guess
can be represented by a vector of g0 evaluated at all the values k can take on.
2. Iterate over gi as follows:
• Form Vi from gi by

Vi (k) = u k, gi (k) + βu gi (k), gi (gi (k)) + β 2 u gi (gi (k)), gi (gi (gi (k))) + · · ·
  

X∞
β t u git (k), git+1 (k) .

=
t=0

One way to implement this is to approximate Vi using the first T terms of this sum for some large
T .6
6 The policy function can be expressed as a transition matrix G (containing all zeros, except for a single one in each row), in
i
which case Vi = t β t Gti u for an appropriately formed vector u.
P

17
• Form gi+1 from Vi by
gi+1 (k) = argmax u(k, k 0 ) + βVi (k 0 ) .
 
k0 ∈Γ(k)

Continue iterating until gi and gi+1 are sufficiently close to each other (typically measured by their sup
norm distance).
3. Conclude that Vi+1 is an approximate solution to the FE.
As with value iteration, for some u and Γ we may be able to use the contraction mapping theorem to
ensure that this process converges for any initial guess, g0 .
The contraction mapping theorem can be used to ensure that this process converges for any initial guess.
This may seem like a more complex algorithm than value iteration, but can in fact be easier to implement.

8.4 Continuous spaces


Solving models with continuous spaces—like all the models we have seen—with numerical methods will
typically rely on “discretizing” the state space in some way; that is, approximating the model with one in
which the state and choice variables can only take on a finite number of values.
The first important thing is to be smart about setting up this “grid” of discrete values. It often makes
sense to choose values whose spacing is logarithmic rather than linear; this allows the grid to be finer in areas
where the value or policy function has more curvature.
Another thing to consider is where to discretize. For example, let K be the grid of values that we restrict
k to taking on. A naı̈ve approach (which you will see taken in the solutions for your first problem set) might
conduct value iteration by finding
u(k, k 0 ) + βVi (k 0 )
 
Vi+1 (k) = 0 max
k ∈Γ(k)∩K

for each k ∈ K by exhaustively considering all the k 0 ∈ Γ(k) ∩ K to find the maximizer. A more sophisticated
approach still only considers k ∈ K, but would allow k 0 to take on any value. This suggests attempting to
iterate with
?
Vi+1 (k) = 0max u(k, k 0 ) + βVi (k 0 ) .
 
k ∈Γ(k)

This proposed approach raises two questions. First, exhaustive search of a continuous choice set is not feasible;
how do we solve the maximization when k 0 can take on a continuum of values? Fortunately, numerical
computing tools (e.g., Matlab, Scilab, SciPy) offer many built-in optimization algorithms. Secondly, how
can we even evaluate the maximand for k 0 6∈ K when Vi was only defined for values in the grid? There are
a number of ways of doing this; the simplest (which also has attractive theoretical properties) is simply to
linearly interpolate between adjacent elements of K. Thus the algorithm, which is called fitted value iteration,
actually iterates using

Vi+1 (k) = 0max u(k, k 0 ) + β V̄i (k 0 ) ,


 
k ∈Γ(k)

where V̄i means the function that linearly interpolates Vi between grid points. There is also a related algorithm
called fitted policy iteration.

9 From the steady state towards explicit dynamics


Once we have “solved” a deterministic infinite-horizon dynamic model, typically the first thing we will do is
look to characterize its steady state. This is not hard: just find a recursive formulation of the solution (i.e.,
one or more difference equations pinning it down), and then substitute, for each time-varying value x,
x∗ ≡ · · · = xt−1 = xt = xt+1 = · · · .

18
Typically, the requisite recursive formulation will come from the intereuler(s).
Once we have characterized a steady state, we may
• Evaluate comparative statics of the steady state with regard to exogenous parameters,
• Draw conclusions about dynamics following small deviations from the steady state, or

• Explicitly characterize the system’s dynamics.


How does this last activity relate to identifying a steady state? In a sense, it does not. In fact, we have already
been investigating transition dynamics without solving for a steady state (for example, solving for policy
functions by “guessing and verifying” or using numerical algorithms). A different approach recognizes that a
major challenge in analytically investigating dynamics comes from the fact that they are nonlinear—consider
for example the second-order difference equation

u0 f (kt ) − kt+1 = βf 0 (kt+1 )u0 f (kt+1 ) − kt+2


 

for arbitrary u and f . One way of making this system more tractable is to consider instead a linear
approximation of this difference equation instead:

kt+2 ≈ α0 + α1 kt+1 + α2 kt

for some appropriately chosen α0 , α1 , α2 ∈ R. We generally conduct such approximations about the steady
state (which is usually the only point on g(·) we can find analytically).
There will be more talk of approximating dynamic systems soon. When we get there, several mathematical
tools will be essential.

9.1 Taylor approximation (a.k.a. “linearization”)


Consider a differentiable real-valued function on (some subset of) Euclidean space, g : Rn → R. The function
can be approximated in the region around some arbitrary point x∗ ∈ Rn by its tangent hyperplane.
If g : R → R, this approximation takes the form7

g(x) ≈ g(x∗ ) + g 0 (x∗ )(x − x∗ ).

If a system evolves according to xt = g(xt−1 ), we get a “linearized” system that evolves according to
xt ≈ g(x∗ ) + g 0 (x∗ )(xt−1 − x∗ ). If x∗ is the system’s steady state, x∗ = g(x∗ ); at this point, the approximation
is perfect.
The slope g 0 (x∗ ) tells us about the speed of convergence of the linearized system: if g 0 (x∗ ) = 0, then
the approximation tells us that xt ≈ g(x∗ ) = x∗ for all xt−1 ; convergence is instantaneous. In contrast, if
g 0 (x∗ ) = 1, then the approximation tells us that xt ≈ g(x∗ ) + xt−1 − x∗ = xt−1 for all xt−1 , and there is no
convergence towards x∗ whatsoever.

9.2 Eigenvectors and eigenvalues


An eigenvector p (6= 0) of a square matrix W has an associated eigenvalue λ if W p = λp. If you think of
left-multiplication by the matrix W as representing some linear transformation in Euclidean space (rotation,
reflection, stretching/compression, shear, or any combination of these), W ’s eigenvectors point in directions
that are unchanged by the transformation, while eigenvalues tell us how much these vectors’ magnitudes
change.
7 If g : Rn → R, this approximation takes the form g(x) ≈ g(x ) +
Pn 0
 
∗ i=1 gi (x∗ )(xi − x∗ i ) = g(x∗ ) + ∇g(x∗ ) · (x − x∗ ), where
· is the vector dot product operator.

19
How do we find eigenvectors and eigenvalues? Eigenvector p is associated with eigenvalue λ if
W p = λp
W p − λp = 0
(W − λI)p = 0 (9)

where I is the identity matrix. To solve for the eigenvalues, note that this equation can be satisfied if and
only if W − λI is singular (i.e., non-invertible), or

det[W − λI] = 0.
This is called the characteristic equation; the left-hand side is a polynomial in λ whose order is the dimension
of W . Unfortunately, for polynomials of degree exceeding four, there is no general solution in radicals (per
the Abel-Ruffini Theorem). Fortunately, there are other ways to calculate eigenvalues and you are unlikely to
need to calculate eigenvalues by hand for a matrix larger than 2 × 2.8 If you don’t remember the quadratic
formula or how to find the determinant of a 2 × 2 matrix, now would be a good time to remind yourself.
After solving for the eigenvalues, you can find the eigenvectors using W p = λp. Of course, p will not be
pinned down entirely: if p is an eigenvector, then αp is also an eigenvector for any α 6= 0. In practice, people
sometimes set the first element of each eigenvector equal to 1 (as long as eigenvector does not have a 0 in
that entry) and solve for the rest of the vector.9
Theorem 12 (Eigen Decomposition Theorem). Consider a square matrix W ; denote its k distinct eigenvectors
p1 , . . . , pk and the associated eigenvalues λ1 , . . . , λk . Let P be the matrix containing the eigenvectors as
columns and Λ be the diagonal matrix with the eigenvalues on the diagonal:
 
  λ1 0 · · · 0
| | |  0 λ2 · · · 0 
P ≡ p1 p2 · · · pk  Λ= . ..  .
 
.. .. . .
| | |  . . .
0 0 · · · λk
If P is a square matrix, W can be decomposed into
W = P ΛP −1 .
Proof.    
| | | |
P Λ = λ1 p1 ··· λk pk  = W p1 ··· W pk  = W P,
| | | |
where the first and third equalities follow from the matrix multiplication algorithm, and the second from the
definition of and eigenvector and its associated eigenvalue. Postmultiplying by P −1 completes the proof.
We will use this result extensively to analyze dynamic systems characterized by linear difference equations
(where the linearity typically arises through linear approximation around the steady state). Here’s how:
suppose that a dynamic system evolves according to
   
kt+2 kt+1
=W .
kt+1 kt
| {z } | {z }
≡xt+1 ≡xt

8 In the 2 × 2 case, the characteristic equation is


 
w11 − λ w12
det = 0,
w21 w22 − λ
or (w11 − λ)(w22 − λ) − w21 w12 = 0. Note that there may be zero, one, or two real solutions to this equation.
9 A complete specification of this normalization is that the first non-zero element of each eigenvector must be 1. Other options
P P 2 1/2
include requiring that i pi = 1, or that kpk ≡ i pi = 1.

20
We therefore have that xt = W t x0 . Off the bat, we might not have much to say about the matrix W raised
to a high power. However, using the eigen decomposition of W gives

xt = (P ΛP −1 )t x0 = P Λ 
P −1
 P −1
P Λ  P Λ · · · ΛP −1 x0 = P Λt P −1 x0 ,

where the structure of Λ makes it very easy to calculate Λt :


 t  t 
λ1 λ1
t
Λ =
 ..  =
  .. .

. .
λk λtk

If all the eigenvalues of W have magnitude less than one, then the xt = W t x0 = P Λt P −1 x0 form a
convergent process no matter what x0 = [k1 k2 ]0 we start from—systems like this are sometimes called
“sinks.” However, if W has an eigenvalue with magnitude greater than one (an “exploding” eigenvalue), the
system only converges if x0 is such that the element of P −1 x0 corresponding to the exploding eigenvalue
is zero. Such a system is said to be a “source” or have a “saddle path,” depending on whether some or all
eigenvalues are explosive. More on all this is still to come.

10 Introducing competitive equilibrium


A single agent—typically the “social planner”—has gotten us thus far, but most economic models have
multiple agents. For multiple agents’ presence to have any real significance, these agents will need to interact
with each other; in macroeconomic models, this interaction usually takes place in a market where
• Pricing is linear (e.g., there are no quantity discounts),
• Pricing is anonymous (i.e., the price paid by one buyer is paid by all buyers and the price received by
one seller is received by all sellers),

• All agents are price-takers/-makers,


• The price paid is the price received, and
• The quantity sold is the quantity bought.

In investigating a multi-agent model, we will look for competitive equilibrium. Since the meaning of
a competitive equilibrium can depend on the economic environment, typically the first thing we do is to
define one. A canonical definition is that a competitive equilibrium comprises prices and quantities (or
allocations) such that
1. All agents maximize given the prices (and any other parameters they cannot control) they face,

2. Markets clear (i.e., prices paid are prices received and quantities sold are quantities bought), and
3. Physical resource constraints are satisfied.

11 Comparing planners’ outcomes with competitive equilibria


Consider a set of economies,
 P t each of which has a single perishable good, and a mass of agents with utility
function U {ct }∞
t=0 = t β u(ct ), each of whom receives a constant endowment each period equal to y.

21
11.1 Asset-free economy
This economy has no assets. It therefore does not matter whether we consider a social planner problem or
a competitive equilibrium: there are no prices, and the economy’s resource constraint is equivalent to the
representative agent’s budget constraint. The problem of the economy is therefore to
X
max β t u(ct ) s.t. ct = y, ∀t.
{ct }t
t

The solution is trivial: c∗t = y for all t.


We also know how to write this problem as something like a functional equation, although the value
“function” has no arguments (since there are no state variables):

V = u(y) + βV.

11.2 Partial equilibrium asset market


In this economy, you can borrow or save money in a bank that exists outside the economy and pays/charges
gross interest rate Rt (known in the previous period) . The only price in the economy is exogenous, and the
economy has no aggregate resource constraint. Thus it again does not matter whether we consider a social
planner problem or a competitive equilibrium. The problem of the economy is to
X
max β t u(ct ) s.t.
{ct ,at+1 }t
t
ct + at+1 = y + Rt at , ∀t;
a0 = 0; and
an appropriate TVC.

To solve the model, we consider set up a Lagrangian for the representative agent’s problem
X
L= β t u(y + Rt at − at+1 )
t
| {z }
ct

and use the FOCs to generate the intereuler: u0 (ct ) = βRt+1 u0 (ct+1 ) for all t.
We can also form a functional equation as long as the interest rates are constant (Rt = R for all t):

V (a) = max
0
u(y + Ra − a0 ) + βV (a0 ).
a

11.3 General equilibrium asset markets: social planner


In this economy, you can borrow or save money, but only with other agents inside the economy. That means
that one agent can only borrow when another lends her his savings. Let the gross interest rate (known in the
previous period) be Rt , which now arises endogenously to clear the financial market.
We first consider a social planner in this economy. Market clearing means that at = 0 for all t, and the
economy-wide resource constraint means ct = y for all t. We are effectively back in the asset-free economy.

11.4 General equilibrium sequential asset markets: competitive equilibrium


This economy is almost the same as the last one, but each period t, a competitive market for financial
claims determines the gross interest rate Rt+1 between the current and subsequent periods. A competitive
equilibrium comprises allocations {ct , at+1 }t and prices {Rt+1 }t such that

22
• Agents maximize: {ct , at+1 }t solve
X
max β t u(ct ) s.t.
{ct ,at+1 }t
t
ct + at+1 = y + Rt at , ∀t;
a0 = 0; and
an appropriate TVC.

This gives the same intereuler as in the partial equilbrium assets market: u0 (ct ) = βRt+1 u0 (ct+1 ) for
all t. Note that this will always be the case; individual agents are price-takers/-makers, so there is no
difference (yet) between GE and PE.
• Markets clear: at = 0 for all t.
• The resource constraint holds: ct = y for all t.
By either market clearing and the agents’ budget constraint, or by the economy’s resource constraint, we
have that u0 (ct+1 ) = u0 (ct ) for all t. Thus the intereuler pins down the equilibrium interest rate Rt = β −1 for
all t ≥ 1.
Note that the allocations are the same as the social planner’s solution under general equilibrium asset
markets. This is a direct implication of the First Welfare Theorem.

11.5 General equilibrium Arrow-Debreu asset market: competitive equilibrium


This economy is almost the same as the last one, but instead of trading financial claims each period (one
period ahead of the contracted delivery), all trading takes place before the economy starts. In other words, at
time −1, all the agents buy and sell promises to deliver units of the consumption good at each future period.
Let the price of one consumption unit in period t be pt . We could think of measuring the number of contracts
each agent buys and sells for each period, but it is easier just to measure her consumption, which pins down
her contract position (given y).
A competitive equilibrium thus comprises allocations {ct }t and prices {pt }t such that
• Agents maximize: {ct }t solve
X
max β t u(ct ) s.t.
{ct }t
t
X X
p t ct = pt y; and
t t
an appropriate TVC.

The Lagrangian X
β t u(ct ) + λ(y − ct )

L=
t
t 0
gives FOCs β u (ct ) = λpt for all t.
• Markets clear: y − ct = 0 for all t.
• The resource constraint holds: ct = y for all t.
By either market clearing or by the economy’s resource constraint, we have that u0 (ct ) = u0 (y) for all
0
t. Thus the intereuler implies that pt ≡ β t ρ for all t, where ρ ≡ u (y)/λ. (This just gives our one allowable
normalization in prices.)
What are equilibrium interest rates? They are Rt = pt/pt−1 = β, just as we would expect. Again, the
allocations are the same as the social planner’s solution under general equilibrium asset markets.

23
12 Pareto Optimality
Definition 13. An allocation is pareto optimal if it is:
1. Feasible: the sum of consumptions is less than or equal to the total endowment; and
2. Pareto: it is not possible to make any person better off without making at least one other person worse
off.

The sections below consider two ways of modelling an economy: as a “Pareto problem,” and as a social
planner problem. We argue that the solutions are the same, and further note that the First Welfare Theorem
ensures that any competitive equilibrium can also be found as the solution to a Pareto or social planner
problem. We will not actually have too much more to say in this class about the Fundamental Welfare
Theorems,10 but there is further treatment in the general equilibrium section of Economics 202/202n.
Keep in mind that these results (as usual) rely on the concavity of the utility function.

12.1 A simple Pareto problem


There is a single period in which a quantity Y of a consumption good must be shared between I > 2 agents,
each with a utility function that satisfies the Inada conditions. The Pareto problem can be specified as:

max u1 (c1 ) s.t.


{ci }Ii=1

ui (ci ) ≥ u∗i , ∀i ≥ 2;
ci ≥ 0, ∀i ≥ 1;
I
X
ci ≤ Y.
i=1

The first set of constraints can be thought of as “promise-keeping” constraints: the Pareto optimizer seeks
to maximize the utility of agent one conditional on promises that she has made to deliver utility of at least
u∗i to each other agent i ≥ 2.
At the optimum, we will clearly have

c∗i ≡ c∗i (u∗i ) = inf {x|ui (x) ≥ u∗i }


x

for i ≥ 2; that is, c∗i is the minimum level of consumption that i needs to achieve utility u∗i . By varying
{c∗i }i≥2 and solving the above optimization, we can identify the full set of Pareto Optimal outcomes. Note
PI
that under these assumptions, since utility is increasing, it must Palways be the case that i=1 c∗i = Y , so
that knowing {c∗i }i≥2 allows us to back out the optimal c∗1 = Y − i≥2 c∗i .
Setting up a Lagrangian (and ignoring non-negativity constraints as usual) gives
I
X   I
X 

 
L ≡ u1 (c1 ) + θi ui (ci ) − ui + ω Y − ci .
i=2 i=1

Note that the problem is well behaved: the choice set is convex, and the objective function is concave and
differentiable. Thus the FOCs characterize the solution. Taking FOCs,

u01 (c∗1 ) = ω,
θi u0i (c∗i ) = ω, ∀i ≥ 2.
10 The Second Welfare Theorem ensures that any solution to a Pareto or social planner problem can be “supported” as a

competitive equilibrium.

24
Thus

u01 (c∗1 )
= θi
u0i (c∗i )

for all i ≥ 2 (or for all i, if we define an extra parameter θ1 ≡ 1).

12.2 A simple social planner problem


As above, there is a single period in which a quantity Y of a consumption good must be shared between
I > 2 agents, each with a utility function that satisfies the Inada conditions. The social planner problem is to
maximize the weighted sum of the utility of all agents. That is, the social planner wants to
I
X
max λi ui (ci ) s.t.
{ci }Ii=1
i=1
ci ≥ 0, ∀i;
I
X
ci ≤ Y,
i=1

where the λs are the weights—exogenous to the problem—that the social planner places on each agent. Note
that only the ratios of the λs matters (we could double the value of each and leave the problem unchanged),
so without loss of generality we can normalize λ1 = 1.
Setting up a Lagrangian (and ignoring non-negativity constraints as usual) gives
I
X   I
X 
L≡ λi ui (ci ) + γ Y − ci .
i=1 i=1

As in the Pareto problem, this is well behaved (the choice set is convex, and the objective function is concave
and differentiable) so the FOCs characterize the solution:

λi u0i (c∗i ) = γ, ∀i.

These imply that

u01 (c∗1 )
= 1/λi
u0i (c∗i )

for all i.
Thus we can achieve any Pareto optimal allocation in the social planner problem by choosing appropriate
weights: λi = 1/θi . There is an equivalence between the two problems.

13 Competitive equilibrium practice question


The question that follows was the longest question on the Fall, 2006 midterm. Per the question’s point value,
you might have expected to spend about 36 minutes on it. It did not test any dynamic programing or tricky
math, but it was long and required that you correctly answer the first part of the question before going on
the second parts. This problem tests your ability to take FOCs, interpret the budget constraint, and do some
algebraic manipulation. It also tests something you have already been asked to do on your problem set and
in interpreting Nir’s notes: read between the lines to fully specify a model, some details of which may have
been left out.

25
Question
Consider the neoclassical growth model with endogenous hours and government spending. That is, the
representative agent maximizes

X
β t U (Ct , Nt )
t=0
s.t. Ct + Kt+1 − (1 − δ)Kt + gt = Wt Nt + Rt Kt

and the representative firm produces output according to

Yt = F (Kt , Nt ).

The functions U and F are strictly increasing in each argument, strictly concave, differentiable and they
satisfy the Inada conditions. We also assume that the corresponding TVC holds.

1. Assume that
Nt1+χ
U (Ct , Nt ) = log(Ct ) −
1+χ
and that the production function is Cobb-Douglas

F (Kt , Nt ) = Ktα Nt1−α .

(a) Derive the FOCs of the hh and the firm and explain each of the four equations you get (two for
the hh and two for the firm).
(b) Define a competitive equilibrium.
(c) Is this allocation Pareto-optimal? Give a short intuitive argument.
(d) Describe the steady state of the economy. Hint: you will not be able to find closed form solutions
for the endogenous variables. Use the intereuler condition to pin down the capital-labor ratio,
which you can define as X. Use X to simplify the intraeuler condition and the resource constraint
of the economy.
(e) Consider now a permanent increase in gt . In the new steady state (i.e., ignoring transition
dynamics), what is the effect of this change on the allocations in the economy (N , K, C, and Y )?
2. Now, assume that the momentary utility function is given by
!
N 1+χ
U (Ct , Nt ) = log Ct − t .
1+χ

(a) Derive the FOCs for this economy (two equations for the hh; the firm problem did not change).
(b) Consider again a permanent increase in gt with this new utility function. In the new steady state
(i.e., ignoring transition dynamics), what is the effect of this change on the allocations in the
economy (N , K, C, and Y )?
(c) Compare the effect of changing g for the two cases with different utility functions. Provide intuition
for your result.

Solution
The first thing to do is figure out what’s going on in this economy. Common questions include,
1. Who owns capital?

26
2. What are consumers’ sources and uses of income?
3. What are firms’ expenses?
4. What happens to firms’ profit (i.e., who owns firms)?
5. In what units are prices measured?

6. What is the physical resource constraint of the economy?


Answers are not always given explicitly in the model, so you may need to use your judgment. In this
model, the answers to these questions are:
1. Households.

2. Sources: wages, rental income of capital, (depreciated) capital stock. Uses: consumption, future capital
stock, tax.
3. Wages and capital rental.

4. It’s not clear who owns firms—perhaps someone outside the economy? However, it will turn out that
the form of the production function ensures there are no profits. (This is not a coincidence; we will
often see this.)
5. Consumption goods. An equivalent way to state this is that the price of consumption is normalized to
one.

6. Ct + gt + Kt+1 = Yt + (1 − δ)Kt .
We now proceed with a solution. Note that the discussion below goes into significantly more depth of
explanation than would be expected on an examination. If you were short on time, you could still could do
much of this problem quickly. You should be able to take the FOCs (1a), define a competitive equilibrium
(1b), have a quick discussion of Pareto Optimally (1c), define a steady state and evaluate the FOCs at a
steady state (1d), take the new FOCs (2a), and mention/define income effects (2c).

1. (a) You should take the FOCs correctly. The rest of the problem relies on them, so you need to get
them correct. As Nir wrote in his solution set,
“Make sure not to lose points on the FOCs! This problem was certainly not easy in the
later parts, but very standard in its general setup. You should not lose any points and/or
time on questions like part 1a, 1b and 2a. Those alone gave you half the points for this
question.”
As for “explaining” them, most of the explanations are not interesting (“you need to balance
marginal this against marginal that, discounted by the interest rate and β”). The intuition should
be straightforward for each FOC, and you might consider spending most of your time on the
math rather than writing out long explanations of the FOCs. In fact, on Nir’s solutions, he didn’t
explain the FOCs at all. Do write something (something correct!), but you need not write more
than a sentence or two here.

Firm Problem We have to set up the firms’ problem, which is not given explicitly in the
problem; this is the first of several things that we will just have to use our judgment about.
Looking at the households’ budget constraints suggests that
• Households own capital, which they rent to firms at a price of Rt ,
• Firms hire workers at a wage of Wt , and
• The “units” in which these prices are measured are consumption units.

27
Thus a firm’s revenue is the quantity it produces, Yt (since the sale price is one), and its cost is
Rt Kt + Wt Nt . Thus it solves

max Ktα Nt1−α − Rt Kt − Wt Nt .


Kt ,Nt

We have not included any non-negativity constraints, justified with an appeal to F ’s satisfaction
of Inada conditions. The FOCs are below:

[firm, labor] Wt = (1 − α)Ktα Nt−α .

The firm should be willing to hire more workers if the wage is less than the worker’s marginal
product (Wt < (1 − α)Ktα Nt−α ) and should want to fire workers otherwise. Hence in equilibrium,
Wt = (1 − α)Ktα Nt−α .
[firm, capital] Rt = αKtα−1 Nt1−α .
The firm should be able to rent more capital if the rental price is less than the marginal product
of capital and should want to get rid of capital (“rent it to the market”) otherwise. Hence in
equilibrium, Rt = αKtα−1 Nt1−α .

Household problem Each household seeks to



" #
1+χ
X N
β t log Wt Nt + Rt Kt − Kt+1 − gt + (1 − δ)Kt − t

max
{Kt+1 ,Nt }t
t=0
| {z } 1+χ
Ct

given K0 (and again ignoring non-negativity) or, expanding around a given point,
" #
t
 Nt1+χ
· · · + β log Wt Nt + Rt Kt − Kt+1 − gt + (1 − δ)Kt − +
| {z } 1+χ
Ct
" 1+χ
#
t+1
 Nt+1
β log Wt+1 Nt+1 + Rt+1 Kt+1 − Kt+2 − gt+1 + (1 − δ)Kt+1 − + ··· .
| {z } 1+χ
Ct+1

Taking FOCs, we get that


1 1 
[hh, capital] =β Rt+1 + (1 − δ) .
Ct Ct+1

This is the standard intertemporal Euler equation (intereuler). The LHS is the marginal utility
of consumption today; the RHS is the marginal utility of consumption tomorrow, discounted by
β and multiplied by the effective interest rate, Rt+1 + (1 − δ), which can be thought of as the
marginal rate of transformation between consumption today and consumption tomorrow.

[hh, labor] Ntχ Ct = Wt


1
Ntχ = Wt
Ct
The is the standard within-period Euler equation (intraeuler). Working more gets you Wt which,
multiplied by the marginal utility of consumption, C1t , gives you the utility gain from working a
bit more. The household needs to balance this against the marginal disutilty from working, Ntχ .
(b) A competitive equilibrium is prices {Wt , Rt }∞ s s d d ∞
t=0 and allocations {Ct , Nt , Kt , Nt , Kt , Yt }t=0 such
that

28
i. Given prices {Wt , Rt }∞ s s ∞
t=0 , allocations {Ct , Nt , Kt+1 }t=0 solve the household problem;
ii. Given prices {Wt , Rt }∞ d d ∞
t=0 , allocations {Nt , Kt }t=0 solve the firm problem;
s d s d
iii. Markets clear: Nt = Nt ≡ Nt and Kt = Kt ≡ Kt for all t; and
iv. The resource constraint is satisfied: Ct + gt + Kt+1 = Yt + (1 − δ)Kt for all t.
(c) There are no externalities and no market failures, So a CE is Pareto Optimal by the First
Fundamental Welfare Theorem. We could also formally show that the FOC of the social planner
is the same, but since Nir asked for a “short intuitive” answer, this is beyond the scope of the
question. Note that reducing government spending is not a feasible Pareto Improvement, as g is an
exogenous parameter here. You lost points if you said that reducing g was a Pareto Improvement.
That argument, that reducing g makes everyone better off, would be analogous to saying that we
can have a Pareto Improvement by lowering the depreciation rate δ.
(d) Recall the intereuler
1 1 
=β Rt+1 + (1 − δ) .
Ct Ct+1
In the steady state, Ct = Ct+1 , so
 α−1
1 K∗
− (1 − δ) = R∗ = α
β N∗
(where the second equality comes from the firm’s FOC with respect to capital), or
 1
 α−1
K∗ 1 − β(1 − δ)
X∗ ≡ = .
N∗ αβ

Thus the capital/labor ratio X is pinned down by β, δ, and α; this pins down R∗ = αX∗α−1 .
From the intraeuler and the firm’s FOC with respect to labor, we have that

C∗ N∗χ = W∗ = (1 − α)X∗α .

From the economy’s resource constraint, at steady state

C∗ + g∗ + K∗ = K∗α N∗1−α + (1 − δ)K∗


C∗ + g∗ + δK∗ = K∗α N∗1−α
= X∗α N∗ .

Substituting in K∗ = X∗ N∗ and C∗ = (1 − α)X∗α N∗−χ gives that

(1 − α)X∗α N∗−χ + g∗ + δX∗ N∗ = X∗α N∗ .

This pins down N∗ , from which we can also solve K∗ and C∗ .


(e) Suppose now that g∗ increases. Note that the intereuler does not change; the capital/labor ratio
X∗ remains fixed. Recall from the last part that

g∗ = (X∗α − δX∗ )N∗ − (1 − α)X∗α N∗−χ .

If g∗ increases,
• Labor N∗ increases as well since the right-hand side of the equation above is increasing in N∗
(check this!11 ),
11 There are two terms on the right-hand side. Notice the second term (−(1 − α)X∗α N∗−χ ) is increasing in N∗ , and is negative.
Since the left-hand side of the equation is positive, the right-hand side must be as well, so its first term ((X∗α − δX∗ )N∗ ) must
be positive, and therefore increasing in N∗ .

29
• Capital K∗ = X∗ N∗ increases (by the same proportion as labor),
• Output Y∗ = K∗α N∗1−α increases (by the same proportion as labor and capital), and
• Consumption C∗ ∝ N∗−χ decreases.
We can think of this as the (long-run) effect of an ongoing lump-sum tax.
2. (a) Here, the household’s problem is

!
X N 1+χ
max β log Wt Nt + Rt Kt − Kt+1 − gt + (1 − δ)Kt − t
t
{Kt+1 ,Nt }t
t=0
| {z } 1+χ
Ct

or, expanding around a given point


!
1+χ
N t
· · · + β t log Wt Nt + Rt Kt − Kt+1 − gt + (1 − δ)Kt − +
| {z } 1+χ
Ct
!
N 1+χ
β t+1
log Wt+1 Nt+1 + rt+1 Kt+1 − Kt+2 − gt+1 + (1 − δ)Kt+1 − t+1 + ··· .
| {z } 1+χ
Ct+1

With this utility function the FOC are


!−1 1+χ
!−1
N 1+χ Nt+1
Ct − t

[hh, capital] = β Ct+1 − Rt+1 + (1 − δ)
1+χ 1+χ

and
[hh, labor] Ntχ = Wt .
(b) The intereuler here gives the same steady-state result as with the other utility function: β −1 −
(1 − δ) = R∗ . Since the firm’s FOCs are the same, we get exactly the same capital/labor ratio:
 1
 α−1
K∗ 1 − β(1 − δ)
X∗ ≡ = .
N∗ αβ

From the intraeuler and the firm’s FOC with respect to labor, we have that

N∗χ = W∗ = (1 − α)X∗α .

If g∗ increases,
• Labor N∗ remains fixed, since X∗ does,
• Capital K∗ = X∗ N∗ remains fixed,
• Output Y∗ = K∗α N∗1−α remains fixed, and
• Consumption C∗ = K∗α N∗1−α − δK∗ − g∗ (from the economy’s resource constraint) falls by
the same amount that government expenditures increase.
(c) Can we say something about the fall in C∗ in the two models? Yes! Since output increased in the
economy with income effects (the first economy) but remained constant in the second, the same
change in g∗ will result in a smaller consumption decline in the first model.12 In the model with
income effects, the influence of the lump sum tax on consumption is partly offset by the increasing
supply of capital and labor. Hence for every dollar the government taxes, consumption falls by less
12 To be more precise, we should consider not output, but Y − δK . However, since Y and K increase by the same fraction,
∗ ∗ ∗ ∗
the difference must increase as long as it was positive to begin with.

30
than one dollar in the first example. In the second example, there are no income effects. Having
the government tax a household one dollar causes consumption to fall by one dollar.
Note that the first model has income effects in the sense that a change in consumption changes
the within-period problem, illustrated by the intraeuler Ct Ntχ = Wt . Increasing Ct must make Nt
fall; hence hours depend on income and there is an income effect.
In the second model, in contrast, increasing consumption does not change the household’s within-
period problem, per the intraeuler Ntχ = Wt . Given a saving rate, the household must each period
maximize
Nt1+χ
 
log Ct −
1+χ
Nt1+χ
which is the same as simply maximizing Ct − 1+χ . Since the within-period decision is quasi-linear
in consumption, there are no income effects.

14 Constant returns to scale production


In the question above, we noted that firms earned no profits. This was ensured by the Cobb-Douglas
production function, and in particular the fact that it demonstrates constant returns to scale (CRS, or
constant RTS).
Definition 14. A production function f exhibits constant returns to scale if
f (αk, αn) = αf (k, n)
for all α > 0
One reason that CRS technologies are attractive for economic models is that if firms exhibit increasing
RTS (f (αk, αn) > αf (k, n) for α > 1), multiple firms will always want to combine and if they exhibit
decreasing RTS (f (αk, αn) < αf (k, n) for α > 1), firms will always want to split into a multiplicity of smaller
firms. It is also the case that at a competitive equilibrium, firms with CRS production functions will earn
zero profits, ensuring there is no incentive for firms to enter or exit the market.
Claim 15. If f (αk, αn) = αf (k, n) for all α > 0, then
max f (k, n) − rk − wn = 0.
k,n

Proof. Consider the definition of CRS:


αf (k, n) = f (αk, αn).

Differentiating with respect to α,

∂f ∂f
f (k, n) = (αk, αn) · k + (αk, αn) · n.
∂k ∂n
Since this must hold for all α > 0, k, and n, we can plug in α = 1, k = k∗ , and n = n∗ :

∂f ∂f
f (k∗ , n∗ ) = (k∗ , n∗ ) · k∗ + (k∗ , n∗ ) · n∗ .
∂k ∂n
∂f ∂f
By the first-order conditions for the maximization problem, ∂k (k∗ , n∗ ) = r and ∂n (k∗ , n∗ ) = w, so

f (k∗ , n∗ ) = rk∗ + wn∗


f (k∗ , n∗ ) − rk∗ + wn∗ = 0.
| {z }
=maxk,n f (k,n)−rk−wn

31
A production function demonstrates CRS if and only if it is homogeneous of degree one, where a function
f is said to be homogeneous of degree k if

f (αx) = αk f (x)

where x may be multidimensional. Using techniques very similar to those above, we could prove

Theorem 16 (Euler’s Law). If f is homogeneous of degree k,

∇f (x) · x = kf (x)

for all x.

The following result is often cited as a corollary of Euler’s Law:


Theorem 17. If f is homogeneous of degree one, then ∇f is homogeneous of degree zero.
Proof. Homogeneity of degree one means that

f (λp) = λf (p).

Taking the derivative with respect to p,

λ∇f (λp) = λ∇f (p)


∇f (λp) = ∇f (p).

15 Continuous-time optimization
Much of these notes comprise (what I hope is) a slightly clearer version of the lecture notes’ treatment of the
same subject.
The key notation we will use in the following is that “dotted” variables represent derivatives with respect
to time (ẋ ≡ ∂x/∂t).

15.1 Finite-horizon
Consider the following maximization problem:
Z T 
V (0) = max v k(t), c(t), t dt s.t.
c : [0,T ]→R 0

k̇ ≤ g k(t), c(t), t , ∀t;
k(0) = k0 > 0 given; and
a no-Ponzi condition.

Note that capital is a state variable and consumption is a control. The first constraint is is the “transition
equation” or “equation of motion” for the state variable k. The no-Ponzi condition ensures that capital at
the end of time, k(T ), cannot be “too negative.”13
First, imagine setting up a Lagrangian
Z T Z T
µ(t) g k(t), c(t), t − k̇ dt + γk(T )e−T r̄(T ) .
   
L= v k(t), c(t), t dt + (10)
0 0
13 In

this problem, the no-Ponzi condition takes the form k(T ) exp −T r̄(T ) ≥ 0, where r̄(t) is the average interest rate from
time 0 to t. This implies that assets at time T are weakly positive.

32
When we have a countable number of constraints, we multiply each by a Lagrange multiplier and then add
these; here the transition equation gives a continuum of constraints, each of which we multiply by a Lagrange
multiplier µ(t) and then integrate across these.
If we try to solve this Lagrangian using typical methods, we face difficulty due to the presence of k̇; thus
we attempt to eliminate it. Note that
Z T Z T
d 
µ̇k(t) + k̇µ(t) dt = µ(t)k(t) dt = k(T )µ(T ) − k(0)µ(0),
0 0 dt
d
where the first equality follows from the product rule ( dt (µk) = µ̇k+ k̇µ) and the second from the Fundamental
Theorem of Calculus. Hence
Z T Z T
k̇µ(t) dt = k(T )µ(T ) − k(0)µ(0) − µ̇k(t) dt.
0 0

We can substitute this into equation 10, giving


Z T Z T
µ̇k(t) dt + k(0)µ(0) − k(T )µ(T ) + γk(T )e−T r̄(T ) .
 
L= v k(t), c(t), t + µ(t)g k(t), c(t), t dt +
0 | {z } 0
≡H(k,c,t,µ)

With the Hamiltonian defined as above, this is


Z T
= H(k, c, t, µ) + µ̇k(t) dt + k(0)µ(0) − k(T )µ(T ) + γk(T )e−T r̄(T ) .
0

Finding the optimal paths


In general, maximizing over the choice of functions is difficult. Fortunately, Pontryagin proposed a method
that allows us to recast the problem in terms of a single variable, allowing us to use tools we already know to
solve the optimization problem.
∗ ∗ 
Consider a path c(t) (which pins down k(t) per k̇ = g k(t), c(t), t given k(0)) that solves the maximization
problem. That means that for any other feasible path, the objective function must be lower. We can express

all other feasible consumption paths as the sum of the optimal path c and some “perturbation function”:

c(t) = c(t) + εpc (t).

Similarly, we can define pk (·) satisfying



k(t) = k(t) + εpk (t).

Checking the optimality of c is equivalent to confirming that for any perturbation function pc (·), the
objective function is (weakly) maximized at ε = 0. This requires that dL/dε = 0 for all functions pc (·) and
pk (·). Recasting our Lagrangian in terms of ε gives
Z T
H k(·, ε), c(·, ε), t, µ + µ̇k(t, ε) dt + k(0, ε)µ(0) − k(T, ε)µ(T ) + γk(T, ε)e−T r̄(T )

L(ε, . . . ) =
0
Z T 
dL dH ∂k ∂k(T, ε)  −T r̄(T ) 
= + µ̇ dt + γe − µ(T )
dε 0 dε ∂ε ∂ε
Now, notice that the chain rule and definition of the perturbation functions gives
dH ∂H ∂c ∂H dk ∂H ∂H
= · + · = pc (t) + pk (t)
dε ∂c ∂ε ∂k dε ∂c ∂k

33
and
∂k(T, ε)
= pk (T ).
∂ε
Substituting in gives
Z T   
dL ∂H ∂H
+ µ̇ pk (t) dt + pk (T ) γe−T r̄(T ) − µ(T ) .
 
= pc (t) +
dε 0 |∂c {z } ∂k | {z }
≡A
| {z } ≡C
≡B

Optimality requires that this be zero when evaluated at ε = 0 for all pc and pk . For this to hold, one
can show that this must hold component-by-component. (In general, it is not the case that A + B + C = 0
implies A, B, and C are all zero; it is true here, however.) That is, we need

∂H
=0
∂c
∂H
= −µ̇
∂k
γe−T r̄(T ) = µ(T ).

Connectively, these are know as “The Maximum Principle.” The first-two play the role of first-order conditions,
and the third is the no-Ponzi condition.14

15.2 Infinite time


When time is infinite, the problem becomes
Z

V (0) = max v k(t), c(t), t dt s.t.
c : R+ →R R

k̇ ≤ g k(t), c(t), t , ∀t;
k(0) = k0 > 0 given; and
a no-Ponzi condition.

The no-Ponzi condition is now ensured by a TVC: limt→∞ k(t) exp −tr(t) ≥ 0. This means that assets can
be negative everywhere, but that they cannot grow more quickly than the interest rate.
There is some discussion in Barro and Sala-I-Martin as to if this TVC is really needed. They report
that in some circumstance, you may be able to solve the problem with a relaxed TVC. If you are interested,
consult the book.

15.3 The Hamiltonian “cookbook”


15.3.1 One control, one state variable
1. Construct the Hamiltonian. Define
 
H(k, c, t, µ) ≡ v k(t), c(t), t + µ(t) · g k(t), c(t), t .
| {z } |{z} | {z }
felicity multiplier RHS of transition eqn.

14 Note

that µ(t) measures the shadow price of capital at time t (in utils). The condition that γ exp −T r̄(T ) = µ(T ) means
that the (shadow) “cost of the no-Ponzi condition”—given by its multiplier γ discounted back to the present—is equal to the
shadow cost of the capital stock available at the end of time. If, for some reason, the no-Ponzi condition did not bind, γ = 0. But
then the agent could use a bit more capital in the last period and still not violate the no-Ponzi condition. For this to be optimal,
the value of capital in the last period, µ(T ), must be 0 as well. If the no-Ponzi condition binds, then capital must be costly.

34
H
2. Take FOC for control variable. Set ∂c = 0:
 
∂H(k, c, t, µ) ∂v k(t), c(t), t ∂g k(t), c(t), t
= + µ(t) =0
∂c(t) ∂c(t) ∂c(t)

for all t.
∂H
3. Take FOC for state variable. Set ∂k = −µ̇:
 
∂H(k, c, t, µ) ∂v k(t), c(t), t ∂g k(t), c(t), t
= + µ(t) = −µ̇(t)
∂k(t) ∂k(t) ∂k(t)

for all t.
4. Identify TVC. If T < ∞, then the TVC is

µ(T )k(T ) = 0;

that is, either capital equals 0 in the last period (the constraint binds) or the constraint does not bind
and µ(T ) = 0.
If time is infinite, the TVC is
lim µ(t)k(t) = 0.
t→∞

15.3.2 Multiple control or state variables


Now, let there be n control variables and m state variables. That is, we choose c1 (·), c2 (·), . . . , cn (·) to
maximize
Z T

V (0) = max v k1 (t), . . . , km (t), c1 (t), . . . , cn (t), t dt
~
c(·) 0 | {z }
≡(~
k(t),~
c(t),t)

s.t.

k̇1 (t) ≤ g1 k1 (t), . . . , km (t), c1 (t), . . . , cn (t), t , ∀t;
..
.

k̇m (t) ≤ gm k1 (t), . . . , km (t), c1 (t), . . . , cn (t), t , ∀t; and
k1 (0), . . . , km (0) > 0 given.

1. Construct the Hamiltonian. Define


m
 X
H(~k, ~c, t, µ) ≡ v ~k(t), ~c(t), t + µj (t)gj ~k(t), ~c(t), t .

j=1

H
2. Take FOCs for control variables. Set ∂ci = 0:

m
∂H(~k, ~c, t, µ) ∂v ~k(t), ~c(t), t ∂gj ~k(t), ~c(t), t
 
X
= + µj (t) =0
∂ci (t) ∂ci (t) j=1
∂ci (t)

for i ∈ {1, . . . , n} and all t.

35
∂H
3. Take FOCs for state variables. Set ∂ki = −µ̇i :

m
∂H(~k, ~c, t, µ) ∂v ~k(t), ~c(t), t ∂gj ~k(t), ~c(t), t
 
X
= + µj (t) = −µ̇i (t)
∂ki (t) ∂ki (t) j=1
∂ki (t)

for i ∈ {1, . . . , m} and all t.


4. Identify TVCs. If T < ∞, then the TVCs are

µi (T )ki (T ) = 0

for i ∈ {1, . . . , m}. If time is infinite, then the TVCs take the form

lim µi (t)ki (t) = 0.


t→∞

15.4 Current-value Hamiltonians


So far we have used what is known as the “present-value Hamiltonian.” There is another formulation of the
problem called the “current-value Hamiltonian,” which is equivalent for the models that we consider in this
class. I am not aware of a particular advantage of this approach, but you might want to know that it exists
in case you come across a Hamiltonian that seems to have been created differently than you expect.
Consider a model with objectives of the form
Z T
e−ρt u k(t), c(t) dt.

0 | {z }
≡v(k(t),c(t),t)

The present-value Hamiltonian is


 
H(k, c, t, µ) = v k(t), c(t), t + µ(t)g k(t), c(t), t
= e−ρt u k(t), c(t) + µ(t)g k(t), c(t), t .
 

But, sometimes it is common instead to proceed as follows. Consider multiplying by eρt to get the current-value
Hamiltonian:

e c, t, µ) ≡ eρt H(k, c, t, µ)
H(k,
= u k(t), c(t) + eρt µ(t) g k(t), c(t), t .
 
| {z }
≡λ(t)

λ(t) is the current-value shadow price: it gives the value of a unit of capital at time t measured in time-t utils
(i.e., felicits), rather than in time-0 utils.
The Maximum Principle tells us that at an optimum, we have an FOC in the choice variable

∂H ∂H
e ∂H
e
= e−ρt =0 ⇐⇒ = 0.
∂c(t) ∂c(t) ∂c(t)

The FOC in the state variable takes the form

∂H ∂H
e
= e−ρt = −µ̇(t),
∂k(t) ∂k(t)

36
which, since µ(t) = e−ρt λ(t),

= ρλ(t)e−ρt − λ̇(t)e−ρt
∂H
e
= ρλ(t) − λ̇(t).
∂k(t)

Finally, the TVC is


µ(T )k(T ) = e−ρT λ(T )k(T ) = 0
or
lim µ(t)k(t) = lim e−ρt λ(t)k(t) = 0.
t→∞ t→∞

16 Log-linearization
Recall that we have thus far linearized systems like g : Rn → R using the first-order Taylor approximation
about the steady state x∗ :

g(x) ≈ g(x∗ ) + g10 (x∗ )(x1 − x∗1 ) + g20 (x∗ )(x2 − x∗2 ) + · · · + gn0 (x∗ )(xn − x∗n )
= g(x∗ ) + ∇g(x∗ ) · (x − x∗ ).
 

Linearizing gives an approximation that is linear in x − x∗ . That is, a one unit change in in x causes the
approximated of g(x) to increase by a constant g 0 (x∗ ). Log-linearization instead gives an approximation for

g(·) that is linear in x̂ ≡ x−x
x∗ , or x’s percentage deviation from steady state.
Why would we like to do this? Consider trying to describe the economies of Palo Alto and the United
States using a single model. Given the difference in the economy’s scales, it makes more sense to draw
conclusions about how each would respond to, say, a 5% budget surplus—which might in some sense affect
Palo Alto and the U.S. similarly—than to draw conclusions about how each would respond to a billion dollar
budget surplus.
We sometimes think in terms of percent movements (“stocks went down 2%”) rather than absolute
movements (“the Dow dropped by 400 points”). One advantage of the former approach is that it allows us to
express our conclusions in unitless measures; they are therefore robust to unit conversion. This advantage

also accrues to log-linearization: if x is measured in Euros, then so is x − x∗ , while x̂ ≡ x−x
x∗ is a unitless
measure.

x−x∗
16.1 Why are approximations in terms of x̂ ≡ x∗
called “log-linear”?
Consider a first-order Taylor approximation of the (natural) log function, the most important Taylor
approximation in economics:15
1
log(x) ≈ log(x∗ ) + (x − x∗ )
x∗
x − x∗
log(x) − log(x∗ ) ≈ ≡ x̂.
x∗
Thus just as a standard Taylor approximation is linear in (x − x∗ ), the log-linearization is linear in
x̂ ≈ log(x) − log(x∗ ); that is, it is linear in logarithms.
15 You should memorize it and get used to recognizing it. In particular, you should get used to recognizing when people treat

this approximation as if it holds exactly; typically this occurs when the log first-difference of a time series (log(yt ) − log(yt−1 ))
is treated as equal to the series’ growth rate (yt /yt−1 − 1).

37
16.2 Log-linearization: first approach
To get a log-linearization, there are many techniques. One is to start with the standard linearized (i.e., Taylor
f (x)−f (x∗ ) 16
approximated) version and “build up” x̂ ≡ x−x x∗ and f (x) ≡
∗ d
f (x∗ ) .

f (x) ≈ f (x∗ ) + f 0 (x∗ )(x − x∗ )


f (x) − f (x∗ ) ≈ f 0 (x∗ )(x − x∗ )
f (x) − f (x∗ ) f 0 (x∗ )(x − x∗ )

f (x∗ ) f (x∗ )
0
f (x∗ )x∗ (x − x∗ )
≈ ·
f (x∗ ) x∗
0
f (x∗ )x∗
fd(x) ≈ x̂
f (x∗ )

Here, we have an expression that describes what happens to the deviation of f (x) from its steady state in
percent terms as x deviates from its steady state in percent terms. A 1% increase in x from the steady state
0
causes f (x) to increase by about f f(x(x∗∗)x) ∗ percent.
You might recognize that this last expression has another name: the elasticity of f (x) with respect to x.
Perhaps it is easier to see when written as

f 0 (x∗ )x∗
 
∂f x
= · .
f (x∗ ) ∂x f (x) x∗

It is important to get some practice with quickly log-linearizing simple functions. Several examples follow:
• A Cobb-Douglas production function

yt = ktα n1−a
t
k∗α n1−a
∗ k α n1−a
yt ≈ y ∗ + α (kt − k∗ ) + (1 − α) ∗ ∗ (nt − n∗ )
k∗ n∗
∗ y∗ y∗
yt − y ≈ α (kt − k∗ ) + (1 − α) (nt − n∗ )
k∗ n∗
yt − y ∗ (kt − k∗ ) (nt − n∗ )
≈α + (1 − α)
y∗ k∗ n∗
ŷt ≈ αk̂t + (1 − α)n̂t .

• An intereuler
1 1 α−1 1−α
=β αzt+1 kt+1 nt+1
ct ct+1
where zt+1 is fixed. We want to log linearize and find ĉt+1 as a function of ĉt , k̂t+1 , and n̂t+1 .
16 Ournotation here is that ŷ ≡ y−y
y∗

≈ log(y) − log(y∗ ) for any y. Sometimes we will instead use ŷ ≡ log(y) − log(y∗ ) ≈ y−y
y∗

.
Both are standard in Nir’s class, but not necessarily generally; in particular, some authors use capitalization to distinguish a
percentage/log deviation from steady state (or trend, or whatever else is being linearized around).

38
Rearranging, we see that
α−1 1−α
ct+1 = βct αzt+1 kt+1 nt+1
α−1 1−α α−1 1−α
 
∂ ct αzt+1 kt+1 nt+1 ∂ ct αzt+1 kt+1 nt+1
≈ c∗ + (ct − c∗ ) + (kt+1 − k∗ )
∂ct ∂kt+1
∗ ∗
α−1 1−α

∂ ct αzt+1 kt+1 nt+1
+ (nt+1 − n∗ )
∂nt+1

c∗ c∗ c∗
≈ c∗ + (ct − c∗ ) + (kt+1 − k∗ ) + (nt+1 − n∗ )
c∗ k∗ n∗
c∗ c∗ c∗
ct+1 − c∗ = (ct − c∗ ) + (α − 1)(kt+1 − k∗ ) + (1 − α)(nt+1 − n∗ )
c∗ k∗ n∗
ct+1 − c∗ (ct − c∗ ) (kt+1 − k∗ ) (nt+1 − n∗ )
≈ + (α − 1) + (1 − α)
c∗ c∗ k∗ n∗
ĉt+1 ≈ ĉt + (α − 1)k̂t+1 + (1 − α)n̂t+1 .

• The exponential

xnt ≈ xn∗ + nxn−1


∗ (xt − x∗ )
xt − x∗
≈ xn∗ + x∗ nxn−1

x∗
≈ xn∗ + nxn∗ x̂t .

• Another function

ext +yt ≈ ex∗ +y∗ + ex∗ +y∗ (xt − x∗ ) + ex∗ +y∗ (yt − y∗ )
xt − x∗ yt − y∗
≈ ex∗ +y∗ + x∗ ex∗ +y∗ + y∗ ex∗ +y∗
x∗ y∗
x∗ +y∗
≈e (1 + x∗ · x̂t + y∗ · ŷt ).

• A sum of functions

xt + yt ≈ x∗ + y∗ + (xt − x∗ ) + (yt − y∗ )
xt − x∗ yt − y∗
≈ x∗ + y∗ + x∗ + y∗
x∗ y∗
≈ x∗ + y∗ + x∗ · x̂t + y∗ · ŷt .

16.3 Log-linearization: second approach


This way may or may not be faster; it depends on your preferences and the problem at hand. It is easiest to
explain this method by way of example. The idea is that we will re-express the function in terms of logs of
the variables, and then take a linear approximation in the logs; then we will have log-linearized.
For simplicity of notation, assume that for any variable yt , we define Yt ≡ log(yt ). Thus we also have
ŷt ≈ Yt − Y∗ .
Several examples of this technique follow:

39
• A Cobb-Douglas production function, yt = ktα n1−a
t . Let us start with the left-hand side:
yt = elog(yt )
= eYt

Y∗ ∂ eYt
≈e + (Yt − Y∗ )
∂Yt
Yt =Y∗

≈ elog(y∗ ) + e log(y∗ )
(Yt − Y∗ )
≈ y∗ + y∗ · ŷt .

We now proceed with the right-hand side

ktα n1−a
t = eα log(kt )+(1−α) log(nt )
= eαKt +(1−α)Nt

αK∗ +(1−α)N ∗ ∂ eαKt +(1−α)Nt
≈e + (Kt − K∗ )
∂Kt
Kt =K∗

∂ eαKt +(1−α)N t

+ (Nt − N∗ )
∂Nt
Nt =N∗

≈ y∗ + αy∗ · k̂t + (1 − α)y∗ · n̂t

equating the two sides, we see that

y∗ + y∗ · ŷt ≈ y∗ + αy∗ · k̂t + (1 − α)y∗ · n̂t


ŷt ≈ αk̂t + (1 − α)n̂t

• An intereuler
1 1 α−1 1−α
=β αzt+1 kt+1 nt+1
ct ct+1
where zt+1 and kt+1 are fixed. We want to find ĉt+1 as a function of ĉt , k̂t+1 , and n̂t+1 . Rearranging
yields that
α−1 1−α
ct+1 = βct αzt+1 kt+1 nt+1

Log-linearizing both the left- and right-hand sides gives

c∗ + c∗ · ĉt+1 ≈ βαeCt+1 +Zt+1 +(α−1)Kt+1 +(1−α)Nt+1


≈ βαeZ∗ +(α−1)K∗ +(1−α)N∗ + βαeCt+1 +Zt+1 +(α−1)Kt+1 +(1−α)Nt+1 · [Ct − C∗ ] +
βαeZ∗ +(α−1)K∗ +(1−α)N∗ · (α − 1) [Kt+1 − K∗ ] +
βαeZ∗ +(α−1)K∗ +(1−α)N∗ · (1 − α) [Nt − N∗ ]
≈ c∗ + c∗ ĉt + c∗ (α − 1)k̂t+1 + c∗ (1 − α)n̂t
ĉt+1 ≈ ĉt + (α − 1)k̂t+1 + (1 − α)n̂t

17 Log-linearizing the NCGM in continuous time


We seek to maximize Z ∞
e−ρt U (ct ) dt
0

40
such that
k̇t = wt nt + rt kt − δkt − ct . (11)
We set up the Hamiltonian as

Ht ≡ e−ρt U (ct ) + µt (wt nt + rt kt − δkt − ct ).

Note that there is no disutility associated with working, so they will be chosen to be n = 1 (there is no FOC
in nt ). Per the Maximum Principle, the correct FOCs are:

∂H
=0 ⇐⇒ e−ρt U 0 (ct ) = µt ; and (12)
∂ct
∂H d
= − µt ⇐⇒ µt (rt − δt ) = −µ̇t ; (13)
∂kt dt
(along with a TVC). Equation 11 is a dynamic condition on capital, and equation 13 is a dynamic condition
on the shadow price of capital, µ. Combining equation 12 (which implies µ̇t = e−ρt U 00 (ct )ċt − ρU 0 (ct ) ) with


equation 13 yields an analogous condition on consumption:

U 0 (ct )
ċt = (ρ + δ − rt ). (14)
U 00 (ct )

This is the analogue of the intereulers we get in discrete-time models.


At this point we impose CRRA utility, hence

c1−σ
U (c) = =⇒ U 0 (c) = c−σ and U 00 (c) = −σc−σ−1 ,
1−σ
and Cobb-Douglas production (recall that n = 1), hence

F (k, n) = k α n1−α =⇒ r = αk α−1 and w = (1 − α)k α .

Thus equation 11 becomes

k̇t = (1 − α)ktα + αktα−1 kt − δkt − ct


= ktα − δkt − ct
k̇t ct
= ktα−1 − δ − .
kt kt

From now on, we will use the notation that “hatted” variables represent the relative deviation from steady
d d ˙
state (x̂ ≡ log(x) − log(x∗ ) ≈ (x − x∗ )/x∗ ). Note that k̇t /kt = dt (log kt ) = dt (log kt − log k∗ ) ≡ k̂t . This gives

˙ ct
k̂t = ktα−1 − δ − . (15)
kt
Similarly, equation 14 becomes
1 α−1
ċt = σ ct (αkt − ρ − δ)
ĉ˙t = 1
σ (αkt
α−1
− ρ − δ). (16)

We know that at steady state, equation 16 equals zero, so


ρ+δ
k∗α−1 = (17)
α

41
and similarly for equation 15,

c∗
= k∗α−1 − δ
k∗
ρ+δ ρ + δ(1 − α)
= −δ = . (18)
α α
˙
Log-linearizing equation 15 (noting that k̂∗ = 0) gives

˙ c∗ c∗
kˆt ≈ (α − 1)k∗α−1 k̂t − ĉt + k̂t
k∗ k∗

Substituting in using the steady state results from equations 17 and 18,
 
ρ+δ ρ + δ(1 − α) ρ + δ(1 − α)
= (α − 1) + k̂t − ĉt
α α α
ρ + δ(1 − α)
= ρk̂t − ĉt (19)
α

Log-linearizing equation 16 (noting that ĉ˙∗ = 0) gives

cˆ˙t ≈ 1
σ α(α − 1)k∗α−1 k̂t .

Substituting in using the steady state results from equation 17,

(α − 1)(ρ + δ)
= k̂t (20)
σ
Combining equations 19 and 20 in matrix notation, we have the dynamic system
" # " # 
˙ ρ − ρ+δ(1−α) k̂t
k̂t = α .
(α−1)(ρ+δ)
ĉ˙t σ 0 ĉt

If we consider this system as ẋt = Axt , we could “decouple” the system using the eigen decomposition
A = P ΛP −1 . Thus the system becomes dt d
P −1 xt = ΛP −1 xt . The solution is xt = P eΛt P −1 x0 .17

18 Optimal taxation
18.1 The Ramsey model
In our models so far, we have only had two types of agents: households and firms. When we have considered
government spending, we have always specified it entirely in terms of an exogenous stream {gt }t that is
taken from households. Because the revenue requirement was exogenous, and because the government only
had a single instrument by which to collect it—lump-sum, or “head”—taxes, there was no flexibility in the
government’s behavior and no reason to treat it as an agent in the model.
If we relax either or both of these restrictions—allowing the government to choose the level of revenue
collected each period and/or the means of collecting it—we move to a class of models called “optimal taxation”
or “Ramsey” models. How does the government make its taxation choices in these models? We will generally
d
17 To see this, first note that the system is dt x̃(t) = Λx̃(t), where we define x̃(t) ≡ P −1 x(t). This gives separable differential
R R 
equations in each element i ∈ {1, 2} of x̃ of the form dx̃i (t)/x̃i (t) = Λi dt =⇒ dx̃i (t)/x̃i (t) = Λi dt =⇒ log x̃i (t) = tΛi +

log x̃i (0) , where the constant of integration is pinned down by the initial condition. Finally, this implies x̃i (t) = exp(tΛi ) · x̃i (0),
or P −1 xi (t) = exp(tΛi )P −1 xi (0) =⇒ xi (t) = P exp(tΛi )P −1 xi (0).

42
assume that the government’s objective is to maximize household welfare, noting that the government knows
households will maximize their own welfare subject to tax policy. A stylized way of describing this is as
follows:  
max max U (c, τ ) (21)
τ s.t. ... c

where τ are the government’s tax policies, c are the representative household’s choices, and there is some
constraint on the outer maximization (i.e., the government’s) of the form g(τ, c∗ (τ )) ≥ 0, designed to capture
the fact that the government must raise enough money to achieve some exogenous goal.18 A solution gives a
“Ramsey equilbrium”: allocations, prices, and taxes such that
1. Households maximize utility subject to their budget constraints, taking prices and taxes as given,
2. Government maximizes households’ utility while financing government expenditures (i.e., meeting its
budget constraint or constraints),
3. Markets clear, and
4. The economy’s resource constraint is satisfied.
Consider an example: an economy with a representative household and the government. In this economy
there is no capital; households can produce a perishable consumption good with technology f (n) = n, and
they can invest in government bonds. Further suppose that the government must raise (exogenous) gt each
period, which it can do through an income tax or through government debt.
We can write the household problem as

X
max β t u(ct , nt )
{ct ,nt ,bt+1 }
t=0
s.t. ct + qt bt+1 = nt (1 − τt ) + bt , ∀t;
b0 given.

The household’s resource uses are consumption and purchase of bonds; its sources are production (f (nt ) = nt )
net of taxes (τt nt ) and bond coupons. The government’s resource sources are bond sales and tax revenue,
and its uses are bond repayment and government spending; thus the government budget constraints are

qt bt+1 + τt nt = bt + gt , ∀t. (22)

Finally, it is useful to write the economy’s resource constraint (which must hold by the combination of
household and government budget constraints):

ct + gt = nt . (23)

So what does the government do? Following the approach suggested by equation 21, it could
1. Find the household optimum as a function of tax rates ~τ ≡ {τt }t and bond prices ~q ≡ {q}t . Setting up
and solving the household problem gives the following intra- and intereulers: for all t,

(1 − τt ) · uc (ct , nt ) = −un (ct , nt ), and (24)


qt · uc (ct , nt ) = βuc (ct+1 , nt+1 ). (25)

These help pin down the optimal c∗t (~τ , ~q), n∗t (~τ , ~q) t ; the household budget constraints then allow us

 ∗
to find the bond holdings bt+1 (~τ , ~q) t .
18 Note that τ captures all tax policies (in each period, for each instrument available to the government), c captures all

household choices (e.g., consumption, hours, assets, and/or capital), and the government’s budget constraint could potentially
be imposing period-by-period revenue requirements.

43
2. Choose ~τ and ~q to maximize
 
U ∗ (~τ , ~q) ≡ U c∗t (~τ , ~q), n∗t (~τ , ~q) t ,

subject to the government’s budget constraints,

qt · b∗t+1 (~τ , ~q) + τt · n∗t (~τ , ~q) = b∗t (~τ , ~q) + gt .

18.2 The Primal approach


The solution procedure described above turns out typically to form a very difficult problem. An alternative
technique, called the “primal approach,” is usually much easier to solve. In the primal approach, the
government optimizes not over taxes, but rather over allocations, subject to a constraint that the allocations
it chooses are optimal for the household under some tax regime. Roughly, this is equivalent to

max U (c),
c s.t. ...

where the government is constrained to choose a c that are the household’s optimal choice (c = c∗ (τ )) for
some τ satisfying the government’s budget constraint g(τ, c∗ (τ )) ≥ 0. This is called the “implementability
constraint.”
Proceeding with our example from above, which allocations {ct , nt , bt+1 }t are consistent with household
optimization for some tax rate and bond prices satisfying the government’s budget constraint? The intereuler
(equation 25) allows us to pin down bond prices in terms of allocations:
uc (ct+1 , nt+1 )
qt = β .
uc (ct , nt )
With the government budget constraint (equation 22), this lets us pin down the tax rates in terms of
allocations.
bt + gt − qt bt+1
τt =
nt
bt + gt − β uc (ct+1 ,nt+1 )
uc (ct ,nt ) bt+1
= . (26)
nt
Plugging this into the intraeuler (equation 24) gives

bt + gt − β uc (ct+1 ,nt+1 )
uc (ct ,nt ) bt+1 un (ct , nt )
1− =− ,
nt uc (ct , nt )

where either c or n (here c) can be eliminated using the economy’s resource constraint (equation 23):

−gt+1 ,nt+1 )
bt + gt − β uc (nut+1
c (nt −gt ,nt )
bt+1 un (nt − gt , nt )
1− =− .
nt uc (nt − gt , nt )
Thus we can write the government’s optimal taxation problem as

X
max β t u(nt − gt , nt )
{nt ,bt+1 }
t=0
uc (nt+1 −gt+1 ,nt+1 )
bt + gt − β uc (nt −gt ,nt ) bt+1 un (nt − gt , nt )
s.t. 1− + = 0.
nt uc (nt − gt , nt )
| {z }
≡η(nt ,gt ,bt ,nt+1 ,gt+1 ,bt+1 )

44
Although it looks ugly, this “primal” problem is actually straightforward to solve!19
In summary, the primal approach uses the “solution” to the household problem, the government’s budget
constraint, and the economy’s resource constraint to eliminate taxes (and prices) from the government’s
problem. Instead, the government faces an implementability constraint (here, η(nt , gt , bt , nt+1 , gt+1 , bt+1 ) = 0
for all t). Maximizing over allocations should then be relatively easy, and the implementability constraint
ensures that the allocations are consistent with some taxes that satisfy the government’s budget constraint.
The last step is to solve for the taxes; here we would use equation 26.

19 Introducing uncertainty
Let S be the set of events that could occur in any given period. For now, we will assume that S has finitely
many elements, but this is mostly just for notational convenience; the intuition and most results go through if
the shock space is countably infinite or continuous. Note that we are also assuming discrete time; considering
uncertainty with continuous time makes things significantly more complex, and we will not do so this quarter.
In the simplest possible example, suppose that we flip a coin in each period t ≥ 1. Letting st denote the
realization of the event at time t, we have

st ∈ S = {H, T }, ∀t ∈ {1, 2, . . . }.

Since st is a random variable for each t, we have a stochastic process:


Definition 18. A stochastic process is a sequence {st }t of random variables, ordered by an integer t
(which we will think of as time).

Suppose we want to characterize the history of this stochastic process through period t; we denote this
object st . For example, after three periods, we may have observed s3 = (H, H, T ). Generally,

st ≡ (s1 , s2 , . . . , st ) ∈ S × · · · × S = S t , ∀t.

This notation can be confusing; a good mnemonic is to use the notational consistency in “st ∈ S t ” to help
remember that superscripts represent histories.
Any stochastic variable in a model—whether exogenous or endogenous—can only depend on the uncertainty
that has already been revealed. That is, if some variable ct is “determined” at time t, that means that
knowing the history st must give us enough information to pin down ct . We often use the notation ct (st ) to
indicate explicitly that ct depends on the resolution of uncertainty in the first t periods.
For example, suppose that we make the following bet: I flip a coin each of two days; you pay me $5 today,
and I give you back $11 tomorrow if my coin flips came up the same both days.20 Your incomes in the periods
are given by the random variables y1 (s1 ) and y2 (s2 ), where

y1 (H) = −5, y2 (H, H) = 11,

y1 (T ) = −5; y2 (H, T ) = 0,

y2 (T, H) = 0,

y2 (T, T ) = 11.
19 If you would like a good exercise that allows you to take this example further, consider the following quasilinear felicity

function: u(c, n) = c − a(n) for a strictly convex function a(·). Show that the interest rate on government bonds is 1/qt = β −1 ,
and that no matter the (deterministic) sequence of {gt }t , there is an optimum with tax rates and labor constant over time. Note
that these results are specific to this utility function, and require that the government be able to credibly commit in advance to
a specific tax plan {τt }t .
20 Suppose I toss a fair coin (and will not run off with your money). Would you take this bet? Your answer will depend on

your risk aversion and your discount rate.

45
19.1 Probability
We denote the (unconditional) probability of observing any particular history st ∈ S t in periods 1, . . . , t by
Pr(st ). (The notation π(·) is also common.) For Pr(·) to be a well-defined probability measure, we require
that Pr(st ) ≥ 0 for all st ∈ S t , and that X
Pr(st ) = 1.
st ∈S t

In our (fair) coin-tossing example, we have Pr(st ) = 2−t for all t and st ∈ S t . Often, we will only allow the
first case, where st remains possible given sτ (i.e., sτ comprises the first τ elements of st ).
We also consider the probability of observing a particular history st conditional on having already observed
history sτ for τ ≤ t. We denote this probability Pr(st |sτ ). Returning to coin tossing,
(
t τ 2−(t−τ ) , if (st1 , st2 , . . . , stτ ) = sτ ;
Pr(s |s ) =
0, otherwise

for all t, τ ≤ t, st ∈ S t , and sτ ∈ S τ .


It will often be convenient to consider an additional period, t = 0, in which there is no uncertainty. To
keep our notation consistent, we use the symbols s0 and s0 to represent the (non-stochastic) state at t = 0,
and extend the definitions of our probability functions as follows:

Pr(s0 ) ≡ 1;
Pr(st |s0 ) ≡ Pr(st ), ∀t ≥ 0.

Using this second extension, we can specify the full probability structure with just the conditional probability
functions Pr(·|·).21

19.2 Utility functions


Consider an agent whose preferences are time-separable, have exponential discounting, and admit a
von Neumann-Morgenstern representation.22 Her preference can be represented by
X X
β t u ct (st ) Pr(st )

U (c) =
t st ∈S t
X 
t
=E β u(ct ) .
t

We rely here on the linearity of the expectation operator; we will often do so.

20 Markov chains
Although the general structure laid out above will be useful, we will often be willing to impose more structure
on the probability structure. The most common structure we will assume is that {st }t forms a Markov chain.
21 It is a bit cumbersome to have period zero be “different” from all other periods, and we will not always do so. However,

it offers two advantages. The first is that having period zero be non-stochastic allows st ∈ S t , rather than st ∈ S t+1 . More
importantly, having a non-stochastic state allows us to write unconditional probabilities as conditional probabilities, since we
can reference/condition on a history (s0 ) that is entirely uninformative.
22 Preferences admit a von Neumann-Morgenstern representation if and only if they satisfy:

1. Continuity: For any x, x0 , x00 with x % x0 % x00 , there exists α ∈ [0, 1] such that αx + (1 − α)x00 ∼ x0 ;
2. Independence: For any x, x0 , x00 and α ∈ [0, 1], we have x % x0 ⇐⇒ αx + (1 − α)x00 % αx0 + (1 − α)x00 ; and
3. A “sure thing principle”
(in addition to the usual completeness and transitivity). Extensive discussion of these points is conducted in Economics 202.

46
Definition 19. A stochastic process {st }t satisfies the Markov property (or is a Markov chain) if for
all t1 < t2 < · · · < tn ,
Pr(stn = s|stn−1 , stn−2 , . . . , st1 ) = Pr(stn = s|stn−1 ).
Although the notation looks odious, the intuition is not bad. A Markov process is one where if one has
information about several realizations (stn−1 , . . . , st1 ), only the latest realization (stn−1 ) is useful in helping
predict the future.
Note that by the definition of conditional probability (and the trivial fact that Pr(st |st−1 ) = Pr(st |st−1 )),

Pr(st ) = Pr(st |st−1 ) · Pr(st−1 )


= Pr(st |st−1 ) · Pr(st−1 |st−2 ) · Pr(st−2 )
..
.
= Pr(st |st−1 ) · Pr(st−1 |st−2 ) · · · Pr(s1 |s0 ).

Fortunately, the Markov property implies that Pr(st |st−1 ) = Pr(st |st−1 ), so for a Markov chain

= Pr(st |st−1 ) · Pr(st−1 |st−2 ) · · · Pr(s1 |s0 )


t−1
Y
= Pr(sj+1 |sj ).
j=0

Similarly, for a Markov chain,

t−1
Y
Pr(st |sτ ) = Pr(sj+1 |sj ).
j=τ

for all t, τ ≤ t, st ∈ S t , and sτ ∈ S τ with sτ = (st1 , st2 , . . . , stτ ). Thus we can specify the entire probability
structure of a Markov chain if we know all of its “transition probabilities” Pr(st+1 |st ). In general, these
transition probabilities may depend on the time t. However, this may not be the case, in which case our
Markov chain is time-invariant.
Definition 20. A time-invariant Markov chain is a stochastic process satisfying the Markov property
and for which
Pr(st+1 = j|st = i) = Pr(st+2 = j|st+1 = i)
for all t and (i, j) ∈ S 2 . This implies that the transition probabilities are constant over time (by induction).
For a time-invariant Markov chain, we can summarize these transition probabilities in a transition matrix.
Suppose without loss of generality that the shock space S = {1, 2, . . . , m}. Then we define the transition
matrix P by
Pij = Pr(st+1 = j|st = i)
for every
P i and j (and any t: by time-invariance, the choice does not matter). Every row of P must add to
one ( j Pij = 1 for all i), which means that P can be called a “stochastic matrix.”
Note that an iid process (i.e., one for which the realization is distributed independently and identically
across periods) is a time-invariant Markov chain, and will have a transition matrix where every row is
identical.

20.1 Unconditional distributions


Suppose we have a time-invariant Markov chain and—contravening our earlier notation—also
P allow s0 to be
stochastic. In particular, let π0 be a vector whose ith element is Pr(s0 = i). Clearly, i π0 i = 1.

47
Can we say anything about an analogous vector π1 characterizing the (unconditional) probability distri-
bution of s1 ∈ S? That is, we seek a vector with elements given by π1 i = Pr(s1 = i). By the law of total
probability, X
π1 i = Pr(s1 = i) = Pr(s1 = i|s0 = j) · Pr(s0 = j) .
| {z } | {z }
j
=Pji =π0 i

This looks like the matrix multiplication algorithm, and indeed is equivalent to stating that π1 = P 0 π0 .
Similarly, for any t,
πt+1 = P 0 πt or equivalently 0
πt+1 = πt0 P, (27)

and induction gives that

πt = (P 0 )t π0 πt0 = π00 P t .

20.2 Conditional distributions


Suppose that we are in state i today (i.e., event i occurred); what is the probability that we will be state j
tomorrow? This is Pij . But what is the probability we will be in state j in two days? By the law of total
probability,
X
Pr(st+2 = j|st = i) = Pr(st+2 = j|st+1 = k, st = i) · Pr(st+1 = k|st = i).
k

The first probability on the right-hand side can be simplified by the Markov property:
X
= Pr(st+2 = j|st+1 = k) · Pr(st+1 = k|st = i)
| {z } | {z }
k
=Pkj =Pik
2
= (P )ij .

It is not hard to see (or show) that more generally, for τ ≤ t,

Pr(st = j|sτ = i) = (P t−τ )ij .

20.3 Stationary distributions


An unconditional distribution Pr(st ) is said to be stationary if it gives the same (unconditional) distribution
across states in the following period; by induction, this ensures that the distribution will be the same in
all subsequent periods for a time-invariant Markov chain. Considering the vector representation of an
unconditional distribution (πt i = Pr(st = i)), πt is a stationary distribution (or invariant distribution) if
πt+1 = πt . Per the law of motion we stated in equation 27, this requires πt = π satisfying
P 0π = π
(P 0 − I)π = 0.
This should look familiar; we saw something very similar in equation 9. It means that a stationary distribution
is any eigenvector associated with a unitary eigenvalue of P 0 , normalized so that the sum of the elements of
the eigenvector is one. The fact that P is a stochastic matrix (i.e., one whose rows add to one) ensures that
it will have at least one unitary eigenvalue (although it may have more than one).
You should be able intuitively to identify the stationary distribution(s) for an iid process. What about
the process with transition matrix P = I? Or
 
0 1
P = ? (28)
1 0

48
20.4 Ergodic distributions
Consider starting a Markov chain with initial distribution π0 , and “running the process forward” arbitarily.
It seems the distribution across states should be given by π∞ (π0 ) ≡ limt→∞ πt = limt→∞ (P 0 )t π0 . We can
call this the “limiting distribution,” but should check first whether this limit is even well defined! It turns
out that it may or may not be. The clearest illustrations come from considering P = I (in which case the
limiting distribution is the initial distribution), and the process with transition matrix as given in equation 28
(which has no limiting distribution unless π0 = [ 12 , 12 ]0 ).
Note that any limiting distribution must be stationary.
We will put aside the question of whether a Markov chain has a limiting distribution; suppose for now
that it does. In fact, for some Markov chains, the limiting distribution not only exists, but does not depend
on the initial distribution.
Definition 21. A (time-invariant) Markov chain is said to be asymptotically stationary with a unique
invariant distribution if all initial distributions yield the same limiting distribution; i.e., π∞ (π0 ) = π∞
for all π0 . This limiting distribution, π∞ , is called the ergodic distribution of the Markov chain, and it is
the only stationary distribution of the Markov chain.

We state without proof several important results about asymptotically stationary Markov chains with
unique invariant distributions.
Theorem 22. Let P be a stochastic matrix with Pij > 0 for all i and j. Then P is asymptotically stationary,
and has a unique invariant distribution.
Theorem 23. Let P be a stochastic matrix with (P m )ij > 0 for all i and j for some m ≥ 1. Then P is
asymptotically stationary, and has a unique invariant distribution.
This means that as long as there is a strictly positive probability of moving from any particular state
today to any particular state in one or more steps,23 then an ergodic distribution exists. Thus if we can find
an m for which (P m )ij > 0 for all i and j, we can

1. Note that P must therefore have an ergodic distribution, and that it must be the unique stationary
distribution of P , and
2. Solve (P 0 − I)π∞ = 0 for this unique distribution.

21 Risk-sharing properties of competitive markets


Now that we are considering stochastic models, we can ask how agents react in the face of risk. We should not
be surprised that when risk-averse agents have access to assets that allow them to insure against uncertainty,
they use them. It turns out that when markets are “complete”—that is, agents can trade contingent
securities for every state of the world—agents will “perfectly insure,” subjecting themselves to no individual
(idiosyncratic) risk. There may still be aggregate risk: if the economy is closed and a hurricane strikes
everyone, the resource constraint ensures that everyone eats less. However, as long as the aggregate resources
available in the economy are non-stochastic (which will often be ensured by a law of large numbers), perfect
insurance allows everyone to avoid uncertainty in consumption.
Consider an economy with agents who have identical preferences represented by

∞  X
cit (st ) E0 β t u cit (st ) .
  
U t=0
=
t=0
23 This is an example of a “mixing condition.” Analogous conditions exist with infinite state spaces, although they are harder

to write.

49
Suppose that agent i receives a stochastic endowment {yti (st )}∞t=0 of the (perishable) consumption good. The
only assets available to agents are a complete set of state-contingent securities traded at time 0, which (in
equilibrium) are priced at qt (st ).24 Agent i’s problem is therefore to
∞ X
X
β t u cit (st ) Pr(st )

max
i
s.t.
{ct (st )}
t=0 st ∈S t
∞ X
X
qt (st ) yti (st ) − cit (st ) ≥ 0.
 
t=0 st ∈S t

Setting up the Lagrangian


∞ X h
X i
β t u cit (st ) Pr(st ) + µi qt (st ) yti (st ) − cit (st )
 
L≡
t=0 st ∈S t

gives first-order conditions of the form


µi qt (st ) = β t u0 cit (st ) Pr(st ).


Dividing the FOCs of two two agents, we have


u0 cit (st )

µi
=
µj u0 cjt (st )

 i 
0 −1 µ 0 j t 
(u ) u ct (s ) = cit (st ).
µj
i t
yti (st ), gives
P P
Summing across agents i and noting that the economy’s resource constraint, i ct (s ) = i
 i  X
X µ 0 j t 
(u0 )−1 j
u ct (s ) = cit (st )
i
µ i
X
= yti (st ) . (29)
i
| {z }
≡ȳ(st )

Although ugly, this equality tells us something important: consumption of each agent cjt (st ) depends on the
state st only through the realization of the aggregate endowment ȳ(st ).
If we are prepared to impose a functional form on u(·), we can go a bit further. Suppose for example that
c1−σ − 1
u(c) = ,
1−σ
u0 (c) = c−σ ,
(u0 )−1 (x) = x− /σ .
1

Then equation 29 becomes


X  µi −1/σ
−σ
cjt (st ) = ȳ(st )
i
µj
1/σ
µj
cjt (st ) = ȳ(s ) · P t
−1/σ .
i µi
With complete markets and power utility, each agent’s consumption is a constant fraction of the economy’s
aggregate endowment.
24 Although we solve the Arrow-Debreu problem, identical results obtain with complete sequential markets; there, agents trade

each period in contracts that pay off in the subsequent period depending on what state occurs.

50
22 Perfect and imperfect insurance practice question
This question comes from the Economics 211 midterm examination in 2007. It is based on a model of
Doireann Fitzgerald’s from 2006.

Question
There is just one period. Suppose there are two countries: A and B. Country A receives an endowment of
yA (s) units of the consumption good as a function of the state of the world s. Country B receives yB (s). Let
π(s) > 0 denote the probabilities of the state of the world s.
The representative agent in each country has a utility function given by
X
π(s) log c(s),
s

where c(s) denotes consumption by the country in state s.


Suppose that the consumption good can be transported at no cost from one country to another.

1. Set up the Pareto problem, letting λ be the planner’s weight on Country B. Show that in a Pareto
optimal allocation, each country consumes a constant fraction of the total endowment yA (s) + yB (s).
2. Suppose that before the state of the world is realized, the countries can trade claims on consumption in a
complete asset market. Assume that initially, each country owns the claim on its stochastic endowment.
Solve for a competitive equilibrium and show that it is Pareto optimal.

Suppose now that the consumption good is costly to ship across countries. In particular, for a unit of
consumption good to arrive in Country B from Country A, 1 + t units of consumption have to be shipped
from Country A (and similarly from Country B to A), where t > 0. We can think of t as capturing a piece of
the consumption good that melts away due to transportation costs.25 Suppose also that there are only two
states of nature s ∈ {s1 , s2 } and that the endowments are
(
1, for s = s1
yA (s) =
0, for s = s2

and yB (s) = 1 − yA (s). So yA (s) + yB (s) is constant. Let π(s1 ) = 1/2.


3. Set up the Pareto problem (let λ be the planner’s weight on Country B), and write down the first-order
conditions. Show that in a Pareto allocation, the consumption of a country will now depend on its
endowment realization, that is, Country A consumption depends on yA (s) and not only on yA (s) + yB (s).
How does t affect the sensitivity of consumption to income?
4. What does part 3 imply for tests of complete markets across countries?

Solution
1. The Pareto problem is to
X X
max (1 − λ) π(s) log cA (s) + λ π(s) log cB (s) s.t.
{cA (s),cB (s)}s∈S
s∈S s∈S
cA (s) + cB (s) ≤ yA (s) + yB (s), ∀s ∈ S.
| {z }
≡ȳ(s)

25 Indeed, this form of transportation costs are often called “iceberg costs,” although melting is only one of the analogies that

has been used to explain the name.

51
Setting up the Lagrangian
X  
L≡ π(s) (1 − λ) log cA (s) + λ log ȳ(s) − cA (s)
| {z }
s∈S
=cB (s)

gives first-order conditions of the form


1−λ λ
= .
cA (s) cB (s)

Plugging cB (s) = cA (s) · λ/(1−λ) into the resource constraint gives

cA (s) = (1 − λ) · ȳ(s),
cB (s) = λ · ȳ(s).

2. Let the price of a claim on consumption in state s be q(s). An equilibrium is prices {q(s)}s∈S and
allocations {cA (s), cB (s)}s∈S such that the markets clear (cA (s) + cB (s) = yA (s) + yB (s) for all s ∈ S),
and {ci (s)}s∈S solves the problem of country i ∈ {A, B} given {q(s)}s∈S .
The problem in country i is to
X
max π(s) log ci (s) s.t.
{ci (s)}s∈S
s∈S
X  
q(s) yi (s) − ci (s) ≥ 0. (30)
s∈S

Setting up the Lagrangian


Xh  i
L≡ π(s) log ci (s) + µi q(s) yi (s) − ci (s)
s∈S

gives first-order conditions of the form

π(s)
= µi q(s). (31)
ci (s)

Dividing A and B’s first-order conditions gives

cA (s) µB
= .
cB (s) µA

Plugging cB (s) = cA (s) · µA/µB into the resource constraint gives


µB
cA (s) = ȳ(s),
µA + µB
µA
cB (s) = ȳ(s).
µA + µB
As these are constant fractions of the total endowment, these allocations are Pareto optimal.
Substituting into A’s first-order condition (equation 31) gives
µA µB µA + µB
π(s) = ȳ(s)q(s) =⇒ q(s) = π(s) ȳ(s)−1 .
µA + µB µA µB

52
(This is an example of a more general result show in class: q(st |s0 ) ∝ β t u0 (c(st )) Pr(st ).) We are allowed
to normalize one price, which we can do by requiring that (µA +µB )/(µA µB ) = 1. This gives us equilibrium
prices
π(s)
q(s) = .
ȳ(s)
Substituting into A’s (binding) budget constraint (equation 30):
X π(s)  µB

yA (s) − ȳ(s) = 0
ȳ(s) µ +µ
s∈S | {z } | A {zB }
=q(s) =cA (s)
X yA (s) µB X ȳ(s)

π(s) =  .
π(s)
ȳ(s) µA + µB   ȳ(s)
s∈S s∈S

This allows us to pin down consumption:
X 
yA (s)
cA (s) = π(s) ȳ(s),
ȳ(s)
s∈S
X 
yB (s)
cB (s) = π(s) ȳ(s).
ȳ(s)
s∈S

3. The Pareto problem is to


X X
max (1 − λ) 1/2 log c (s)
A +λ 1/2 log c (s)
B s.t.
{cA (s),cB (s)}s∈S
s∈S s∈S
cA (s1 ) + (1 + t)cB (s1 ) ≤ 1,
(1 + t)cA (s2 ) + cB (s2 ) ≤ 1.
Setting up the Lagrangian
 
1 − cA (s1 ) 
2L ≡ (1 − λ) log cA (s1 ) + λ log + (1 − λ) log cA (s2 ) + λ log 1 − (1 + t)cA (s2 )
1+t | {z }
| {z } =c (s ) B 2
=cB (s1 )

gives first-order conditions


1−λ λ λ
= =⇒ (1 + t)cB (s1 ) = cA (s1 );
cA (s1 ) (1 + t)cB (s1 ) 1−λ
1−λ (1 + t)λ 1−λ
= =⇒ (1 + t)cA (s2 ) = cB (s2 ).
cA (s2 ) cB (s) λ
Substituting into the resource constraints gives
λ
cA (s1 ) = 1 − λ, cB (s1 ) = ;
1+t
1−λ
cA (s2 ) = , cB (s2 ) = λ.
1+t
Thus cA (s1 ) > cA (s2 ) and cB (s1 ) < cB (s2 ), even though the aggregate endowment is constant across
the two states. Hence consumption depends on one’s own endowment realization.
4. Suppose that the First Welfare Theorem applies, so that equilibrium is Pareto optimal. Then by part 3,
we should not expect to find independence of a country’s consumption from its own endowment, even if
markets are complete. Hence the fact that empirically, consumption does vary with endowment is not
by itself sufficient evidence to reject the complete markets hypothesis.

53
23 Asset pricing with complete markets
For our purposes, the term “asset” refers to a contractually-guaranteed right to delivery of consumption
goods, with the amount of delivery conditional on the history of the world st . The notation can be a little bit
tricky, but there are three main ways that we denote the prices of assets:
1. q τ (st ): This is the price of an asset that delivers one unit of consumption at history st , where the price
is paid in history-sτ consumption goods (for τ ≤ t, and assuming that sτ1 = st1 , . . . , sττ = stτ ). This
notation captures the price of Arrow-Debreu securities, q 0 (st ); these are actually sufficient to pin down
all q τ (st ) according to
q 0 (st )
q τ (st ) = 0 τ .
q (s )
The Arrow-Debreu prices are pinned down (up to a normalization) by any agent’s consumption, since
the first-order conditions of the consumer problem
X 
t t

max
t
E β u ct (s ) s.t.
{{ct (s )}st ∈S t }t
t
X X
q 0 (st )ct (st ) ≤ B
t st ∈S t

are of the form


q 0 (st ) = λ−1 β t u0 ct (st ) π(st ).


Thus
λ−1 β t u0 ct (st ) π(st )

q 0 (st )
q τ (st ) = = 
q 0 (sτ ) λ−1 β τ u0 cτ (sτ ) π(sτ )
0 t

t−τ u ct (s ) 
=β π(st |sτ ). (32)
u0 cτ (sτ )

2. p0 (sτ ): This is the price of an asset that delivers d(st ) units of consumption at every history st for t ≥ τ
if history sτ is achieved, where the price is paid in time-zero consumption goods. This is a “redundant”
asset; i.e., it could be created with a suitable combination of Arrow-Debreu securities, which determines
the asset’s price:
XX
p0 (sτ ) = q 0 (st )d(st ).
t≥τ st |sτ

In the simplest case, where sτ = s0 ≡ s0 , the asset delivers d(st ) at every st ; the price (paid in time-zero
consumption goods) is
XX
p0 (s0 ) = q 0 (st )d(st ).
t st

3. pτ (sτ ): This is the price of an asset that delivers d(st ) units of consumption at every history st for t ≥ τ
if history sτ is achieved, where the price is paid in history-sτ consumption goods. (This is sometimes
called the price of the “tail asset.”) To convert from a price measured in time-zero consumption goods,
we must divide by the time-zero price of history-sτ consumption goods:
p0 (sτ ) X X q 0 (st )
pτ (sτ ) = = d(st ). (33)
q 0 (sτ ) t τ
q 0 (sτ )
t≥τ s |s | {z }
=q τ (st )

54
Suppose that d(·) is such that d(st ) = 0 for all t 6= τ + 1; that is, the asset can only pay off in the
period after it is “purchased.” By equations 32 and 33,
X
pτ (sτ ) = q τ (sτ +1 )d(sτ +1 )
sτ +1

u0 cτ +1 (sτ +1 )

X
= β  π(sτ +1 |sτ )d(sτ +1 )
u 0 c (sτ )
sτ +1 τ
 0 
u cτ +1 (sτ +1 )

τ +1
= Eτ β  d(s ) ,
u0 cτ (sτ )
| {z }
≡mτ +1 (sτ +1 )

where mτ +1 is called the stochastic discount factor (note that it sort of tells us how much the consumer
discounts consumption payouts in sτ +1 when valuing them in units of history-sτ consumption). This
also gives an expression that functions as a stochastic intereuler: defining the (stochastic) return on the
τ +1
asset as Rτ +1 (sτ +1 ) ≡ d(s )/pτ (sτ ),
 
1 = Eτ mτ +1 Rτ +1 .

We can also apply the law of iterated expectations to ensure that an analogous result holds with
unconditional expectations:
 
1 = E mτ +1 Rτ +1 .

24 Introducing incomplete markets


Thus far, we have discussed models where agents can trade contingent claims for consumption at every
possible history of the world. There are important insights to be gleaned from such models, but they also
make strong predictions about perfect insurance that clearly don’t hold up in the real world. We therefore
start to consider economies where only a limited set of assets trade.
There are a number of reasons we might expect markets to be incomplete. One possibility is that the
state of future states is so large and complex that either bounded rationality or transaction costs interfere
with the issuance of a full set of contingent contracts.
Another potential problem is that people may not actually keep their promises. (We touched on a related
problem in our discussion of time-inconsistency in optimal taxation.) An agent may commit to pay, and then
fail to. She might commit to reveal some private information, but lack a means to prove that her revelation
is honest. Or she might commit to take some private action, but lack a means to prove that she is actually
taking it.
In the simplest models we discuss, we will exogenously impose a particular form of incompletness on
markets. Typically, we will only allow consumers to trade risk-free securities. Although we won’t explicitly
discuss why we make these impositions, you should typically have stories about transaction costs, bounded
rationality, or commitment issues (including private information and hidden actions) in mind.
Incompleteness of markets can introduce significant complications to our analysis. For example,
• There may no longer be an equivalence between sequential and date-zero trading. We need to solve the
economy as it actually exists.
• Representative agents may no longer exist.

55
• When developing a recursive formulation for agents’ problems, the state space is typically much more
complicated. Perfect insurance under complete markets means that we often need only include, for
example aggregate asset holdings and today’s shock realization. With incomplete markets, asset holdings
vary across agents, so we need to track the full distribution.

25 Econometrics of incomplete markets practice question


This question comes from the Economics 211 midterm examination in 2007.

Question
Suppose an agent consumes two goods every period: bananas and newspapers. Let the per-period utility
function given consumption of bananas cb and consumption of newspapers cn be given by u(cb , cn ). Let us
assume that this utility satisfies all standard properties (strictly concave, strictly increasing, and differentiable).
The agent lives for two periods. Let p1 and p2 (s) denote the price of newspapers in units of bananas in
periods 1 and 2, respectively, where s represents a stochastic state of the world that realizes in period 2. The
agent receives an endowment in period 1 equal to y1 units of bananas, and receives y2 (s) units of bananas in
period 2. The probabilities of states of the world are denoted by π(s) > 0. The agent maximizes expected
discounted utility, where the discount factor is given by β.
Suppose the agent can save in a riskless bond that returns R units of bananas in period 2 irrespective of
the state of the world. Suppose that the agent cannot borrow. Assume for the questions that follow that the
solution to the agent problem is interior—i.e., the borrowing constraint does not bind.
Suppose that utility is separable between bananas and newspapers and takes the following form:

c1−γ b
c1−γn
u(cb , cn ) = b
+ n .
1 − γb 1 − γn
1. Show that the following equation holds:
  −ρ 
cb2
βE R =1 (34)
cb1

for some ρ, where cb1 and cb2 are consumption of bananas in periods 1 and 2, respectively.

Suppose that an econometrician observes the interest rate on the risk-free bond, but only has information
about the number of bananas consumed by the agent in periods 1 and 2. She does not observe the agent’s
endowment, his consumption of newspapers, nor the price of newspapers (p1 and p2 ).
2. Assuming that the econometrician observes several of these agents at different points in time (with
possibly different risk-free interest rates), can she estimate γb ? What about γn ? Ignoring non-linearities,
write down the regression that could be used.
Suppose for the rest of the question that the utility is non-separable and takes the following form:

(cα 1−α 1−γ


b cn )
u(cb , cn ) = (35)
1−γ

for some α ∈ (0, 1).

3. Show that the econometrician in part 2 can estimate γ as long as p2 (s) = p1 (the price of newspapers is
constant) by estimating the same Euler equation as in part 2. Can she estimate α?

56
Solution
1. The consumer’s problem is to
X 
max u(cb1 , cn1 ) + β π(s)u u(cb2 (s), cn2 (s)) s.t.
cb1 ,cn1 ,{cb2 (s),cn2 (s)}s∈S ,a
s∈S
cb1 + p1 cn1 + a ≤ y1 ;
cb2 (s) + p2 (s)cn2 (s) ≤ y2 (s) + Ra, ∀s;
a ≥ 0.

Setting up the Lagrangian (assuming that a ≥ 0 does not bind, and substituting in for cb1 and cb2 (s)
using the budget conditions)
X 
L ≡ u(y1 − p1 cn1 − a, cn1 ) + β π(s)u y2 (s) + Ra − p2 (s)cn2 (s), cn2 (s)
s∈S

gives first-order conditions of the form

p1 u1 (cb1 , cn1 ) = u2 (cb1 , cn1 )


 
p2 u1 cb2 (s), cn2 (s) = u2 cb2 (s), cn2 (s)
 
u1 (cb1 , cn1 ) = βRE u1 (cb2 , cn2 ) .

Noting that the functional form of u gives u1 (cb , cn ) = c−γ


b
b
, the banana intereuler becomes

c−γ b −γb
b1 = βRE[cb2 ]
  −γb 
cb2
1 = βE R .
cb1

2. Define ε by
 −γb
cb2
βR ≡ eε ;
cb1

therefore, E[eε ] = 1. Taking logs,


 
cb2
log β + log R − γb log = ε,
cb1
 
cb2
log R = − log β + γb log + ε.
cb1

Thus the econometrician can estimate γ̂b using OLS, since eE[ε] ≈ E[eε ] = 1 =⇒ E[ε] ≈ 0.
The intereuler for newspaper consumption is
  −γn 
p1 cn2
1 = βE R .
p2 cn1

As p1 , p2 (s), and endowments are unobserved, the econometrician cannot recover cn2/cn1 . Hence, she
cannot estimate γn . (Note that I do not know if we can actually prove that γn is not estimable.)

57
3. Using the new functional form of u, the banana intereuler becomes
h i
α(1−γ)−1 (1−α)(1−γ) α(1−γ)−1 (1−α)(1−γ)
αcb1 cn1 = βRE αcb2 cn2
"  (1−α)(1−γ) #
α(1−γ)−1 
cb2 cn2
1 = βRE .
cb1 cn1

Since period-utility is Cobb-Douglas, expenditure on newspapers is a constant fraction of expenditure


on bananas. This implies cb2/cb1 = (p2 cn2 )/(p1 cn1 ), hence
" −γ  (1−α)(1−γ) #
cb2 p1
= βRE .
cb1 p2

Hence, if p1 = p2 , γ can be estimated exactly as in part 2. However, the econometrician cannot estimate
α from this intereuler. (As above, I do not know if we can actually prove that α is not estimable.)

26 Hall’s martingale hypothesis with durables practice question


This question comes from an old Economics 211 problem set. It is not solved in these notes, but is a good
practice problem for you to work through!

Question
Hall (1978) showed that consumption should follow an AR(1) process, and that no other variable known at
time t should influence the expected consumption at time t + 1. Mankiw (1982) generalized Hall’s results for
the case of durable consumption as follows. Suppose consumers have the following preferences over durable
goods and non-durables:
X∞ X
π(st )β t u K(st ) ,

t=0 st ∈S t

where
K(st ) = (1 − δ)K(st−1 ) + c(st ),
K represents the stock of the durable good, and c the acquisition of new durables. Consumers have access to
a riskless bond. Labor income is the only source of uncertainty, and the gross interest rate is constant and
equal to R. The evolution of assets is given by

A(st ) = RA(st−1 ) + y(st ) − c(st ).

1. Show that the first-order condition for optimality implies (ignoring the possibility of a binding borrowing
constraint) that X
u0 K(st ) = βR π(st+1 |st )u0 K(st+1 ) .
 

st+1

2. If u is quadratic, show that

c(st+1 ) = γ1 + α − (1 − δ) K(st ) + ε(st+1 ),




where α = (βR)−1 , E[ε(st+1 )|st ] = 0, and γ1 is a constant.

58
3. If u is quadratic, also show that
αc(st ) = γ2 + α − (1 − δ) K(st ) + (1 − δ)ε(st ),


and hence

c(st+1 ) = γ3 αc(st ) + ε(st+1 ) − (1 − δ)ε(st )


where γ2 and γ3 are constants.
Acquisition of durables therefore follows an ARMA(1, 1) process.
4. Argue that the same result will obtain if non-durable consumption goods are introduced into the model,
with the agent’s utility flow given by a separable function
u(K, z) = uK (K) + uz (z),
where K is the stock of durables and z is the consumption of non-durables.

27 General equilibrium in incomplete markets


27.1 A constant absolute risk aversion example
Consider an economy populated by a large number of ex ante identical agents. That is, each faces the same
stochastic distribution of shocks, but is affected by her shock’s idiosyncratic realization. Each agent has
constant absolute risk aversion; her preferences are represented by
X   
1 X t −γc(st )
U {c(st )}t = E β t u c(st ) = − E
 
β e ,
t
γ t

where st is the history of the consumer’s shocks (not the history of the world).
Each consumer receives an i.i.d., normally-distributed stochastic endowment each period:
y(st ) ∼ N (ȳ, σ 2 ).
Note that by a law of large numbers, there is no aggregate uncertainty. Since there is no “worst” possible
shock, we cannot apply a natural borrowing constraint (σ = −∞); we do need to prevent over-borrowing,
but will do so through an (unstated) no-Ponzi condition. Furthermore, we must allow consumption to be
negative.
The only markets available are for risk-free bonds. The consumer problem is therefore to
X 
t t−1 t t

max
t
E β u Rt−1 a(s ) + y(s ) − a(s ) .
{a(s )}t
t

Given the stationarity of the problem, we can also write this as a functional equation:
V (x) = max u(x − a) + βEV (Ra + y 0 ) ,
 
a

where x is the “cash on hand” after the realization of the stochastic shock.
Earlier in the term, we discussed several techniques for solving problems like this one. We will proceed by
“guessing and verifying;” fortunately, we will start with some very good guesses:
1
V (x) = − e−Âx−B̂ ,
γ
c(x) = Ax + B, and
a(x) = (1 − A)x − B.

59
If these guesses are correct, the value function becomes
1 1 0
− e−Âx−B̂ = − e−γAx−γB +βE −1/γ · e−Â(1−A)Rx+ÂBR−Ây −B̂
 
γ γ | {z }
=V (Ra+y 0 )
| {z } | {z }
=V (x) =u(c)
1 β  0
= − e−γAx−γB − e−Â(1−A)Rx+ÂBR−B̂ E eÂy .
γ γ
0 2
σ 2 /2
Since Ây 0 ∼ N (−Âȳ, Â2 σ 2 ), we have E[eÂy ] = e−Âȳ+Â .

1 β 2 2
= − e−γAx−γB − e−Â(1−A)Rx+ÂBR−B̂−Âȳ+Â σ /2
γ γ
2
σ 2 /2
e−Âx−B̂ = e−γAx−γB + βe−Â(1−A)Rx+ÂBR−B̂−Âȳ+Â
2
σ 2 /2
e−B̂ = e−(γA−Â)x−γB + βe(Â−ÂR+ÂAR)x−B̂−Âȳ+Â .

Since the left-hand side does not depend on x, the right-hand side also must not;26 this implies γA − Â = 0
and  − ÂR + ÂAR = 0, or
R−1
A= ,
R
R−1
 = γ .
R

Thus if we can pin down R, we know A and Â.27


The envelope condition is that

V 0 (x) = u0 c(x)


 −Âx−B̂
e = e−γAx−γB
γ
log A −   − B̂ = 
Âx −Âx − γB



log A + γB = B̂. (36)

Thus if we can pin down R and B, we know A, Â, and B̂.


Finally, we have the first-order condition:

u0 c(x) = βREV 0 Ra(x) + y 0


 
0
e−γAx−γB = βRE Â/γ · e−Â(Ra(x)+y )−B̂
 
0
= βRAe−ÂRa(x)−B̂ E e−Ây

2
σ 2 /2
= βRAe−ÂRa(x)−B̂−Âȳ+Â .

Plugging in a(x) = (1 − A)x − B gives

2
σ 2 /2
= βRAe−ÂR(1−A)x+ÂRB−B̂−Âȳ+Â .
26 Here we act as if the fact that the sum of the right-hand-side terms does not depend on x means that neither term depends

on x.
27 If we want to confirm that our functional form guesses are correct, we should return—after pinning down B and B̂—to
2 2
confirm that e−B̂ = e−γB + βe−B̂−Âȳ+Â σ /2 . We will not do this.

60
Taking logarithms and cancelling terms using equation 36,

−γAx − γB
 = log(βR) + log
  − Âȳ + Â2 σ2/2
A − ÂR(1 − A)x + ÂRB − 

R −1 R−1  1 R−1 R−1 (R − 1)2 σ 2
−γ x = log(βR) −  γ  R x+γ RB − γ ȳ + γ 2 ·
 R R R R R R2 2
log(βR) ȳ R − 1 σ2
B=− + −γ · .
γ(R − 1) R R2 2

Thus we can write A, Â, B, and B̂ as a function of R. Plugging what we know into the consumption equation
gives that
R−1 ȳ γ(R − 1)σ 2 log(βR)
c(x) = Ax + B = x+ − 2
− .
| R {z R} | 2R
{z } γ(R − 1)
| {z }
Permanent income Precautionary savings Slope (?)

Using this expression and a market clearing condition, we could (in theory) pin down the equilibrium interest
rate. One way to do this is to consider the evolution of the agents’ cash-in-hand:

x0 = Ra(x) + y 0
= R (1 − A)x − B + y 0
 

= x − RB + y 0
log(βR) γ(R − 1)σ 2
=x+R + + (y 0 − ȳ).
γ(R − 1) 2R

This is a random walk with innovations (y 0 − ȳ) and a drift:

log(βR) γ(R − 1)σ 2


Ex0 = x + R + .
γ(R − 1) 2R
| {z }
Drift

Note that if βR = 1, the drift is positive: cash-in-hand (and hence assets) diverge in expectation to positive
infinitity. In fact, assets must diverge in expectation (to positive or negative infinity) unless the drift term
equals zero, or
  ∗ 2 2 
1 2 R −1 σ
β = ∗ exp −γ · .
R R∗ 2
The right-hand side is decreasing in R, so no drift implies a unique value of R∗ with βR∗ < 1. It is
straightforward to consider comparative statics of this equilibrium interest rate in terms of β, γ, and σ 2 .
One final note: this economy does not have a steady-state. The distribution of wealth follows a random
walk, which means that its unconditional variance increases without bound over time. We can think of this
as the gap between the richest and poorest agents—who, recall, are ex ante identical—growing endlessly.

27.2 Aiyagari (1994) model


Aiyagari’s model is similar in several respects to the CARA example given above. The following differences
are important:

• The distribution of shock realizations has bounded support. That means we can define a ymin and ymax
with Pr(y 6∈ [ymin , ymax ]) = 0.28 Because there is a “worst” shock, we can define a natural borrowing
28 We will further insist that y
min and ymax be the highest and lowest values, respectively, for which Pr(y 6∈ [ymin , ymax ]) = 0.
This can be written as requiring that for all ε > 0, Pr(y ∈ [ymin , ymin + ε]) > 0 and Pr(y ∈ [ymax − ε, ymax ]) > 0.

61
constraint: an agent can never borrow so much that he can never repay her debt, even if she receives
the worst possible shock from now on; i.e., a(st ) ≥ −φ, where
ymin
φ≡ .
R−1

• The agents’ coefficients of relative risk aversion must be bounded above. In our earlier example
consumers had CARA, which implied increasing relative risk aversion. The result was that as they got
richer, they saved an increasing fraction of their wealth. We were able to keep average asset holdings
bounded, but only for one particular value of R∗ < 1/β ; by bounding the coefficient of relative risk
aversion, we can keep (expected) assets from growing arbitrarily for all R < 1/β . (This also relies on the
existence of an upper bound on possible shocks.)
The model can be more convenient to analyze when values are renormalized as follows in terms of the
borrowing constraint:

Original Renormalized
Assets a â ≡ a + φ
Cash-in-hand x z ≡x+φ
Cash-in-hand evolution x0 = Ra + y 0 z 0 = Râ + ỹ 0
Endowment y ỹ ≡ y − (R − 1)φ
Consumption c=x−a c = z − â

Although the endowment renormalization looks a bit strange, it takes a form implied by our definitions of â
and z, along with a desire for z to evolve similarly to how x does.
The representative consumer’s problem is characterized by the following functional equation:

V (z) = max u(z − â) + βEV (Râ + ỹ 0 ) ,


 
â≥0 | {z }
=z 0

which implies the first-order condition

u0 c(z) ≥ βREV 0 (Râ(z) + ỹ 0 ).




If the borrowing constraint does not bind, this takes the form of our standard intereuler. When the borrowing
constraint does bind, the consumer would like to consume more (i.e., lower the left-hand side through c) and
save less (i.e., raise the right-hand side through â). We define (with apologies for the unfortunate notation
choice) ẑ as the level of cash-in-hand at which the borrowing constraint barely binds.
If z ≤ ẑ, the agent borrows up to the max, so â(z) = 0 and c(z) = z. If z ≥ ẑ, the FOC holds with
equality:
u0 z − â = βREV 0 (Râ + ỹ 0 ).


As we consider going from z up to z̄ > z, we would need â to increase the same amount to keep the left-hand
side constant. But this would lower the right-hand side somewhat; thus â must increase less than one-for-one
with z.

28 Where we have been: a brief reminder


Broadly speaking, the material we have covered in the second half of the term (excluding our initial discussion
of optimal taxation) is focused on the the introduction of uncertainty to our models. There are dimensions
along which our discussions have varied:

62
• Complete vs. incomplete markets. Under complete markets, agents can buy and sell a complete set
of contingent securities. We typically think of these securities as trading in a time-zero (Arrow-Debreu)
market, but this is mainly a mathematical convenience—we showed that there is an “equivalence”
between the allocations and prices that obtain at equilibrium in this market and in a sequence of
period-by-period markets. When agents have access to complete markets, they have the ability to
hedge all idiosyncratic shocks.29 We showed that they take advantage of this ability, so that individual
consumption can only depend on the history of the world through the history of aggregate shocks.
• Partial equilibrium vs. general equilibrium. We have considered two types of economies. The
first is small “open” economies, where trade takes place with foreigners; in these markets, asset prices
are exogenously set by the global market, and the openness of the market means economy-wide resource
constraints don’t have as much bite as in closed economies. In closed economies, the price of assets (and
notably, therefore, interest rates) are set to clear the asset markets given the economy-wide resource
constraint. That is, prices arise endogenously, through general equilibrium.
• Idiosyncratic risk only vs. aggregate risk. As discussed when complete markets are available,
there is a sense in which only aggregate shocks matter. However, when markets are imcomplete, both
idiosyncratic and aggregate shocks are important. We started by considering incomplete markets with
only idiosyncratic risk, and then introduced additional aggregate risk.

29 Agents may also be able to hedge aggregate risk if the economy is open or there is a storage technology.

63

You might also like