Section All
Section All
Contents
1 Welcome 3
1
10 Introducing competitive equilibrium 21
12 Pareto Optimality 24
12.1 A simple Pareto problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
12.2 A simple social planner problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
15 Continuous-time optimization 32
15.1 Finite-horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
15.2 Infinite time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
15.3 The Hamiltonian “cookbook” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
15.3.1 One control, one state variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
15.3.2 Multiple control or state variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
15.4 Current-value Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
16 Log-linearization 37
∗
16.1 Why are approximations in terms of x̂ ≡ x−x x∗ called “log-linear”? . . . . . . . . . . . . . . . 37
16.2 Log-linearization: first approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
16.3 Log-linearization: second approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
18 Optimal taxation 42
18.1 The Ramsey model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
18.2 The Primal approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
19 Introducing uncertainty 45
19.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
19.2 Utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
20 Markov chains 46
20.1 Unconditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
20.2 Conditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
20.3 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
20.4 Ergodic distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2
25 Econometrics of incomplete markets practice question 56
1 Welcome
• email: [email protected]
• homepage: http://www.lukestein.com
• Office Hours: Wednesdays, 3:15–5:05 in Landau Economics room 350
• Sections: Fridays, 12:15–2:05 in building 240 room 110
5. Steady state
Explicit dynamics Use linearization (Taylor approximation) and “diagonalization” (eigenvalue de-
composition) or phase diagrams.
6. One “agent” Either a true single agent (partial equilibrium problems) or a social planner.
Many agents interacting through markets; competitive equilibrium either in period-by-period (i.e.,
sequential) markets or in one big market at t = 0 (Arrow-Debreu).
3
3 Some thoughts about utility functions
Consider our canonical utility function (for problems with discrete time):
T
X
U {ct }Tt=0 = β t u(ct ),
t=0
where T may be infinite. The function u(·) is called the felicity or instantaneous utility function. This
utility function has several important properties, including:
Time separability The period utility at time t only depends on consumption at time t. For example, there
is no habit persistence.
Exponential discounting β constant and β < 1 mean the agent values consumption today more than
consumption tomorrow, with a constant “strength of preference” for consumption sooner vs. later.
Stationarity The felicity function is time invariant. Thus when T = ∞, the utility function evaluated over
future consumption looks the same from any point in the future.
4
4.1 Taylor approximation
Consider a differentiable real-valued function on (some subset of) Euclidean space, f : Rn → R. The function
can be approximated in the region around some arbitrary point y ∈ Rn by its tangent hyperplane.
If f : R → R, this approximation takes the form
2. f 00 (x) ≤ 0;
f 0 (y)−f 0 (x)
3. x ≥ y if and only if f 0 (x) ≤ f 0 (y), which can also be stated compactly as y−x ≤ 0 or (y −
x) f 0 (y) − f 0 (x) ≤ 0; and
Also, for all concave functions (not just differentiable ones), a local maximum is a global maximum. And
a concave function must be continuous on the interior of its domain (although it need not be continuous on
the boundaries).
5
y
0.25
0.2
0.15
0.1
0.05
z
0.2 0.4 0.6 0.8 1
As an exercise, use the first-order and envelope conditions for the functional equation form of the NCGM
(given in equation 8) to derive its intereuler. The derivation should really include an argument that the
optimal k 0 is interior to the feasible set.
A more complete envelope theorem for constrained optimization is:
Theorem 1. Consider a constrained optimization problem v(θ) = maxx f (x, θ) such that g1 (x, θ) ≥ 0, . . . ,
gK (x, θ) ≥ 0.
Comparative statics on the value function are given by:
K
∂v ∂f X ∂gk ∂L
= + λk =
∂θi ∂θi x∗ ∂θi x∗ ∂θi x∗
k=1
P
(for Lagrangian L(x, θ, λ) ≡ f (x, θ) + k λk gk (x, θ)) for all θ such that the set of binding constraints does
not change in an open neighborhood.
Roughly, this states that the derivative of the value function is the derivative of the Lagrangian.
Proof. The proof is given for a single constraint (but is similar for K constraints): v(x, θ) = maxx f (x, θ)
such that g(x, θ) ≥ 0.
Lagrangian L(x, θ) ≡ f (x, θ) + λg(x, θ) gives FOC
∂f ∂g ∂f ∂g
+λ = 0 ⇐⇒ = −λ (1)
∂x ∗ ∂x ∗ ∂x ∗ ∂x ∗
where the notation
·|∗ means “evaluated at x ∗ (θ), θ for some θ.”
If g x∗ (θ), θ = 0, take the derivative in θ of this equality condition to get
∂g ∂x∗ ∂g ∂g ∂g ∂x∗
+ = 0 ⇐⇒ =− . (2)
∂x ∗ ∂θ θ ∂θ ∗ ∂θ ∗ ∂x ∗ ∂θ θ
∂f
Note that, ∂L
∂θ = ∂θ + λ ∂g
∂θ . Evaluating at (x∗ (θ), θ) gives
∂L ∂f ∂g
= +λ .
∂θ ∗ ∂θ ∗ ∂θ θ
∂f
If λ = 0, this gives that ∂L
∂θ |∗ = ∂θ |∗ ; if λ > 0, complementary slackness ensures g x∗ (θ), θ = 0 so we can
apply equation 2. In either case, we get that
∂f ∂g ∂x∗
= −λ . (3)
∂θ ∂x ∗ ∂θ θ
6
Applying the chain rule to v(x, θ) = f x∗ (θ), θ and evaluating at x∗ (θ), θ gives
∂v ∂f ∂x∗ ∂f
= +
∂θ ∗ ∂x ∗ ∂θ θ ∂θ ∗
∂g ∂x∗ ∂f
= −λ +
∂x ∗ ∂θ θ ∂θ ∗
∂L
= ,
∂θ ∗
which can be interpreted as follows: the left-hand side is the marginal benefit of consuming an extra unit
today, the right-hand side is the marginal cost of consuming an extra unit today, comprising
1. R: Conversion from a unit of consumption today to units of consumption tomorrow,
2. u0 (c2 ): Conversion from units of consumption tomorrow to felicits tomorrow,
ct , kt+1 ≥ 0;
k0 given.
7
It can also be written many other ways; e.g., in terms of investment:
∞
X
max∞ β t u(ct )
{(ct ,it )}t=0
t=0
s.t. ∀t, ct + it ≤ F (kt ),
(5)
kt+1 = (1 − δ)kt + it ,
ct , kt+1 ≥ 0;
k0 given.
There is an art to choosing the simplest formulation. We can turn some inequality constraints into equality
constraints with simple arguments (e.g., monotone utility means no consumption will be “thrown away”),
and Inada conditions (together with a no-free-lunch production function) can ensure that non-negativity
constraints will never bind. Further, in deterministic problems, it is usually best to let the choice space be
the state space; here, that means eliminating consumption and investment:
∞
X
β t u f (kt ) − kt+1
max∞
{kt+1 }t=0
t=0 (6)
s.t. k0 given.
have an interpretation almost identical to that of the two-period saving problem, with f 0 (kt+1 ) playing the
role of R: the marginal rate of transformation between ct+1 and ct .
So do these intereulers give us a solution to the original problem? No. They give a second -order difference
equation (in kt ) with only one additional condition: the initial condition that k0 is given. It turns out we
need another condition, called the transversality condition (TVC), which is sufficient, together with the
intereulers, for a solution (see SL Theorem 4.15):
The intuition for the TVC is that if you consume too little and save too much, kt and u0 (ct ) will grow so
fast as to overwhelm the shrinking β t and f 0 (kt ). The intuition for the terms is:
1. kt : Amount of capital,
2. f 0 (kt ): Conversion from amount of capital to amount of consumption (at marginal—not average—
product rate),
3. u0 (ct ): Conversion from amount of consumption to felicits at time t,
4. β t : Conversion from felicits at time t to utils.
A final observation about the steady state of the NCGM. At a steady state ct = ct+1 = c∗ and kt+1 = k∗ ,
so the intereuler reduces to f 0 (k∗ ) = 1/β.
8
Question
Consider an economy with a measure one number of identical households. Preferences are given by:
∞
X C 1−σ t
.
0
1−σ
Households have L units of labor, which they supply inelastically to firms that produce a consumption
good using technology:
Ct = (ut Kt )α L1−α ,
where Kt is physical capital and ut ∈ [0, 1] is the fraction of the capital stock used to produce consumption
goods.
Investment goods are produced according to:
It = A(1 − ut )Kt ,
where 1 − ut represents the share of the capital stock used to produce capital goods.
The stock of capital evolves according to:
Kt+1 = (1 − δ)Kt + It .
1. Write the problem of a Social Planner and find the Euler equation.
2. For the balanced growth path of this economy, find expressions for the growth rate of Kt , Ct and It in
terms of the balanced growth path level of ut = u∗ .
Solution
1. Write the problem of a Social Planner and find the Euler equation.
As we have discussed, there are several ways to write the same problem. Although it is tempting to
substitute to get a problem written only in terms of capital, it turns out that here (and in many other
problems with lots of choice variables), the notation can wind up getting very nasty. I started down
that route, but returned to including more Lagrange multipliers instead. The social planner maximizes
∞
X c1−σ
t
max βt s.t.
{kt+1 ,ct ,ut }t
t=0
1−σ
ct = uα α 1−α
t kt L , ∀t;
kt+1 = (1 − δ + A − Aut )kt , ∀t;
ut ∈ [0, 1], ∀t.
With an appeal to an Inada condition on u(·) and the fact that ut = 0 =⇒ ct = 0, we will not worry
about ut ≥ 0 binding. Without argument, we will also ignore ut ≤ 1; this could actually be a problem.
The Lagrangian is
∞
c1−σ
X
βt t + λt (uα α 1−α
L= k
t t L − ct ) + µt (1 − δ + A − Au )k
t t − kt+1
t=0
1−σ
9
β t αc1−σ
t
λt α uα−1
t ktα L1−α = µt Akt =⇒ µt = ; and
| {z } ut Akt
=ct /ut
α−1 α
µt = λt+1 α ut+1 kt+1 L1−α +µt+1 (1 − δ + A − Aut ) =⇒
| {z }
=ct+1 /ut
αc1−σ
β t βαc1−σ
t+1 αc1−σ
t+1
β
t t+1 t+1
= + (1 − δ + A − Aut+1 )
ut Akt kt+1 ut+1 Akt+1
c1−σ
t βAut+1 c1−σ
t+1 βc1−σ
t+1
= + (1 − δ + A − Aut+1 )
ut kt ut+1 kt+1 ut+1 kt+1
1−σ
ct ut kt
= β(1 − δ + A) .
ct+1 ut+1 kt+1
Note there α are other possible simplifications of this intereuler, based on the fact that ct /ct+1 =
ut kt
ut+1 kt+1 .
2. For the balanced growth path of this economy, find expressions for the growth rate of Kt , Ct and It in
terms of the balanced growth path level of ut = u∗ .
Note that the production technology for the investment good is it ∝ Akt , which violates Inada conditions:
∂it
lim = A(1 − ut ) 6= 0.
k→∞ ∂kt
This can give us endogenous growth, since decreasing marginal returns never kick in.
Along the balanced growth path, the law of motion for capital gives the growth rate of capital:
The investment goods production technology gives us the growth rate of investment:
The consumption goods production technology gives us the growth rate of consumption:
α
(u∗ kt+1 )α L1−α
ct+1 kt+1
γc ≡ = = = γkα .
ct (u∗ kt )α L1−α kt
10
7 Introduction to dynamic programming
We consider “transforming” a sequence problem of the form given in equations 4–6 into a functional equation
V (k) = max
0
[u(c) + βV (k 0 )]
c,k
As in the SP, we can recast the choice space to be the state space (also arguing that the inequality constraints
do not bind):
u f (k) − k 0 + βV (k 0 ) .
V (k) = 0 max (8)
k ∈(0,f (k))
While the solution to the SP (as written in equation 4) is a joint sequence {(ct , kt+1 )}∞
t=0 , the solution to
the FE is a value function v(k) and a policy function y ∗ = g(k). Therefore in order to find a solution to the
FE problem we need to understand a few things, namely:
• In what space do functions “live”?
• If there are multiple solutions to the FE, which one(s) solve(s) the SP?
Why do we want to solve the FE? We are shifting the problem from studying an infinite sequence to
a function. Is this really easier? Analytically the answer is not clear; there are properties of the solution
that are proved more easily using the SP formulation than the FE one. However, there is at least one clear
advantage to a dynamic programming approach: many problems of interest to macroeconomists cannot be
solved analytically. When we need to find numerical solutions, the FE formulation makes things easier.2
V : R → R. The dimensionality of the former object is lower (countably infinite vs. uncountably so) but the function can much
more easily be approximated. We do this by somehow restricting its domain, and then determining its value on a discrete grid of
points.
3 Actually there are two operations: left scalar multiplication and right scalar multiplication (with an additional requirement
11
1. α · x ∈ X (closure under scalar multiplication),
2. α · x = x · α (commutativity),
3. (α · β) · x = α · (β · x) (associativity),
4. 1 · x = x (identity existence), and
5. 0 · x = ~0.
Distributive laws
1. α · (x + y) = α · x + α · y, and
2. (α + β) · x = a · x + β · x.
Examples of real vector spaces include:
• Euclidean spaces (Rn ),
• The set of real-valued functions f : [a, b] → R, and
• The set of continuous (real-to-real) functions f : [a, b] → R;
• The set of continuous, bounded (real-to-real) functions f : [a, b] → R; we refer to this set as C[a, b].
Definition 3. A normed vector space is a vector space S, together with a norm k·k : S → R, such that,
for all x, y ∈ S and α ∈ R:
1. kxk = 0 if and only if x = ~0,
2. kα · xk = |α| · kxk, and
3. kx + yk ≤ kxk + kyk.
As an exercise, you can prove that these assumptions ensure that kxk ≥ 0.
Examples of normed vector spaces include:
• On R, absolute values kxk = |x|;
pPn
• On Rn , the Euclidean norm kxk = 2
i=i xi ;
Pn
• On Rn , the Manhattan norm kxk = i=i |xi |;
• On Rn , the sup norm kxk = supi∈{1,...,n} |xi | (a proof that this is a norm is given in the appendix to
this section); and
• On C[a, b], the sup norm kf k = supt∈[a,b] |f (t)|.
If we have a normed vector space, we can also define a metric space, which has a “metric” (also known
as a “distance”) function ρ : S × S → R defined by ρ(x, y) = kx − yk.
It may be useful for each of the example norms given above to sketch a ball: the locus of points in S less
than distance r away from some element x ∈ S
Although every normed vector space can give rise to a metric space, not every metric space can be
generated in this way. The general definition of a metric space follows; it is not hard to prove that the
properties of a norm imply that ρ(x, y) = kx − yk satisfies the properties required of a metric. (A proof is
given in the appendix to this section.)
Definition 4. A metric space is a set S, together with a metric ρ : S × S → R, such that, for all x, y, z ∈ S:
1. ρ(x, y) = 0 if and only if x = y (which implies, with the later axioms, that the distance between two
distinct elements is strictly positive),
2. ρ(x, y) = ρ(y, x) (symmetry), and
3. ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (triangle inequality).
12
7.2 Convergence in metric spaces
∞
Definition 5. Let (S, ρ) be a metric space and {xn }n=0 be a sequence in S. We say that {xn } converges
to x ∈ S, or that the sequence has limit x if
That is, a sequence is convergent if its terms get closer and closer to some point x, to the extent that,
given an arbitrary small number ε > 0, we can always find some positive integer N (ε) such that all terms of
the sequence beyond N (ε) will be no further than ε from x.
Checking directly whether a sequence convergences requires knowing its limit. Most of the time, we don’t
know the limit. The definition of Cauchy sequences will help us with that.
∞
Definition 6. A sequence {xn }n=0 in a metric space (S, ρ) is a Cauchy sequence if
It’s clear that any converging sequence is also a Cauchy sequence, but the converse is not true: not every
Cauchy sequence converges.4 However there is a particular class of metric spaces, called complete metric
spaces in which this converse does hold.
Definition 7. A metric space (S, ρ) is complete if every Cauchy sequence contained in S converges to some
point in S.
Checking completeness is very difficult. We will take it as given that the real line R with the metric
ρ(x, y) = |x − y| is a complete metric space.
Theorem 8. Let X ⊆ Rn , and let C(X) be the set of bounded, continuous functions f : X → R with the sup
norm, kf k = supx∈X |f (x)|. Then C(X) is a complete metric space.
A proof is given in the appendix to this section.
• S = R, ρ(x, y) = |x − y|, and f (s) = s2 : the sequence converges to 1 for s0 = ±1, converges to 0 for
s0 ∈ (−1, 1), and diverges otherwise;
√
• S = R+ , ρ(x, y) = |x − y|, and f (s) = s: the sequence converges for any s0 (to 0 if s0 = 0 and to 1
otherwise).
4 Consider the metric space S = (0, 1] with the metric derived from the absolute value norm ρ(x, y) = |x − y|. (This is a
metric space, even though S does not form a vector space with regular addition and multiplication.) The Cauchy sequence
1, 12 , 13 , . . . lies in S but does not converge in S.
13
Drawing pictures is useful here: graph f (·) and the 45-degree line on the same axes.5
Here’s the important thing: suppose that for some s0 the sequence converges to s. Then as long as f
is continuous, s is a fixed point of f ; i.e., f (s) = s. Believe it or not, this will help us solve functional
equations.
Recall the FE of equation 8:
u f (k) − k 0 + βV (k 0 )
V (k) = 0 max for all k.
k ∈(0,f (k))
Just as our functions f (·) above mapped real numbers to real numbers, we consider an operator T (·) that
maps functions to functions. In particular, let
u f (k) − k 0 + βv(k 0 )
T (v) ≡ 0 max
k ∈(0,f (k))
for any function v. This may take a bit to get your head around, but note that T : R → R and T (v) : R → R.
Given this definition of T , we can write our functional equation compactly as V (k) = [T (V )](k) for all k, or
even more simply as V = T (V ).
That means that V , the solution to our function equation, is a fixed point of the operator T . How can we
find such a thing? Our earlier intuition suggests the following strategy:
1. Show the operator T is continuous (we will use this later);
2. Pick some initial function v0 ;
3. Consider the sequence {vn }∞
n=0 defined by vn+1 = T (vn ) and the initial condition;
4. Show that this sequence converges under some distance over function spaces (we use that induced by
the sup norm);
5. Find the thing V to which the sequence converges;
6. Note that continuity of T implies V is a fixed point of T and hence solves the functional equation.
How do we do this? If we can show that T is a contraction (perhaps using Blackwell’s Theorem) then we
are assured continuity, that the resulting sequence is Cauchy (and hence converges by the completeness of
the metric space), and that it converges to the unique fixed point of T from any starting v0 . We will often
have to use numerical methods to actually find the convergence point V , however.
Proof. We must show that only identical elements have zero distance, that the distance function is symmetric,
and that it satisfies the triangle inequality.
1. Let S be the vector space, for x, y ∈ S let z = (x − y) ∈ S (we can derive this from the axioms on
vector spaces). Then ρ(x, y) = kx − yk = kzk ≥ 0 by property (1) of the norm. Notice also that we can
say that z = θ if and only if x = y, which implies that ρ(x, y) = 0 if and only if x = y.
S ⊆ R (make sure to confirm that f : S → S) using intuition and/or the definition of a contractions.
14
3. We want to show that ρ(x, z) ≤ ρ(x, y) + ρ(y, z) is true with our metric, i.e. that kx − zk ≤ kx − yk +
ky − zk.
Again let t = x − y ∈ S and w = y − z ∈ S. Then kx − zk = kx − y + y − zk = kt + wk ≤ ktk + kwk ≤
kx − yk + ky − zk.
Theorem 10. The sup norm kxk ≡ supi∈{1,...,n} |xi | over Rn is a norm.
Proof. We must show that only the zero element has zero norm, that scalar multiplication can be taken
outside the norm, and that it satisfies a triangle inequality.
1. kxk = supi {|xi |, i = 1, ..., n} = |xj | ≥ 0 for some 1 ≤ j ≤ n.
2. kα · xk = supi {|α · xi |, i = 1, ..., n} = supi {|α| · |xi |, i = 1, ..., n} = |α| · supi {|xi |, i = 1, ..., n} = |α| · kxk.
3. kx + yk = supi {|xi + yi |, i = 1, ..., n} Call the maximal elements x∗ and y∗ . Then: supi {|xi + yi |, i =
1, ..., n} = |x∗ + y∗ | ≤ |x∗ | + |y∗ | ≤ supi {|xi |, i = 1, ..., n} + supi {|yi |, i = 1, ..., n} = kxk + kyk.
Theorem 11. Let X ⊆ Rn , and let C(X) be the set of bounded continuous functions f : X → R with the
sup norm, kf k = supx∈X |f (x)|. Then C(X) is a complete metric space.
Proof. We take as given that C(X) with the sup norm metric is a metric space. Hence we must show that if
{fn } is a Cauchy sequence, there exists f ∈ C(X) such that
for any ε ≥ 0, there exists N (ε) such that kfn − f k ≤ ε for all n ≥ N (ε).
1. find f ;
2. show that {fn } converges to f in the sup norm;
3. show that f ∈ C(X), i.e., f is continuous and bounded.
So let’s start:
1. We want to find the candidate function f . In what follows, we will indicate a general element of X as x
and a particular element as x0 .
Consider a Cauchy sequence {fn }. Fix x0 ∈ X, then {fn (x0 )} defines a sequence of real numbers. Let’s
focus on this sequence of real numbers (notice that now we are talking about something different than
the Cauchy sequence of functions {fn }): by the definition of sup and of sup norm we can say that:
But then we know that the sequence {fn } is Cauchy by hypothesis (this time the sequence of functions),
hence kfn − fm k ≤ ε. But then:
Thus the sequence of real numbers {fn (x0 )} is also a Cauchy sequence, and since R is a complete
metric space, it will converge to some limit point f (x0 ). Therefore we now have our candidate function
f : X → R.
15
2. We want to show that the sequence of functions {fn } converges to f in the sup norm, i.e. that
kfn − f k → 0 as n → ∞.
Fix an arbitrary ε > 0 and choose N (ε) so that, n, m ≥ N (ε) implies kfn − fm k ≤ ε/2 (we know that
we can do this since {fn } is Cauchy).
Now, for any fixed arbitrary x0 ∈ X and all m ≥ n ≥ N (ε),
Since {fm (x0 )} converges to f (x0 ) (this is how we constructed f (x0 )), then we can choose m for each
fixed x0 ∈ X so that |fm (x0 ) − f (x0 )| ≤ ε/2. Hence we have:
fn (x0 ) − f (x0 ) ≤ ε.
But since the choice of x0 was arbitrary, this will hold for all x ∈ X, in particular for supx∈X |fn (x) −
f (x)| = kfn − f k. And since also the choice of ε was arbitrary, then we have obtained that for any
ε > 0, kfn − f k ≤ ε, for all n ≥ N (ε).
3. Now we want to show that f ∈ C(X), i.e. that f is bounded and countinuous.
f is bounded by construction. To prove that f is continuous, we need to show that for every ε ≥ 0
andPevery x ∈ X, there exists δ ≥ 0 such that |f (x) − f (y)| ≤ ε if kx − ykE < δ, where kx − ykE =
n
p n
n
i=1 i − yi | is the Euclidean norm on R .
|x
n
Fix arbitrary ε > 0 and x0 ∈ X. Then choose k such that kf − fk k < ε/3 (we can do this since {fn }
converges to f in the sup norm). Then notice that fk is continuous (by hypothesis the sequence is in
C(X), hence it continuous. Therefore there exists δ such that:
f (x0 ) − f (y) ≤ f (x0 ) − fk (y0 ) + fk (x0 ) − fk (y) + fk (y) − f (y) (by triangle inequality)
≤ 2 · kf − fk k + f (x0 ) − fk (y0 ) (by def of sup and sup norm)
<ε (by continuity of fk ).
But since the choice of ε and x0 was arbitrary, this will hold generally, so we have proved our
statement.
The two algorithms you are expected to “know” for Economics 210 are “guess and verify” and value (function)
iteration. They are discussed below, along with a third algorithm: policy (function) iteration.
16
8.1 “Guess and verify”
This is an analytical algorithm; the two following algorithms will be numerical.
1. Develop an initial “guess” for the value function, V0 (·), with an appropriately parameterized functional
form.
2. Iterate over Vi using
Vi+1 (k) = 0max u(k, k 0 ) + βVi (k 0 )
k ∈Γ(k)
until Vi and Vi+1 are sufficiently close to each other (typically measured by their sup norm distance).
3. Conclude that Vi+1 is an approximate solution to the FE.
For some u and Γ, we may be able to use the contraction mapping theorem to ensure that this process
converges for any initial guess, V0 .
1. Develop an initial “guess” for the policy function, g0 (·). If the space is finite and discrete, this guess
can be represented by a vector of g0 evaluated at all the values k can take on.
2. Iterate over gi as follows:
• Form Vi from gi by
Vi (k) = u k, gi (k) + βu gi (k), gi (gi (k)) + β 2 u gi (gi (k)), gi (gi (gi (k))) + · · ·
X∞
β t u git (k), git+1 (k) .
=
t=0
One way to implement this is to approximate Vi using the first T terms of this sum for some large
T .6
6 The policy function can be expressed as a transition matrix G (containing all zeros, except for a single one in each row), in
i
which case Vi = t β t Gti u for an appropriately formed vector u.
P
17
• Form gi+1 from Vi by
gi+1 (k) = argmax u(k, k 0 ) + βVi (k 0 ) .
k0 ∈Γ(k)
Continue iterating until gi and gi+1 are sufficiently close to each other (typically measured by their sup
norm distance).
3. Conclude that Vi+1 is an approximate solution to the FE.
As with value iteration, for some u and Γ we may be able to use the contraction mapping theorem to
ensure that this process converges for any initial guess, g0 .
The contraction mapping theorem can be used to ensure that this process converges for any initial guess.
This may seem like a more complex algorithm than value iteration, but can in fact be easier to implement.
for each k ∈ K by exhaustively considering all the k 0 ∈ Γ(k) ∩ K to find the maximizer. A more sophisticated
approach still only considers k ∈ K, but would allow k 0 to take on any value. This suggests attempting to
iterate with
?
Vi+1 (k) = 0max u(k, k 0 ) + βVi (k 0 ) .
k ∈Γ(k)
This proposed approach raises two questions. First, exhaustive search of a continuous choice set is not feasible;
how do we solve the maximization when k 0 can take on a continuum of values? Fortunately, numerical
computing tools (e.g., Matlab, Scilab, SciPy) offer many built-in optimization algorithms. Secondly, how
can we even evaluate the maximand for k 0 6∈ K when Vi was only defined for values in the grid? There are
a number of ways of doing this; the simplest (which also has attractive theoretical properties) is simply to
linearly interpolate between adjacent elements of K. Thus the algorithm, which is called fitted value iteration,
actually iterates using
where V̄i means the function that linearly interpolates Vi between grid points. There is also a related algorithm
called fitted policy iteration.
18
Typically, the requisite recursive formulation will come from the intereuler(s).
Once we have characterized a steady state, we may
• Evaluate comparative statics of the steady state with regard to exogenous parameters,
• Draw conclusions about dynamics following small deviations from the steady state, or
for arbitrary u and f . One way of making this system more tractable is to consider instead a linear
approximation of this difference equation instead:
kt+2 ≈ α0 + α1 kt+1 + α2 kt
for some appropriately chosen α0 , α1 , α2 ∈ R. We generally conduct such approximations about the steady
state (which is usually the only point on g(·) we can find analytically).
There will be more talk of approximating dynamic systems soon. When we get there, several mathematical
tools will be essential.
If a system evolves according to xt = g(xt−1 ), we get a “linearized” system that evolves according to
xt ≈ g(x∗ ) + g 0 (x∗ )(xt−1 − x∗ ). If x∗ is the system’s steady state, x∗ = g(x∗ ); at this point, the approximation
is perfect.
The slope g 0 (x∗ ) tells us about the speed of convergence of the linearized system: if g 0 (x∗ ) = 0, then
the approximation tells us that xt ≈ g(x∗ ) = x∗ for all xt−1 ; convergence is instantaneous. In contrast, if
g 0 (x∗ ) = 1, then the approximation tells us that xt ≈ g(x∗ ) + xt−1 − x∗ = xt−1 for all xt−1 , and there is no
convergence towards x∗ whatsoever.
19
How do we find eigenvectors and eigenvalues? Eigenvector p is associated with eigenvalue λ if
W p = λp
W p − λp = 0
(W − λI)p = 0 (9)
where I is the identity matrix. To solve for the eigenvalues, note that this equation can be satisfied if and
only if W − λI is singular (i.e., non-invertible), or
det[W − λI] = 0.
This is called the characteristic equation; the left-hand side is a polynomial in λ whose order is the dimension
of W . Unfortunately, for polynomials of degree exceeding four, there is no general solution in radicals (per
the Abel-Ruffini Theorem). Fortunately, there are other ways to calculate eigenvalues and you are unlikely to
need to calculate eigenvalues by hand for a matrix larger than 2 × 2.8 If you don’t remember the quadratic
formula or how to find the determinant of a 2 × 2 matrix, now would be a good time to remind yourself.
After solving for the eigenvalues, you can find the eigenvectors using W p = λp. Of course, p will not be
pinned down entirely: if p is an eigenvector, then αp is also an eigenvector for any α 6= 0. In practice, people
sometimes set the first element of each eigenvector equal to 1 (as long as eigenvector does not have a 0 in
that entry) and solve for the rest of the vector.9
Theorem 12 (Eigen Decomposition Theorem). Consider a square matrix W ; denote its k distinct eigenvectors
p1 , . . . , pk and the associated eigenvalues λ1 , . . . , λk . Let P be the matrix containing the eigenvectors as
columns and Λ be the diagonal matrix with the eigenvalues on the diagonal:
λ1 0 · · · 0
| | | 0 λ2 · · · 0
P ≡ p1 p2 · · · pk Λ= . .. .
.. .. . .
| | | . . .
0 0 · · · λk
If P is a square matrix, W can be decomposed into
W = P ΛP −1 .
Proof.
| | | |
P Λ = λ1 p1 ··· λk pk = W p1 ··· W pk = W P,
| | | |
where the first and third equalities follow from the matrix multiplication algorithm, and the second from the
definition of and eigenvector and its associated eigenvalue. Postmultiplying by P −1 completes the proof.
We will use this result extensively to analyze dynamic systems characterized by linear difference equations
(where the linearity typically arises through linear approximation around the steady state). Here’s how:
suppose that a dynamic system evolves according to
kt+2 kt+1
=W .
kt+1 kt
| {z } | {z }
≡xt+1 ≡xt
20
We therefore have that xt = W t x0 . Off the bat, we might not have much to say about the matrix W raised
to a high power. However, using the eigen decomposition of W gives
xt = (P ΛP −1 )t x0 = P Λ
P −1
P −1
P Λ P Λ · · · ΛP −1 x0 = P Λt P −1 x0 ,
If all the eigenvalues of W have magnitude less than one, then the xt = W t x0 = P Λt P −1 x0 form a
convergent process no matter what x0 = [k1 k2 ]0 we start from—systems like this are sometimes called
“sinks.” However, if W has an eigenvalue with magnitude greater than one (an “exploding” eigenvalue), the
system only converges if x0 is such that the element of P −1 x0 corresponding to the exploding eigenvalue
is zero. Such a system is said to be a “source” or have a “saddle path,” depending on whether some or all
eigenvalues are explosive. More on all this is still to come.
In investigating a multi-agent model, we will look for competitive equilibrium. Since the meaning of
a competitive equilibrium can depend on the economic environment, typically the first thing we do is to
define one. A canonical definition is that a competitive equilibrium comprises prices and quantities (or
allocations) such that
1. All agents maximize given the prices (and any other parameters they cannot control) they face,
2. Markets clear (i.e., prices paid are prices received and quantities sold are quantities bought), and
3. Physical resource constraints are satisfied.
21
11.1 Asset-free economy
This economy has no assets. It therefore does not matter whether we consider a social planner problem or
a competitive equilibrium: there are no prices, and the economy’s resource constraint is equivalent to the
representative agent’s budget constraint. The problem of the economy is therefore to
X
max β t u(ct ) s.t. ct = y, ∀t.
{ct }t
t
V = u(y) + βV.
To solve the model, we consider set up a Lagrangian for the representative agent’s problem
X
L= β t u(y + Rt at − at+1 )
t
| {z }
ct
and use the FOCs to generate the intereuler: u0 (ct ) = βRt+1 u0 (ct+1 ) for all t.
We can also form a functional equation as long as the interest rates are constant (Rt = R for all t):
V (a) = max
0
u(y + Ra − a0 ) + βV (a0 ).
a
22
• Agents maximize: {ct , at+1 }t solve
X
max β t u(ct ) s.t.
{ct ,at+1 }t
t
ct + at+1 = y + Rt at , ∀t;
a0 = 0; and
an appropriate TVC.
This gives the same intereuler as in the partial equilbrium assets market: u0 (ct ) = βRt+1 u0 (ct+1 ) for
all t. Note that this will always be the case; individual agents are price-takers/-makers, so there is no
difference (yet) between GE and PE.
• Markets clear: at = 0 for all t.
• The resource constraint holds: ct = y for all t.
By either market clearing and the agents’ budget constraint, or by the economy’s resource constraint, we
have that u0 (ct+1 ) = u0 (ct ) for all t. Thus the intereuler pins down the equilibrium interest rate Rt = β −1 for
all t ≥ 1.
Note that the allocations are the same as the social planner’s solution under general equilibrium asset
markets. This is a direct implication of the First Welfare Theorem.
The Lagrangian X
β t u(ct ) + λ(y − ct )
L=
t
t 0
gives FOCs β u (ct ) = λpt for all t.
• Markets clear: y − ct = 0 for all t.
• The resource constraint holds: ct = y for all t.
By either market clearing or by the economy’s resource constraint, we have that u0 (ct ) = u0 (y) for all
0
t. Thus the intereuler implies that pt ≡ β t ρ for all t, where ρ ≡ u (y)/λ. (This just gives our one allowable
normalization in prices.)
What are equilibrium interest rates? They are Rt = pt/pt−1 = β, just as we would expect. Again, the
allocations are the same as the social planner’s solution under general equilibrium asset markets.
23
12 Pareto Optimality
Definition 13. An allocation is pareto optimal if it is:
1. Feasible: the sum of consumptions is less than or equal to the total endowment; and
2. Pareto: it is not possible to make any person better off without making at least one other person worse
off.
The sections below consider two ways of modelling an economy: as a “Pareto problem,” and as a social
planner problem. We argue that the solutions are the same, and further note that the First Welfare Theorem
ensures that any competitive equilibrium can also be found as the solution to a Pareto or social planner
problem. We will not actually have too much more to say in this class about the Fundamental Welfare
Theorems,10 but there is further treatment in the general equilibrium section of Economics 202/202n.
Keep in mind that these results (as usual) rely on the concavity of the utility function.
ui (ci ) ≥ u∗i , ∀i ≥ 2;
ci ≥ 0, ∀i ≥ 1;
I
X
ci ≤ Y.
i=1
The first set of constraints can be thought of as “promise-keeping” constraints: the Pareto optimizer seeks
to maximize the utility of agent one conditional on promises that she has made to deliver utility of at least
u∗i to each other agent i ≥ 2.
At the optimum, we will clearly have
for i ≥ 2; that is, c∗i is the minimum level of consumption that i needs to achieve utility u∗i . By varying
{c∗i }i≥2 and solving the above optimization, we can identify the full set of Pareto Optimal outcomes. Note
PI
that under these assumptions, since utility is increasing, it must Palways be the case that i=1 c∗i = Y , so
that knowing {c∗i }i≥2 allows us to back out the optimal c∗1 = Y − i≥2 c∗i .
Setting up a Lagrangian (and ignoring non-negativity constraints as usual) gives
I
X I
X
∗
L ≡ u1 (c1 ) + θi ui (ci ) − ui + ω Y − ci .
i=2 i=1
Note that the problem is well behaved: the choice set is convex, and the objective function is concave and
differentiable. Thus the FOCs characterize the solution. Taking FOCs,
u01 (c∗1 ) = ω,
θi u0i (c∗i ) = ω, ∀i ≥ 2.
10 The Second Welfare Theorem ensures that any solution to a Pareto or social planner problem can be “supported” as a
competitive equilibrium.
24
Thus
u01 (c∗1 )
= θi
u0i (c∗i )
where the λs are the weights—exogenous to the problem—that the social planner places on each agent. Note
that only the ratios of the λs matters (we could double the value of each and leave the problem unchanged),
so without loss of generality we can normalize λ1 = 1.
Setting up a Lagrangian (and ignoring non-negativity constraints as usual) gives
I
X I
X
L≡ λi ui (ci ) + γ Y − ci .
i=1 i=1
As in the Pareto problem, this is well behaved (the choice set is convex, and the objective function is concave
and differentiable) so the FOCs characterize the solution:
u01 (c∗1 )
= 1/λi
u0i (c∗i )
for all i.
Thus we can achieve any Pareto optimal allocation in the social planner problem by choosing appropriate
weights: λi = 1/θi . There is an equivalence between the two problems.
25
Question
Consider the neoclassical growth model with endogenous hours and government spending. That is, the
representative agent maximizes
∞
X
β t U (Ct , Nt )
t=0
s.t. Ct + Kt+1 − (1 − δ)Kt + gt = Wt Nt + Rt Kt
Yt = F (Kt , Nt ).
The functions U and F are strictly increasing in each argument, strictly concave, differentiable and they
satisfy the Inada conditions. We also assume that the corresponding TVC holds.
1. Assume that
Nt1+χ
U (Ct , Nt ) = log(Ct ) −
1+χ
and that the production function is Cobb-Douglas
(a) Derive the FOCs of the hh and the firm and explain each of the four equations you get (two for
the hh and two for the firm).
(b) Define a competitive equilibrium.
(c) Is this allocation Pareto-optimal? Give a short intuitive argument.
(d) Describe the steady state of the economy. Hint: you will not be able to find closed form solutions
for the endogenous variables. Use the intereuler condition to pin down the capital-labor ratio,
which you can define as X. Use X to simplify the intraeuler condition and the resource constraint
of the economy.
(e) Consider now a permanent increase in gt . In the new steady state (i.e., ignoring transition
dynamics), what is the effect of this change on the allocations in the economy (N , K, C, and Y )?
2. Now, assume that the momentary utility function is given by
!
N 1+χ
U (Ct , Nt ) = log Ct − t .
1+χ
(a) Derive the FOCs for this economy (two equations for the hh; the firm problem did not change).
(b) Consider again a permanent increase in gt with this new utility function. In the new steady state
(i.e., ignoring transition dynamics), what is the effect of this change on the allocations in the
economy (N , K, C, and Y )?
(c) Compare the effect of changing g for the two cases with different utility functions. Provide intuition
for your result.
Solution
The first thing to do is figure out what’s going on in this economy. Common questions include,
1. Who owns capital?
26
2. What are consumers’ sources and uses of income?
3. What are firms’ expenses?
4. What happens to firms’ profit (i.e., who owns firms)?
5. In what units are prices measured?
2. Sources: wages, rental income of capital, (depreciated) capital stock. Uses: consumption, future capital
stock, tax.
3. Wages and capital rental.
4. It’s not clear who owns firms—perhaps someone outside the economy? However, it will turn out that
the form of the production function ensures there are no profits. (This is not a coincidence; we will
often see this.)
5. Consumption goods. An equivalent way to state this is that the price of consumption is normalized to
one.
6. Ct + gt + Kt+1 = Yt + (1 − δ)Kt .
We now proceed with a solution. Note that the discussion below goes into significantly more depth of
explanation than would be expected on an examination. If you were short on time, you could still could do
much of this problem quickly. You should be able to take the FOCs (1a), define a competitive equilibrium
(1b), have a quick discussion of Pareto Optimally (1c), define a steady state and evaluate the FOCs at a
steady state (1d), take the new FOCs (2a), and mention/define income effects (2c).
1. (a) You should take the FOCs correctly. The rest of the problem relies on them, so you need to get
them correct. As Nir wrote in his solution set,
“Make sure not to lose points on the FOCs! This problem was certainly not easy in the
later parts, but very standard in its general setup. You should not lose any points and/or
time on questions like part 1a, 1b and 2a. Those alone gave you half the points for this
question.”
As for “explaining” them, most of the explanations are not interesting (“you need to balance
marginal this against marginal that, discounted by the interest rate and β”). The intuition should
be straightforward for each FOC, and you might consider spending most of your time on the
math rather than writing out long explanations of the FOCs. In fact, on Nir’s solutions, he didn’t
explain the FOCs at all. Do write something (something correct!), but you need not write more
than a sentence or two here.
Firm Problem We have to set up the firms’ problem, which is not given explicitly in the
problem; this is the first of several things that we will just have to use our judgment about.
Looking at the households’ budget constraints suggests that
• Households own capital, which they rent to firms at a price of Rt ,
• Firms hire workers at a wage of Wt , and
• The “units” in which these prices are measured are consumption units.
27
Thus a firm’s revenue is the quantity it produces, Yt (since the sale price is one), and its cost is
Rt Kt + Wt Nt . Thus it solves
We have not included any non-negativity constraints, justified with an appeal to F ’s satisfaction
of Inada conditions. The FOCs are below:
The firm should be willing to hire more workers if the wage is less than the worker’s marginal
product (Wt < (1 − α)Ktα Nt−α ) and should want to fire workers otherwise. Hence in equilibrium,
Wt = (1 − α)Ktα Nt−α .
[firm, capital] Rt = αKtα−1 Nt1−α .
The firm should be able to rent more capital if the rental price is less than the marginal product
of capital and should want to get rid of capital (“rent it to the market”) otherwise. Hence in
equilibrium, Rt = αKtα−1 Nt1−α .
given K0 (and again ignoring non-negativity) or, expanding around a given point,
" #
t
Nt1+χ
· · · + β log Wt Nt + Rt Kt − Kt+1 − gt + (1 − δ)Kt − +
| {z } 1+χ
Ct
" 1+χ
#
t+1
Nt+1
β log Wt+1 Nt+1 + Rt+1 Kt+1 − Kt+2 − gt+1 + (1 − δ)Kt+1 − + ··· .
| {z } 1+χ
Ct+1
This is the standard intertemporal Euler equation (intereuler). The LHS is the marginal utility
of consumption today; the RHS is the marginal utility of consumption tomorrow, discounted by
β and multiplied by the effective interest rate, Rt+1 + (1 − δ), which can be thought of as the
marginal rate of transformation between consumption today and consumption tomorrow.
28
i. Given prices {Wt , Rt }∞ s s ∞
t=0 , allocations {Ct , Nt , Kt+1 }t=0 solve the household problem;
ii. Given prices {Wt , Rt }∞ d d ∞
t=0 , allocations {Nt , Kt }t=0 solve the firm problem;
s d s d
iii. Markets clear: Nt = Nt ≡ Nt and Kt = Kt ≡ Kt for all t; and
iv. The resource constraint is satisfied: Ct + gt + Kt+1 = Yt + (1 − δ)Kt for all t.
(c) There are no externalities and no market failures, So a CE is Pareto Optimal by the First
Fundamental Welfare Theorem. We could also formally show that the FOC of the social planner
is the same, but since Nir asked for a “short intuitive” answer, this is beyond the scope of the
question. Note that reducing government spending is not a feasible Pareto Improvement, as g is an
exogenous parameter here. You lost points if you said that reducing g was a Pareto Improvement.
That argument, that reducing g makes everyone better off, would be analogous to saying that we
can have a Pareto Improvement by lowering the depreciation rate δ.
(d) Recall the intereuler
1 1
=β Rt+1 + (1 − δ) .
Ct Ct+1
In the steady state, Ct = Ct+1 , so
α−1
1 K∗
− (1 − δ) = R∗ = α
β N∗
(where the second equality comes from the firm’s FOC with respect to capital), or
1
α−1
K∗ 1 − β(1 − δ)
X∗ ≡ = .
N∗ αβ
Thus the capital/labor ratio X is pinned down by β, δ, and α; this pins down R∗ = αX∗α−1 .
From the intraeuler and the firm’s FOC with respect to labor, we have that
C∗ N∗χ = W∗ = (1 − α)X∗α .
If g∗ increases,
• Labor N∗ increases as well since the right-hand side of the equation above is increasing in N∗
(check this!11 ),
11 There are two terms on the right-hand side. Notice the second term (−(1 − α)X∗α N∗−χ ) is increasing in N∗ , and is negative.
Since the left-hand side of the equation is positive, the right-hand side must be as well, so its first term ((X∗α − δX∗ )N∗ ) must
be positive, and therefore increasing in N∗ .
29
• Capital K∗ = X∗ N∗ increases (by the same proportion as labor),
• Output Y∗ = K∗α N∗1−α increases (by the same proportion as labor and capital), and
• Consumption C∗ ∝ N∗−χ decreases.
We can think of this as the (long-run) effect of an ongoing lump-sum tax.
2. (a) Here, the household’s problem is
∞
!
X N 1+χ
max β log Wt Nt + Rt Kt − Kt+1 − gt + (1 − δ)Kt − t
t
{Kt+1 ,Nt }t
t=0
| {z } 1+χ
Ct
and
[hh, labor] Ntχ = Wt .
(b) The intereuler here gives the same steady-state result as with the other utility function: β −1 −
(1 − δ) = R∗ . Since the firm’s FOCs are the same, we get exactly the same capital/labor ratio:
1
α−1
K∗ 1 − β(1 − δ)
X∗ ≡ = .
N∗ αβ
From the intraeuler and the firm’s FOC with respect to labor, we have that
N∗χ = W∗ = (1 − α)X∗α .
If g∗ increases,
• Labor N∗ remains fixed, since X∗ does,
• Capital K∗ = X∗ N∗ remains fixed,
• Output Y∗ = K∗α N∗1−α remains fixed, and
• Consumption C∗ = K∗α N∗1−α − δK∗ − g∗ (from the economy’s resource constraint) falls by
the same amount that government expenditures increase.
(c) Can we say something about the fall in C∗ in the two models? Yes! Since output increased in the
economy with income effects (the first economy) but remained constant in the second, the same
change in g∗ will result in a smaller consumption decline in the first model.12 In the model with
income effects, the influence of the lump sum tax on consumption is partly offset by the increasing
supply of capital and labor. Hence for every dollar the government taxes, consumption falls by less
12 To be more precise, we should consider not output, but Y − δK . However, since Y and K increase by the same fraction,
∗ ∗ ∗ ∗
the difference must increase as long as it was positive to begin with.
30
than one dollar in the first example. In the second example, there are no income effects. Having
the government tax a household one dollar causes consumption to fall by one dollar.
Note that the first model has income effects in the sense that a change in consumption changes
the within-period problem, illustrated by the intraeuler Ct Ntχ = Wt . Increasing Ct must make Nt
fall; hence hours depend on income and there is an income effect.
In the second model, in contrast, increasing consumption does not change the household’s within-
period problem, per the intraeuler Ntχ = Wt . Given a saving rate, the household must each period
maximize
Nt1+χ
log Ct −
1+χ
Nt1+χ
which is the same as simply maximizing Ct − 1+χ . Since the within-period decision is quasi-linear
in consumption, there are no income effects.
∂f ∂f
f (k, n) = (αk, αn) · k + (αk, αn) · n.
∂k ∂n
Since this must hold for all α > 0, k, and n, we can plug in α = 1, k = k∗ , and n = n∗ :
∂f ∂f
f (k∗ , n∗ ) = (k∗ , n∗ ) · k∗ + (k∗ , n∗ ) · n∗ .
∂k ∂n
∂f ∂f
By the first-order conditions for the maximization problem, ∂k (k∗ , n∗ ) = r and ∂n (k∗ , n∗ ) = w, so
31
A production function demonstrates CRS if and only if it is homogeneous of degree one, where a function
f is said to be homogeneous of degree k if
f (αx) = αk f (x)
where x may be multidimensional. Using techniques very similar to those above, we could prove
∇f (x) · x = kf (x)
for all x.
f (λp) = λf (p).
15 Continuous-time optimization
Much of these notes comprise (what I hope is) a slightly clearer version of the lecture notes’ treatment of the
same subject.
The key notation we will use in the following is that “dotted” variables represent derivatives with respect
to time (ẋ ≡ ∂x/∂t).
15.1 Finite-horizon
Consider the following maximization problem:
Z T
V (0) = max v k(t), c(t), t dt s.t.
c : [0,T ]→R 0
k̇ ≤ g k(t), c(t), t , ∀t;
k(0) = k0 > 0 given; and
a no-Ponzi condition.
Note that capital is a state variable and consumption is a control. The first constraint is is the “transition
equation” or “equation of motion” for the state variable k. The no-Ponzi condition ensures that capital at
the end of time, k(T ), cannot be “too negative.”13
First, imagine setting up a Lagrangian
Z T Z T
µ(t) g k(t), c(t), t − k̇ dt + γk(T )e−T r̄(T ) .
L= v k(t), c(t), t dt + (10)
0 0
13 In
this problem, the no-Ponzi condition takes the form k(T ) exp −T r̄(T ) ≥ 0, where r̄(t) is the average interest rate from
time 0 to t. This implies that assets at time T are weakly positive.
32
When we have a countable number of constraints, we multiply each by a Lagrange multiplier and then add
these; here the transition equation gives a continuum of constraints, each of which we multiply by a Lagrange
multiplier µ(t) and then integrate across these.
If we try to solve this Lagrangian using typical methods, we face difficulty due to the presence of k̇; thus
we attempt to eliminate it. Note that
Z T Z T
d
µ̇k(t) + k̇µ(t) dt = µ(t)k(t) dt = k(T )µ(T ) − k(0)µ(0),
0 0 dt
d
where the first equality follows from the product rule ( dt (µk) = µ̇k+ k̇µ) and the second from the Fundamental
Theorem of Calculus. Hence
Z T Z T
k̇µ(t) dt = k(T )µ(T ) − k(0)µ(0) − µ̇k(t) dt.
0 0
33
and
∂k(T, ε)
= pk (T ).
∂ε
Substituting in gives
Z T
dL ∂H ∂H
+ µ̇ pk (t) dt + pk (T ) γe−T r̄(T ) − µ(T ) .
= pc (t) +
dε 0 |∂c {z } ∂k | {z }
≡A
| {z } ≡C
≡B
Optimality requires that this be zero when evaluated at ε = 0 for all pc and pk . For this to hold, one
can show that this must hold component-by-component. (In general, it is not the case that A + B + C = 0
implies A, B, and C are all zero; it is true here, however.) That is, we need
∂H
=0
∂c
∂H
= −µ̇
∂k
γe−T r̄(T ) = µ(T ).
Connectively, these are know as “The Maximum Principle.” The first-two play the role of first-order conditions,
and the third is the no-Ponzi condition.14
14 Note
that µ(t) measures the shadow price of capital at time t (in utils). The condition that γ exp −T r̄(T ) = µ(T ) means
that the (shadow) “cost of the no-Ponzi condition”—given by its multiplier γ discounted back to the present—is equal to the
shadow cost of the capital stock available at the end of time. If, for some reason, the no-Ponzi condition did not bind, γ = 0. But
then the agent could use a bit more capital in the last period and still not violate the no-Ponzi condition. For this to be optimal,
the value of capital in the last period, µ(T ), must be 0 as well. If the no-Ponzi condition binds, then capital must be costly.
34
H
2. Take FOC for control variable. Set ∂c = 0:
∂H(k, c, t, µ) ∂v k(t), c(t), t ∂g k(t), c(t), t
= + µ(t) =0
∂c(t) ∂c(t) ∂c(t)
for all t.
∂H
3. Take FOC for state variable. Set ∂k = −µ̇:
∂H(k, c, t, µ) ∂v k(t), c(t), t ∂g k(t), c(t), t
= + µ(t) = −µ̇(t)
∂k(t) ∂k(t) ∂k(t)
for all t.
4. Identify TVC. If T < ∞, then the TVC is
µ(T )k(T ) = 0;
that is, either capital equals 0 in the last period (the constraint binds) or the constraint does not bind
and µ(T ) = 0.
If time is infinite, the TVC is
lim µ(t)k(t) = 0.
t→∞
s.t.
k̇1 (t) ≤ g1 k1 (t), . . . , km (t), c1 (t), . . . , cn (t), t , ∀t;
..
.
k̇m (t) ≤ gm k1 (t), . . . , km (t), c1 (t), . . . , cn (t), t , ∀t; and
k1 (0), . . . , km (0) > 0 given.
H
2. Take FOCs for control variables. Set ∂ci = 0:
m
∂H(~k, ~c, t, µ) ∂v ~k(t), ~c(t), t ∂gj ~k(t), ~c(t), t
X
= + µj (t) =0
∂ci (t) ∂ci (t) j=1
∂ci (t)
35
∂H
3. Take FOCs for state variables. Set ∂ki = −µ̇i :
m
∂H(~k, ~c, t, µ) ∂v ~k(t), ~c(t), t ∂gj ~k(t), ~c(t), t
X
= + µj (t) = −µ̇i (t)
∂ki (t) ∂ki (t) j=1
∂ki (t)
µi (T )ki (T ) = 0
for i ∈ {1, . . . , m}. If time is infinite, then the TVCs take the form
But, sometimes it is common instead to proceed as follows. Consider multiplying by eρt to get the current-value
Hamiltonian:
e c, t, µ) ≡ eρt H(k, c, t, µ)
H(k,
= u k(t), c(t) + eρt µ(t) g k(t), c(t), t .
| {z }
≡λ(t)
λ(t) is the current-value shadow price: it gives the value of a unit of capital at time t measured in time-t utils
(i.e., felicits), rather than in time-0 utils.
The Maximum Principle tells us that at an optimum, we have an FOC in the choice variable
∂H ∂H
e ∂H
e
= e−ρt =0 ⇐⇒ = 0.
∂c(t) ∂c(t) ∂c(t)
∂H ∂H
e
= e−ρt = −µ̇(t),
∂k(t) ∂k(t)
36
which, since µ(t) = e−ρt λ(t),
= ρλ(t)e−ρt − λ̇(t)e−ρt
∂H
e
= ρλ(t) − λ̇(t).
∂k(t)
16 Log-linearization
Recall that we have thus far linearized systems like g : Rn → R using the first-order Taylor approximation
about the steady state x∗ :
g(x) ≈ g(x∗ ) + g10 (x∗ )(x1 − x∗1 ) + g20 (x∗ )(x2 − x∗2 ) + · · · + gn0 (x∗ )(xn − x∗n )
= g(x∗ ) + ∇g(x∗ ) · (x − x∗ ).
Linearizing gives an approximation that is linear in x − x∗ . That is, a one unit change in in x causes the
approximated of g(x) to increase by a constant g 0 (x∗ ). Log-linearization instead gives an approximation for
∗
g(·) that is linear in x̂ ≡ x−x
x∗ , or x’s percentage deviation from steady state.
Why would we like to do this? Consider trying to describe the economies of Palo Alto and the United
States using a single model. Given the difference in the economy’s scales, it makes more sense to draw
conclusions about how each would respond to, say, a 5% budget surplus—which might in some sense affect
Palo Alto and the U.S. similarly—than to draw conclusions about how each would respond to a billion dollar
budget surplus.
We sometimes think in terms of percent movements (“stocks went down 2%”) rather than absolute
movements (“the Dow dropped by 400 points”). One advantage of the former approach is that it allows us to
express our conclusions in unitless measures; they are therefore robust to unit conversion. This advantage
∗
also accrues to log-linearization: if x is measured in Euros, then so is x − x∗ , while x̂ ≡ x−x
x∗ is a unitless
measure.
x−x∗
16.1 Why are approximations in terms of x̂ ≡ x∗
called “log-linear”?
Consider a first-order Taylor approximation of the (natural) log function, the most important Taylor
approximation in economics:15
1
log(x) ≈ log(x∗ ) + (x − x∗ )
x∗
x − x∗
log(x) − log(x∗ ) ≈ ≡ x̂.
x∗
Thus just as a standard Taylor approximation is linear in (x − x∗ ), the log-linearization is linear in
x̂ ≈ log(x) − log(x∗ ); that is, it is linear in logarithms.
15 You should memorize it and get used to recognizing it. In particular, you should get used to recognizing when people treat
this approximation as if it holds exactly; typically this occurs when the log first-difference of a time series (log(yt ) − log(yt−1 ))
is treated as equal to the series’ growth rate (yt /yt−1 − 1).
37
16.2 Log-linearization: first approach
To get a log-linearization, there are many techniques. One is to start with the standard linearized (i.e., Taylor
f (x)−f (x∗ ) 16
approximated) version and “build up” x̂ ≡ x−x x∗ and f (x) ≡
∗ d
f (x∗ ) .
Here, we have an expression that describes what happens to the deviation of f (x) from its steady state in
percent terms as x deviates from its steady state in percent terms. A 1% increase in x from the steady state
0
causes f (x) to increase by about f f(x(x∗∗)x) ∗ percent.
You might recognize that this last expression has another name: the elasticity of f (x) with respect to x.
Perhaps it is easier to see when written as
f 0 (x∗ )x∗
∂f x
= · .
f (x∗ ) ∂x f (x) x∗
It is important to get some practice with quickly log-linearizing simple functions. Several examples follow:
• A Cobb-Douglas production function
yt = ktα n1−a
t
k∗α n1−a
∗ k α n1−a
yt ≈ y ∗ + α (kt − k∗ ) + (1 − α) ∗ ∗ (nt − n∗ )
k∗ n∗
∗ y∗ y∗
yt − y ≈ α (kt − k∗ ) + (1 − α) (nt − n∗ )
k∗ n∗
yt − y ∗ (kt − k∗ ) (nt − n∗ )
≈α + (1 − α)
y∗ k∗ n∗
ŷt ≈ αk̂t + (1 − α)n̂t .
• An intereuler
1 1 α−1 1−α
=β αzt+1 kt+1 nt+1
ct ct+1
where zt+1 is fixed. We want to log linearize and find ĉt+1 as a function of ĉt , k̂t+1 , and n̂t+1 .
16 Ournotation here is that ŷ ≡ y−y
y∗
∗
≈ log(y) − log(y∗ ) for any y. Sometimes we will instead use ŷ ≡ log(y) − log(y∗ ) ≈ y−y
y∗
∗
.
Both are standard in Nir’s class, but not necessarily generally; in particular, some authors use capitalization to distinguish a
percentage/log deviation from steady state (or trend, or whatever else is being linearized around).
38
Rearranging, we see that
α−1 1−α
ct+1 = βct αzt+1 kt+1 nt+1
α−1 1−α α−1 1−α
∂ ct αzt+1 kt+1 nt+1 ∂ ct αzt+1 kt+1 nt+1
≈ c∗ + (ct − c∗ ) + (kt+1 − k∗ )
∂ct ∂kt+1
∗ ∗
α−1 1−α
∂ ct αzt+1 kt+1 nt+1
+ (nt+1 − n∗ )
∂nt+1
∗
c∗ c∗ c∗
≈ c∗ + (ct − c∗ ) + (kt+1 − k∗ ) + (nt+1 − n∗ )
c∗ k∗ n∗
c∗ c∗ c∗
ct+1 − c∗ = (ct − c∗ ) + (α − 1)(kt+1 − k∗ ) + (1 − α)(nt+1 − n∗ )
c∗ k∗ n∗
ct+1 − c∗ (ct − c∗ ) (kt+1 − k∗ ) (nt+1 − n∗ )
≈ + (α − 1) + (1 − α)
c∗ c∗ k∗ n∗
ĉt+1 ≈ ĉt + (α − 1)k̂t+1 + (1 − α)n̂t+1 .
• The exponential
• Another function
ext +yt ≈ ex∗ +y∗ + ex∗ +y∗ (xt − x∗ ) + ex∗ +y∗ (yt − y∗ )
xt − x∗ yt − y∗
≈ ex∗ +y∗ + x∗ ex∗ +y∗ + y∗ ex∗ +y∗
x∗ y∗
x∗ +y∗
≈e (1 + x∗ · x̂t + y∗ · ŷt ).
• A sum of functions
xt + yt ≈ x∗ + y∗ + (xt − x∗ ) + (yt − y∗ )
xt − x∗ yt − y∗
≈ x∗ + y∗ + x∗ + y∗
x∗ y∗
≈ x∗ + y∗ + x∗ · x̂t + y∗ · ŷt .
39
• A Cobb-Douglas production function, yt = ktα n1−a
t . Let us start with the left-hand side:
yt = elog(yt )
= eYt
Y∗ ∂ eYt
≈e + (Yt − Y∗ )
∂Yt
Yt =Y∗
≈ elog(y∗ ) + e log(y∗ )
(Yt − Y∗ )
≈ y∗ + y∗ · ŷt .
ktα n1−a
t = eα log(kt )+(1−α) log(nt )
= eαKt +(1−α)Nt
αK∗ +(1−α)N ∗ ∂ eαKt +(1−α)Nt
≈e + (Kt − K∗ )
∂Kt
Kt =K∗
∂ eαKt +(1−α)N t
+ (Nt − N∗ )
∂Nt
Nt =N∗
• An intereuler
1 1 α−1 1−α
=β αzt+1 kt+1 nt+1
ct ct+1
where zt+1 and kt+1 are fixed. We want to find ĉt+1 as a function of ĉt , k̂t+1 , and n̂t+1 . Rearranging
yields that
α−1 1−α
ct+1 = βct αzt+1 kt+1 nt+1
40
such that
k̇t = wt nt + rt kt − δkt − ct . (11)
We set up the Hamiltonian as
Note that there is no disutility associated with working, so they will be chosen to be n = 1 (there is no FOC
in nt ). Per the Maximum Principle, the correct FOCs are:
∂H
=0 ⇐⇒ e−ρt U 0 (ct ) = µt ; and (12)
∂ct
∂H d
= − µt ⇐⇒ µt (rt − δt ) = −µ̇t ; (13)
∂kt dt
(along with a TVC). Equation 11 is a dynamic condition on capital, and equation 13 is a dynamic condition
on the shadow price of capital, µ. Combining equation 12 (which implies µ̇t = e−ρt U 00 (ct )ċt − ρU 0 (ct ) ) with
U 0 (ct )
ċt = (ρ + δ − rt ). (14)
U 00 (ct )
c1−σ
U (c) = =⇒ U 0 (c) = c−σ and U 00 (c) = −σc−σ−1 ,
1−σ
and Cobb-Douglas production (recall that n = 1), hence
From now on, we will use the notation that “hatted” variables represent the relative deviation from steady
d d ˙
state (x̂ ≡ log(x) − log(x∗ ) ≈ (x − x∗ )/x∗ ). Note that k̇t /kt = dt (log kt ) = dt (log kt − log k∗ ) ≡ k̂t . This gives
˙ ct
k̂t = ktα−1 − δ − . (15)
kt
Similarly, equation 14 becomes
1 α−1
ċt = σ ct (αkt − ρ − δ)
ĉ˙t = 1
σ (αkt
α−1
− ρ − δ). (16)
41
and similarly for equation 15,
c∗
= k∗α−1 − δ
k∗
ρ+δ ρ + δ(1 − α)
= −δ = . (18)
α α
˙
Log-linearizing equation 15 (noting that k̂∗ = 0) gives
˙ c∗ c∗
kˆt ≈ (α − 1)k∗α−1 k̂t − ĉt + k̂t
k∗ k∗
Substituting in using the steady state results from equations 17 and 18,
ρ+δ ρ + δ(1 − α) ρ + δ(1 − α)
= (α − 1) + k̂t − ĉt
α α α
ρ + δ(1 − α)
= ρk̂t − ĉt (19)
α
cˆ˙t ≈ 1
σ α(α − 1)k∗α−1 k̂t .
(α − 1)(ρ + δ)
= k̂t (20)
σ
Combining equations 19 and 20 in matrix notation, we have the dynamic system
" # " #
˙ ρ − ρ+δ(1−α) k̂t
k̂t = α .
(α−1)(ρ+δ)
ĉ˙t σ 0 ĉt
If we consider this system as ẋt = Axt , we could “decouple” the system using the eigen decomposition
A = P ΛP −1 . Thus the system becomes dt d
P −1 xt = ΛP −1 xt . The solution is xt = P eΛt P −1 x0 .17
18 Optimal taxation
18.1 The Ramsey model
In our models so far, we have only had two types of agents: households and firms. When we have considered
government spending, we have always specified it entirely in terms of an exogenous stream {gt }t that is
taken from households. Because the revenue requirement was exogenous, and because the government only
had a single instrument by which to collect it—lump-sum, or “head”—taxes, there was no flexibility in the
government’s behavior and no reason to treat it as an agent in the model.
If we relax either or both of these restrictions—allowing the government to choose the level of revenue
collected each period and/or the means of collecting it—we move to a class of models called “optimal taxation”
or “Ramsey” models. How does the government make its taxation choices in these models? We will generally
d
17 To see this, first note that the system is dt x̃(t) = Λx̃(t), where we define x̃(t) ≡ P −1 x(t). This gives separable differential
R R
equations in each element i ∈ {1, 2} of x̃ of the form dx̃i (t)/x̃i (t) = Λi dt =⇒ dx̃i (t)/x̃i (t) = Λi dt =⇒ log x̃i (t) = tΛi +
log x̃i (0) , where the constant of integration is pinned down by the initial condition. Finally, this implies x̃i (t) = exp(tΛi ) · x̃i (0),
or P −1 xi (t) = exp(tΛi )P −1 xi (0) =⇒ xi (t) = P exp(tΛi )P −1 xi (0).
42
assume that the government’s objective is to maximize household welfare, noting that the government knows
households will maximize their own welfare subject to tax policy. A stylized way of describing this is as
follows:
max max U (c, τ ) (21)
τ s.t. ... c
where τ are the government’s tax policies, c are the representative household’s choices, and there is some
constraint on the outer maximization (i.e., the government’s) of the form g(τ, c∗ (τ )) ≥ 0, designed to capture
the fact that the government must raise enough money to achieve some exogenous goal.18 A solution gives a
“Ramsey equilbrium”: allocations, prices, and taxes such that
1. Households maximize utility subject to their budget constraints, taking prices and taxes as given,
2. Government maximizes households’ utility while financing government expenditures (i.e., meeting its
budget constraint or constraints),
3. Markets clear, and
4. The economy’s resource constraint is satisfied.
Consider an example: an economy with a representative household and the government. In this economy
there is no capital; households can produce a perishable consumption good with technology f (n) = n, and
they can invest in government bonds. Further suppose that the government must raise (exogenous) gt each
period, which it can do through an income tax or through government debt.
We can write the household problem as
∞
X
max β t u(ct , nt )
{ct ,nt ,bt+1 }
t=0
s.t. ct + qt bt+1 = nt (1 − τt ) + bt , ∀t;
b0 given.
The household’s resource uses are consumption and purchase of bonds; its sources are production (f (nt ) = nt )
net of taxes (τt nt ) and bond coupons. The government’s resource sources are bond sales and tax revenue,
and its uses are bond repayment and government spending; thus the government budget constraints are
Finally, it is useful to write the economy’s resource constraint (which must hold by the combination of
household and government budget constraints):
ct + gt = nt . (23)
So what does the government do? Following the approach suggested by equation 21, it could
1. Find the household optimum as a function of tax rates ~τ ≡ {τt }t and bond prices ~q ≡ {q}t . Setting up
and solving the household problem gives the following intra- and intereulers: for all t,
These help pin down the optimal c∗t (~τ , ~q), n∗t (~τ , ~q) t ; the household budget constraints then allow us
∗
to find the bond holdings bt+1 (~τ , ~q) t .
18 Note that τ captures all tax policies (in each period, for each instrument available to the government), c captures all
household choices (e.g., consumption, hours, assets, and/or capital), and the government’s budget constraint could potentially
be imposing period-by-period revenue requirements.
43
2. Choose ~τ and ~q to maximize
U ∗ (~τ , ~q) ≡ U c∗t (~τ , ~q), n∗t (~τ , ~q) t ,
max U (c),
c s.t. ...
where the government is constrained to choose a c that are the household’s optimal choice (c = c∗ (τ )) for
some τ satisfying the government’s budget constraint g(τ, c∗ (τ )) ≥ 0. This is called the “implementability
constraint.”
Proceeding with our example from above, which allocations {ct , nt , bt+1 }t are consistent with household
optimization for some tax rate and bond prices satisfying the government’s budget constraint? The intereuler
(equation 25) allows us to pin down bond prices in terms of allocations:
uc (ct+1 , nt+1 )
qt = β .
uc (ct , nt )
With the government budget constraint (equation 22), this lets us pin down the tax rates in terms of
allocations.
bt + gt − qt bt+1
τt =
nt
bt + gt − β uc (ct+1 ,nt+1 )
uc (ct ,nt ) bt+1
= . (26)
nt
Plugging this into the intraeuler (equation 24) gives
bt + gt − β uc (ct+1 ,nt+1 )
uc (ct ,nt ) bt+1 un (ct , nt )
1− =− ,
nt uc (ct , nt )
where either c or n (here c) can be eliminated using the economy’s resource constraint (equation 23):
−gt+1 ,nt+1 )
bt + gt − β uc (nut+1
c (nt −gt ,nt )
bt+1 un (nt − gt , nt )
1− =− .
nt uc (nt − gt , nt )
Thus we can write the government’s optimal taxation problem as
∞
X
max β t u(nt − gt , nt )
{nt ,bt+1 }
t=0
uc (nt+1 −gt+1 ,nt+1 )
bt + gt − β uc (nt −gt ,nt ) bt+1 un (nt − gt , nt )
s.t. 1− + = 0.
nt uc (nt − gt , nt )
| {z }
≡η(nt ,gt ,bt ,nt+1 ,gt+1 ,bt+1 )
44
Although it looks ugly, this “primal” problem is actually straightforward to solve!19
In summary, the primal approach uses the “solution” to the household problem, the government’s budget
constraint, and the economy’s resource constraint to eliminate taxes (and prices) from the government’s
problem. Instead, the government faces an implementability constraint (here, η(nt , gt , bt , nt+1 , gt+1 , bt+1 ) = 0
for all t). Maximizing over allocations should then be relatively easy, and the implementability constraint
ensures that the allocations are consistent with some taxes that satisfy the government’s budget constraint.
The last step is to solve for the taxes; here we would use equation 26.
19 Introducing uncertainty
Let S be the set of events that could occur in any given period. For now, we will assume that S has finitely
many elements, but this is mostly just for notational convenience; the intuition and most results go through if
the shock space is countably infinite or continuous. Note that we are also assuming discrete time; considering
uncertainty with continuous time makes things significantly more complex, and we will not do so this quarter.
In the simplest possible example, suppose that we flip a coin in each period t ≥ 1. Letting st denote the
realization of the event at time t, we have
st ∈ S = {H, T }, ∀t ∈ {1, 2, . . . }.
Suppose we want to characterize the history of this stochastic process through period t; we denote this
object st . For example, after three periods, we may have observed s3 = (H, H, T ). Generally,
st ≡ (s1 , s2 , . . . , st ) ∈ S × · · · × S = S t , ∀t.
This notation can be confusing; a good mnemonic is to use the notational consistency in “st ∈ S t ” to help
remember that superscripts represent histories.
Any stochastic variable in a model—whether exogenous or endogenous—can only depend on the uncertainty
that has already been revealed. That is, if some variable ct is “determined” at time t, that means that
knowing the history st must give us enough information to pin down ct . We often use the notation ct (st ) to
indicate explicitly that ct depends on the resolution of uncertainty in the first t periods.
For example, suppose that we make the following bet: I flip a coin each of two days; you pay me $5 today,
and I give you back $11 tomorrow if my coin flips came up the same both days.20 Your incomes in the periods
are given by the random variables y1 (s1 ) and y2 (s2 ), where
y1 (H) = −5, y2 (H, H) = 11,
y1 (T ) = −5; y2 (H, T ) = 0,
y2 (T, H) = 0,
y2 (T, T ) = 11.
19 If you would like a good exercise that allows you to take this example further, consider the following quasilinear felicity
function: u(c, n) = c − a(n) for a strictly convex function a(·). Show that the interest rate on government bonds is 1/qt = β −1 ,
and that no matter the (deterministic) sequence of {gt }t , there is an optimum with tax rates and labor constant over time. Note
that these results are specific to this utility function, and require that the government be able to credibly commit in advance to
a specific tax plan {τt }t .
20 Suppose I toss a fair coin (and will not run off with your money). Would you take this bet? Your answer will depend on
45
19.1 Probability
We denote the (unconditional) probability of observing any particular history st ∈ S t in periods 1, . . . , t by
Pr(st ). (The notation π(·) is also common.) For Pr(·) to be a well-defined probability measure, we require
that Pr(st ) ≥ 0 for all st ∈ S t , and that X
Pr(st ) = 1.
st ∈S t
In our (fair) coin-tossing example, we have Pr(st ) = 2−t for all t and st ∈ S t . Often, we will only allow the
first case, where st remains possible given sτ (i.e., sτ comprises the first τ elements of st ).
We also consider the probability of observing a particular history st conditional on having already observed
history sτ for τ ≤ t. We denote this probability Pr(st |sτ ). Returning to coin tossing,
(
t τ 2−(t−τ ) , if (st1 , st2 , . . . , stτ ) = sτ ;
Pr(s |s ) =
0, otherwise
Pr(s0 ) ≡ 1;
Pr(st |s0 ) ≡ Pr(st ), ∀t ≥ 0.
Using this second extension, we can specify the full probability structure with just the conditional probability
functions Pr(·|·).21
We rely here on the linearity of the expectation operator; we will often do so.
20 Markov chains
Although the general structure laid out above will be useful, we will often be willing to impose more structure
on the probability structure. The most common structure we will assume is that {st }t forms a Markov chain.
21 It is a bit cumbersome to have period zero be “different” from all other periods, and we will not always do so. However,
it offers two advantages. The first is that having period zero be non-stochastic allows st ∈ S t , rather than st ∈ S t+1 . More
importantly, having a non-stochastic state allows us to write unconditional probabilities as conditional probabilities, since we
can reference/condition on a history (s0 ) that is entirely uninformative.
22 Preferences admit a von Neumann-Morgenstern representation if and only if they satisfy:
1. Continuity: For any x, x0 , x00 with x % x0 % x00 , there exists α ∈ [0, 1] such that αx + (1 − α)x00 ∼ x0 ;
2. Independence: For any x, x0 , x00 and α ∈ [0, 1], we have x % x0 ⇐⇒ αx + (1 − α)x00 % αx0 + (1 − α)x00 ; and
3. A “sure thing principle”
(in addition to the usual completeness and transitivity). Extensive discussion of these points is conducted in Economics 202.
46
Definition 19. A stochastic process {st }t satisfies the Markov property (or is a Markov chain) if for
all t1 < t2 < · · · < tn ,
Pr(stn = s|stn−1 , stn−2 , . . . , st1 ) = Pr(stn = s|stn−1 ).
Although the notation looks odious, the intuition is not bad. A Markov process is one where if one has
information about several realizations (stn−1 , . . . , st1 ), only the latest realization (stn−1 ) is useful in helping
predict the future.
Note that by the definition of conditional probability (and the trivial fact that Pr(st |st−1 ) = Pr(st |st−1 )),
Fortunately, the Markov property implies that Pr(st |st−1 ) = Pr(st |st−1 ), so for a Markov chain
t−1
Y
Pr(st |sτ ) = Pr(sj+1 |sj ).
j=τ
for all t, τ ≤ t, st ∈ S t , and sτ ∈ S τ with sτ = (st1 , st2 , . . . , stτ ). Thus we can specify the entire probability
structure of a Markov chain if we know all of its “transition probabilities” Pr(st+1 |st ). In general, these
transition probabilities may depend on the time t. However, this may not be the case, in which case our
Markov chain is time-invariant.
Definition 20. A time-invariant Markov chain is a stochastic process satisfying the Markov property
and for which
Pr(st+1 = j|st = i) = Pr(st+2 = j|st+1 = i)
for all t and (i, j) ∈ S 2 . This implies that the transition probabilities are constant over time (by induction).
For a time-invariant Markov chain, we can summarize these transition probabilities in a transition matrix.
Suppose without loss of generality that the shock space S = {1, 2, . . . , m}. Then we define the transition
matrix P by
Pij = Pr(st+1 = j|st = i)
for every
P i and j (and any t: by time-invariance, the choice does not matter). Every row of P must add to
one ( j Pij = 1 for all i), which means that P can be called a “stochastic matrix.”
Note that an iid process (i.e., one for which the realization is distributed independently and identically
across periods) is a time-invariant Markov chain, and will have a transition matrix where every row is
identical.
47
Can we say anything about an analogous vector π1 characterizing the (unconditional) probability distri-
bution of s1 ∈ S? That is, we seek a vector with elements given by π1 i = Pr(s1 = i). By the law of total
probability, X
π1 i = Pr(s1 = i) = Pr(s1 = i|s0 = j) · Pr(s0 = j) .
| {z } | {z }
j
=Pji =π0 i
This looks like the matrix multiplication algorithm, and indeed is equivalent to stating that π1 = P 0 π0 .
Similarly, for any t,
πt+1 = P 0 πt or equivalently 0
πt+1 = πt0 P, (27)
πt = (P 0 )t π0 πt0 = π00 P t .
The first probability on the right-hand side can be simplified by the Markov property:
X
= Pr(st+2 = j|st+1 = k) · Pr(st+1 = k|st = i)
| {z } | {z }
k
=Pkj =Pik
2
= (P )ij .
48
20.4 Ergodic distributions
Consider starting a Markov chain with initial distribution π0 , and “running the process forward” arbitarily.
It seems the distribution across states should be given by π∞ (π0 ) ≡ limt→∞ πt = limt→∞ (P 0 )t π0 . We can
call this the “limiting distribution,” but should check first whether this limit is even well defined! It turns
out that it may or may not be. The clearest illustrations come from considering P = I (in which case the
limiting distribution is the initial distribution), and the process with transition matrix as given in equation 28
(which has no limiting distribution unless π0 = [ 12 , 12 ]0 ).
Note that any limiting distribution must be stationary.
We will put aside the question of whether a Markov chain has a limiting distribution; suppose for now
that it does. In fact, for some Markov chains, the limiting distribution not only exists, but does not depend
on the initial distribution.
Definition 21. A (time-invariant) Markov chain is said to be asymptotically stationary with a unique
invariant distribution if all initial distributions yield the same limiting distribution; i.e., π∞ (π0 ) = π∞
for all π0 . This limiting distribution, π∞ , is called the ergodic distribution of the Markov chain, and it is
the only stationary distribution of the Markov chain.
We state without proof several important results about asymptotically stationary Markov chains with
unique invariant distributions.
Theorem 22. Let P be a stochastic matrix with Pij > 0 for all i and j. Then P is asymptotically stationary,
and has a unique invariant distribution.
Theorem 23. Let P be a stochastic matrix with (P m )ij > 0 for all i and j for some m ≥ 1. Then P is
asymptotically stationary, and has a unique invariant distribution.
This means that as long as there is a strictly positive probability of moving from any particular state
today to any particular state in one or more steps,23 then an ergodic distribution exists. Thus if we can find
an m for which (P m )ij > 0 for all i and j, we can
1. Note that P must therefore have an ergodic distribution, and that it must be the unique stationary
distribution of P , and
2. Solve (P 0 − I)π∞ = 0 for this unique distribution.
to write.
49
Suppose that agent i receives a stochastic endowment {yti (st )}∞t=0 of the (perishable) consumption good. The
only assets available to agents are a complete set of state-contingent securities traded at time 0, which (in
equilibrium) are priced at qt (st ).24 Agent i’s problem is therefore to
∞ X
X
β t u cit (st ) Pr(st )
max
i
s.t.
{ct (st )}
t=0 st ∈S t
∞ X
X
qt (st ) yti (st ) − cit (st ) ≥ 0.
t=0 st ∈S t
Although ugly, this equality tells us something important: consumption of each agent cjt (st ) depends on the
state st only through the realization of the aggregate endowment ȳ(st ).
If we are prepared to impose a functional form on u(·), we can go a bit further. Suppose for example that
c1−σ − 1
u(c) = ,
1−σ
u0 (c) = c−σ ,
(u0 )−1 (x) = x− /σ .
1
each period in contracts that pay off in the subsequent period depending on what state occurs.
50
22 Perfect and imperfect insurance practice question
This question comes from the Economics 211 midterm examination in 2007. It is based on a model of
Doireann Fitzgerald’s from 2006.
Question
There is just one period. Suppose there are two countries: A and B. Country A receives an endowment of
yA (s) units of the consumption good as a function of the state of the world s. Country B receives yB (s). Let
π(s) > 0 denote the probabilities of the state of the world s.
The representative agent in each country has a utility function given by
X
π(s) log c(s),
s
1. Set up the Pareto problem, letting λ be the planner’s weight on Country B. Show that in a Pareto
optimal allocation, each country consumes a constant fraction of the total endowment yA (s) + yB (s).
2. Suppose that before the state of the world is realized, the countries can trade claims on consumption in a
complete asset market. Assume that initially, each country owns the claim on its stochastic endowment.
Solve for a competitive equilibrium and show that it is Pareto optimal.
Suppose now that the consumption good is costly to ship across countries. In particular, for a unit of
consumption good to arrive in Country B from Country A, 1 + t units of consumption have to be shipped
from Country A (and similarly from Country B to A), where t > 0. We can think of t as capturing a piece of
the consumption good that melts away due to transportation costs.25 Suppose also that there are only two
states of nature s ∈ {s1 , s2 } and that the endowments are
(
1, for s = s1
yA (s) =
0, for s = s2
Solution
1. The Pareto problem is to
X X
max (1 − λ) π(s) log cA (s) + λ π(s) log cB (s) s.t.
{cA (s),cB (s)}s∈S
s∈S s∈S
cA (s) + cB (s) ≤ yA (s) + yB (s), ∀s ∈ S.
| {z }
≡ȳ(s)
25 Indeed, this form of transportation costs are often called “iceberg costs,” although melting is only one of the analogies that
51
Setting up the Lagrangian
X
L≡ π(s) (1 − λ) log cA (s) + λ log ȳ(s) − cA (s)
| {z }
s∈S
=cB (s)
cA (s) = (1 − λ) · ȳ(s),
cB (s) = λ · ȳ(s).
2. Let the price of a claim on consumption in state s be q(s). An equilibrium is prices {q(s)}s∈S and
allocations {cA (s), cB (s)}s∈S such that the markets clear (cA (s) + cB (s) = yA (s) + yB (s) for all s ∈ S),
and {ci (s)}s∈S solves the problem of country i ∈ {A, B} given {q(s)}s∈S .
The problem in country i is to
X
max π(s) log ci (s) s.t.
{ci (s)}s∈S
s∈S
X
q(s) yi (s) − ci (s) ≥ 0. (30)
s∈S
π(s)
= µi q(s). (31)
ci (s)
cA (s) µB
= .
cB (s) µA
52
(This is an example of a more general result show in class: q(st |s0 ) ∝ β t u0 (c(st )) Pr(st ).) We are allowed
to normalize one price, which we can do by requiring that (µA +µB )/(µA µB ) = 1. This gives us equilibrium
prices
π(s)
q(s) = .
ȳ(s)
Substituting into A’s (binding) budget constraint (equation 30):
X π(s) µB
yA (s) − ȳ(s) = 0
ȳ(s) µ +µ
s∈S | {z } | A {zB }
=q(s) =cA (s)
X yA (s) µB X ȳ(s)
π(s) = .
π(s)
ȳ(s) µA + µB ȳ(s)
s∈S s∈S
This allows us to pin down consumption:
X
yA (s)
cA (s) = π(s) ȳ(s),
ȳ(s)
s∈S
X
yB (s)
cB (s) = π(s) ȳ(s).
ȳ(s)
s∈S
53
23 Asset pricing with complete markets
For our purposes, the term “asset” refers to a contractually-guaranteed right to delivery of consumption
goods, with the amount of delivery conditional on the history of the world st . The notation can be a little bit
tricky, but there are three main ways that we denote the prices of assets:
1. q τ (st ): This is the price of an asset that delivers one unit of consumption at history st , where the price
is paid in history-sτ consumption goods (for τ ≤ t, and assuming that sτ1 = st1 , . . . , sττ = stτ ). This
notation captures the price of Arrow-Debreu securities, q 0 (st ); these are actually sufficient to pin down
all q τ (st ) according to
q 0 (st )
q τ (st ) = 0 τ .
q (s )
The Arrow-Debreu prices are pinned down (up to a normalization) by any agent’s consumption, since
the first-order conditions of the consumer problem
X
t t
max
t
E β u ct (s ) s.t.
{{ct (s )}st ∈S t }t
t
X X
q 0 (st )ct (st ) ≤ B
t st ∈S t
Thus
λ−1 β t u0 ct (st ) π(st )
q 0 (st )
q τ (st ) = =
q 0 (sτ ) λ−1 β τ u0 cτ (sτ ) π(sτ )
0 t
t−τ u ct (s )
=β π(st |sτ ). (32)
u0 cτ (sτ )
2. p0 (sτ ): This is the price of an asset that delivers d(st ) units of consumption at every history st for t ≥ τ
if history sτ is achieved, where the price is paid in time-zero consumption goods. This is a “redundant”
asset; i.e., it could be created with a suitable combination of Arrow-Debreu securities, which determines
the asset’s price:
XX
p0 (sτ ) = q 0 (st )d(st ).
t≥τ st |sτ
In the simplest case, where sτ = s0 ≡ s0 , the asset delivers d(st ) at every st ; the price (paid in time-zero
consumption goods) is
XX
p0 (s0 ) = q 0 (st )d(st ).
t st
3. pτ (sτ ): This is the price of an asset that delivers d(st ) units of consumption at every history st for t ≥ τ
if history sτ is achieved, where the price is paid in history-sτ consumption goods. (This is sometimes
called the price of the “tail asset.”) To convert from a price measured in time-zero consumption goods,
we must divide by the time-zero price of history-sτ consumption goods:
p0 (sτ ) X X q 0 (st )
pτ (sτ ) = = d(st ). (33)
q 0 (sτ ) t τ
q 0 (sτ )
t≥τ s |s | {z }
=q τ (st )
54
Suppose that d(·) is such that d(st ) = 0 for all t 6= τ + 1; that is, the asset can only pay off in the
period after it is “purchased.” By equations 32 and 33,
X
pτ (sτ ) = q τ (sτ +1 )d(sτ +1 )
sτ +1
u0 cτ +1 (sτ +1 )
X
= β π(sτ +1 |sτ )d(sτ +1 )
u 0 c (sτ )
sτ +1 τ
0
u cτ +1 (sτ +1 )
τ +1
= Eτ β d(s ) ,
u0 cτ (sτ )
| {z }
≡mτ +1 (sτ +1 )
where mτ +1 is called the stochastic discount factor (note that it sort of tells us how much the consumer
discounts consumption payouts in sτ +1 when valuing them in units of history-sτ consumption). This
also gives an expression that functions as a stochastic intereuler: defining the (stochastic) return on the
τ +1
asset as Rτ +1 (sτ +1 ) ≡ d(s )/pτ (sτ ),
1 = Eτ mτ +1 Rτ +1 .
We can also apply the law of iterated expectations to ensure that an analogous result holds with
unconditional expectations:
1 = E mτ +1 Rτ +1 .
55
• When developing a recursive formulation for agents’ problems, the state space is typically much more
complicated. Perfect insurance under complete markets means that we often need only include, for
example aggregate asset holdings and today’s shock realization. With incomplete markets, asset holdings
vary across agents, so we need to track the full distribution.
Question
Suppose an agent consumes two goods every period: bananas and newspapers. Let the per-period utility
function given consumption of bananas cb and consumption of newspapers cn be given by u(cb , cn ). Let us
assume that this utility satisfies all standard properties (strictly concave, strictly increasing, and differentiable).
The agent lives for two periods. Let p1 and p2 (s) denote the price of newspapers in units of bananas in
periods 1 and 2, respectively, where s represents a stochastic state of the world that realizes in period 2. The
agent receives an endowment in period 1 equal to y1 units of bananas, and receives y2 (s) units of bananas in
period 2. The probabilities of states of the world are denoted by π(s) > 0. The agent maximizes expected
discounted utility, where the discount factor is given by β.
Suppose the agent can save in a riskless bond that returns R units of bananas in period 2 irrespective of
the state of the world. Suppose that the agent cannot borrow. Assume for the questions that follow that the
solution to the agent problem is interior—i.e., the borrowing constraint does not bind.
Suppose that utility is separable between bananas and newspapers and takes the following form:
c1−γ b
c1−γn
u(cb , cn ) = b
+ n .
1 − γb 1 − γn
1. Show that the following equation holds:
−ρ
cb2
βE R =1 (34)
cb1
for some ρ, where cb1 and cb2 are consumption of bananas in periods 1 and 2, respectively.
Suppose that an econometrician observes the interest rate on the risk-free bond, but only has information
about the number of bananas consumed by the agent in periods 1 and 2. She does not observe the agent’s
endowment, his consumption of newspapers, nor the price of newspapers (p1 and p2 ).
2. Assuming that the econometrician observes several of these agents at different points in time (with
possibly different risk-free interest rates), can she estimate γb ? What about γn ? Ignoring non-linearities,
write down the regression that could be used.
Suppose for the rest of the question that the utility is non-separable and takes the following form:
3. Show that the econometrician in part 2 can estimate γ as long as p2 (s) = p1 (the price of newspapers is
constant) by estimating the same Euler equation as in part 2. Can she estimate α?
56
Solution
1. The consumer’s problem is to
X
max u(cb1 , cn1 ) + β π(s)u u(cb2 (s), cn2 (s)) s.t.
cb1 ,cn1 ,{cb2 (s),cn2 (s)}s∈S ,a
s∈S
cb1 + p1 cn1 + a ≤ y1 ;
cb2 (s) + p2 (s)cn2 (s) ≤ y2 (s) + Ra, ∀s;
a ≥ 0.
Setting up the Lagrangian (assuming that a ≥ 0 does not bind, and substituting in for cb1 and cb2 (s)
using the budget conditions)
X
L ≡ u(y1 − p1 cn1 − a, cn1 ) + β π(s)u y2 (s) + Ra − p2 (s)cn2 (s), cn2 (s)
s∈S
c−γ b −γb
b1 = βRE[cb2 ]
−γb
cb2
1 = βE R .
cb1
2. Define ε by
−γb
cb2
βR ≡ eε ;
cb1
Thus the econometrician can estimate γ̂b using OLS, since eE[ε] ≈ E[eε ] = 1 =⇒ E[ε] ≈ 0.
The intereuler for newspaper consumption is
−γn
p1 cn2
1 = βE R .
p2 cn1
As p1 , p2 (s), and endowments are unobserved, the econometrician cannot recover cn2/cn1 . Hence, she
cannot estimate γn . (Note that I do not know if we can actually prove that γn is not estimable.)
57
3. Using the new functional form of u, the banana intereuler becomes
h i
α(1−γ)−1 (1−α)(1−γ) α(1−γ)−1 (1−α)(1−γ)
αcb1 cn1 = βRE αcb2 cn2
" (1−α)(1−γ) #
α(1−γ)−1
cb2 cn2
1 = βRE .
cb1 cn1
Hence, if p1 = p2 , γ can be estimated exactly as in part 2. However, the econometrician cannot estimate
α from this intereuler. (As above, I do not know if we can actually prove that α is not estimable.)
Question
Hall (1978) showed that consumption should follow an AR(1) process, and that no other variable known at
time t should influence the expected consumption at time t + 1. Mankiw (1982) generalized Hall’s results for
the case of durable consumption as follows. Suppose consumers have the following preferences over durable
goods and non-durables:
X∞ X
π(st )β t u K(st ) ,
t=0 st ∈S t
where
K(st ) = (1 − δ)K(st−1 ) + c(st ),
K represents the stock of the durable good, and c the acquisition of new durables. Consumers have access to
a riskless bond. Labor income is the only source of uncertainty, and the gross interest rate is constant and
equal to R. The evolution of assets is given by
1. Show that the first-order condition for optimality implies (ignoring the possibility of a binding borrowing
constraint) that X
u0 K(st ) = βR π(st+1 |st )u0 K(st+1 ) .
st+1
58
3. If u is quadratic, also show that
αc(st ) = γ2 + α − (1 − δ) K(st ) + (1 − δ)ε(st ),
and hence
where st is the history of the consumer’s shocks (not the history of the world).
Each consumer receives an i.i.d., normally-distributed stochastic endowment each period:
y(st ) ∼ N (ȳ, σ 2 ).
Note that by a law of large numbers, there is no aggregate uncertainty. Since there is no “worst” possible
shock, we cannot apply a natural borrowing constraint (σ = −∞); we do need to prevent over-borrowing,
but will do so through an (unstated) no-Ponzi condition. Furthermore, we must allow consumption to be
negative.
The only markets available are for risk-free bonds. The consumer problem is therefore to
X
t t−1 t t
max
t
E β u Rt−1 a(s ) + y(s ) − a(s ) .
{a(s )}t
t
Given the stationarity of the problem, we can also write this as a functional equation:
V (x) = max u(x − a) + βEV (Ra + y 0 ) ,
a
where x is the “cash on hand” after the realization of the stochastic shock.
Earlier in the term, we discussed several techniques for solving problems like this one. We will proceed by
“guessing and verifying;” fortunately, we will start with some very good guesses:
1
V (x) = − e−Âx−B̂ ,
γ
c(x) = Ax + B, and
a(x) = (1 − A)x − B.
59
If these guesses are correct, the value function becomes
1 1 0
− e−Âx−B̂ = − e−γAx−γB +βE −1/γ · e−Â(1−A)Rx+ÂBR−Ây −B̂
γ γ | {z }
=V (Ra+y 0 )
| {z } | {z }
=V (x) =u(c)
1 β 0
= − e−γAx−γB − e−Â(1−A)Rx+ÂBR−B̂ E eÂy .
γ γ
0 2
σ 2 /2
Since Ây 0 ∼ N (−Âȳ, Â2 σ 2 ), we have E[eÂy ] = e−Âȳ+Â .
1 β 2 2
= − e−γAx−γB − e−Â(1−A)Rx+ÂBR−B̂−Âȳ+Â σ /2
γ γ
2
σ 2 /2
e−Âx−B̂ = e−γAx−γB + βe−Â(1−A)Rx+ÂBR−B̂−Âȳ+Â
2
σ 2 /2
e−B̂ = e−(γA−Â)x−γB + βe(Â−ÂR+ÂAR)x−B̂−Âȳ+Â .
Since the left-hand side does not depend on x, the right-hand side also must not;26 this implies γA − Â = 0
and  − ÂR + ÂAR = 0, or
R−1
A= ,
R
R−1
 = γ .
R
V 0 (x) = u0 c(x)
 −Âx−B̂
e = e−γAx−γB
γ
log A − − B̂ =
Âx −Âx − γB
2
σ 2 /2
= βRAe−ÂR(1−A)x+ÂRB−B̂−Âȳ+Â .
26 Here we act as if the fact that the sum of the right-hand-side terms does not depend on x means that neither term depends
on x.
27 If we want to confirm that our functional form guesses are correct, we should return—after pinning down B and B̂—to
2 2
confirm that e−B̂ = e−γB + βe−B̂−Âȳ+Â σ /2 . We will not do this.
60
Taking logarithms and cancelling terms using equation 36,
−γAx − γB
= log(βR) + log
− Âȳ + Â2 σ2/2
A − ÂR(1 − A)x + ÂRB −
B̂
R −1 R−1 1 R−1 R−1 (R − 1)2 σ 2
−γ x = log(βR) − γ R x+γ RB − γ ȳ + γ 2 ·
R R R R R R2 2
log(βR) ȳ R − 1 σ2
B=− + −γ · .
γ(R − 1) R R2 2
Thus we can write A, Â, B, and B̂ as a function of R. Plugging what we know into the consumption equation
gives that
R−1 ȳ γ(R − 1)σ 2 log(βR)
c(x) = Ax + B = x+ − 2
− .
| R {z R} | 2R
{z } γ(R − 1)
| {z }
Permanent income Precautionary savings Slope (?)
Using this expression and a market clearing condition, we could (in theory) pin down the equilibrium interest
rate. One way to do this is to consider the evolution of the agents’ cash-in-hand:
x0 = Ra(x) + y 0
= R (1 − A)x − B + y 0
= x − RB + y 0
log(βR) γ(R − 1)σ 2
=x+R + + (y 0 − ȳ).
γ(R − 1) 2R
Note that if βR = 1, the drift is positive: cash-in-hand (and hence assets) diverge in expectation to positive
infinitity. In fact, assets must diverge in expectation (to positive or negative infinity) unless the drift term
equals zero, or
∗ 2 2
1 2 R −1 σ
β = ∗ exp −γ · .
R R∗ 2
The right-hand side is decreasing in R, so no drift implies a unique value of R∗ with βR∗ < 1. It is
straightforward to consider comparative statics of this equilibrium interest rate in terms of β, γ, and σ 2 .
One final note: this economy does not have a steady-state. The distribution of wealth follows a random
walk, which means that its unconditional variance increases without bound over time. We can think of this
as the gap between the richest and poorest agents—who, recall, are ex ante identical—growing endlessly.
• The distribution of shock realizations has bounded support. That means we can define a ymin and ymax
with Pr(y 6∈ [ymin , ymax ]) = 0.28 Because there is a “worst” shock, we can define a natural borrowing
28 We will further insist that y
min and ymax be the highest and lowest values, respectively, for which Pr(y 6∈ [ymin , ymax ]) = 0.
This can be written as requiring that for all ε > 0, Pr(y ∈ [ymin , ymin + ε]) > 0 and Pr(y ∈ [ymax − ε, ymax ]) > 0.
61
constraint: an agent can never borrow so much that he can never repay her debt, even if she receives
the worst possible shock from now on; i.e., a(st ) ≥ −φ, where
ymin
φ≡ .
R−1
• The agents’ coefficients of relative risk aversion must be bounded above. In our earlier example
consumers had CARA, which implied increasing relative risk aversion. The result was that as they got
richer, they saved an increasing fraction of their wealth. We were able to keep average asset holdings
bounded, but only for one particular value of R∗ < 1/β ; by bounding the coefficient of relative risk
aversion, we can keep (expected) assets from growing arbitrarily for all R < 1/β . (This also relies on the
existence of an upper bound on possible shocks.)
The model can be more convenient to analyze when values are renormalized as follows in terms of the
borrowing constraint:
Original Renormalized
Assets a â ≡ a + φ
Cash-in-hand x z ≡x+φ
Cash-in-hand evolution x0 = Ra + y 0 z 0 = Râ + ỹ 0
Endowment y ỹ ≡ y − (R − 1)φ
Consumption c=x−a c = z − â
Although the endowment renormalization looks a bit strange, it takes a form implied by our definitions of â
and z, along with a desire for z to evolve similarly to how x does.
The representative consumer’s problem is characterized by the following functional equation:
If the borrowing constraint does not bind, this takes the form of our standard intereuler. When the borrowing
constraint does bind, the consumer would like to consume more (i.e., lower the left-hand side through c) and
save less (i.e., raise the right-hand side through â). We define (with apologies for the unfortunate notation
choice) ẑ as the level of cash-in-hand at which the borrowing constraint barely binds.
If z ≤ ẑ, the agent borrows up to the max, so â(z) = 0 and c(z) = z. If z ≥ ẑ, the FOC holds with
equality:
u0 z − â = βREV 0 (Râ + ỹ 0 ).
As we consider going from z up to z̄ > z, we would need â to increase the same amount to keep the left-hand
side constant. But this would lower the right-hand side somewhat; thus â must increase less than one-for-one
with z.
62
• Complete vs. incomplete markets. Under complete markets, agents can buy and sell a complete set
of contingent securities. We typically think of these securities as trading in a time-zero (Arrow-Debreu)
market, but this is mainly a mathematical convenience—we showed that there is an “equivalence”
between the allocations and prices that obtain at equilibrium in this market and in a sequence of
period-by-period markets. When agents have access to complete markets, they have the ability to
hedge all idiosyncratic shocks.29 We showed that they take advantage of this ability, so that individual
consumption can only depend on the history of the world through the history of aggregate shocks.
• Partial equilibrium vs. general equilibrium. We have considered two types of economies. The
first is small “open” economies, where trade takes place with foreigners; in these markets, asset prices
are exogenously set by the global market, and the openness of the market means economy-wide resource
constraints don’t have as much bite as in closed economies. In closed economies, the price of assets (and
notably, therefore, interest rates) are set to clear the asset markets given the economy-wide resource
constraint. That is, prices arise endogenously, through general equilibrium.
• Idiosyncratic risk only vs. aggregate risk. As discussed when complete markets are available,
there is a sense in which only aggregate shocks matter. However, when markets are imcomplete, both
idiosyncratic and aggregate shocks are important. We started by considering incomplete markets with
only idiosyncratic risk, and then introduced additional aggregate risk.
29 Agents may also be able to hedge aggregate risk if the economy is open or there is a storage technology.
63