Dynamic Programming
Dynamic Programming
Lecture Notes∗
Klaus Neusser†
∗
These notes are based on the books of Sargent (1987) and Stokey and Robert E. Lucas
(1989).
†
Department of Economics, University of Bern, Schanzeneckstrasse 1, P.O. Box 8573,
CH-3001 Berne, Switzerland. email: [email protected]
1
Contents
Notation and Symbols iii
1 Introduction 1
2 Basic concepts 1
4 Examples 8
4.1 Intertemporal job search . . . . . . . . . . . . . . . . . . . . . 8
4.2 The Cake Eating Problem . . . . . . . . . . . . . . . . . . . . 11
4.3 The Neoclassical Growth Model . . . . . . . . . . . . . . . . . 13
4.4 The Linear Regulator Problem . . . . . . . . . . . . . . . . . . 19
i
List of Theorems
5.1 Theorem (Solution of Bellman Equation) . . . . . . . . . . . . 25
5.2 Theorem (Properties of the Solution) . . . . . . . . . . . . . . 25
5.3 Theorem (Further Properties of the Solution) . . . . . . . . . 26
5.4 Theorem (Solution found by Iteration) . . . . . . . . . . . . . 26
5.5 Theorem (Differentiability of Solution) . . . . . . . . . . . . . 26
6.1 Theorem (Solution of the Stochastic Bellman Equation) . . . . 28
6.2 Theorem (Property of Solution: Stochastic Case) . . . . . . . 28
6.3 Theorem (Further Properties of Solution: Stochastic Case) . . 29
6.4 Theorem (Solution found by Iteration: Stochastic Case) . . . . 29
6.5 Theorem (Differentiability of Solution: Stochastic Case) . . . . 29
B.2 Theorem (Contraction Mapping Theorem) . . . . . . . . . . . 40
B.3 Theorem (Blackwell’s sufficient condition) . . . . . . . . . . . 41
B.4 Theorem (Brouwer’s Fixed Point Theorem) . . . . . . . . . . . 42
ii
Notation and Symbols
X state space
U control
µ strategy
π policy
V value function
iii
1 Introduction
The foundation of macroeconomic theory on microeconomic principles has
been one of the most important developments in economics. By now it is
standard to view the decision maker (households, firms, state) as operating
in a complex stochastic environment. In particular, agents are conceived
as players in a dynamic stochastic game. Typically, it is assumed that the
players understand the rules of the game and can foresee the consequences of
their actions on themselves and on others. The agents are thus understood
to choose strategies which maximize their objective function.
From a modeling point of view, it is therefore necessary to specify exactly
the information at the disposal of agents, the technology available to them
and the restrictions which constrain their actions. The decisions depend typ-
ically on expectations about the future. These expectations will influence
the actions of the agents already today, thereby determining the possibilities
of the agent in the future. This intertemporal interaction is visualized in
figure 1. It is also important to emphasize that this strategic intertempo-
ral interaction is typical for economics system and differentiate them from
physical systems.
2 Basic concepts
In order to understand the issues involved in Dynamic Programming, it is in-
structive to start with the simple example of inventory management. Denote
the stock of inventory at the beginning of period t by Xt , then the manager
has to decide on how much to order to replenish the stock. The order Ut is
considered to be the control variable. In each period the inventory is reduced
by satisfying a stochastic demand Zt . It is assumed that the manager does
not know the realized value of demand at the time he makes the decision.
The situation is depicted in figure 2.
In this problem the variable which characterizes the state of the inventory,
in our case Xt , is called the state variable of the system. The state variable
or shortly the state must lie in some set called the state space denoted by
1
today tomorrow
observed observed
variables variables
(data) (data)
expectations
about the future
unobserved unobserved
disturbances disturbances
stochastic
demand
Zt
inventory inventory
at the beginning at the beginning
of period t of period t+1
Inventory
Xt Xt+1 = Xt + Ut - Zt
Ut period cost
order c Ut + h(Xt+1)
2
X . The control variable or control, for short, takes values in some set C. As
the demand Zt is assumed to be identically and independently distributed,
the state is just the inventory carried over from last period. Because next
period’s inventory is just given by the accounting identity
Xt+1 = Xt + Ut − Zt , (2.1)
the control could as well be Xt+1 . In a more general context, demand follows
a stochastic process {Zt } governed by some transition function Q where
tomorrow’s demand depends on the demand realized today and possibly on
the control Ut . In this more general setup, the state is given by (Xt , Zt ).
In each period, the manager faces some costs. In our example the costs
are twofold. On the one hand inventory holdings are costly. This cost is
denoted by h(Xt+1 ). The inventory costs may also account for shortage cost
for unfilled order if Xt+1 < 0. On the other hand, each order produces some
cost c Ut . In each period total costs amount to:
c Ut + h(Xt+1 ) (2.2)
The transition equation (2.1) and the cost or utility function (period
valuation) are the main ingredients of the inventory problem. The objective
of the manager is to minimize expected discounted costs:
T −1
X
J(x0 ) = E0 β t (c Ut + h(Xt+1 ))
t=0
T
X −1
= E0 gt (Xt , Ut , Zt ) + gT (XT ) −→ min, 0<β<1 (2.3)
Ut
t=0
3
better to decide upon the order at time t after knowing the state Xt . We
are therefore confronted with a sequential decision problem. The gathering
of information, here observing the state Xt , becomes essential. This way
of viewing the decision problem implies that we are actually not interested
in setting numerical values for Ut in each period, but in a strategy, rule,
reaction function, or policy function µt which assigns to each possible state
Xt an action µt (Xt ) = Ut .1
The control Ut must lie in some subset Γ(Xt ) ⊆ C, the control constraint
set at Xt . Γ assigns to every state Xt a set Γ(Xt ) = [0, B − Xt ] and is thus a
set valued function or correspondence from X into the subsets of C. As noted
in the footnote before, in a general stochastic context Γ may depend on Xt
and Zt . It is typically assumed that Γ(x) 6= ∅ for all x ∈ X . This implies
that there is always a feasible choice to make.
Let M = {µ : X → C s.t. µ(x) ∈ Γ(x)}. We call π = (µ0 , µ1 , . . . , µT −1 ) a
feasible policy if µt ∈ M . The set of all feasible policies is called the policy
space and is denoted by Π. If π = (µ, µ, . . . , µ), π is called a stationary
policy. In our example X = [0, B], C = [0, B], and Γ(Xt ) = [0, B − Xt ].
With this notation we can rewrite our decision problem as
T −1
X
Jπ (X0 ) = E0 gt (Xt , µt (Xt ), Zt ) + gT (XT ) −→ min (2.5)
π∈Π
t=0
4
1. The decision maker observes the state of the system (i.e. Xt ) and
applies his decision rule Ut = µt (Xt ).
5. Start again in 1.
The principle of optimality then states that the truncated policy (µ∗τ , µ∗τ +1 , . . . , µ∗T −1 )
is also optimal for the above subproblem. The intuition is simple: if the trun-
cated policy were not optimal for the subproblem, the decision maker would
be able to reduce cost further by switching to an optimal policy for the sub-
problem once Xτ has been reached. This idea can be exploited by solving
the decision problem by backward induction. For this purpose consider the
decision problem in period T − 1 with given state XT −1 . Clearly the decision
5
maker chooses UT∗ −1 = µT −1 (XT −1 ) ∈ Γ(XT −1 ) in order to minimize
+ gT ( fT −1 (XT −1 , UT −1 , ZT −1 ) )]
| {z }
=XT
+ JT −1 (XT −1 )]
where Xt+1 can be substituted by Xt+1 = ft (Xt , µt (Xt ), Zt ). From this rea-
soning we can derive the following proposition.
6
Proposition 3.1. If J ∗ (X0 ) is the optimal cost, J ∗ (X0 ) = J0 (X0 ). More-
over,
and JT (XT ) = gT (XT ). Furthermore, if Ut∗ = µ∗t (Xt ) ∈ Γ(Xt ) minimizes the
right hand side above for each Xt , then the policy π ∗ = (µ∗0 , µ∗1 , . . . , µ∗T −1 ) is
optimal.
• JT (XT ) = 0.
=⇒
Vt+1 (Xt ) = max U (Xt , µ(Xt )) + βEt Vt (f (Xt , µ(Xt ), Zt ))
Ut =µ(Xt )∈Γ(Xt )
If this limit exists, it must satisfy the following functional equation, called
Bellman equation:
7
where a prime denotes next period’s value. The expectation E is conditional
on the information available to the agent. It must be emphasized that in
the Bellman equation the unknown is not a number as in standard algebraic
equations, but a function. In this context the following mathematical issues
arise:
1. Does the limit exist? Is the limit independent from the initial functional
V0 ?
3. Does there exist a time invariant policy function µ? What are the
properties of such a function?
4 Examples
4.1 Intertemporal job search
Consider the following simplified intertemporal job search model. Suppose
that a worker, if unemployed, receives in each period a job offer which
promises to pay w forever. If he accepts the offer he receives w in all subse-
quent periods. Assuming that the worker lives forever, the value of the job
offer in period t is
∞
X w
β τ −t w =
τ =t
1−β
If he rejects the offer, he receives an unemployment compensation c and
the chance to receive a new wage offer next period. Wage offers are drawn
from a known probability distribution given by F (w0 ) = P[w ≤ w0 ] with
F (0) = 0 and F (B) = 1 for some B < ∞. Denoting the value of a wage offer
8
by V (w) and assuming that wage offers are independent draws, the value of
waiting one more period therefore is
Z ∞
c+β V (w0 )dF (w0 )
0
Thus the value of a wage offer must satisfy the following functional equa-
tion: Z ∞
w 0 0
V (w) = max ,c + β V (w )dF (w ) (4.1)
1−β 0
From figure 3, we see that the solution must have the reservation wage prop-
erty: ( R∞
W
1−β
= c+β 0
V (w0 )dF (w0 ), w ≤ W ;
V (w) = w
(4.2)
1−β
, w ≥ W.
where W is called the reservation wage. It is determined through the follow-
ing equation:
Z ∞ Z ∞
W 0 0 β
=c+β V (w )dF (w ) ⇒ W − c = (w0 − W )dF (w0 )
1−β 0 1−β W
(4.3)
Z ∞
⇒W −c =β V (w0 − W )dF (w0 )
W
(4.4)
where the left hand side represents the cost of searching one more time having
a wage offer W at hand and where the right hand side is the expected benefit
of searching one more time in terms of the expected present value associated
with drawing an offer w0 > W .
Manipulating the above equation then leads to an alternative character-
ization of the reservation wage:
Z W
W − c = β(Ew − c) + β F (w0 )dw0 (4.5)
0
9
V(w)
w/(1-b)
c + bò V(w ¢)dF(W¢)
0 reject W accept w
wage offer wage offer
The search model can be used to get a simple equilibrium model of un-
employment known as the bathtub model. In each period, the worker faces a
given probability α ∈ (0, 1) of surviving into the next period. Leaving the
remaining parts of the problem unchanged, the worker’s Bellman equation
becomes:
Z
w 0 0
V (w) = max , c + αβ V (w )dF (w ) . (4.6)
1 − αβ
This is essentially the same equation with only the discount factor changing.
Let the implied reservation wage be w̄. Assume that in each period there is a
constant fraction 1 − α of new born workers. They replace an equal number
of newly departed workers. If all new workers start out being unemployed,
the unemployment rate Ut obeys the law of motion:
Ut = (1 − α) + αF (w̄)Ut−1 . (4.7)
The right hand side is the sum of the fraction of new born workers and
the fraction of surviving workers who remained unemployed at the end of
10
last period (i.e. those who rejected offers because they were less than the
reservation wage w̄). The steady state unemployment rate U ∗ is
1−α
U∗ = . (4.8)
1 − αF (w̄)
Next we show that it is never optimal for the worker to quit. For this
purpose consider the following three options:
w
(A1): accept the wage and keep it forever: 1−αβ
.
w − (αβ)t w
Z
t 0 0
+ (αβ) c + αβ V (w )dF (w )
1 − αβ
w w − w̄
= − (αβ)t
1 − αβ 1 − αβ
If w < w̄, then A1 < A2 < A3, and if w > w̄, then A1 > A2 > A3. The
three alternatives yield the same lifetime utility if w = w̄. Thus, A2 is never
optimal.
11
The cake has an initial size of k0 > 0. The transition equation for the size of
the cake clearly is:
kt+1 = kt − ct .
The size of the cake in each period can therefore be computed as follows:
k1 = k0 − c0
k2 = k1 − c1 = k0 − c0 − βc0
| {z } |{z}
k1 c1
1 − β3
k3 = k2 − c2 = k0 − c0 − βc0 − β 2 c0 = k0 − (1 + β + β 2 )c0 = k0 − c0
| {z } |{z} 1−β
k2 c2
...
1 − β T +1
kT +1 = k0 − c0
1−β
From this derivation we see that the Euler equation does not uniquely deter-
mine the path of {kt }. Only if we add the transversality condition that the
cake must be completely eaten in period T , i.e. kT +1 = 0, can we solve the
problem uniquely:
1−β
c0 = k0 .
1 − β T +1
For T → ∞, we get:
c0 = (1 − β)k0
which implies
ct = (1 − β)kt .
12
The value of the cake when T → ∞ is thus given by
∞
X ∞
X
V (k0 ) = V (c0 , c1 , . . . ) = β t ln ct = β t ln (β t c0 )
t=0 t=0
∞ ∞
X
t
X β ln β ln c0
= ln β tβ + ln c0 βt = 2
+
t=0 t=0
(1 − β) 1−β
β ln β ln((1 − β)k0 )
= 2
+
(1 − β) 1−β
β ln β ln(1 − β) 1
= 2
+ + ln k0
(1 − β) 1−β 1−β
F : R+ × R+ −→ R+
(k, n) −→ y = F (k, n)
ct + it ≤ yt = F (kt , nt ).
If output is invested, the capital stock in the next period changes according
to the transition equation:
kt+1 = (1 − δ)kt + it
13
where the depreciation rate δ ∈ [0, 1]. Thus we have the following resource
constraint:
ct + kt+1 − (1 − δ)kt ≤ F (kt , nt )
• There is always a positive value for output because U 0 > 0. This implies
that the resource constraint must hold with equality. Thus choosing
kt+1 automatically implies a value for ct given kt . We can therefore
choose kt+1 instead of ct as the control variable.
14
With these modification the above maximization problem can be rewritten
as
∞
X
β t U (f (kt ) − kt+1 ) −→ max
{kt+1 }
t=0
3
The other specification involves a quadratic objective function coupled with linear
constraints (see Section 4.4).
15
Most importantly, we can start the iteration with an arbitrary function V0 .
Take, for example, the function V0 (k) = 0. Then the Bellman equation
simplifies to
V1 (k) = sup {U (f (k) − k 0 )} .
k0 ∈Γ(k)
16
The structure of the solutions V0 , V1 , V2 , V3 , and so on, leads to the
conjecture that, by taking j to infinity, the value function is (log-)linear with
V (k) = v0 + v1 ln k where v1 is given by:
(j) α
v1 = lim v1 =
j→∞ 1 − αβ
This implies
A Fβ A
E + F ln k = ln + α ln k + Eβ + F β ln + F αβ ln k
1 + Fβ 1 + Fβ
Equating the coefficients of ln k leads to following equation for F :
α
F = α + F αβ =⇒ F =
1 − αβ
17
Inserting this results into the objective function leads to:
αβ αβ α
(1 − β)E = ln(1 − αβ) + ln A + ln(Aαβ) + α 1 + − ln k
1 − αβ 1 − αβ 1 − αβ
αβ
= ln(1 − αβ) + ln A + ln(Aαβ)
1 − αβ
=⇒
1 αβ
E= ln(1 − αβ) + ln A + ln(Aαβ)
1−β 1 − αβ
Note that first order condition delivers a first order linear difference equa-
tion in kt :
Fβ
kt+1 = A ktα = αβ A ktα
1 + Fβ
ln kt+1 = ln Aαβ + α ln kt
U 0 (f (k) − k 0 ) = βV 0 (k 0 ).
U 0 (f (k) − k 0 ) = βU 0 (f (k 0 ) − k 00 )f 0 (k 0 )
1 1 α−1
α
=β α Aα kt+1
Akt − kt+1 Akt+1 − kt+2
α
kt+1 Aαβ kt+1
=
Aktα − kt+1 α
Akt+1 − kt+2
1 αβ
α
= α
.
(Akt /kt+1 ) − 1 1 − (kt+2 /Akt+1 )
18
difference equation for savings rate yt with αβ = 0.3
2
45°−line
1.5
1 + αβ
SS
1
SS
0.5
αβ yt+1 = 1+αβ − (αβ/yt)
yt+1
−0.5
−1
−1.5
−2
0 0.5 1 1.5 2
y
t
kt+2
If we set yt+1 = α ,
A kt+1
we get the following non-linear difference equation:
αβ
yt+1 = (1 + αβ) − .
yt
This equation admits two steady states: αβ and 1. The second steady state
cannot be optimal because it implies that there is no consumption. As the
second steady state is unstable, we must have yt = αβ which implies kt+1 =
Aαβ ktα . The situation is depicted in figure 4
19
Optimal Linear Regulator Problem:
∞
X
V (x0 ) = − β t (x0t Rxt + u0t Qut ) → max, 0<β<1 (4.10)
ut
t=0
where xt denotes the n-dimensional state vector and ut the k-vector of con-
trols. R is positive semidefinite symmetric n × n matrix and Q is a positive
definite symmetric k × k matrix. A and B are n × n, respectively n × k ma-
trices. Note that the problem has been simplified by allowing no interaction
between xt and ut .
The Bellman equation can thus be written as
V (xt ) = max −{(x0t Rxt + u0t Qut ) + βV (xt+1 )} s.t. xt+1 = Axt + But ,
ut
where V (xt ) denotes the value of x in period t. We may solve this equation
by guessing that V (x) = −x0 P x for some positive semidefinite symmetric
matrix P . Using this guess and the law of motion the Bellman equation
becomes:
The first order condition of the maximization problem of the right hand side
is4
(Q + βB 0 P B)u = −βB 0 P Ax.
Thus, we get the feedback rule:
Inserting this rule into the Bellman equation and rearranging terms leads to
20
starting from P0 = 0. A sufficient condition for the iteration to converge is
that the eigenvalues of A are absolutely strictly smaller than one.
If the optimal rule is inserted into the law of motion, we obtain the closed-
loop solution:
xt+1 = (A − BF )xt
P2 √
P =1+P − =⇒ P = (1 + 5)/2 ≈ 1.618
1+P
The negative solution can be disregarded because we are looking for positive
solutions. This implies that
P
u = −F x = − x
1+P
Inserting this in the law of motion for xt gives the closed-loop solution:
P 1
xt+1 = 1 − xt = xt
1+P 1+P
21
5 Principles of Dynamic Programming: the
deterministic case
As we have seen, many dynamic economic problems can be cast in either of
the two following forms: a sequence problem (SP) or a functional (Bellman)
equation (FE).
∞
X
(SP) sup β t F (xt , xt+1 )
{xt+1 }∞
t=0 t=0
This is just the partial sum of discounted returns for a feasible plan π. As-
sumption 5.2 allows to define a function U : Π(x0 ) → R by
U (π) = lim UT (π)
T →∞
22
where R may now include ∞ and −∞. By assumption 5.1, Π(x0 ) is not
empty and the objective function in (SP) is well defined. This allows us to
define the supremum function V ∗ : X → R by
This function is well defined and is the unique function which satisfies:
(iii) If |V ∗ (x0 )| = ∞, there exists a sequence of plans {πk } such that U (πk )
converges to ∞.
(iv) If |V ∗ (x0 )| = −∞, there exists a sequence of plans {πk } such that
U (πk ) converges to −∞.
23
Remark 5.1. Proposition 5.2 implies that the solution to (FE) is unique.
and
24
Note that all function in B(X ) satisfy the transversality condition. The idea
of using this operator view is the following. Suppose V0 has some property.
Then we ask if TV0 also has this property. If this is true Tt V0 has also this
property. But then V = limt→∞ Tt V0 will also share this property because of
uniform convergence. In order to make this idea workable, we have to place
additional assumption on our decision problem.
Theorem 5.2 (Properties of the Solution). Under assumptions 5.3, 5.4, 5.5
and 5.6, the solution V to (FE) is strictly increasing.
25
Theorem 5.3 (Further Properties of the Solution). Under assumptions 5.3,
5.4, 5.7, and 5.8 if V satisfies (FE) and the policy correspondence G is well
defined then
The subscript i denotes the derivative with respect to the i-th element of
x0 ∈ Rm .
26
6 Principles of Dynamic Programming: the
stochastic case
In the following we will consider stochastic dynamic programming problems
of the following type:
Z
0 0
V (x, z) = sup F (x, y, z) + β V (y, z )Q(z, dz )
y∈Γ(x,z) Z
The major difference is that the agent(s) have to take stochastic shocks into
account. This implies that the supremum has to be taken with respect to
F plus the expected value of discounted future V . The shocks are governed
by a transition function Q. Under suitable assumptions on the shocks, the
required mathematical properties of the value function V are preserved under
integration. Thus, the results for the deterministic model carry over virtually
without change.
The state space is now the product space of the measurable spaces (X , X)
and (Z, Z), i.e. (S, S) = (X × Z, X × Z), describing the possible values of
the endogenous and exogenous state variables. Q is a transition function on
(Z, Z). The correspondence Γ describing the feasibility constraints is now
a correspondence from S into X . The graph of Γ is denoted by A, i.e.
A = {(x, y, z) ∈ X × X × Z : y ∈ Γ(x, z)}. F is again a one period return
function on A and β is the subjective discount factor with 0 < β < 1.
(ii) Z is a compact and convex subset of Rm with Borel subsets Z and the
transition function Q on (Z, Z) has the Feller property.
27
Theorem 6.1 (Solution of the Stochastic Bellman Equation). Let (X , X),
(Z, Z), Q, Γ, F , and β satisfy assumptions 6.1, 6.2, 6.3, and 6.4. Define the
operator T on B(S) by
Z
0 0
(Tf )(x, z) = sup F (x, y, z) + β f (y, z )Q(z, dz ) .
y∈Γ(x,z) Z
Theorem 6.2 (Property of Solution: Stochastic Case). Let (X , X), (Z, Z),
Q, Γ, F , and β satisfy assumptions 6.1, 6.2, 6.3, 6.4, 6.5, and 6.6. If V
denotes the fixed point of T, then, ∀z ∈ Z, V (., z) : X → R is strictly
increasing.
28
Theorem 6.3 (Further Properties of Solution: Stochastic Case). Let (X , X),
(Z, Z), Q, Γ, F , and β satisfy assumptions 6.1, 6.2, 6.3, 6.4, 6.7, and 6.8.
If V denotes the fixed point of T and G the policy correspondence, then,
∀z ∈ Z, V (., z) : X → R is strictly concave and G(., z) : X → X is a
continuous (single-valued) function.
Vn = TVn−1 , n = 1, 2, . . .
and
Z
0 0
gn (x) = arg max F (x, y, z) + β Vn (y, z )Q(z, dz ) .
x∈Γ(x,z) Z
Then {gn } converges pointwise to g. If X and Z are both compact, then the
convergence is uniform.
The subscript i denotes the derivative with respect to the i-th element.
29
7 The Lucas Tree Model
This presentation follows closely the exposition given in Sargent (1987, chap-
ter 3). For details we refer to the original paper by Lucas (1978). Imagine an
economy composed of a large number of agents with identical preferences and
endowments. Each agent owns exactly one tree which is perfectly durable.
In each period, the tree yields a fruit or dividend dt . The fruit is the only
consumption good and is nonstorable. Denote by pt the price of the tree.
Then each agent is assumed to maximize
∞
X
E0 β t U (ct ), 0 < β < 1,
t=0
subject to
Denoting the return Rt is given by Rt = (pt+1 + dt+1 )/pt the Euler equation
for this maximization problem is
Because all agents are identical and there is no satiation, every agent just
consumes this period’s dividends. Thus, we have in equilibrium
ct = dt .
U 0 (dt+1 )
pt = βEt (pt+1 + dt+1 ) (7.2)
U 0 (dt )
Iterating this equation forward in time and using the law of iterated expec-
tations, we get the solution
∞
"j−1 #
X Y U 0 (dt+i+1 )
pt = Et βj dt+j
j=1 i=0
U 0 (dt+i )
30
which simplifies to
∞
X U 0 (dt+j )
pt = Et βj dt+j . (7.3)
j=1
U 0 (dt )
The share price is the expected discounted stream of dividends, but with
time-varying and stochastic discount factors.
An interesting formula is obtained by taking U (c) = ln c. In this case,
the pricing formula (7.3) simplifies to
∞
X β
pt = Et β j dt = dt .
j=1
1−β
This is a simple asset-pricing function which maps the state of the economy
at time t, dt , into the price of an asset at time t.
In order for the conditional expectation in equation (7.1) to be well-
defined, it is necessary to impute to the representative agent a view about
the law of motion over time of dt and pt . The specification of an actual law
of motion for dt , which agents are supposed to know, and a perceived pricing
function that maps the history of dt into pt implies that a law of motion for
pt has been perceived. Given that Et in equation (7.1) is computed using
the perceived pricing function, equation (7.1) maps the perceived pricing
function into an actual pricing function. The notion of a rational expectation
equilibrium is that the actual pricing function equals the perceived pricing
function. In the following, we will exploit this notion in more detail.
Suppose that dividends evolve according to a Markov process with time-
invariant transition probability distribution function F (x0 , x) defined as
31
Lifetime utility is again given by
∞
X
E0 β t U (ct ), 0 < β < 1.
t=0
pt = h(xt ).
This law of motion together with the evolution of xt defines the perceived
law of motion for the tree prices. The Bellman equation can then be written
as
V (s(h(x) + x)) = max U (s(h(x) + x) − h(x)s0 )
s0
Z
0 0 0 0
+β V (s (h(x ) + x ))dF (x , x) . (7.4)
or by simplifying
32
Defining w(x) by
we obtain
Z Z
0 0
w(x) = β w(x )dF (x , x) + β x0 U 0 (s0 (h(x0 ) + x0 ) − h(x0 )s00 )dF (x0 , x)
In equilibrium s = s0 = s00 = 1 because all agents and all trees are alike so
that consumption c(x) = s(h(x) + x) − h(x)s0 = x. Thus, we obtain the
functional equation in the unknown function w(x):
Z Z
w(x) = β w(x )dF (x , x) + β x0 U 0 (x0 )dF (x0 , x)
0 0
(7.5)
where w(x) = h(x)U 0 (x). Thus, once w(x) has been determined, the pricing
function h(x) can be recovered from w(x) = h(x)U 0 (x) as U 0 (x) is known.
Denote by g(x) the function g(x) = β x0 U 0 (x0 )dF (x0 , x), then we can
R
33
Thus, T is monotone. Furthermore, for any constant c ∈ R we have
Z
(T(w + c))(x) = β (w + c)(x0 )dF (x0 , x) + g(x)
Z Z
= β w(x )dF (x , x) + βc dF (x0 , x) + g(x) = (Tw)(x) + βc.
0 0
or in terms of h(x)
Z
0
h(j+1)
(x)U (x) = β h(j) (x0 )U 0 (x0 )dF (x0 , x) + g(x) (7.6)
Thus, the operator T maps a perceived pricing function h(j) into an actual
pricing function h(j+1) . A rational expectation equilibrium is then nothing
but a fixed point of this mapping so that the actual pricing function equals
the perceived one.
As an example, take U (x) = ln x. With this utility function g(x) =
x0 (1/x0 )dF (x0 , x) = β. Guess that the solution is of the form h(x) = ax.
R
β
This implies that
Z
1 1
ax = β ax0 dF (x0 , x) + β = aβ + β
x x0
which results in
β β
a= giving the solution h(x) = x.
1−β 1−β
The Lucas tree model highlights a general principle for the construction
of asset-pricing models:
34
(ii) Open up a specific market for an asset which represents a specific claim
on future consumption. Assume no trade restrictions and derive the
corresponding Euler equation.
(iii) Equate the consumption in the Euler equation to the general equilib-
rium values found in (i). Then derive the associated asset price.
The state is given by (dt , xt ) and dt+j = dt xt+1 . . . xt+j . This implies that the
price of the equity in state (d, i), p(d, i), is homogeneous in dt . The Euler
35
equation (7.2) becomes
n
X
p(d, i) = β Pij (σj d)−α [p(σj d, j) + σj d]dα .
j=1
which simplifies to
n
X
wi = β Pij σj1−α [wj + 1].
j=1
w = βP Σw + βP Σi
w = β(In − βP Σ)−1 P Σi
36
where π = (π1 , . . . , πn ) is the stationary distribution for P = (Pij ), i.e. π
solves π = πP .
The risk-free asset pays one unit of consumption irrespectively of the
state. Thus, the price of this asset, pf (d, i) in state (d, i) is
n
X Xn
pf (d, i) = β Pij (σj d)−α dα = β Pij (σj )−α
j=1 j=1
In their paper Mehra and Prescott (1985) calibrate this model to the U.S.
data over the period 1889 to 1978 setting n = 2. They find
! ! !
0.43 0.57 σ1 1.054
P = and = .
0.57 0.43 σ2 0.982
They then compute the combinations of the average risk-free rate and the
average risk-premium Re − Rf for values of the risk aversion parameter α
ranging from small positive numbers to a maximum of 10 and for the discount
factor β ranging from 0.925 to 0.9999. This delineates an admissible region
reproduced in Figure 5. The empirical average over this period for the risk
premium is 6.18 percent and 0.80 percent for the risk-free rate. These values
are way of the admissible region, the model is clearly incompatible with
the data. This result turned out to be very robust and is since then called
the equity premium puzzle. This puzzle is still not completely resolved and
remains an active area of research (see the survey by Mehra and Prescott,
2003).
37
1.4
1.2
average risk premium Re−Rf
0.6
admissible region
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4
average risk−free rate
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) for all x, y ∈ X and 0 < θ < 1.
38
The function f is said to be strictly quasi-concave iff
39
B Fixed Point Theorems
Definition B.1. Let (X , d) be a metric space, then an operator T : X → X
is a contraction mapping (with modulus β) if there exists a β ∈ (0, 1) such
that
d(Tx, Ty) ≤ βd(x, y) for all x, y ∈ X .
Proof. For any > 0 chose δ = , then for all x, y with d(x, y) < δ
d(Tx, Ty) ≤ βd(x, y) < d(x, y) < δ = . Thus, T is uniformly continuous
and therefore also continuous.
40
45o-degree line
b
Tx
x*
a
a x* b
Remark B.1. Let X be a closed interval [a, b] and d(x, y) = |x − y|. Then
T : X → X is a contraction mapping if for some β ∈ (0, 1),
|Tx − Ty|
≤ β < 1, for all x, y ∈ X with x 6= y
|x − y|
41
(i) (monotonicity): f, g ∈ B(X ) with f (x) ≤ g(x) for all x ∈ X implies
Tf (x) ≤ Tg(x);
(ii) (discounting): There exists β ∈ (0, 1) such that for any function which
is constant to some value c ≥ 0, [T(f + c)](x) ≤ (Tf )(x) + βc, for all
f ∈ B(X ), x ∈ X ,
Combining the two inequalities, we obtain kTf −Tgk ≤ βkf −gk as required.
42
45o-degree line
b
x*
Tx
a
a x* b
43
C Derivation of Equations Characterizing the
Reservation Wage
In this appendix we derive equations (4.3), (4.4), and (4.5). From equa-
tion (4.2) we get
Z w Z ∞
w w w0
= dF (w ) + dF (w0 )
1−β 1−β 0 1−β w
Z w Z ∞
w 0 w
=c+β dF (w ) + β dF (w0 )
0 1−β w 1 − β
Z w Z ∞
0 1
=⇒ w dF (w ) − c = (βw0 − w)dF (w0 )
0 1−β w
R∞
adding w w
dF (w0 ) on both sides leads to
Z ∞ Z ∞
β 0 0
w−c= (w − w)dF (w ) = β V (w0 − w)dF (w0 )
1−β w w
integrating by parts
Z w
w − c = βE(w − c) + β F (w0 )dw0
0
44
D Mean Preserving Spreads
Consider a class of distribution functions Fr (x) indexed by r with support
included in [0, B], 0 < B < ∞, and having the same mean. Note that if a
random variable X is distributed according to Fr then its mean EX is given
RB
by EX = 0 [1 − Fr (x)] dx.6 Therefore, if Fr and Fs have the same mean,
then Z B
[Fr (x) − Fs (x)] dx = 0.
0
Two distributions Fr and Fs are said to satisfy the single-crossing property
if there exists x̄ such that
If the two distributions satisfy the above two properties, we can regard Fs as
being obtained from Fr by shifting probability mass towards the tail of the
distribution keeping the mean constant. The two properties imply
Z y
[Fs (x) − Fr (x)] dx ≥ 0 0 ≤ y ≤ B.
0
45
1
Fr(x)
Fs(x)
0 x B x
where f is a real and bounded function defined on some interval [a, b]. The
x0 , x1 , . . . , xn partition the interval [a, b] such that a = x0 < x1 < . . . <
xn−1 < xn = b. The ξi are arbitrary numbers from the interval [xi−1 , xi ]. Note
that the limit does not exist for every function. Thus, there are functions
which are not Riemann integrable. Monoton or continuous functions are
Riemann integrable.
This definition can be generalized to the so-called Riemann-Stieltjes inte-
gral by replacing (xi −xi−1 ) by (g(xi )−g(xi−1 )) where g is some real bounded
function on [a, b]:
Z b n
X
f (x)dg(x) = lim f (ξi ) (g(xi ) − g(xi−1 )) .
a n→∞
i=1
46
The Riemann-Stieltjes integral exists if f is continuous and g is a function
of bounded variation.7 For g(x) = x, we get the Riemann integral above.
Moreover, if g is continuously differentiable with derivative g 0 ,
Z b Z b
f (x)dg(x) = f (x)g 0 (x)dx.
a a
7
A function g on [a, b] is called a function of bounded variation if there exists
a constant M > 0 such that for all partitions x0 , x1 , . . . , xn of the interval [a, b]
Pn
i=1 |g(xi ) − g(xi−1 )| ≤ M . Continuous and monoton functions are of bounded vari-
ation.
47
References
F. Kydland and E. C. Prescott. Rules rather than discretion: The inconsis-
tency of optimal plans. Journal of Political Economy, 85:473–492, 1977.
48