0% found this document useful (0 votes)
14 views

Lecture4 BellmanOperator Handout

1. The document discusses methods for solving dynamic programming problems, specifically value function iteration and policy iteration. 2. Value function iteration iteratively updates the value function by applying the Bellman operator until convergence, while policy iteration alternates between improving the policy given the value and updating the value given the policy. 3. Policy iteration may converge faster than value function iteration by simulating the policy for multiple periods instead of a single period at each iteration.

Uploaded by

EvanYao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture4 BellmanOperator Handout

1. The document discusses methods for solving dynamic programming problems, specifically value function iteration and policy iteration. 2. Value function iteration iteratively updates the value function by applying the Bellman operator until convergence, while policy iteration alternates between improving the policy given the value and updating the value given the policy. 3. Policy iteration may converge faster than value function iteration by simulating the policy for multiple periods instead of a single period at each iteration.

Uploaded by

EvanYao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Lecture 4: The Bellman Operator


Dynamic Programming

Jeppe Druedahl
Department of Economics

15th of February 2016 — Slide 1/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Infinite horizon, t → ∞
Policy Iteration

Projection methods
• We know
Bellman equation

Until next
V 0 ( Mt ) = whatever
n o
V 1 ( Mt ) = max u( Mt , Ct ) + βV 0 ( Mt+1 )
Ct ∈C( Mt )
n o
2
V ( Mt ) = max u( Mt , Ct ) + βV 1 ( Mt+1 )
Ct ∈C( Mt )
n o
V 3 ( Mt ) = max u( Mt , Ct ) + βV 2 ( Mt+1 )
Ct ∈C( Mt )
... n o
lim V n ( Mt ) = max u( Mt , Ct ) + βV n−1 ( Mt+1 ) ?
n→∞ Ct ∈C( Mt )

where Mt+1 = Γ( Mt , Ct )
• Does the limit exist?

15th of February 2016 — Slide 3/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Operator notation
Policy Iteration

Projection methods
• Write the Bellman equation on the following general form
Bellman equation

Until next V n ( Mt ) = max u( Mt , Ct ) + V n−1 (Γ( Mt , Ct )) for all Mt ∈ M


Ct ∈C( Mt )

• Alternatively in operator form

V n ( Mt ) = J (V n−1 )( Mt ) for all Mt ∈ M

• A fixed point is a function V such that

V ( Mt ) = J (V )( Mt ) for all Mt ∈ M

• Is there always a fixed point, and is it unique?

15th of February 2016 — Slide 4/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Contraction mapping requirement
Policy Iteration

Projection methods
• Let F (M) be the space of bounded continuous functions
Bellman equation

Until next Theorem


Assume u( Mt , Ct ) is real-valued, continuous and bounded,
0 < β < 1 and the constraint set, C( Mt ) is non-empty,
compact-valued and continuous, then J has a unique fixed
point V ∈ F (M), and for all V0 ∈ F (M)

| J n (V0 ) − V | ≤ βn |V0 − V | , n = 0, 1, 2, 3, . . .

• Full proof: Lucas and Stokey (1989), theorem 4.6


• Main idea: Apply Blackwell’s contraction mapping theorem
requiring that J is
1 Monotone
2 Discounted

15th of February 2016 — Slide 5/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration Montone (requirement 1)
Policy Iteration

Projection methods

Bellman equation
≥ Q ( Mt ) , ∀ Mt ∈ M ⇒
V ( Mt )
Until next J (V )( Mt ) ≥ J ( Q)( Mt ), ∀ Mt ∈ M

?
CV ( Mt ) ≡ arg max u( Mt , Ct ) + βV (Γ( Mt , Ct ))
Ct ∈C( Mt )
?
CQ ( Mt ) ≡ arg max u( Mt , Ct ) + βQ(Γ( Mt , Ct )
Ct ∈C( Mt )

• Insert into J (V )( Mt )

J (V )( Mt ) = max u( Mt , Ct ) + βV (Γ( Mt , Ct ))
Ct ∈C( Mt )
?
= u( Mt , CV ( Mt )) + βV (Γ( Mt , CV? ( Mt ))
? ?
≥ u ( M t , CQ ( Mt )) + βV (Γ( Mt , CQ ( Mt ))
? ?
≥ u( Mt , CQ ( Mt )) + βQ(Γ( Mt , CQ ( Mt ))
= J ( Q)( Mt )
15th of February 2016 — Slide 6/19
UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration Discounted (requirement 2)
Policy Iteration

Projection methods ∃γ ∈ (0, 1) : J (V + k)( Mt ) ≤ J (V )( Mt ) + γk.


Bellman equation

Until next • We have

J (V + k )( Mt ) = max u( Mt , Ct ) + β(V (Γ( Mt , Ct )) + k )


Ct ∈C( Mt )
= max u( Mt , Ct ) + βV (Γ( Mt , Ct )) + βk
Ct ∈C( Mt )
= J (V )( Mt ) + βk
≤ J (V )( Mt ) + γk for γ = β ∈ (0, 1)

• What could break down here?

15th of February 2016 — Slide 7/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Summarize
Policy Iteration

Projection methods
1 The uniqueness of the value function can be proven
Bellman equation 2 Iteration on the value function can be proven to converge
Until next at a rate of β
3 Further properties:
1 Monotonicity in states expanding the choice set
2 Concavity if choice set is convex and u is concave
3 Differentiability (e.g. Benvenste and Scheinkman (1979),
Clausen and Strub (2016))
4 Unique policy function typically requires that the choice
set is convex and u is strictly concave
5 Boyd’s Weighted Contraction Mapping Theorem can be
used if returns are unbounded (see Carroll (2012))

15th of February 2016 — Slide 8/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Value function iteration (VFI)
Policy Iteration
Algorithm 11: Find the fixed point V
Projection methods

Bellman equation input : tol. = ∞


Until next output: V [•]
C ? [•]
1 V [•] = 0 all m ∈ M
2 while ? do
3 V− [•] = V [•]
4 V [•], C ? [•] = find V(V [•])
5 δ = max(|V− [:] − V [:]|)

15th of February 2016 — Slide 10/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Policy iteration
Policy Iteration

Projection methods
• Think of step n in VFI where we for all Mt ∈ M set
Bellman equation h i
Until next V n ( Mt ) = u( Mt , C ?n ( Mt )) + βEt V n−1 (Γ( Mt , C ?n ( Mt ))
h i
C ?n ( Mt ) = arg max u( Mt , Ct ) + βEt V n−1 (Γ( Mt , Ct ))
Ct

• Alternative: Simulate forward for k periods using C ?n (•) as


decision rule, and update by
k
V n ( Mt ) = ∑ β j u( Mt+ j , C?n ( Mt+ j ))
j =0

+ βk+1 V n−1 (Γ( Mt+k+1 , Ct?+k+1 ))


• Better convergence? Yes, in terms of speed. No, in terms of
pool of atraction
• Everything is discrete: The simulation can be replaced by
inversion of a matrix! [Bertel will show you]
15th of February 2016 — Slide 12/19
UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Guess and verify
Policy Iteration
• Consider the neoclassical growth model
Projection methods

Bellman equation
V ( Kt ) = max log Ct + βV (Kt+1 )
Until next Ct
s.t.
K t +1 = AKtα − Ct
• Assume that V (Kt ) = a + b log Kt such that

a + b log Kt = max log( AKtα − Kt+1 ) + β( a + b log Kt+1 )


K t +1

• The FOC then is


1 βb βb
= ↔ K t +1 = AKtα
AKtα − K t +1 K t +1 1 + βb
• Inserrt FOC and solve for a and b (independent of Kt )

βb βb
a + b log Kt = log( AKtα − AKtα ) + β( a + b log( AKtα ))
1 + βb 1 + βb

15th of February 2016 — Slide 14/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Projection methods
Policy Iteration

Projection methods
• Guess and verify is only possible for very special models
Bellman equation • Value and policy functions might, however, be well
Until next approximated by parametric functions (typically
polynomials, Weierstrass theorem)
• Solve for the parameters numerically instead of solving
the maximization problems (relying on the first-order
conditions instead)

15th of February 2016 — Slide 15/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
The Bellman equation
Policy Iteration
• The model:
Projection methods

Bellman equation 1 A household gets utility from consumption


Until next and disutility from labor
2 The household’s income dependent on
whether it works or not
3 The household accumulates human capital by working
4 It can save in an acount with an interest rate of r

• Task: Write up the Bellman equation on the white board for


your choice of utility function, wage process and human
capital accumulation equation

15th of February 2016 — Slide 17/19


UNIVERSITY OF COPENHAGEN DEPARTMENT OF ECONOMICS

Bellman operator

Value function
iteration
Until next
Policy Iteration
• Ensure that you understand:
Projection methods

Bellman equation
• Algorithm 11
Until next • How to set up a Bellman equation
• Go to PadLet and ask or answer a question
(https://padlet.com/jeppe druedahl/dynamic programming)
• Think about: What is the problem with having
respectively:
1 Multiple states
2 Multiple choices
3 Multiple shocks

15th of February 2016 — Slide 19/19

You might also like