0% found this document useful (0 votes)

21 views10 pages

SLchapt 3

dynamic programming

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views10 pages

SLchapt 3

dynamic programming

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Chapter 3.

Dynamic Programming

This chapter introduces basic ideas and methods of dynamic programming. 1 It sets
out the basic elements of a recursive optimization problem, describes the functional
equation (the Bellman equation), presents three methods for solving the Bellman
equation, and gives the Benveniste-Scheinkman formula for the derivative of the op-
timal value function. Let’s dive in.

3.1. Sequential problems

Let β ∈ (0, 1) be a discount factor. We want to choose an inﬁnite sequence of
“controls” {ut }∞
t=0 to maximize

∞

β t r (xt , ut ) , (3.1.1)
t=0

subject to xt+1 = g(xt , ut ), with x0 given. We assume that r(xt , ut ) is a concave

function and that the set {(xt+1 , xt ) : xt+1 ≤ g(xt , ut ), ut ∈ Rk } is convex and
compact. Dynamic programming seeks a time-invariant policy function h mapping
the state xt into the control ut , such that the sequence {us }∞ s=0 generated by
iterating the two functions
ut = h (xt )
(3.1.2)
xt+1 = g (xt , ut ) ,
starting from initial condition x0 at t = 0 solves the original problem. A solution in
the form of equations (3.1.2 ) is said to be recursive. To ﬁnd the policy function h we
need to know another function V (x) that expresses the optimal value of the original
problem, starting from an arbitrary initial condition x ∈ X . This is called the value

1 This chapter is written in the hope of getting the reader to start using the methods
quickly. We hope to promote demand for further and more rigorous study of the
subject. In particular see Bertsekas (1976), Bertsekas and Shreve (1978), Stokey and
Lucas (with Prescott) (1989), Bellman (1957), and Chow (1981). This chapter covers
much of the same material as Sargent (1987b, chapter 1).

– 78 –
Sequential problems 79

function. In particular, deﬁne

∞

V (x0 ) = max
∞
β t r (xt , ut ) , (3.1.3)
{us }s=0
t=0

where again the maximization is subject to xt+1 = g(xt , ut ), with x0 given. Of

course, we cannot possibly expect to know V (x0 ) until after we have solved the
problem, but let’s proceed on faith. If we knew V (x0 ), then the policy function h
could be computed by solving for each x ∈ X the problem

max{r (x, u) + βV (x̃)}, (3.1.4)

where the maximization is subject to x̃ = g(x, u) with x given, and x̃ denotes the
state next period. Thus, we have exchanged the original problem of finding an infinite
sequence of controls that maximizes expression (3.1.1 ) for the problem of finding the
optimal value function V (x) and a function h that solves the continuum of maximum
problems (3.1.4 )—one maximum problem for each value of x. This exchange doesn’t
look like progress, but we shall see that it often is.
Our task has become jointly to solve for V (x), h(x), which are linked by the
Bellman equation
V (x) = max{r (x, u) + βV [g (x, u)]}. (3.1.5)
u

The maximizer of the right side of equation (3.1.5 ) is a policy function h(x) that
satisﬁes
V (x) = r [x, h (x)] + βV {g [x, h (x)]}. (3.1.6)
Equation (3.1.5 ) or (3.1.6 ) is a functional equation to be solved for the pair of un-
known functions V (x), h(x).
Methods for solving the Bellman equation are based on mathematical structures
that vary in their details depending on the precise nature of the functions r and g . 2
2 There are alternative sets of conditions that make the maximization (3.1.4 ) well
behaved. One set of conditions is as follows: (1) r is concave and bounded, and
(2) the constraint set generated by g is convex and compact, that is, the set of
{(xt+1 , xt ) : xt+1 ≤ g(xt , ut )} for admissible ut is convex and compact. See Stokey,
Lucas, and Prescott (1989), and Bertsekas (1976) for further details of convergence
results. See Benveniste and Scheinkman (1979) and Stokey, Lucas, and Prescott
(1989) for the results on diﬀerentiability of the value function. In an appendix on
functional analysis, chapter A, we describe the mathematics for one standard set of
assumptions about (r, g). In chapter 5, we describe it for another set of assumptions
about (r, g).
80 Chapter 3: Dynamic Programming

All of these structures contain versions of the following four ﬁndings. Under various
particular assumptions about r and g , it turns out that
1. The functional equation (3.1.5 ) has a unique strictly concave solution.
2. This solution is approached in the limit as j → ∞ by iterations on

Vj+1 (x) = max{r (x, u) + βVj (x̃)}, (3.1.7)

subject to x̃ = g(x, u), x given, starting from any bounded and continuous initial
V0 .
3. There is a unique and time invariant optimal policy of the form ut = h(xt ),
where h is chosen to maximize the right side of (3.1.5 ). 3
4. Oﬀ corners, the limiting value function V is diﬀerentiable with

∂r ∂g
V (x) = [x, h (x)] + β [x, h (x)] V {g [x, h (x)]}. (3.1.8)
∂x ∂x

This is a version of a formula of Benveniste and Scheinkman (1979). We often

encounter settings in which the transition law can be formulated so that the state
∂g
x does not appear in it, so that ∂x = 0 , which makes equation (3.1.8 ) become

∂r
V (x) = [x, h (x)] . (3.1.9)
∂x

At this point, we describe three broad computational strategies that apply in

various contexts.

3 The time invariance of the policy function u = h(x ) is very convenient econo-
t t
metrically, because we can impose a single decision rule for all periods. This lets us
pool data across period to estimate the free parameters of the return and transition
functions that underlie the decision rule.
Sequential problems 81

3.1.1. Three computational methods

There are three main types of computational methods for solving dynamic programs.
All aim to solve the functional equation (3.1.4 ).

Value function iteration. The ﬁrst method proceeds by constructing a sequence

of value functions and associated policy functions. The sequence is created by iter-
ating on the following equation, starting from V0 = 0 , and continuing until Vj has
converged: 4
Vj+1 (x) = max{r (x, u) + βVj (x̃)}, (3.1.10)
u
5
subject to x̃ = g(x, u), x given. This method is called value function iteration or
iterating on the Bellman equation.

Guess and verify. A second method involves guessing and verifying a solution
V to equation (3.1.5 ). This method relies on the uniqueness of the solution to the
equation, but because it relies on luck in making a good guess, it is not generally
available.

Howard’s improvement algorithm. A third method, known as policy function

iteration or Howard’s improvement algorithm, consists of the following steps:

1. Pick a feasible policy, u = h0 (x), and compute the value associated with oper-
ating forever with that policy:
∞

Vhj (x) = β t r [xt , hj (xt )] ,
t=0

where xt+1 = g[xt , hj (xt )], with j = 0 .

2. Generate a new policy u = hj+1 (x) that solves the two-period problem

max{r (x, u) + βVhj [g (x, u)]},

for each x.
3. Iterate over j to convergence on steps 1 and 2.

4 See the appendix on functional analysis for what it means for a sequence of
functions to converge.
5 A proof of the uniform convergence of iterations on equation (3.1.10 ) is contained
in the appendix on functional analysis, chapter A.
82 Chapter 3: Dynamic Programming

In the appendix on functional analysis, chapter A, we describe some conditions

under which the improvement algorithm converges to the solution of Bellman’s equa-
tion. The method often converges faster than does value function iteration (e.g., see
exercise 2.1 at the end of this chapter). 6 The policy improvement algorithm is also
a building block for the methods for studying government policy to be described in
chapter 22.
Each of these methods has its uses. Each is “easier said than done,” because it is
typically impossible analytically to compute even one iteration on equation (3.1.10 ).
This fact thrusts us into the domain of computational methods for approximating
solutions: pencil and paper are insufficient. The following chapter describes some
computational methods that can be used for problems that cannot be solved by hand.
Here we shall describe the first of two special types of problems for which analytical
solutions can be obtained. It involves Cobb-Douglas constraints and logarithmic
preferences. Later in chapter 5, we shall describe a specification with linear constraints
and quadratic preferences. For that special case, many analytic results are available.
These two classes have been important in economics as sources of examples and as
inspirations for approximations.

3.1.2. Cobb-Douglas transition, logarithmic preferences

Brock and Mirman (1972) used the following optimal growth example. 7 A planner
chooses sequences {ct , kt+1 }∞
t=0 to maximize

∞

β t ln (ct )
t=0

subject to a given value for k0 and a transition law

kt+1 + ct = Aktα , (3.1.11)

where A > 0, α ∈ (0, 1), β ∈ (0, 1).

6 The quickness of the policy improvement algorithm is linked to its being an

implementation of Newton’s method, which converges quadratically while iteration
on the Bellman equation converges at a linear rate. See chapter 4 and the appendix
on functional analysis, chapter A.
7 See also Levhari and Srinivasan (1969).
Sequential problems 83

This problem can be solved “by hand,” using any of our three methods. We begin
with iteration on the Bellman equation. Start with v0 (k) = 0 , and solve the one-
period problem: choose c to maximize ln(c) subject to c + k̃ = Ak α . The solution
is evidently to set c = Ak α , k̃ = 0 , which produces an optimized value v1 (k) =
1 βα
ln A + α ln k . At the second step, we ﬁnd c = 1+βα Ak α , k̃ = 1+βα Ak α , v2 (k) =
A αβA
ln 1+αβ + β ln A + αβ ln 1+αβ + α(1 + αβ) ln k . Continuing, and using the algebra of
geometric series, gives the limiting policy functions c = (1 − βα)Ak α , k̃ = βαAk α ,
and the value function v(k) = (1 − β)−1 {ln[A(1 − βα)] + 1−βα
βα
ln(Aβα)} + 1−βαα
ln k .
Here is how the guess-and-verify method applies to this problem. Since we already
know the answer, we’ll guess a function of the correct form, but leave its coeﬃcients
undetermined. 8 Thus, we make the guess

v (k) = E + F ln k, (3.1.12)

where E and F are undetermined constants. The left and right sides of equation
(3.1.12 ) must agree for all values of k . For this guess, the ﬁrst-order necessary
condition for the maximum problem on the right side of equation (3.1.10 ) implies the
following formula for the optimal policy k̃ = h(k), where k̃ is next period’s value and
k is this period’s value of the capital stock:
βF
k̃ = Ak α . (3.1.13)
1 + βF
Substitute equation (3.1.13 ) into the Bellman equation and equate the result to the
right side of equation (3.1.12 ). Solving the resulting equation for E and F gives
F = α/(1 − αβ) and E = (1 − β)−1 [ln A(1 − αβ) + 1−αβ βα
ln Aβα]. It follows that

k̃ = βαAk α . (3.1.14)

Note that the term F = α/(1 − αβ) can be interpreted as a geometric sum α[1 +
αβ + (αβ)2 + . . .].
Equation (3.1.14 ) shows that the optimal policy is to have capital move according
to the diﬀerence equation kt+1 = Aβαktα , or ln kt+1 = ln Aβα + α ln kt . That α
is less than 1 implies that kt converges as t approaches inﬁnity for any positive
α
initial value k0 . The stationary point is given by the solution of k∞ = Aβαk∞ , or
α−1 −1
k∞ = (Aβα) .

8 This is called the method of undetermined coeﬃcients.

84 Chapter 3: Dynamic Programming

3.1.3. Euler equations

In many problems, there is no unique way of defining states and controls, and sev-
eral alternative definitions lead to the same solution of the problem. Sometimes the
states and controls can be defined in such a way that xt does not appear in the
transition equation, so that ∂gt /∂xt ≡ 0 . In this case, the first-order condition
for the problem on the right side of the Bellman equation in conjunction with the
Benveniste-Scheinkman formula implies
∂rt ∂gt ∂rt+1 (xt+1 , ut+1 )
(xt , ut ) + (ut ) · = 0, xt+1 = gt (ut ) .
∂ut ∂ut ∂xt+1
The first equation is called an Euler equation. Under circumstances in which the
second equation can be inverted to yield ut as a function of xt+1 , using the second
equation to eliminate ut from the first equation produces a second-order difference
equation in xt , since eliminating ut+1 brings in xt+2 .

3.1.4. A sample Euler equation

As an example of an Euler equation, consider the Ramsey problem of choosing
∞ t
{ct , kt+1 }∞
t=0 to maximize t=0 β u(ct ) subject to ct + kt+1 = f (kt ), where k0
is given and the one-period utility function satisﬁes u (c) > 0, u (c) < 0, limct 0
u (ct ) = ∞; and where f (k) > 0, f (k) < 0 . Let the state be k and the control
be k , where k denotes next period’s value of k . Substitute c = f (k) − k into the
utility function and express the Bellman equation as

v (k) = max{u f (k) − k̃ + βv k̃ }. (3.1.15)
k̃

Application of the Benveniste-Scheinkman formula gives

v (k) = u f (k) − k̃ f (k) . (3.1.16)
Notice that the ﬁrst-order condition for the maximum problem on the right side of
equation (3.1.15 ) is −u [f (k) − k̃] + βv (k̃) = 0 , which, using equation v(3.1.16 ),
gives
u f (k) − k̃ = βu f k̃ − k̂ f (k ) , (3.1.17)

where k̂ denotes the “two-period-ahead” value of k . Equation (3.1.17 ) can be ex-

pressed as
u (ct+1 )
1=β f (kt+1 ) ,
u (ct )
Stochastic control problems 85

an Euler equation that is exploited extensively in the theories of ﬁnance, growth, and
real business cycles.

3.2. Stochastic control problems

We now consider a modiﬁcation of problem (3.1.1 ) to permit uncertainty. Essentially,
we add some well-placed shocks to the previous non-stochastic problem. So long as the
shocks are either independently and identically distributed or Markov, straightforward
modiﬁcations of the method for handling the nonstochastic problem will work.
Thus, we modify the transition equation and consider the problem of maximizing
∞

E0 β t r (xt , ut ) , 0 < β < 1, (3.2.1)
t=0

subject to
xt+1 = g (xt , ut , t+1 ) , (3.2.2)

with x0 known and given at t = 0 , where t is a sequence of independently and iden-

tically distributed random variables with cumulative probability distribution function
prob{t ≤ e} = F (e) for all t; Et (y) denotes the mathematical expectation of a ran-
dom variable y , given information known at t. At time t, xt is assumed to be known,
but xt+j , j ≥ 1 is not known at t. That is, t+1 is realized at (t + 1), after ut has
been chosen at t. In problem (3.2.1 )–(3.2.2 ), uncertainty is injected by assuming
that xt follows a random difference equation.
Problem (3.2.1 )–(3.2.2 ) continues to have a recursive structure, stemming jointly
from the additive separability of the objective function (3.2.1 ) in pairs (xt , ut ) and
from the difference equation characterization of the transition law (3.2.2 ). In partic-
ular, controls dated t affect returns r(xs , us ) for s ≥ t but not earlier. This feature
implies that dynamic programming methods remain appropriate.
The problem is to maximize expression (3.2.1 ) subject to equation (3.2.2 ) by
choice of a “policy” or “contingency plan” ut = h(xt ). The Bellman equation (3.1.5 )
becomes
V (x) = max{r (x, u) + βE [V [g (x, u, )] |x]}, (3.2.3)
u

where E{V [g(x, u, )]|x} = V [g(x, u, )]dF () and where V (x) is the optimal value
of the problem starting from x at t = 0 . The solution V (x) of equation (3.2.3 ) can
86 Chapter 3: Dynamic Programming

be computed by iterating on

Vj+1 (x) = max{r (x, u) + βE [Vj [g (x, u, )] |x]}, (3.2.4)

starting from any bounded continuous initial V0 . Under various particular regularity
conditions, there obtain versions of the same four properties listed earlier. 9
The first-order necessary condition for the problem on the right side of equation
(3.2.3 ) is
∂r (x, u) ∂g
+ βE (x, u, ) V [g (x, u, )] |x = 0,
∂u ∂u
which we obtained simply by differentiating the right side of equation (3.2.3 ), passing
the differentiation operation under the E (an integration) operator. Off corners, the
value function satisfies

∂r ∂g
V (x) = [x, h (x)] + βE [x, h (x) , ] V (g [x, h (x) , ]) |x .
∂x ∂x

In the special case in which ∂g/∂x ≡ 0 , the formula for V (x) becomes

∂r
V (x) = [x, h (x)] .
∂x
Substituting this formula into the ﬁrst-order necessary condition for the problem gives
the stochastic Euler equation

∂r ∂g ∂r
(x, u) + βE (x, u, ) (x̃, ũ) |x = 0,
∂u ∂u ∂x

where tildes over x and u denote next-period values.

9 See Stokey and Lucas (with Prescott) (1989), or the framework presented in the
appendix on functional analysis, chapter A.
Exercise 87

3.3. Concluding remarks

This chapter has put forward basic tools and findings: the Bellman equation and
several approaches to solving it; the Euler equation; and the Beneveniste-Scheinkman
formula. To appreciate and believe in the power of these tools requires more words
and more practice than we have yet supplied. In the next several chapters, we put
the basic tools to work in different contexts with particular specification of return and
transition equations designed to render the Bellman equation susceptible to further
analysis and computation.

Exercise

Exercise 3.1 Howard’s policy iteration algorithm

Consider the Brock-Mirman problem: to maximize
∞

E0 β t ln ct ,
t=0

subject to ct + kt+1 ≤ Aktα θt ,k0 given, A > 0 , 1 > α > 0 , where {θt } is an i.i.d.
sequence with ln θt distributed according to a normal distribution with mean zero
and variance σ 2 .
Consider the following algorithm. Guess at a policy of the form kt+1 = h0 (Aktα θt )
for any constant h0 ∈ (0, 1). Then form
∞

J0 (k0 , θ0 ) = E0 β t ln (Aktα θt − h0 Aktα θt ) .
t=0

Next choose a new policy h1 by maximizing

ln (Ak α θ − k ) + βEJ0 (k , θ ) ,

where k = h1 Ak α θ . Then form

∞

J1 (k0 , θ0 ) = E0 β t ln (Aktα θt − h1 Aktα θt ) .
t=0

Continue iterating on this scheme until successive hj have converged.

Show that, for the present example, this algorithm converges to the optimal
policy function in one step.

Shiyu Zhao - Mathematical Foundation of Reinforcement Learning (2024, Tsinghua University Press, Springer) - Libgen - Li
No ratings yet
Shiyu Zhao - Mathematical Foundation of Reinforcement Learning (2024, Tsinghua University Press, Springer) - Libgen - Li
283 pages
Part 10
No ratings yet
Part 10
57 pages
Book All-In-One 2
No ratings yet
Book All-In-One 2
281 pages
Book All in One
No ratings yet
Book All in One
288 pages
Handout 10 Dynamic Programming Nov14
No ratings yet
Handout 10 Dynamic Programming Nov14
113 pages
2 Dynamic
No ratings yet
2 Dynamic
50 pages
Markov Decision Process II
No ratings yet
Markov Decision Process II
88 pages
3 - Chapter 4 Value Iteration and Policy Iteration
No ratings yet
3 - Chapter 4 Value Iteration and Policy Iteration
20 pages
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
No ratings yet
3 - Chapter 3 Optimal State Values and Bellman Optimality Equation
21 pages
2 Growth Neoclassical Growth
No ratings yet
2 Growth Neoclassical Growth
71 pages
3 - Chapter 4 Value Iteration and Policy Iteration
No ratings yet
3 - Chapter 4 Value Iteration and Policy Iteration
20 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
62 pages
CBSE Class 3 EVS Worksheet - 3 PDF
100% (3)
CBSE Class 3 EVS Worksheet - 3 PDF
4 pages
Macro2 HW2 Solution v1
No ratings yet
Macro2 HW2 Solution v1
15 pages
Dynamic Programming
No ratings yet
Dynamic Programming
21 pages
Dynamic Programming
No ratings yet
Dynamic Programming
37 pages
Fa19 Lecture 15 MDPs II
No ratings yet
Fa19 Lecture 15 MDPs II
76 pages
Sanchez Kai PS2
No ratings yet
Sanchez Kai PS2
11 pages
EC106 DeterministicMOdels WS21 June13
No ratings yet
EC106 DeterministicMOdels WS21 June13
19 pages
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
No ratings yet
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
55 pages
Lecture 3 and 4
No ratings yet
Lecture 3 and 4
14 pages
P550
No ratings yet
P550
27 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
Dp-Intro Dynamic Programming
No ratings yet
Dp-Intro Dynamic Programming
4 pages
RL Unit-4
No ratings yet
RL Unit-4
18 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
14 pages
3 Recursive
No ratings yet
3 Recursive
8 pages
SanchezPajueloKai PS3
No ratings yet
SanchezPajueloKai PS3
6 pages
Notes ValueFunctionIteration
No ratings yet
Notes ValueFunctionIteration
10 pages
3 - Chapter 9 Policy Gradient Methods
No ratings yet
3 - Chapter 9 Policy Gradient Methods
24 pages
MS&E 221: Stochastic Modeling: Session 7: Nonlinear Optimization, Markov Decision Processes
No ratings yet
MS&E 221: Stochastic Modeling: Session 7: Nonlinear Optimization, Markov Decision Processes
18 pages
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
No ratings yet
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
10 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
Methods For Applied Macroeconomic Research - ch2
No ratings yet
Methods For Applied Macroeconomic Research - ch2
38 pages
Subtitle
No ratings yet
Subtitle
1 page
Computational Economics: Session 16: Numerical Dynamic Programming
No ratings yet
Computational Economics: Session 16: Numerical Dynamic Programming
17 pages
DP - Bellman - 1741339134 2025-03-07 09 - 19 - 05
No ratings yet
DP - Bellman - 1741339134 2025-03-07 09 - 19 - 05
13 pages
EC004 OutputDynamics - Microfoundation 2022 Lecture5
No ratings yet
EC004 OutputDynamics - Microfoundation 2022 Lecture5
21 pages
Testbank For Medical Terminology For Health Professions 9th Edition Ehrlich Solution Manual
No ratings yet
Testbank For Medical Terminology For Health Professions 9th Edition Ehrlich Solution Manual
17 pages
Sample SOP For Visitor Visa Australia
No ratings yet
Sample SOP For Visitor Visa Australia
6 pages
Die Shear Test - Microelectronic Devices - Application Overview
No ratings yet
Die Shear Test - Microelectronic Devices - Application Overview
2 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
Admin Resume Sample
100% (1)
Admin Resume Sample
6 pages
Typeset by AMS-TEX
No ratings yet
Typeset by AMS-TEX
27 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
5 Year Procurement Projection 30032023
No ratings yet
5 Year Procurement Projection 30032023
26 pages
Mathii at Su and Sse: John Hassler Iies, Stockholm University February 25, 2005
No ratings yet
Mathii at Su and Sse: John Hassler Iies, Stockholm University February 25, 2005
87 pages
Dynamic Equilibrium Models III: Infinite Periods
No ratings yet
Dynamic Equilibrium Models III: Infinite Periods
15 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Group Work Project: Mscfe 660 Case Studies in Risk Management
100% (1)
Group Work Project: Mscfe 660 Case Studies in Risk Management
7 pages
2 - Overview of This Book
No ratings yet
2 - Overview of This Book
4 pages
Bellman Routingproblem 1958
No ratings yet
Bellman Routingproblem 1958
5 pages
Electric Rop
No ratings yet
Electric Rop
2 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
Invitation Letter 2
No ratings yet
Invitation Letter 2
7 pages
Resolution 2024-38 (Realignment Diesel)
No ratings yet
Resolution 2024-38 (Realignment Diesel)
2 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Paulo Brito Ecomat Discreto
No ratings yet
Paulo Brito Ecomat Discreto
49 pages
A Child's Guide To Dynamic Programming
No ratings yet
A Child's Guide To Dynamic Programming
20 pages
Computing The Cake Eating Problem
No ratings yet
Computing The Cake Eating Problem
13 pages
EE675A Lec12
No ratings yet
EE675A Lec12
5 pages
Applications of Multi Rotor Drone Technologies in Construction Management
No ratings yet
Applications of Multi Rotor Drone Technologies in Construction Management
14 pages
Information About Netbook Axioo Neon CNW
0% (1)
Information About Netbook Axioo Neon CNW
16 pages
1 - Table of Contents
No ratings yet
1 - Table of Contents
6 pages
Namic Programming
No ratings yet
Namic Programming
18 pages
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
No ratings yet
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
30 pages
Hasee HP500 Laptop Schematics
No ratings yet
Hasee HP500 Laptop Schematics
41 pages
Opt Quizzer Tabag
No ratings yet
Opt Quizzer Tabag
21 pages
Activity No. 1
No ratings yet
Activity No. 1
1 page
Valvula Mariposa Danais 150
No ratings yet
Valvula Mariposa Danais 150
15 pages
Kr11 Plus 1-10kva
No ratings yet
Kr11 Plus 1-10kva
2 pages
Practicum Action Plan
No ratings yet
Practicum Action Plan
2 pages
Yarber File 1
No ratings yet
Yarber File 1
29 pages
Morning Briefing (May 07, 2012)
No ratings yet
Morning Briefing (May 07, 2012)
2 pages
Record Sheet
No ratings yet
Record Sheet
5 pages
Internship Assessment Report On: Prakash Gupta 1812210083 Under The Guidance of Er. Devendra Kumar
No ratings yet
Internship Assessment Report On: Prakash Gupta 1812210083 Under The Guidance of Er. Devendra Kumar
26 pages
Cambria Company v. Cosmos Granite & Marble, NC - Complaint
No ratings yet
Cambria Company v. Cosmos Granite & Marble, NC - Complaint
17 pages
Task Card 5 - Confidence Intervals
No ratings yet
Task Card 5 - Confidence Intervals
3 pages
Data Sheet - HBW2PER2 PDF
No ratings yet
Data Sheet - HBW2PER2 PDF
2 pages
Sal Proj Statement r3
No ratings yet
Sal Proj Statement r3
86 pages
Dynamic Programming Handout - : 14.451 Recitation, February 18, 2005 - Todd Gormley
No ratings yet
Dynamic Programming Handout - : 14.451 Recitation, February 18, 2005 - Todd Gormley
11 pages
The Effect of Globalization On Vietnamese People's Eating and Drinking Habits
No ratings yet
The Effect of Globalization On Vietnamese People's Eating and Drinking Habits
3 pages
Local Link Portlaoise To Roscrea Bus Timetable
No ratings yet
Local Link Portlaoise To Roscrea Bus Timetable
2 pages
Shrey Choubey: Career Objective Skills
No ratings yet
Shrey Choubey: Career Objective Skills
2 pages
Yoon-Suin Twenty Tea Houses
100% (2)
Yoon-Suin Twenty Tea Houses
1 page
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

SLchapt 3

Uploaded by

SLchapt 3

Uploaded by

Chapter 3.

3.1. Sequential problems

subject to xt+1 = g(xt , ut ), with x0 given. We assume that r(xt , ut ) is a concave

function. In particular, deﬁne

where again the maximization is subject to xt+1 = g(xt , ut ), with x0 given. Of

max{r (x, u) + βV (x̃)}, (3.1.4)

Vj+1 (x) = max{r (x, u) + βVj (x̃)}, (3.1.7)

This is a version of a formula of Benveniste and Scheinkman (1979). We often

At this point, we describe three broad computational strategies that apply in

3.1.1. Three computational methods

Value function iteration. The ﬁrst method proceeds by constructing a sequence

Howard’s improvement algorithm. A third method, known as policy function

where xt+1 = g[xt , hj (xt )], with j = 0 .

max{r (x, u) + βVhj [g (x, u)]},

In the appendix on functional analysis, chapter A, we describe some conditions

3.1.2. Cobb-Douglas transition, logarithmic preferences

subject to a given value for k0 and a transition law

kt+1 + ct = Aktα , (3.1.11)

where A > 0, α ∈ (0, 1), β ∈ (0, 1).

6 The quickness of the policy improvement algorithm is linked to its being an

8 This is called the method of undetermined coeﬃcients.

3.1.3. Euler equations

3.1.4. A sample Euler equation

Application of the Benveniste-Scheinkman formula gives

where k̂ denotes the “two-period-ahead” value of k . Equation (3.1.17 ) can be ex-

3.2. Stochastic control problems

with x0 known and given at t = 0 , where t is a sequence of independently and iden-

Vj+1 (x) = max{r (x, u) + βE [Vj [g (x, u, )] |x]}, (3.2.4)

where tildes over x and u denote next-period values.

3.3. Concluding remarks

Exercise 3.1 Howard’s policy iteration algorithm

Next choose a new policy h1 by maximizing

where k = h1 Ak α θ . Then form

Continue iterating on this scheme until successive hj have converged.

You might also like

with x0 known and given at t = 0 , where t is a sequence of independently and iden-

Vj+1 (x) = max{r (x, u) + βE [Vj [g (x, u, )] |x]}, (3.2.4)