MA 668 2024 Lecture 23
MA 668 2024 Lecture 23
Lecture 23
3 Next, we aim to show that the inequality above can be reversed. Take an
arbitrary admissible control u ∈ A and consider what is known as an
ϵ-optimal control denoted by vϵ ∈ A and defined as a control which is
better than H(t, x) − ϵ, but of course not as good as H(t, x) i.e., a control
such that ϵ
H(t, x) ≥ H v (t, x) ≥ H(t, x) − ϵ. (2)
The Dynamic Programming Principle (Contd ...)
1 Such a control exists, assuming that the value function is continuous in
the space of controls.
2 Consider next the modification of the ϵ-optimal control.
vϵ = ut 1t≤τ + vϵ 1t>τ ,
e (3)
2 Moreover, since the above holds true for every u ∈ A, we have that:
Zτ
H(t, x) ≥ sup Et,x H(τ, Xuτ ) + F (s, Xus , us ) ds . (4)
u∈A
t
3 The upper bound (1) and lower bound (4) form the dynamic programming
inequalities. Putting them together, we obtain the following Theorem.
The Dynamic Programming Principle (Contd ...)
Theorem
Dynamic Programming Principle for Diffusions. The value function satisfies the
DPP:
Zτ
u u
H(t, x) = sup Et,x H(τ, Xτ ) + F (s, Xs , us ) ds , (5)
u∈A
t
1 This equation is really a sequence of equations that tie the value function
to its future expected value, plus the running reward/penalty.
2 Since it is a sequence of equations, an even more powerful equation can
be found by looking at its infinitesimal version, the so-called DPE.
DPE/HJB Equation
The DPE is an infinitesimal version of the dynamic programming principle
(DPP). There are two key ideas involved:
Idea 1
Setting the stopping time τ in the DPP to be the minimum between:
(a) The time it takes for the process Xut to exit a ball of size ϵ around its
starting point AND
(b) A fixed (small) time h: all while keeping it bounded by T .
This can be viewed in Figure 5.2 and can be stated precisely as: