0% found this document useful (0 votes)
3 views11 pages

MA 668 2024 Lecture 23

The document discusses the Dynamic Programming Principle (DPP) in the context of algorithmic and high-frequency trading, focusing on the relationship between the value function and its expected future value. It presents the theorem that the value function satisfies the DPP for diffusions and introduces the Dynamic Programming Equation (DPE) as an infinitesimal version of the DPP. The document also outlines the derivation of the DPE and its implications in stochastic control problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

MA 668 2024 Lecture 23

The document discusses the Dynamic Programming Principle (DPP) in the context of algorithmic and high-frequency trading, focusing on the relationship between the value function and its expected future value. It presents the theorem that the value function satisfies the DPP for diffusions and introduces the Dynamic Programming Equation (DPE) as an infinitesimal version of the DPP. The document also outlines the derivation of the DPE and its implications in stochastic control problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MA668: Algorithmic and High Frequency Trading

Lecture 23

Prof. Siddhartha Pratim Chakrabarty


Department of Mathematics
Indian Institute of Technology Guwahati
The Dynamic Programming Principle (Contd ...)
1 Note that on the right-hand side of the above, the arbitrary control u only
acts over the interval [t, T ] and the optimal one is implicitly incorporated
in the value function H(τ, Xuτ ) but starting at the point to which the
arbitrary control u caused the process x to flow, namely, xuτ .
2 Taking supremum over admissible strategies on the left-hand side, so that
the left-hand side also reduces to the value function, we have that:
 

H(t, x) ≤ sup Et,x H(τ, Xuτ ) + F (s, Xus , us ) ds  . (1)
u∈A
t

3 Next, we aim to show that the inequality above can be reversed. Take an
arbitrary admissible control u ∈ A and consider what is known as an
ϵ-optimal control denoted by vϵ ∈ A and defined as a control which is
better than H(t, x) − ϵ, but of course not as good as H(t, x) i.e., a control
such that ϵ
H(t, x) ≥ H v (t, x) ≥ H(t, x) − ϵ. (2)
The Dynamic Programming Principle (Contd ...)
1 Such a control exists, assuming that the value function is continuous in
the space of controls.
2 Consider next the modification of the ϵ-optimal control.

vϵ = ut 1t≤τ + vϵ 1t>τ ,
e (3)

i.e., the modification is ϵ-optimal after the stopping time T , but


potentially sub-optimal on the interval [t, T ]. Then we have:
ϵ
H(t, x) ≥ H ev (t, x)
 
Zτ  
eϵ eϵ eϵ
= Et,x H v (τ, Xvτ ) + F s, Xvs , e
vsϵ ds  ,
t
 


= Et,x H e
(τ, Xuτ ) + F (s, Xus , us ) ds  , (using (3))
t
 

≥ Et,x H(τ, Xuτ ) + F (s, Xus , us ) ds  . (by (2))
t
The Dynamic Programming Principle (Contd ...)
1 Taking limit as ϵ ↓ 0, we have,
 

H(t, x) ≥ Et,x H(τ, Xuτ ) + F (s, Xus , us ) ds  .
t

2 Moreover, since the above holds true for every u ∈ A, we have that:
 

H(t, x) ≥ sup Et,x H(τ, Xuτ ) + F (s, Xus , us ) ds  . (4)
u∈A
t

3 The upper bound (1) and lower bound (4) form the dynamic programming
inequalities. Putting them together, we obtain the following Theorem.
The Dynamic Programming Principle (Contd ...)

Theorem

Dynamic Programming Principle for Diffusions. The value function satisfies the
DPP:  

u u
H(t, x) = sup Et,x H(τ, Xτ ) + F (s, Xs , us ) ds  , (5)
u∈A
t

for all (t, x) ∈ [0, T ] × Rn and all stopping times τ ≤ T .

1 This equation is really a sequence of equations that tie the value function
to its future expected value, plus the running reward/penalty.
2 Since it is a sequence of equations, an even more powerful equation can
be found by looking at its infinitesimal version, the so-called DPE.
DPE/HJB Equation
The DPE is an infinitesimal version of the dynamic programming principle
(DPP). There are two key ideas involved:

Idea 1
Setting the stopping time τ in the DPP to be the minimum between:
(a) The time it takes for the process Xut to exit a ball of size ϵ around its
starting point AND
(b) A fixed (small) time h: all while keeping it bounded by T .
This can be viewed in Figure 5.2 and can be stated precisely as:

τ = T ∧ inf {s > t : (s − t, |Xus − x|) ∈


/ [0, h) × [0, ϵ)} .

Notice that as h ↓ 0, τ ↓ t, a.s. and that τ = t + h whenever h is sufficiently


small: since as the time span h shrinks, it is less and less likely that X will exit
the ball first.
Figure 5.2

Figure: Figure 5.2


Idea 2
1 Writing the value function (for an arbitrary admissible control u) at the
stopping time τ in terms of the value function at t using Ito’s lemma.
Specifically, assuming enough regularity of the value function, we can
write:
Zτ Zτ
H(τ, Xuτ ) = H(t, x) + (∂s + Lus ) H(s, Xus )ds + Dx H(s, Xus )′ σsu dWs ,
t t
(6)
where σtu := σ (t, Xut , ut ), Lut represents the infinitesimal generator of Xut
and Dx H(·) denotes the vector of partial derivatives with components
[Dx H(·)]i = ∂xi H(·).
2 For example, in the one-dimensional case:
1 u 2 1
Lut = µut ∂x + (σt ) ∂xx = µ(t, x, u)∂x + σ 2 (t, u, x)∂xx .
2 2
DPE/HJB Equation (Contd ...)
1 As before, we derive the DPE in two stages by obtaining two inequalities.
First, taking v ∈ A to be constant over the interval [t, τ ], applying the
lower bound and substituting (6) into the right-hand side implies that:
 

H(t, x) ≥ sup Et,x H(τ, Xuτ ) + F (s, Xus , us ) ds  ,
u∈A
t
 

≥ Et,x H(τ, Xvτ ) + F (s, Xvs , vs ) ds  ,
t


= Et,x H(t, x) + (∂s + Lvs ) H(s, Xvs )ds
t

Zτ Zτ
+ Dx H(s, Xvs )′ σsv dWs + F (s, Xvs , vs ) ds  .
t t
DPE/HJB Equation (Contd ...)
1 The integrand in the stochastic integral above, Dx H(s, Xvs )′ σsv , is bounded
on the interval [t, τ ], since we have ensured that |Xvt − x| ≤ ϵ on the
interval.
2 Hence, this stochastic integral is the increment of a martingale and we
can be assured that its expectation is zero.
3 Therefore:
   
Zτ  Zτ 
H(t, x) ≥ Et,x H(t, x) + (∂s + Lvs ) H(s, Xvs ) + F (s, Xvs , v) ds  ,
 
t t

and recall that τ = t + h.


DPE/HJB Equation (Contd ...)
1 Moving the H(t, x) on the left-hand side over to the right-hand side,
dividing by h and taking the limit as h ↓ 0 yields:
 τ 
Z
1
0 ≥ lim Et,x  {(∂s + Lvs ) H(s, Xvs ) + F (s, Xvs , v)} ds  ,
h↓0 h
t
= (∂t + Lvt ) H(t, x) + F (t, x, v).

2 The second line follows from:


(i) As h ↓ 0, τ = t + h a.s. since the process will not hit the barrier of ϵ
in extremely short periods of time,
(ii) The condition that |Xuτ − x| ≤ ϵ, which implies that if the process
does hit the barrier it is bounded,
Zt+h
1
(iii) The Mean-Value Theorem allows us to write lim ωs ds = ωt , and
h↓0 h
t
(iv) The process starts at Xvt = x.

You might also like