0% found this document useful (0 votes)
22 views28 pages

Intro To Sdes

Uploaded by

shineforests
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views28 pages

Intro To Sdes

Uploaded by

shineforests
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

A short introduction to

stochastic differential equations

Tobias Jahnke
Karlsruher Institut für Technologie
Fakultät für Mathematik
Institut für Angewandte und Numerische Mathematik
[email protected]

© Tobias Jahnke, Karlsruhe 2023/24

Version: October 19, 2023


© Tobias Jahnke 2023/24 1

1 Motivation
Financial markets trade investments into stocks of a company, commodities (e.g. oil, gold),
bonds, or derivatives. Bonds evolve in a predictive way, but stocks and commodities do
not. They are risky assets, because their value is affected by randomness. Our goal is to
model the price of such a risky asset.
Many processes in science, technology, and engineering can be described very accurately
with ordinary differential equations (ODEs)
dX
= f (t, X).
dt
Here, X = X(t) is the time-dependent solution, and f = f (t, X) is a given function
which depends on what is supposed to be modeled. But the solution of an ODE has no
randomness – it is a purely deterministic object. If we know its value X(t0 ) at a given
time t0 , then by solving the ODE we can compute its value X(t) at any later time t ≥ 0.
This is certainly not true for stocks. In order to model risky assets, we have to put some
randomness into the dynamics. A first and rather naı̈ve approach to do so is to add a
term which generates “random noise”:
dX
= f (t, X) + g(t, X)Z(t) . (1)
|dt {z } | {z
random noise
}
ordinary differential
equation

The next questions are obviously how to choose g(t, X), and how to define Z(t) in a
mathematically sound way. But even if these problems can be solved, Equation (1) is
dubious. The solution of an ODE is, by definition, a differentiable function, whereas the
chart of a stock typically looks like this:

It can be expected that such a function is continuous, but not differentiable. This raises
the question if (1) makes sense at all. So what is the right way to define a stochastic
differential equation?

The goal of these notes is to give an informal introduction to stochastic differential equa-
tion (SDEs) of Itô type. They are based on the Itô integral, which can be thought of as
an extension of the classical Riemann-Stieltjes integral to cases where both the integrand
and the integrator can be stochastic processes. Another important tool is the Itô-Doeblin
formula, which is a stochastic counterpart of the classical chain rule.
2 A short introduction to stochastic differential equations – October 19, 2023

The exposition in these notes is not mathematically rigorous. I have tried to be as


precise as possible, but also to avoid technicalities whenever possible. Many proofs are
omitted. For a comprehensive discussion of Itô SDEs the reader is referred to the books
[Ste01, Shr04, Øks03, KP99, BK04].

2 Stochastic processes and filtrations


Let (Ω, F, P) be a probability space1 : Ω 6= ∅ is a set, F is a σ-algebra (or σ-field) on
Ω, and P : F −→ [0, 1] is a probability measure. A probability space is complete if F
contains all subsets G of Ω with P-outer measure zero, i.e. with
P∗ (G) := inf P(F ) : F ∈ F and G ⊂ F = 0.


Any probability space can be completed. Hence, we can assume that every probability
space in these notes is complete.

Definition 2.1 (Stochastic process) Let T be an ordered set (e.g. T = [0, ∞) or


T = N). A stochastic process is a family X = {Xt : t ∈ T } of random variables
Xt : Ω −→ Rd .
Equivalent notations are X(t, ω), X(t), Xt (ω), Xt , . . . Below, we will often write Xt
instead of {Xt : t ∈ T }. For a fixed ω ∈ Ω, the function t 7→ Xt (ω) is called a realization
(or path or trajectory) of X.

The path of a stochastic process is associated to some ω ∈ Ω. As time evolves, more


information about ω becomes available.

Example (cf. chapter 2 in [Shr04]). If we toss a coin three times, then the possible
results are:
ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8
HHH HHT HTH HTT THH THT TTH TTT
(H = heads, T = tails).
• Before the first toss, we only know that ω ∈ Ω = {ω1 , . . . , ω8 }.
• After the first toss, we know if the final result will belong to
{HHH, HHT, HT H, HT T } or to {T HH, T HT, T T H, T T T }.
These sets are “resolved by the information”. Hence, we know in which of the sets
{w1 , w2 , w3 , w4 }, {w5 , w6 , w7 , w8 }
ω is.
1
See “2.2.2 What is (Ω, F, P) anyway?” in the book [CT04] for a nice discussion of this concept.
© Tobias Jahnke 2023/24 3

• After the second toss, the sets

{HHH, HHT }, {HT H, HT T }, {T HH, T HT }, {T T H, T T T }

are resolved, and we know in which of the sets

{w1 , w2 }, {ω3 , w4 }, {w5 , w6 }, {w7 , w8 }

ω is.
This motivates the following definition.

Definition 2.2 (Filtration)


• A filtration is a family {Ft : t ≥ 0} of sub-σ-algebras of F such that Fs ⊆ Ft for
all t ≥ s ≥ 0.
Interpretation: A filtration models the fact that more and more information about
the realization of a process is known as time evolves.
• If {Xt : t ≥ 0} is a family of random variables and Xt is Ft -measurable, then
{Xt : t ≥ 0} is adapted to (or nonanticipating with respect to) {Ft : t ≥ 0}.
Interpretation: At time t we know for each set S ∈ Ft if ω ∈ S or not. The value
of Xt is revealed at time t.
• For every s ∈ [0, t] let σ{Xs } be the σ-algebra generated by Xs , i.e. the collection of
all sets

Xs−1 (B) for all B ∈ B

where B denotes the Borel σ-algebra. By definition σ{Xs } is the smallest σ-algebra
where Xs is measurable.

3 The Wiener process


Robert Brown 1827, Louis Bachelier 1900, Albert Einstein 1905, Norbert Wiener 1923

Definition 3.1 (Normal distribution) A random variable X : Ω −→ Rd with d ∈ N is


normal if it has a multivariate normal (Gaussian) distribution with mean µ ∈ Rd
and a symmetric, positive definite covariance matrix Σ ∈ Rd×d , i.e.
Z
1  1 
P(X ∈ B) = p exp − (x − µ)T Σ−1 (x − µ) dx
(2π)d det(Σ) 2
B

for all Borel sets B ⊂ Rd . Notation: X ∼ N (µ, Σ)

Remarks:
4 A short introduction to stochastic differential equations – October 19, 2023

1. If X ∼ N (µ, Σ), then E(X) = µ and Σ = (σij ) with σij = E[(Xi − µi )(Xj − µj )].
2. Standard normal distribution ⇔ µ = 0, Σ = I (identity matrix).
3. If X ∼ N (µ, Σ) and Y = v + T X for some v ∈ Rd and a regular matrix T ∈ Rd×d ,
then
Y ∼ N (v + T µ, T ΣT T ). (2)
4. Warning: In one dimension, the covariance matrix is simply a number, namely the
variance. Unfortunately, the variance is usually denoted by σ 2 instead of σ in the
literature, which is somewhat confusing.

Definition 3.2 (Wiener process, Brownian motion)


(a) A stochastic process {Wt : t ∈ [0, T )} is called a standard Brownian motion or
standard Wiener process if it has the following properties:
1. W0 = 0 (with probability one)
2. Independent increments: For all 0 ≤ t1 < t2 < . . . < tn < T the random variables
Wt2 − Wt1 , Wt3 − Wt2 , ... , Wtn − Wtn−1
are independent.
3. Wt − Ws ∼ N (0, t − s) for any 0 ≤ s < t < T .
4. There is a Ω̃ ⊂ Ω with P(Ω̃) = 1 such that t 7→ Wt (ω) is continuous for all ω ∈ Ω̃.
(1) (d)
(b)
 If Wt , . . . ,Wt are independent one-dimensional Wiener processes, then Wt =
(1) (d)
Wt , . . . , W t is called a d-dimensional Wiener process, and

Wt − Ws ∼ N (0, (t − s)I).

The existence of Brownian motion was first proved in a mathematically rigorous way by
Norbert Wiener in 1923.
The Wiener process will serve as the “source of randomness” in our model of the financial
market.

Notation: Wt = Wt (ω) = W (t, ω) = W (t)

Numerical simulation of a Wiener process (d=1). In order to get an idea of what


a Wiener process is, we sketch how a realisation of a Wiener process can be simulated
numerically. All one has to do is to choose a step-size τ > 0, set tn = nτ and W̃0 = 0,
and to repeat the following steps:
for n = 0, 1, 2, 3, . . .
Generate random a number Zn ∼ N (0, 1)

W̃n+1 = W̃n + τ Zn
end for
© Tobias Jahnke 2023/24 5

For τ −→ 0 the interpolation of W̃0 , W̃1 , W̃2 , . . . approximates a path of the Wiener process
(W̃N ≈ Wnτ ).

How smooth is a path of a Wiener process? For simplicity, we consider only the case
d = 1.

Hölder continuity and non-differentiability


Definition 3.3 (Hölder continuity) A function f : (a, b) −→ R is Hölder continu-
ous of order α for some α ∈ [0, 1] if there is a constant C such that
|f (t) − f (s)| ≤ C|t − s|α for all s, t ∈ (a, b).
If α = 0, then f is bounded.
If α > 0, then f is uniformly continuous.
If α = 1, then f is Lipschitz continuous.

A path of the Wiener process on a bounded interval is


• Hölder continuous of order α ∈ [0, 21 ) with probability one, but
• not Hölder continuous of order α ≥ 12 with probability one.
A path of the Wiener process is nowhere differentiable with probability one.

Proofs: [Ste01], chapter 5

Unbounded total variation


Definition 3.4 (Total variation) Let [a, b] be an interval and let
P = {t0 , t1 , . . . , tN (P ) }, a = t0 < t1 < . . . < tN (P ) = b, N (P ) ∈ N
be a partition of this interval. Let P be the set of all such partitions. The total variation
of a function f : [a, b] −→ R is
N (P )
X
T Va,b (f ) = sup |f (tn ) − f (tn−1 )|. (3)
P ∈P n=1

If f is differentiable and f 0 is integrable, then it can be shown that


Zb
T Va,b (f ) = |f 0 (t)| dt.
a

Conversely, if a function f has bounded total variation, then its derivative exists for al-
most all t ∈ [a, b].
Consequence: A path of the Wiener process has unbounded total variation with proba-
bility one.
6 A short introduction to stochastic differential equations – October 19, 2023

Quadratic variation
The quadratic variation of a function f : (a, b) −→ R is
N
X 2
QVa,b (f ) = lim f (tn ) − f (tn−1 ) , |PN | = max |tn − tn−1 |
N −→ ∞ n=1,...,N
n=1
|PN | → 0

If f is continuously differentiable, then one can show that

QVa,b (f ) = 0.

For a path t 7→ Wt (ω) with t ∈ [0, T ], however, it can be shown that


N
X 2
lim Wtn (ω) − Wtn−1 (ω) − T = 0,
N −→ ∞
n=1 L2 (dP)
|PN | −→ 0
  21
where p Z
kXkL2 (dP) = E(X 2 ) =  X 2 (ω) dP(ω) .
ω∈Ω

By choosing a suitable subsequence, it can be concluded that QV0,t t 7→ Wt (ω) = t with
probability one.

Filtration of the Wiener process


The natural filtration of the Wiener process on [0, T ] is given by

{Ft : t ∈ [0, T ]}, Ft = σ{Ws , s ∈ [0, t]}

(cf. Definition 2.2). Ft contains all information (but not more) which can be obtained by
observing Ws on the interval [0, t]. For technical reasons, however, it is more advantageous
to use an augmented filtration called the standard Brownian filtration. See pp. 50-51
in [Ste01] for details.

4 Construction of the Itô integral (step 1 and 2)


References: [KP99, Øks03, Shr04, Ste01]

Now we return to the naı̈ve ansatz (1) for defining an SDE. We assume that f : R×R → R
and g : R × R → R are sufficiently regular functions. Although we do not yet know how
to choose the random noise Z(t), we formally apply the explicit Euler method:
Choose t ≥ 0 and N ∈ N, let τ = t/N , tn = nτ and define approximations Xn ≈ X(tn )
by

Xn+1 = Xn + τ f (tn , Xn ) + τ g(tn , Xn )Z(tn ) (n = 0, 1, 2, . . .)


© Tobias Jahnke 2023/24 7

with initial data X0 = X(0). In the special case f (t, X) = 0, g(t, X) = 1 and X(0) = 0,
we want that Xn = W (tn ) is the Wiener process, i.e. we postulate that
!
W (tn+1 ) = W (tn ) + τ Z(tn ).

This yields
 
Xn+1 = Xn + τ f (tn , Xn ) + g(tn , Xn ) W (tn+1 ) − W (tn )

and after N steps


N
X −1 N
X −1  
XN = X0 + τ f (tn , Xn ) + g(tn , Xn ) W (tn+1 ) − W (tn ) . (4)
n=0 n=0

Now we keep t fixed and let N −→ ∞, which means that τ = t/N −→ 0. Then, (4)
should somehow converge to
Zt Zt
X(t) = X(0) + f (s, X(s)) ds + g(s, X(s))dW (s) . (5)
0 0
| {z }
(?)

Problem: We cannot define (?) as a pathwise Riemann-Stieltjes integral (cf. Appendix B).
When N −→ ∞, the sum
N
X −1  
g(tn , Xn (ω)) W (tn+1 , ω) − W (tn , ω)
n=0

diverges with probability one, because a path of the Wiener process has unbounded total
variation with probability one.

New goal: Define the integral


Zt
It [u](ω) = u(s, ω) dWs (ω)
0

in a “reasonable” way for the following class of functions.

Definition 4.1 Let (Ω, F, P) be a probability space, and let {Ft : t ∈ [0, T ]} be the
standard Brownian filtration. Then, we define H2 [0, T ] to be the class of functions

u = u(t, ω), u : [0, T ] × Ω −→ R

with the following properties:


8 A short introduction to stochastic differential equations – October 19, 2023

• (t, ω) 7→ u(t, ω) is (B × F)-measurable.


• u is adapted to {Ft : t ∈ [0, T ]}, i.e. u(t, ·) is Ft -measurable.
 T 
Z
• E  u2 (t, ·) dt < ∞
0

Remark: B × F is the σ-algebra generated by all sets of the form B × F with B ∈ B


and F ∈ F. The product measure satisfies (µ × P)(B × F ) = µ(B)P(F ).

Step 1: Itô integral for elementary functions


Definition 4.2 (Elementary functions) A function φ ∈ H2 [0, T ] is called elemen-
tary if it is a stochastic step function of the form
N
X −1
φ(t, ω) = a0 (ω)1[0,0] (t) + an (ω)1(tn ,tn+1 ] (t)
n=0
N
X −1
= a0 (ω)1[0,t1 ] (t) + an (ω)1(tn ,tn+1 ] (t)
n=1

with a partition 0 = t0 < t1 < . . . < tN −1 < tN = T . The random variables an must be
Ftn -measurable with E(a2n ) < ∞. Here and below,
(
1 if t ∈ [c, d]
1[c,d] (t) = (6)
0 else

is the indicator function of an interval [c, d].


For 0 ≤ c < d ≤ T , the only reasonable way to define the Itô integral of an indicator
function 1(c,d] is
ZT Zd
IT [1(c,d] ](ω) = 1(c,d] (s) dW (s, ω) = dW (s, ω) = W (d, ω) − W (c, ω).
0 c

Hence, by linearity, we define the Itô integral of an elementary function by


N
X −1

IT [φ](ω) = an (ω) W (tn+1 , ω) − W (tn , ω) .
n=0

Lemma 4.3 (Itô isometry for elementary functions) For all elementary functions
we have
 T 
Z
E IT [φ]2 = E  φ2 (t, ·) dt


0
© Tobias Jahnke 2023/24 9

or equivalently

kIT [φ]kL2 (dP) = kφkL2 (dt×dP)

with
  21    12
Z ZT ZT
kφkL2 (dt×dP) =  φ2 (t, ω) dt dP = E  φ2 (t, ·) dt .
Ω 0 0

Proof. Since
N
X −1
φ2 (t, ω) = a20 (ω)1[0,0] (t) + a2n (ω)1(tn ,tn+1 ] (t)
n=0

we obtain
 T 
Z N
X −1
2
E a2n (tn+1 − tn )

E  φ (t, ·) dt = (7)
0 n=0

for the right-hand side. If we let ∆Wn = W (tn+1 ) − W (tn ), then


 !2  N −1 N −1
N
X −1 XX
E IT [φ]2 = E 

an ∆Wn  = E(am an ∆Wm ∆Wn ). (8)
n=0 n=0 m=0

• If n > m, then am an ∆Wm is Ftn -measurable. ∆Wn is independent of Ftn because


the Wiener process has independent increments. Hence, am an ∆Wm and ∆Wn are
independent. It follows that

E (an am ∆Wn ∆Wm ) = E (an am ∆Wm ) E (∆Wn ) = 0 for n > m

because ∆Wn ∼ N (0, tn+1 − tn ) by definition.


• By the same argument a2n and ∆Wn2 are independent. Hence, we obtain

E a2n ∆Wn2 = E a2n (tn+1 − tn )


 

because E(∆Wn2 ) = V(∆Wn ) = tn+1 − tn .


Hence, (8) simplifies to
−1
 NX
E IT [φ]2 = E a2n (tn+1 − tn ).

(9)
n=0

Comparing (7) and (9) yields the assertion.


10 A short introduction to stochastic differential equations – October 19, 2023

Step 2: Itô integral on H2 [0, T ]


Lemma 4.4 For any u ∈ H2 [0, T ] there is a sequence (φk )k∈N of elementary functions
φk ∈ H2 [0, T ] such that
lim ku − φk kL2 (dt×dP) = 0
k→∞

Proof: Section 6.6 in [Ste01].

Let u ∈ H2 [0, T ] and let (φk )k∈N be elementary functions such that
u = lim φk in L2 (dt × dP)
k→∞

as in Lemma 4.4. The linearity of IT [·] and Lemma 4.3 yield


kIT [φj ] − IT [φk ]kL2 (dP) = kIT [φj − φk ]kL2 (dP) = kφj − φk kL2 (dt×dP) −→ 0

for j, k −→ ∞. Hence, (IT [φk ])k is a Cauchy sequence in the Hilbert space L2 (dP). Thus,
(IT [φk ])k converges in L2 (dP), and we can define
IT [u] = lim IT [φk ].
k→∞

The choice of the sequence does not matter: If (ψk )k∈N is another sequence of elementary
functions with u = limk→∞ ψk in L2 (dt × dP), then by Lemma 4.3 we obtain for k −→ ∞
kIT [φk ] − IT [ψk ]kL2 (dP) = kIT [φk − ψk ]kL2 (dP)
= kφk − ψk kL2 (dt×dP)
≤ kφk − ukL2 (dt×dP) + ku − ψk kL2 (dt×dP) −→ 0.
Theorem 4.5 (Itô isometry) For all u ∈ H2 [0, T ] we have
kIT [u]kL2 (dP) = kukL2 (dt×dP) .
Proof: Let (φk )k∈N again be elementary functions such that u = limk→∞ φk in L2 (dt×dP);
cf. Lemma 4.4. Then
lim kφk kL2 (dt×dP) = kukL2 (dt×dP) ,
k→∞

because the reverse triangle inequality yields

kφk kL2 (dt×dP) − kukL2 (dt×dP) ≤ kφk − ukL2 (dt×dP) → 0.

By the same argument, we obtain


lim kIT [φk ]kL2 (dP) = kIT [u]kL2 (dP)
k→∞

Now the assertion follows from Lemma 4.3 by taking the limit.
© Tobias Jahnke 2023/24 11

5 Martingales
Definition 5.1 (conditional expectation) Let X be an integrable random variable,
and let G be a sub-σ-algebra of F. Then, Y is a conditional expectation of X with
respect to G if Y is G-measurable and if

E(X1A ) = E(Y 1A ) for all A ∈ G,


Z Z
⇔ X(ω) dP(ω) = Y (ω) dP(ω) for all A ∈ G.
A A

In this case, we write Y = E(X | G).

“This definition is not easy to love. Fortunately, love is not required.”


J.M. Steele in [Ste01], p. 45.

Interpretation. E(X | G) is a random variable on (Ω, G, P) and hence on (Ω, F, P), too.
Roughly speaking, E(X | G) is the best approximation of X detectable by the events in
G. The more G is refined, the better E(X | G) approximates X.
Examples.
R
1. If G = {Ω, ∅}, then E(X | G) = E(X) = Ω X(ω) dP(ω).
2. If G = F, then E(X | G) = X.
3. If F ∈ F with P(F ) > 0 and

G = {∅, F, Ω \ F, Ω}

then it can be shown that


Z
1


 XdP if ω ∈ F
 P(F )


F
E(X | G)(ω) = 1
Z

 XdP if ω ∈ Ω \ F.
 P(Ω \ F )


Ω\F

4. If X is independent of G, then E(X | G) = E(X).


Proof: Exercise.

Lemma 5.2 (Properties of the conditional expectation) For all integrable ran-
dom variables X and Y and all sub-σ-algebras G ⊂ F, the conditional expectation has the
following properties:
• Linearity: E(X + Y | G) = E(X | G) + E(Y | G)
• Positivity: If X ≥ 0, then E(X | G) ≥ 0.
• Tower property: If H ⊂ G ⊂ F are sub-σ-algebras, then
 
E E(X | G) | H = E(X | H)
12 A short introduction to stochastic differential equations – October 19, 2023

 
• E E(X | G) = E(X)
• Factorization property: If Y is G-measurable and |XY | and |Y | are integrable, then
E(XY | G) = Y E(X | G)
Proof: Exercise.
Definition 5.3 (martingale) Let Xt be a stochastic process which is adapted to a filtra-
tion {Ft : t ≥ 0} of F. If
1. E(|Xt |) < ∞ for all 0 ≤ t < ∞, and
2. E(Xt |Fs ) = Xs for all 0 ≤ s ≤ t < ∞,
then Xt is called a martingale. A martingale Xt is called continuous if there is a set
Ω0 ⊂ Ω with P(Ω0 ) = 1 such that the path t 7→ Xt (ω) is continuous for all ω ∈ Ω0 .
Interpretation: A martingale models a fair game. Observing the game up to time s
does not give any advantage for future times.
Examples. It can be shown that each of the following processes is a continuous martingale
with respect to the standard Brownian filtration:
α2
 
2
Wt , Wt − t, exp αWt − t with α ∈ R
2
Proof: Exercise.

6 Construction of the Itô integral (step 3 and 4)


Step 3: The Itô integral as a process
So far we have defined the Itô integral IT [u](ω) over the interval [0, T ] for fixed T . For
applications in mathematical finance, however, we want to consider It [u](ω) : t ∈ [0, T ]
as a stochastic process.
If u(s, ω) ∈ H2 [0, T ], then 1[0,t] (s)u(s, ω) ∈ H2 [0, T ]. Can we define It [u](ω) by
IT [1[0,t] u](ω)?
Problem: The integral IT [1[0,t] u](ω) is only defined in L2 (dP). Hence, the value IT [1[0,t] u](ω)
is arbitrary on sets Z ∈ Zt := {Z ∈ Ft : P(Z) = 0}. Since the set [0, T ] is uncountable2 ,
the union
[
Zt
t∈[0,T ]

(i.e. the set where the process is not well-defined) could be “very large”! Fortunately,
this can be fixed:
2
We only know that countable unions of null sets have measure zero, but this is not true for uncountabel
unions.
© Tobias Jahnke 2023/24 13

Theorem 6.1 For any u ∈ H2 [0, T ] there is a process {Xt : t ∈ [0, T ]} that is a continu-
ous martingale with respect to the standard Brownian filtration Ft such that
 
P ω ∈ Ω : Xt (ω) = IT [1[0,t] u](ω) = 1

for each t ∈ [0, T ].


A proof can be found in [Ste01], Theorem 6.2, pages 83-84.

Step 4: The Itô integral on L2loc [0, T ]


So far we have defined the Itô integral for functions u ∈ H2 [0, T ]; cf. Definition 4.1. Such
functions must satisfy
 T 
Z
E  u2 (t, ·) dt < ∞, (10)
0

and this condition is sometimes too restrictive.


Example: If y(x) = exp(x4 ), then u(t, ω) = y(Wt (ω)) 6∈ H2 [0, T ], because
 T 
Z Z ZT
2
E  u (t, ·) dt = exp(2Wt4 (ω)) dtdω
0 Ω 0
Z∞ ZT  2
4 1 x
= exp(2x ) √ exp − dt dx = ∞
2πt 2t
−∞ 0

since the first exp term is stronger than the second.


With some more work, the Itô integral can be extended to the class L2loc [0, T ], i.e. to all
functions
u = u(t, ω), u : [0, T ] × Ω −→ R
with the following properties:
• (t, ω) 7→ u(t, ω) is (B × F)-measurable.
• u is adapted to {Ft : t ∈ [0, T ]}.
 T 
Z
• P  u2 (t, ω) dt < ∞ = 1.
0
The first two conditions are the same as for H2 [0, T ], but the third
 condition is weaker
than (10). If y : R −→ R is continuous, then u(t, ω) = y W (t, ω) ∈ L2loc [0, T ], because
t 7→ y W (t, ω) is continuous with probability one and hence bounded on [0, T ] with
probability one.
Details: Chapter 7 in [Ste01].
14 A short introduction to stochastic differential equations – October 19, 2023

Notation
The process X constructed above is called the Itô integral (Itô Kiyoshi 1944) of u ∈
L2loc [0, T ] and is denoted by

Zt
X(t, ω) = u(s, ω) dW (s, ω).
0

The Itô integral over an arbitrary interval [a, b] ⊂ [0, T ] is defined by

Zb Zb Za
u(s, ω) dW (s, ω) = u(s, ω) dW (s, ω) − u(s, ω) dW (s, ω).
a 0 0

Alternative notations:
Zb Zb Zb Zb
u(s, ω) dW (s, ω) = u(s, ω) dWs (ω) = us (ω) dWs (ω) = us dWs
a a a a

Properties of the Itô integral


Lemma 6.2 Let c ∈ R and u, v ∈ L2loc [0, T ]. The Itô integral on [a, b] ⊂ [0, T ] has the
following properties:
1. Linearity:

Zb   Zb Zb
cu(s, ω) + v(s, ω) dWs (ω) = c u(s, ω) dWs (ω) + v(s, ω) dWs (ω)
a a a

with probability one.


 b 
Z
2. E  u(s, ·) dWs  = 0
a
Zt
3. u(s, ω) dWs (ω) is Ft -measurable for t ≥ a.
a
4. Itô isometry on [a, b]:
   
 Zb 2 Zb
E u(s, ·) dWs  = E u2 (s, ·) ds
a a

(cf. Theorem 4.5).


© Tobias Jahnke 2023/24 15

5. Martingale property: The Itô integral


Zt
X(t, ω) = u(s, ω) dW (s, ω).
0

of a function u ∈ H2 [0, T ] is a continuous martingale with respect to the standard


Brownian filtration; cf. Theorem 6.1. If u ∈ L2loc [0, T ], then the Itô integral is only
a local martingale; cf. Proposition 7.7 in [Ste01].

The first four properties can be shown by considering elementary functions and passing
to the limit.

7 Stochastic differential equations and the Itô-Doeblin


formula
Definition 7.1 (SDE) A stochastic differential equation (SDE) is an equation of
the form
Zt Zt
 
X(t) = X(0) + f s, X(s) ds + g s, X(s) dW (s). (11)
0 0

The solution X(t) of (11) is called an Itô process.

The last term is an Itô integral, with W (t) denoting the Wiener process. The functions
f : R×R −→ R and g : R×R −→ R are called drift and diffusion coefficients, respectively.
These functions are typically given while X(t) = X(t, ω) is unknown.
This equation is actually not a differential equation, but an integral equation! Often
people write

dXt = f (t, Xt )dt + g(t, Xt )dWt

as a shorthand notation for (11). Some people even “divide by dt” in order to make the
equation look like a differential equation, but this is more than audacious since “dWt /dt”
does not make sense.

Two special cases:


• If g t, X(t) ≡ 0, then (11) is reduced to


Zt

X(t) = X(0) + f s, X(s) ds.
0
16 A short introduction to stochastic differential equations – October 19, 2023

If X(t) is differentiable, this is equivalent to the ordinary differential equation

dX(t) 
= f t, X(t)
dt

with initial data X(0).

• For f t, X(t) ≡ 0, g t, X(t) ≡ 1 and X(0) = 0, (11) turns into


 

Zt Zt
 
X(t) = X(0) + f s, X(s) ds + g s, X(s) dW (s) = W (t) − W (0) = W (t).
| {z } | {z }
=0 0 0 =1
| {z }
=0

Computing Riemann integrals via the basic definition is usually very tedious. The fun-
damental theorem of calculus provides an alternative which is more convenient in most
cases. For Itô integrals, the situation is similar: The approximation via elementary func-
tions which is used to define the Itô integral is rarely used to compute the integral. What
is the counterpart of the fundamental theorem of calculus for the Itô integral?

Theorem 7.2 (Itô-Doeblin formula) Let Xt be the solution of the SDE

dXt = f (t, Xt )dt + g(t, Xt )dWt

∂F ∂F
and let F (t, x) be a function with continuous partial derivatives ∂t F = ∂t
, ∂x F = ∂x
,
2
and ∂x2 F = ∂∂xF2 . Then, we have for Yt := F (t, Xt ) that

1
dYt = ∂t F dt + ∂x F dXt + (∂x2 F )g 2 dt
 2 
1 2 2
= ∂t F + (∂x F )f + (∂x F )g dt + (∂x F )g dWt . (12)
2

with f = f (t, Xt ), g = g(t, Xt ), ∂x F = ∂x F (t, Xt ), and so on.

Notation. Evaluations of the derivatives of F are to be understood in the sense of, e.g.,

∂x F (s, Xs ) := ∂x F (t, x) (t,x)=(s,Xs )

and so on.
© Tobias Jahnke 2023/24 17

Remarks:

1. If y(t) is a smooth deterministic


 fuctions, then according to the chain rule the
derivative of t 7→ F t, y(t) is

d    dy(t)
F t, y(t) = ∂t F t, y(t) + ∂x F t, y(t) ·
dt dt
and in shorthand notation

dF = ∂t F dt + ∂x F dy.

The Itô-Doeblin formula can be considered as a stochastic version of the chain rule,
but the term 21 (∂x2 F ) · g 2 dt is surprising since such a term does not appear in the
deterministic chain rule.

2. Let f (t, Xt ) = 0, g(t, Xt ) = 1, Xt = Wt and suppose that F (t, x) = F (x) does not
depend on t. Then, the Itô-Doeblin formula yields for Yt := F (Wt ) that
1
dYt = F 0 (Wt )dWt + F 00 (Wt )dt
2
which is the shorthand notation for
Zt Zt
0 1
F (Wt ) = F (W0 ) + F (Ws )dWs + F 00 (Ws )ds.
2
0 0

This can be seen as a counterpart of the fundamental theorem of calculus. Again,


the last term is surprising, because for a suitable deterministic function v(t) = vt
we obtain
Zt
F (vt ) = F (v0 ) + F 0 (vs )dvs .
0

Sketch of the proof of Theorem 7.2.


(i) Equation (12) is the shorthand notation for

Zt  
1 2 2
Yt = Y0 + ∂t F (s, Xs ) + ∂x F (s, Xs ) · f (s, Xs ) + ∂x F (s, Xs ) · g (s, Xs ) ds
2
0
Zt
+ ∂x F (s, Xs ) · g(s, Xs )dWs
0
18 A short introduction to stochastic differential equations – October 19, 2023

Assume that F is twice continuously differentiable with bounded partial derivatives.


(Otherwise F can be approximated by such functions with uniform convergence on
compact subsets of [0, ∞) × R.) Moreover, assume that (t, ω) 7→ f (t, Xt (ω)) and
(t, ω) 7→ g(t, Xt (ω)) are elementary functions. (Otherwise approximate by elemen-
tary functions.) Hence, there is a partition 0 = t0 < t1 < . . . < tN = t such
that
N
X −1
f (t, Xt (ω)) = f (0, X0 (ω))1[0,t1 ] (t) + f (tn , Xtn (ω))1(tn ,tn+1 ] (t)
n=1

and the same equation with f replaced by g.


(ii) For the rest of the proof, we will use the short-hand notation
f (n) := f (tn , Xtn ), F (n) := F (tn , Xtn ),
g (n) := g(tn , Xtn ), ∂t F (n) := ∂t F (tn , Xtn )
and so on, and
∆tn = tn+1 − tn , ∆Xn = Xtn+1 − Xtn , ∆Wn = Wtn+1 − Wtn .
Since f and g are elementary functions, we have
Ztn Ztn
Xtn = X0 + f (s, Xs ) ds + g(s, Xs ) dWs
0 0
n−1
X n−1
X
= X0 + f (tk , Xtk ) ∆tk + g(tk , Xtk ) ∆Wk .
k=0
| {z } k=0
| {z }
f (k) g (k)

and hence
∆Xn = Xtn+1 − Xtn = f (n) ∆tn + g (n) ∆Wn .
(iii) Now Yt can be expressed by the telescoping sum
N
X −1 N
X −1
F (n+1) − F (n) .

Yt = YtN = Y0 + (Ytn+1 − Ytn ) = Y0 +
n=0 n=0

Applying Taylor’s theorem yields


F (n+1) − F (n)
1
= ∂t F (n) · ∆tn + ∂x F (n) · ∆Xn + ∂t2 F (n) · (∆tn )2 + ∂t ∂x F (n) · ∆tn ∆Xn
2
1 2 (n)
+ ∂x F · (∆Xn )2 + Rn (∆tn , ∆Xn )
2
with a remainder term Rn . This identity is inserted into the telescoping sum.
© Tobias Jahnke 2023/24 19

(iv) Consider the limit N −→ ∞, ∆tn −→ 0 with respect to k · kL2 (dP) . For the first two
terms, this yields
N
X −1 N
X −1 Zt
lim ∂t F (n) · ∆tn = lim ∂t F (tn , Xtn ) · ∆tn = ∂t F (s, Xs ) ds
N →∞ N →∞
n=0 n=0 0

and
N
X −1
lim ∂x F (n) · ∆Xn
N →∞
n=0
N
X −1 N
X −1
= lim ∂x F (n) · f (n) ∆tn + lim ∂x F (n) · g (n) ∆Wn
N →∞ N →∞
n=0 n=0
Zt Zt
= ∂x F (s, Xs ) · f (s, Xs ) ds + ∂x F (s, Xs ) · g(s, Xs ) dWs .
0 0

(v) Next, we investigate the “∂x2 F (n) term”. Since


 2
(∆Xn )2 = f (n) ∆tn + g (n) ∆Wn

we have
N −1 N −1
1 X 2 (n) 2 1 X 2 (n) 2
∂x F · (∆Xn ) = ∂x F · f (n) (∆tn )2 (13)
2 n=0 2 n=0
N
X −1
+ ∂x2 F (n) · f (n) g (n) ∆tn ∆Wn (14)
n=0
N −1
1 X 2
+ ∂x2 F (n) · g (n) (∆Wn )2 . (15)
2 n=0

For the right-hand side of (13), we obtain


2
 !2 
N −1 N −1
X 2 X 2
∂x2 F (n) · f (n) (∆tn )2 =E  ∂x2 F (n) · f (n) (∆tn )2  −→ 0.
n=0 L2 (dP) n=0

With the abbreviation α(n) := ∂x2 F (n) · f (n) g (n) we obtain for the right-hand side of
(14) that
2
 !2 
NX−1 N
X −1
α(n) ∆tn ∆Wn = E α(n) ∆tn ∆Wn 
n=0 L2 (dP) n=0
N
X −1 N
X −1
E α(n) α(m) ∆Wn ∆Wm ∆tn ∆tm .

=
n=0 m=0
20 A short introduction to stochastic differential equations – October 19, 2023

For n < m we have


E α(n) α(m) ∆Wn ∆Wm = E α(n) α(m) ∆Wn E (∆Wm ) = 0
 
| {z }
=0

and similar for m < n. Hence, only the terms with n = m have to be considered,
which yields
N −1 2 N −1
X X
(n)
E (α(n) )2 (∆tn )2 E (∆Wn )2 −→ 0.
  
α ∆tn ∆Wn =
n=0 n=0
| {z }
L2 (dP) =∆tn

The third term (15), however, has a non-zero limit: We show that
N −1 Z t
1 X 2 (n) (n) 2 2 1 2
∂x2 F (s, Xs ) · g(s, Xs ) ds

lim ∂x F · g (∆Wn ) =
N →∞ 2 2
n=0 0

which yields the strange additional


 term in the Itô-Doeblin formula. With the
(n) 1 2 (n) (n) 2
abbreviation β = 2 ∂x F · g we have
N −1 2
X
β (n) (∆Wn )2 − ∆tn

n=0 L2 (dP)
 !2 
N
X −1
β (n) (∆Wn )2 − ∆tn

= E 
n=0
"N −1 N −1 #
XX
β (n) β (m) (∆Wn )2 − ∆tn (∆Wm )2 − ∆tm
 
=E .
n=0 m=0

For n < m we have


E β (n) β (m) (∆Wn )2 − ∆tn (∆Wm )2 − ∆tm
  

= E β (n) β (m) (∆Wn )2 − ∆tn E (∆Wm )2 − ∆tm = 0


   
| {z }
=0

and vice versa for n > m. Hence, only the terms with n = m have to be considered,
and we obtain
N −1 2 "N −1 #
X X 2 2
β (n) (∆Wn )2 − ∆tn β (n) (∆Wn )2 − ∆tn
  
=E
n=0 L2 (dP) n=0
N −1 h i h
X
(n) 2
 2
2 i
= E β E (∆Wn ) − ∆tn →0
n=0
h i
2
because it can be shown that E ((∆Wn )2 − ∆tn ) = 2∆t2n .
© Tobias Jahnke 2023/24 21

(vi) With essentially the same arguments, it can be shown that


N −1
1 X 2 (n)
lim ∂t F · (∆tn )2 = 0
N →∞ 2
n=0
N
X −1
lim ∂t ∂x F (n) · ∆tn ∆Xn = 0
N →∞
n=0

and that the remainder term from the Taylor expansion can be neglected when the
limit is taken.

Example 1. Consider the integral


Z t
Ws dWs .
0

Xt := Wt solves the SDE with f (t, Xt ) ≡ 0 and g(t, Xt ) ≡ 1. For

F (t, x) = x2 , Yt = F (t, Xt ) = Xt2 = Wt2

the Itô-Doeblin formula


 
1 2 2
dYt = ∂t F + (∂x F )f + (∂x F )g dt + (∂x F )g dWt
2
yields
1
· 2 · 12 dt + 2Wt · 1 dWt = dt + 2Wt dWt
d(Wt2 ) = 0 + 0 +
2
1 2

=⇒ Wt dWt = d(Wt ) − dt
2
This means that
Zt Zt Zt
1 1 1 1
Ws dWs = 1 d(Ws2 ) − 1 ds = Wt2 − t.
2 2 2 2
0 0 0

Example 2. The solution of the SDE

dYt = µYt dt + σYt dWt

with constants µ, σ ∈ R and deterministic initial value Y0 ∈ R is given by


σ2 
 
Yt = exp µ − t + σWt Y0 .
2
22 A short introduction to stochastic differential equations – October 19, 2023

This process is called a geometric Brownian motion and is often used in mathematical
finance to model stock prices (see below).
The proof is left as an exercise.

Ordinary differential equations can have multiple solutions with the same initial value,
and solutions do not necessarily exist for all times. Hence, we cannot expect that every
SDE has a unique solution. As in the ODE case, however, existence and uniqueness can
be shown under certain assumptions concerning the coefficients f and g:
Theorem 7.3 (existence and uniqueness)
Let f : R+ × R −→ R and g : R+ × R −→ R be functions with the following properties:
• Lipschitz condition: There is a constant L ≥ 0 such that
|f (t, x) − f (t, y)| ≤ L|x − y|, |g(t, x) − g(t, y)| ≤ L|x − y| (16)
for all x, y ∈ R and t ≥ 0.
• Linear growth condition: There is a constant K ≥ 0 such that
|f (t, x)|2 ≤ K 1 + |x|2 , |g(t, x)|2 ≤ K 1 + |x|2
 
(17)
for all x ∈ R and t ≥ 0.
Then, the SDE
 
dX(t) = f t, X(t) dt + g t, X(t) dW (t), t ∈ [0, T ]
with deterministic initial value X(0) = X0 has a continuous adapted solution and
sup E X 2 (t) < ∞.

t∈[0,T ]

If both X(t) and X(t)


e are such solutions, then
 
P X(t) = X(t)
e for all t ∈ [0, T ] = 1.

Proof: Theorem 9.1 in [Ste01] or Theorem 4.5.3 in [KP99].

Remark: The assumptions can be weakened.

8 Extension to higher dimensions


In order to model options on several underlying assets (e.g. basket options), we have to
consider vector-valued Itô integrals and SDEs. A d-dimensional SDE takes the form
Zt m Z
X
t

Xj (t) = Xj (0) + fj (s, X(s)) ds + gjk (s, X(s)) dWk (s) (18)
0 k=1 0
(j = 1, . . . , d)
© Tobias Jahnke 2023/24 23

for d, m ∈ N and suitable functions

fj : R × Rd −→ R, gjk : R × Rd −→ R.

W1 (s), . . . , Wm (s) are one-dimensional scalar Wiener processes which are pairwise inde-
pendent. (18) is equivalent to
Zt Zt
X(t) = X(0) + f (s, X(s)) ds + g(s, X(s)) dW (s) (19)
0 0

with vectors
T
W (t) = W1 (t), . . . , Wm (t) ∈ Rm
T
f (t, x) = f1 (t, x), . . . , fd (t, x) ∈ Rd

and a matrix
 
g11 (t, x) · · · g1m (t, x)
.. .. d×m
g(t, x) =  ∈R
 
. .
gd1 (t, x) · · · gdm (t, x)

Theorem 8.1 (Multi-dimensional Itô-Doeblin formula ) Let Xt be the solution of


the SDE (19) and let F : [0, ∞) × Rd −→ Rn be a function with continuous partial
derivatives ∂t F , ∂xj F , and ∂xj ∂xk F . Then, the process Y (t) := F (t, Xt ) satisfies

dY` (t) = ∂t F` (t, Xt ) dt


d
X
+ ∂xi F` (t, Xt ) · fi (t, Xt ) dt
i=1
d d m
!
1 XX X
+ ∂xi ∂xj F` (t, Xt ) · gik (t, Xt )gjk (t, Xt ) dt
2 i=1 j=1 k=1
d
X m
X
+ ∂xi F` (t, Xt ) · gik (t, Xt ) dWk
i=1 k=1

or equivalently
 
T 1  T 2
dY` = ∂t F` + f ∇F` + tr g (∇ F` )g dt + (∇F` )T g dW (t)
2
Pm
where ∇F` is the gradient and ∇2 F` is the Hessian of F` , and where tr(A) = j=1 ajj is
the trace of a matrix A = (aij )i,j ∈ Rm×m .

Proof: Similar to the case d = m = 1.


24 A short introduction to stochastic differential equations – October 19, 2023

Final remark: Itô vs. Stratonovich. The Itô integral is not the only stochastic
integral, and the Stratonovich integral is a famous alternative. The Stratonovich integral
has the advantage that the ordinary chain rule remains valid, i.e. the additional term in
the Itô-Doeblin formula does not appear when the Stratonovich integral is used. However,
a Stratonovich integral is not a martingale, whereas an Itô integral is, and this is the reason
why typically the Itô integral is used to model risky assets in financial markets. However,
Stratonovich integrals can be transformed into Itô integrals and vice versa. See 3.1, 3.3
in [Øks03] and 3.5, 4.9 in [KP99]. If the Itô integral in (11) or (18) is replaced by the
Stratonovich integral, the properties of the SDE change. In order to distinguish between
both concepts, one should distinguish between “Itô SDEs” and “Stratonovich SDEs”.

References
[BK04] N. H. Bingham and Rüdiger Kiesel. Risk-neutral valuation. Pricing and hedging
of financial derivatives. Springer Finance. Springer, London, 4nd ed. edition,
2004.
[CT04] Rama Cont and Peter Tankov. Financial modelling with jump processes. CRC
Financial Mathematics Series. Chapman & Hall, Boca Raton, FL, 2004.
[KP99] Peter E. Kloeden and Eckhard Platen. Numerical solution of stochastic differ-
ential equations. Number 23 in Applications of Mathematics. Springer, Berlin,
corr. 3rd printing edition, 1999.
[Øks03] Bernt Øksendal. Stochastic differential equations. An introduction with applica-
tions. Universitext. Springer, Berlin, 6th ed. edition, 2003.
[Shr04] Steven E. Shreve. Stochastic calculus for finance. II: Continuous-time models.
Springer Finance. Springer, 2004.
[Ste01] J. Michael Steele. Stochastic calculus and financial applications. Number 45 in
Applications of Mathematics. Springer, New York, NY, 2001.

A Some definitions from probability theory


Definition A.1 (Probability space) The triple (Ω, F, P) is called a probability space,
if the following holds:
1. Ω 6= ∅ is a set, and F is a σ-algebra (or σ-field) on Ω, i.e. a family of subsets of
Ω with the following properties:
• ∅∈F
• If F ∈ F, then Ω \ F ∈ F

[
• If Fi ∈ F for all i ∈ N, then Fi ∈ F
i=1
© Tobias Jahnke 2023/24 25

The pair (Ω, F) is called a measurable space.


2. P : F −→ [0, 1] is a probability measure, i.e.
• P(∅) = 0 and P(Ω) = 1
• If Fi ∈ F for all i ∈ N are pairwise disjoint (i.e. Fi ∩ Fj = ∅ for i 6= j), then

! ∞
[ X
P Fi = P(Fi ).
i=1 i=1

Definition A.2 (Borel σ-algebra) If U is a family of subsets of Ω, then the σ-algebra


generated by U is
\
FU = {F : F is a σ-algebra of Ω and U ⊂ F}.

If U is the collection of all open subsets of a topological space Ω (e.g. Ω = Rd ), then


B = FU is called the Borel σ-algebra on Ω. The elements B ∈ B are called Borel sets.

For the rest of this section (Ω, F, P) is a probability space.

Definition A.3 (Measurable functions, random variables)


• A function X : Ω −→ Rd is called F -measurable if

X −1 (B) := {ω ∈ Ω : X(ω) ∈ B} ∈ F

for all Borel sets B ∈ B. If (Ω, F, P) is a probability space, then every F-measurable
function is called a random variable.
• If X : Ω −→ Rd is any function, then the σ-algebra generated by X is the
collection of all subsets

X −1 (B) for all B ∈ B.

Notation: F X = σ{X}
F X is the smallest σ-algebra where X is measurable.

Definition A.4 (Independence)


• Two sets A ⊂ Ω and B ⊂ Ω are called independent if

P(A ∩ B) = P(A)P(B).

• For n ∈ N let G1 , . . . , Gn ⊂ F be a collection of sub-σ-algebras of F. G1 , . . . , Gn are


independent if

P(A1 ∩ A2 ∩ . . . ∩ An ) = P(A1 )P(A2 ) · · · P(An ) for all Ai ∈ Gi .


26 A short introduction to stochastic differential equations – October 19, 2023

• Random variables X1 , . . . , Xn are called independent if


n
! n
\ Y
−1
P Xi−1 (Ai )

P Xi (Ai ) =
i=1 i=1

for all A1 , . . . , An ∈ B. Equivalent: The random variables X1 , . . . , Xn are indepen-


dent if the σ-algebras generated by X1 , . . . , Xn are independent.
• If X and Y are independent random variables with E(|XY |) < ∞, then E(XY ) =
E(X)E(Y ).
• A random variable X is independent of a sub-σ-algebra G ⊂ F if the σ-algebra
generated by X is independent of G.

B The Riemann-Stieltjes integral


Let f : [a, b] → R be bounded and let w : [a, b] → R be increasing, i.e. w(t) ≥ w(s) for all
t ≥ s. For a partition a = t0 < t1 < . . . < tN = b we define the lower and upper sums
N
X −1

SN := sup{f (t) : t ∈ [tn , tn+1 ]} w(tn+1 ) − w(tn )
n=0
N
X −1

SN := inf{f (t) : t ∈ [tn , tn+1 ]} w(tn+1 ) − w(tn ) .
n=0

If SN and SN converge to the same value as the partition is refined, then the Riemann-
Stieltjes integral is defined by

Zb
f (t)dw(t) := lim SN = lim SN .
N →∞ N →∞
a

For w(t) = t, this is the standard Riemann integral.


If w : [a, b] → R is not increasing but has bounded variation, then there are increasing
functions w1 : [a, b] → R and w2 : [a, b] → R such that w(t) = w1 (t) − w2 (t), and the
Riemann-Stieltjes integral can be defined by

Zb Zb Zb
f (t)dw(t) := f (t)dw1 (t) − f (t)dw2 (t).
a a a

You might also like