Intro To Sdes
Intro To Sdes
Tobias Jahnke
Karlsruher Institut für Technologie
Fakultät für Mathematik
Institut für Angewandte und Numerische Mathematik
[email protected]
1 Motivation
Financial markets trade investments into stocks of a company, commodities (e.g. oil, gold),
bonds, or derivatives. Bonds evolve in a predictive way, but stocks and commodities do
not. They are risky assets, because their value is affected by randomness. Our goal is to
model the price of such a risky asset.
Many processes in science, technology, and engineering can be described very accurately
with ordinary differential equations (ODEs)
dX
= f (t, X).
dt
Here, X = X(t) is the time-dependent solution, and f = f (t, X) is a given function
which depends on what is supposed to be modeled. But the solution of an ODE has no
randomness – it is a purely deterministic object. If we know its value X(t0 ) at a given
time t0 , then by solving the ODE we can compute its value X(t) at any later time t ≥ 0.
This is certainly not true for stocks. In order to model risky assets, we have to put some
randomness into the dynamics. A first and rather naı̈ve approach to do so is to add a
term which generates “random noise”:
dX
= f (t, X) + g(t, X)Z(t) . (1)
|dt {z } | {z
random noise
}
ordinary differential
equation
The next questions are obviously how to choose g(t, X), and how to define Z(t) in a
mathematically sound way. But even if these problems can be solved, Equation (1) is
dubious. The solution of an ODE is, by definition, a differentiable function, whereas the
chart of a stock typically looks like this:
It can be expected that such a function is continuous, but not differentiable. This raises
the question if (1) makes sense at all. So what is the right way to define a stochastic
differential equation?
The goal of these notes is to give an informal introduction to stochastic differential equa-
tion (SDEs) of Itô type. They are based on the Itô integral, which can be thought of as
an extension of the classical Riemann-Stieltjes integral to cases where both the integrand
and the integrator can be stochastic processes. Another important tool is the Itô-Doeblin
formula, which is a stochastic counterpart of the classical chain rule.
2 A short introduction to stochastic differential equations – October 19, 2023
Any probability space can be completed. Hence, we can assume that every probability
space in these notes is complete.
Example (cf. chapter 2 in [Shr04]). If we toss a coin three times, then the possible
results are:
ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8
HHH HHT HTH HTT THH THT TTH TTT
(H = heads, T = tails).
• Before the first toss, we only know that ω ∈ Ω = {ω1 , . . . , ω8 }.
• After the first toss, we know if the final result will belong to
{HHH, HHT, HT H, HT T } or to {T HH, T HT, T T H, T T T }.
These sets are “resolved by the information”. Hence, we know in which of the sets
{w1 , w2 , w3 , w4 }, {w5 , w6 , w7 , w8 }
ω is.
1
See “2.2.2 What is (Ω, F, P) anyway?” in the book [CT04] for a nice discussion of this concept.
© Tobias Jahnke 2023/24 3
ω is.
This motivates the following definition.
where B denotes the Borel σ-algebra. By definition σ{Xs } is the smallest σ-algebra
where Xs is measurable.
Remarks:
4 A short introduction to stochastic differential equations – October 19, 2023
1. If X ∼ N (µ, Σ), then E(X) = µ and Σ = (σij ) with σij = E[(Xi − µi )(Xj − µj )].
2. Standard normal distribution ⇔ µ = 0, Σ = I (identity matrix).
3. If X ∼ N (µ, Σ) and Y = v + T X for some v ∈ Rd and a regular matrix T ∈ Rd×d ,
then
Y ∼ N (v + T µ, T ΣT T ). (2)
4. Warning: In one dimension, the covariance matrix is simply a number, namely the
variance. Unfortunately, the variance is usually denoted by σ 2 instead of σ in the
literature, which is somewhat confusing.
Wt − Ws ∼ N (0, (t − s)I).
The existence of Brownian motion was first proved in a mathematically rigorous way by
Norbert Wiener in 1923.
The Wiener process will serve as the “source of randomness” in our model of the financial
market.
For τ −→ 0 the interpolation of W̃0 , W̃1 , W̃2 , . . . approximates a path of the Wiener process
(W̃N ≈ Wnτ ).
How smooth is a path of a Wiener process? For simplicity, we consider only the case
d = 1.
Conversely, if a function f has bounded total variation, then its derivative exists for al-
most all t ∈ [a, b].
Consequence: A path of the Wiener process has unbounded total variation with proba-
bility one.
6 A short introduction to stochastic differential equations – October 19, 2023
Quadratic variation
The quadratic variation of a function f : (a, b) −→ R is
N
X 2
QVa,b (f ) = lim f (tn ) − f (tn−1 ) , |PN | = max |tn − tn−1 |
N −→ ∞ n=1,...,N
n=1
|PN | → 0
QVa,b (f ) = 0.
(cf. Definition 2.2). Ft contains all information (but not more) which can be obtained by
observing Ws on the interval [0, t]. For technical reasons, however, it is more advantageous
to use an augmented filtration called the standard Brownian filtration. See pp. 50-51
in [Ste01] for details.
Now we return to the naı̈ve ansatz (1) for defining an SDE. We assume that f : R×R → R
and g : R × R → R are sufficiently regular functions. Although we do not yet know how
to choose the random noise Z(t), we formally apply the explicit Euler method:
Choose t ≥ 0 and N ∈ N, let τ = t/N , tn = nτ and define approximations Xn ≈ X(tn )
by
with initial data X0 = X(0). In the special case f (t, X) = 0, g(t, X) = 1 and X(0) = 0,
we want that Xn = W (tn ) is the Wiener process, i.e. we postulate that
!
W (tn+1 ) = W (tn ) + τ Z(tn ).
This yields
Xn+1 = Xn + τ f (tn , Xn ) + g(tn , Xn ) W (tn+1 ) − W (tn )
Now we keep t fixed and let N −→ ∞, which means that τ = t/N −→ 0. Then, (4)
should somehow converge to
Zt Zt
X(t) = X(0) + f (s, X(s)) ds + g(s, X(s))dW (s) . (5)
0 0
| {z }
(?)
Problem: We cannot define (?) as a pathwise Riemann-Stieltjes integral (cf. Appendix B).
When N −→ ∞, the sum
N
X −1
g(tn , Xn (ω)) W (tn+1 , ω) − W (tn , ω)
n=0
diverges with probability one, because a path of the Wiener process has unbounded total
variation with probability one.
Definition 4.1 Let (Ω, F, P) be a probability space, and let {Ft : t ∈ [0, T ]} be the
standard Brownian filtration. Then, we define H2 [0, T ] to be the class of functions
with a partition 0 = t0 < t1 < . . . < tN −1 < tN = T . The random variables an must be
Ftn -measurable with E(a2n ) < ∞. Here and below,
(
1 if t ∈ [c, d]
1[c,d] (t) = (6)
0 else
Lemma 4.3 (Itô isometry for elementary functions) For all elementary functions
we have
T
Z
E IT [φ]2 = E φ2 (t, ·) dt
0
© Tobias Jahnke 2023/24 9
or equivalently
with
21 12
Z ZT ZT
kφkL2 (dt×dP) = φ2 (t, ω) dt dP = E φ2 (t, ·) dt .
Ω 0 0
Proof. Since
N
X −1
φ2 (t, ω) = a20 (ω)1[0,0] (t) + a2n (ω)1(tn ,tn+1 ] (t)
n=0
we obtain
T
Z N
X −1
2
E a2n (tn+1 − tn )
E φ (t, ·) dt = (7)
0 n=0
Let u ∈ H2 [0, T ] and let (φk )k∈N be elementary functions such that
u = lim φk in L2 (dt × dP)
k→∞
for j, k −→ ∞. Hence, (IT [φk ])k is a Cauchy sequence in the Hilbert space L2 (dP). Thus,
(IT [φk ])k converges in L2 (dP), and we can define
IT [u] = lim IT [φk ].
k→∞
The choice of the sequence does not matter: If (ψk )k∈N is another sequence of elementary
functions with u = limk→∞ ψk in L2 (dt × dP), then by Lemma 4.3 we obtain for k −→ ∞
kIT [φk ] − IT [ψk ]kL2 (dP) = kIT [φk − ψk ]kL2 (dP)
= kφk − ψk kL2 (dt×dP)
≤ kφk − ukL2 (dt×dP) + ku − ψk kL2 (dt×dP) −→ 0.
Theorem 4.5 (Itô isometry) For all u ∈ H2 [0, T ] we have
kIT [u]kL2 (dP) = kukL2 (dt×dP) .
Proof: Let (φk )k∈N again be elementary functions such that u = limk→∞ φk in L2 (dt×dP);
cf. Lemma 4.4. Then
lim kφk kL2 (dt×dP) = kukL2 (dt×dP) ,
k→∞
Now the assertion follows from Lemma 4.3 by taking the limit.
© Tobias Jahnke 2023/24 11
5 Martingales
Definition 5.1 (conditional expectation) Let X be an integrable random variable,
and let G be a sub-σ-algebra of F. Then, Y is a conditional expectation of X with
respect to G if Y is G-measurable and if
Interpretation. E(X | G) is a random variable on (Ω, G, P) and hence on (Ω, F, P), too.
Roughly speaking, E(X | G) is the best approximation of X detectable by the events in
G. The more G is refined, the better E(X | G) approximates X.
Examples.
R
1. If G = {Ω, ∅}, then E(X | G) = E(X) = Ω X(ω) dP(ω).
2. If G = F, then E(X | G) = X.
3. If F ∈ F with P(F ) > 0 and
G = {∅, F, Ω \ F, Ω}
Lemma 5.2 (Properties of the conditional expectation) For all integrable ran-
dom variables X and Y and all sub-σ-algebras G ⊂ F, the conditional expectation has the
following properties:
• Linearity: E(X + Y | G) = E(X | G) + E(Y | G)
• Positivity: If X ≥ 0, then E(X | G) ≥ 0.
• Tower property: If H ⊂ G ⊂ F are sub-σ-algebras, then
E E(X | G) | H = E(X | H)
12 A short introduction to stochastic differential equations – October 19, 2023
• E E(X | G) = E(X)
• Factorization property: If Y is G-measurable and |XY | and |Y | are integrable, then
E(XY | G) = Y E(X | G)
Proof: Exercise.
Definition 5.3 (martingale) Let Xt be a stochastic process which is adapted to a filtra-
tion {Ft : t ≥ 0} of F. If
1. E(|Xt |) < ∞ for all 0 ≤ t < ∞, and
2. E(Xt |Fs ) = Xs for all 0 ≤ s ≤ t < ∞,
then Xt is called a martingale. A martingale Xt is called continuous if there is a set
Ω0 ⊂ Ω with P(Ω0 ) = 1 such that the path t 7→ Xt (ω) is continuous for all ω ∈ Ω0 .
Interpretation: A martingale models a fair game. Observing the game up to time s
does not give any advantage for future times.
Examples. It can be shown that each of the following processes is a continuous martingale
with respect to the standard Brownian filtration:
α2
2
Wt , Wt − t, exp αWt − t with α ∈ R
2
Proof: Exercise.
(i.e. the set where the process is not well-defined) could be “very large”! Fortunately,
this can be fixed:
2
We only know that countable unions of null sets have measure zero, but this is not true for uncountabel
unions.
© Tobias Jahnke 2023/24 13
Theorem 6.1 For any u ∈ H2 [0, T ] there is a process {Xt : t ∈ [0, T ]} that is a continu-
ous martingale with respect to the standard Brownian filtration Ft such that
P ω ∈ Ω : Xt (ω) = IT [1[0,t] u](ω) = 1
Notation
The process X constructed above is called the Itô integral (Itô Kiyoshi 1944) of u ∈
L2loc [0, T ] and is denoted by
Zt
X(t, ω) = u(s, ω) dW (s, ω).
0
Zb Zb Za
u(s, ω) dW (s, ω) = u(s, ω) dW (s, ω) − u(s, ω) dW (s, ω).
a 0 0
Alternative notations:
Zb Zb Zb Zb
u(s, ω) dW (s, ω) = u(s, ω) dWs (ω) = us (ω) dWs (ω) = us dWs
a a a a
Zb Zb Zb
cu(s, ω) + v(s, ω) dWs (ω) = c u(s, ω) dWs (ω) + v(s, ω) dWs (ω)
a a a
The first four properties can be shown by considering elementary functions and passing
to the limit.
The last term is an Itô integral, with W (t) denoting the Wiener process. The functions
f : R×R −→ R and g : R×R −→ R are called drift and diffusion coefficients, respectively.
These functions are typically given while X(t) = X(t, ω) is unknown.
This equation is actually not a differential equation, but an integral equation! Often
people write
as a shorthand notation for (11). Some people even “divide by dt” in order to make the
equation look like a differential equation, but this is more than audacious since “dWt /dt”
does not make sense.
Zt
X(t) = X(0) + f s, X(s) ds.
0
16 A short introduction to stochastic differential equations – October 19, 2023
dX(t)
= f t, X(t)
dt
Zt Zt
X(t) = X(0) + f s, X(s) ds + g s, X(s) dW (s) = W (t) − W (0) = W (t).
| {z } | {z }
=0 0 0 =1
| {z }
=0
Computing Riemann integrals via the basic definition is usually very tedious. The fun-
damental theorem of calculus provides an alternative which is more convenient in most
cases. For Itô integrals, the situation is similar: The approximation via elementary func-
tions which is used to define the Itô integral is rarely used to compute the integral. What
is the counterpart of the fundamental theorem of calculus for the Itô integral?
∂F ∂F
and let F (t, x) be a function with continuous partial derivatives ∂t F = ∂t
, ∂x F = ∂x
,
2
and ∂x2 F = ∂∂xF2 . Then, we have for Yt := F (t, Xt ) that
1
dYt = ∂t F dt + ∂x F dXt + (∂x2 F )g 2 dt
2
1 2 2
= ∂t F + (∂x F )f + (∂x F )g dt + (∂x F )g dWt . (12)
2
Notation. Evaluations of the derivatives of F are to be understood in the sense of, e.g.,
and so on.
© Tobias Jahnke 2023/24 17
Remarks:
d dy(t)
F t, y(t) = ∂t F t, y(t) + ∂x F t, y(t) ·
dt dt
and in shorthand notation
dF = ∂t F dt + ∂x F dy.
The Itô-Doeblin formula can be considered as a stochastic version of the chain rule,
but the term 21 (∂x2 F ) · g 2 dt is surprising since such a term does not appear in the
deterministic chain rule.
2. Let f (t, Xt ) = 0, g(t, Xt ) = 1, Xt = Wt and suppose that F (t, x) = F (x) does not
depend on t. Then, the Itô-Doeblin formula yields for Yt := F (Wt ) that
1
dYt = F 0 (Wt )dWt + F 00 (Wt )dt
2
which is the shorthand notation for
Zt Zt
0 1
F (Wt ) = F (W0 ) + F (Ws )dWs + F 00 (Ws )ds.
2
0 0
Zt
1 2 2
Yt = Y0 + ∂t F (s, Xs ) + ∂x F (s, Xs ) · f (s, Xs ) + ∂x F (s, Xs ) · g (s, Xs ) ds
2
0
Zt
+ ∂x F (s, Xs ) · g(s, Xs )dWs
0
18 A short introduction to stochastic differential equations – October 19, 2023
and hence
∆Xn = Xtn+1 − Xtn = f (n) ∆tn + g (n) ∆Wn .
(iii) Now Yt can be expressed by the telescoping sum
N
X −1 N
X −1
F (n+1) − F (n) .
Yt = YtN = Y0 + (Ytn+1 − Ytn ) = Y0 +
n=0 n=0
(iv) Consider the limit N −→ ∞, ∆tn −→ 0 with respect to k · kL2 (dP) . For the first two
terms, this yields
N
X −1 N
X −1 Zt
lim ∂t F (n) · ∆tn = lim ∂t F (tn , Xtn ) · ∆tn = ∂t F (s, Xs ) ds
N →∞ N →∞
n=0 n=0 0
and
N
X −1
lim ∂x F (n) · ∆Xn
N →∞
n=0
N
X −1 N
X −1
= lim ∂x F (n) · f (n) ∆tn + lim ∂x F (n) · g (n) ∆Wn
N →∞ N →∞
n=0 n=0
Zt Zt
= ∂x F (s, Xs ) · f (s, Xs ) ds + ∂x F (s, Xs ) · g(s, Xs ) dWs .
0 0
we have
N −1 N −1
1 X 2 (n) 2 1 X 2 (n) 2
∂x F · (∆Xn ) = ∂x F · f (n) (∆tn )2 (13)
2 n=0 2 n=0
N
X −1
+ ∂x2 F (n) · f (n) g (n) ∆tn ∆Wn (14)
n=0
N −1
1 X 2
+ ∂x2 F (n) · g (n) (∆Wn )2 . (15)
2 n=0
With the abbreviation α(n) := ∂x2 F (n) · f (n) g (n) we obtain for the right-hand side of
(14) that
2
!2
NX−1 N
X −1
α(n) ∆tn ∆Wn = E α(n) ∆tn ∆Wn
n=0 L2 (dP) n=0
N
X −1 N
X −1
E α(n) α(m) ∆Wn ∆Wm ∆tn ∆tm .
=
n=0 m=0
20 A short introduction to stochastic differential equations – October 19, 2023
and similar for m < n. Hence, only the terms with n = m have to be considered,
which yields
N −1 2 N −1
X X
(n)
E (α(n) )2 (∆tn )2 E (∆Wn )2 −→ 0.
α ∆tn ∆Wn =
n=0 n=0
| {z }
L2 (dP) =∆tn
The third term (15), however, has a non-zero limit: We show that
N −1 Z t
1 X 2 (n) (n) 2 2 1 2
∂x2 F (s, Xs ) · g(s, Xs ) ds
lim ∂x F · g (∆Wn ) =
N →∞ 2 2
n=0 0
and vice versa for n > m. Hence, only the terms with n = m have to be considered,
and we obtain
N −1 2 "N −1 #
X X 2 2
β (n) (∆Wn )2 − ∆tn β (n) (∆Wn )2 − ∆tn
=E
n=0 L2 (dP) n=0
N −1 h i h
X
(n) 2
2
2 i
= E β E (∆Wn ) − ∆tn →0
n=0
h i
2
because it can be shown that E ((∆Wn )2 − ∆tn ) = 2∆t2n .
© Tobias Jahnke 2023/24 21
and that the remainder term from the Taylor expansion can be neglected when the
limit is taken.
This process is called a geometric Brownian motion and is often used in mathematical
finance to model stock prices (see below).
The proof is left as an exercise.
Ordinary differential equations can have multiple solutions with the same initial value,
and solutions do not necessarily exist for all times. Hence, we cannot expect that every
SDE has a unique solution. As in the ODE case, however, existence and uniqueness can
be shown under certain assumptions concerning the coefficients f and g:
Theorem 7.3 (existence and uniqueness)
Let f : R+ × R −→ R and g : R+ × R −→ R be functions with the following properties:
• Lipschitz condition: There is a constant L ≥ 0 such that
|f (t, x) − f (t, y)| ≤ L|x − y|, |g(t, x) − g(t, y)| ≤ L|x − y| (16)
for all x, y ∈ R and t ≥ 0.
• Linear growth condition: There is a constant K ≥ 0 such that
|f (t, x)|2 ≤ K 1 + |x|2 , |g(t, x)|2 ≤ K 1 + |x|2
(17)
for all x ∈ R and t ≥ 0.
Then, the SDE
dX(t) = f t, X(t) dt + g t, X(t) dW (t), t ∈ [0, T ]
with deterministic initial value X(0) = X0 has a continuous adapted solution and
sup E X 2 (t) < ∞.
t∈[0,T ]
Xj (t) = Xj (0) + fj (s, X(s)) ds + gjk (s, X(s)) dWk (s) (18)
0 k=1 0
(j = 1, . . . , d)
© Tobias Jahnke 2023/24 23
fj : R × Rd −→ R, gjk : R × Rd −→ R.
W1 (s), . . . , Wm (s) are one-dimensional scalar Wiener processes which are pairwise inde-
pendent. (18) is equivalent to
Zt Zt
X(t) = X(0) + f (s, X(s)) ds + g(s, X(s)) dW (s) (19)
0 0
with vectors
T
W (t) = W1 (t), . . . , Wm (t) ∈ Rm
T
f (t, x) = f1 (t, x), . . . , fd (t, x) ∈ Rd
and a matrix
g11 (t, x) · · · g1m (t, x)
.. .. d×m
g(t, x) = ∈R
. .
gd1 (t, x) · · · gdm (t, x)
or equivalently
T 1 T 2
dY` = ∂t F` + f ∇F` + tr g (∇ F` )g dt + (∇F` )T g dW (t)
2
Pm
where ∇F` is the gradient and ∇2 F` is the Hessian of F` , and where tr(A) = j=1 ajj is
the trace of a matrix A = (aij )i,j ∈ Rm×m .
Final remark: Itô vs. Stratonovich. The Itô integral is not the only stochastic
integral, and the Stratonovich integral is a famous alternative. The Stratonovich integral
has the advantage that the ordinary chain rule remains valid, i.e. the additional term in
the Itô-Doeblin formula does not appear when the Stratonovich integral is used. However,
a Stratonovich integral is not a martingale, whereas an Itô integral is, and this is the reason
why typically the Itô integral is used to model risky assets in financial markets. However,
Stratonovich integrals can be transformed into Itô integrals and vice versa. See 3.1, 3.3
in [Øks03] and 3.5, 4.9 in [KP99]. If the Itô integral in (11) or (18) is replaced by the
Stratonovich integral, the properties of the SDE change. In order to distinguish between
both concepts, one should distinguish between “Itô SDEs” and “Stratonovich SDEs”.
References
[BK04] N. H. Bingham and Rüdiger Kiesel. Risk-neutral valuation. Pricing and hedging
of financial derivatives. Springer Finance. Springer, London, 4nd ed. edition,
2004.
[CT04] Rama Cont and Peter Tankov. Financial modelling with jump processes. CRC
Financial Mathematics Series. Chapman & Hall, Boca Raton, FL, 2004.
[KP99] Peter E. Kloeden and Eckhard Platen. Numerical solution of stochastic differ-
ential equations. Number 23 in Applications of Mathematics. Springer, Berlin,
corr. 3rd printing edition, 1999.
[Øks03] Bernt Øksendal. Stochastic differential equations. An introduction with applica-
tions. Universitext. Springer, Berlin, 6th ed. edition, 2003.
[Shr04] Steven E. Shreve. Stochastic calculus for finance. II: Continuous-time models.
Springer Finance. Springer, 2004.
[Ste01] J. Michael Steele. Stochastic calculus and financial applications. Number 45 in
Applications of Mathematics. Springer, New York, NY, 2001.
X −1 (B) := {ω ∈ Ω : X(ω) ∈ B} ∈ F
for all Borel sets B ∈ B. If (Ω, F, P) is a probability space, then every F-measurable
function is called a random variable.
• If X : Ω −→ Rd is any function, then the σ-algebra generated by X is the
collection of all subsets
Notation: F X = σ{X}
F X is the smallest σ-algebra where X is measurable.
P(A ∩ B) = P(A)P(B).
If SN and SN converge to the same value as the partition is refined, then the Riemann-
Stieltjes integral is defined by
Zb
f (t)dw(t) := lim SN = lim SN .
N →∞ N →∞
a
Zb Zb Zb
f (t)dw(t) := f (t)dw1 (t) − f (t)dw2 (t).
a a a