Stoc
Stoc
Justin Salez
April 2, 2024
2
Contents
1 Preliminaries 5
1.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Absolute and quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Levy’s characterization of Brownian motion . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Local martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Stochastic integration 17
2.1 The Wiener isometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 The Wiener integral as a process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Progressive processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 The Itô isometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 The Itô integral as a process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Generalized Itô integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Stochastic differentiation 27
3.1 Itô processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Quadratic variation of an Itô process . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Itô’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Exponential martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 An application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Disclaimer: this course is a minimal and practical introduction to the theory of stochastic
calculus, with an emphasis on examples and applications rather than abstract subtleties.
Acknowledgment: Thanks are due to Josué Corujo and Damiano De Gaspari for having
reported many typos in a preliminary version of these notes.
3
CONTENTS
4
Chapter 1
Preliminaries
Mean and covariance. A stochastic process X is called square-integrable if its coordinates are in
L2 (Ω, F , P), i.e. E[ Xt2 ] < ∞ for all t ∈ T. By Cauchy-Schwarz, this ensures the well-definiteness
of the mean m X : T → R and covariance γX : T2 → R, given by
Recall for future reference that the function γX is always symmetric in its two arguments, and
positive semi-definite: for all n ∈ N, all (t1 , . . . , tn ) ∈ Tn , and all (λ1 , . . . , λn ) ∈ Rn , we have
!
n n n
∑ ∑ λ j λk γX ( t j , tk ) = Var ∑ λj Xj ≥ 0. (1.2)
j,k =1 k =1 j =1
Perhaps surprisingly, the two simple functions m X and γX capture a considerable amount of
structural information about the process, and play a major role in many practical aspects of
signal processing and forecasting. While they are far from characterizing the law of a general
square-integrable process, they do characterize it in the important case of Gaussian processes.
5
1.1. Stochastic processes
Gaussian processes. A stochastic process X = ( Xt )t∈T is called Gaussian if every finite linear
combination of its coordinates is a gaussian random variable. More explicitly, for every n ∈ N,
every (t1 , . . . , tn ) ∈ Tn , and every (λ1 , . . . , λn ) ∈ Rn , the scalar random variable
Z : = λ 1 Xt1 + · · · + λ n Xt n (1.3)
Independance. Two stochastic processes ( Xs )s∈S and (Yt )t∈T are independent if the random
vectors ( Xs1 , . . . , Xsn ) and (Yt1 , . . . , Ytn ) are independent for every n ∈ N and every choice of the
indices (s1 , . . . , sn ) ∈ Sn and (t1 , . . . , tn ) ∈ Tn . In general, this may be quite hard to check, but
a huge simplification occurs when the two processes ( Xs )s∈S and (Yt )t∈T are jointly Gaussian,
meaning that the concatenated process (( Xs )s∈S , (Yt )t∈T ) is Gaussian. Indeed, the random vector
( Xs1 , . . . , Xsn , Yt1 , . . . , Ytn ) is then Gaussian, so its distribution is entirely determined by its mean
and covariance. In particular, the independance between ( Xs1 , . . . , Xsn ) and (Yt1 , . . . , Ytn ) reduces
to the corresponding covariances being 0. Thus, two jointly Gaussian processes X = ( Xs )s∈S
and Y = (Yt )t∈T are independent if and only if they are decorrelated in the sense that
This simplification extends to more than two processes in the obvious way.
Indistinguishability We will say that two processes X = ( Xt )t∈T and Y = (Yt )t∈T are indistin-
guishable if the random variables X and Y coincide a.-s., i.e. if the set
{ω ∈ Ω : ∃t ∈ T : Xt ̸= Yt }
∀t ∈ T, P( Xt = Yt ) = 1. (1.6)
However, the two notions coincide when T is countable, or when T = R and X, Y are (right-)
continuous. Note that (1.6) implies, in particular, that X and Y have the same law.
6
1.2. Brownian motion
(iii) Bt2 − Bt1 , . . . , Btn − Btn−1 are independent for any n ∈ N and any 0 ≤ t1 ≤ . . . ≤ tn .
Conversely, any process satisfying these three properties has the law of a Brownian motion.
Proof. Since B is a Gaussian process with m B = 0 and γB (s, t) = s ∧ t, we have Bt ∼ N (0, t) for
all t ≥ 0. Taking t = 0 yields the first claim, and we now turn to the second. The fact that Bt − Bs
is a Gaussian random variable is clear, since B is a Gaussian process. Thus, it only remains to
compute its mean and variance: by linearity of expectations and bilinearity of covariances,
E[ Bt − Bs ] = m B (t) − m B (s) = 0
Var( Bt − Bs ) = γB (t, t) + γB (s, s) − 2γB (t, s) = t − s.
Finally, the random vector ( Bt2 − Bt1 , . . . , Btn − Btn−1 ) is Gaussian, because any linear combina-
tion of its coordinates is also a linear combination of coordinates of the Gaussian process B.
Consequently, independance reduces to decorrelation. Now, for 1 ≤ j < k ≤ n, we have
Cov Bt j − Bt j−1 , Btk − Btk−1 = γ B ( t j , t k ) + γ B ( t j −1 , t k −1 ) − γ B ( t j , t k −1 ) − γ B ( t j −1 , t k )
= t j + t j −1 − t j − t j −1
= 0,
where the second line uses the fact that t j−1 ≤ t j ≤ tk−1 ≤ tk . This establishes (iii). The converse
is a good exercise, which we leave to the reader.
7
1.2. Brownian motion
We now list three elementary but important invariance properties, which confirm the robust-
ness and canonical nature of Brownian motion.
Proposition 1.2 (Invariance). Let B = ( Bt )t≥0 be a Brownian motion. Then, in each of the following
cases, the process W = (Wt )t≥0 is also a Brownian motion.
Bat
(ii) Wt := √
a
for any fixed a > 0 (invariance by scaling).
Proof. In each case, W is a Gaussian process because any linear combination of its coordinates is
also a linear combination of coordinates of the Gaussian process B. Moreover, direct computa-
tions reveal that W has the same mean and covariance as B. From this, we can already conclude
that W is distributed as a Brownian motion, but we still have to check the almost-sure continuity
of t 7→ Wt . The latter is clear in cases (i) and (ii), by composition of continuous functions. The
same argument works in (iii), except at t = 0. Thus, it only remains to check that Wt → 0
almost-surely as t → 0. Here is a short but subtle argument: if a function x : R+ → R is known
to be continuous on (0, ∞), then its convergence to 0 at 0+ can be expressed as x ∈ E, where
∞ [
∞
\ \ 1
E := | xt | ≤ .
n=1 k =1 t∈[0, 1 ]∩Q
n
k
Clearly, this set is in the product σ−field. Since W and B have the same law, we can safely
conclude that P(W ∈ E) = P( B ∈ E). But P( B ∈ E) = 1, by the trajectorial continuity of B.
Remark 1.3 (SLLN for the Browian motion). The invariance (iii) as an interesting consequence: being
a Brownian motion, the process t 7→ tB 1 1(t>0) must tend to 0 almost-surely as t → 0, which means that
t
Bt a.−s.
−−→ 0. (1.8)
t t→∞
This classical fact is known as the strong law of large numbers for the Brownian motion.
Proposition 1.3 (Markov property for the Brownian motion). Let ( Bt )t≥0 be a Brownian motion,
and let a ≥ 0 be fixed. Then, the Brownian motion ( Bt+a − Ba )t≥0 is independent of ( Bt )t∈[0,a] .
Proof. The processes ( Bt )t∈[0,a] and ( Bt+a − Ba )t≥0 are jointly Gaussian, because their coordinates
are linear combinations of coordinates of the same Gaussian process B. Thus, the claimed
independance reduces to the decorrelation property
8
1.3. Martingales
1.3 Martingales
From now onward, we turn our probability space (Ω, F , P) into a filtered probability space
(Ω, F , (Ft )t≥0 , P), by equipping it with a filtration (Ft )t≥0 . In other words, each Ft ⊆ F is a
σ −field on Ω and Fs ⊆ Ft for each 0 ≤ s ≤ t. The intuition is that Ft represents the information
that is available by time t about the various stochastic processes under consideration. For this
interpretation to be valid, we shall restrict our attention to processes X = ( Xt )t≥0 that satisfy
∀t ≥ 0, Xt is Ft − measurable. (1.10)
Such processes are said to be adapted. A simple way to ensure that a given process X is adapted
is to choose its natural filtration F X = (FtX )t≥0 , defined by
FtX := σ ( Xs : s ≤ t) . (1.11)
Of course, any larger filtration (in the coordinate-wise sense) will also work.
Remark 1.4 (Constant mean). In particular, the mean of a martingale is a constant function, i.e.
∀t ≥ 0, E[ Mt ] = E[ M0 ]. (1.12)
Property (iii) is, however, a much deeper property: as we will see, it implies that (1.12) acutally remains
valid when the deterministic time t is replaced by any “sufficiently reasonable” random time T.
Example 1.1 (Some important martingales). Let B = ( Bt )t≥0 be a Brownian motion. Then, in each of
the following cases, the process ( Mt )t≥0 is a martingale with respect to the filtration F B .
(i) Mt := Bt
(ii) Mt := Bt2 − t
θ2 t
(iii) Mt := eθBt − 2 , for any fixed θ ∈ R.
Proof. In each case, the process M is adapted because we have Mt = f t ( Bt ) for some measurable
(in fact, continuous) function f t : R → R. The integrability is standard, since Bt ∼ N (0, t).
Finally, for 0 ≤ s ≤ t, we may write Bt = Bs + ( Bt − Bs ) and use the fact that Bs is Fs −measurable
while Bt − Bs is independent of Fs (this is the Markov property for B) to obtain
E[ Bt | Fs ] = Bs + E[ Bt − Bs ] = Bs
E[ Bt2 | Fs ] = Bs2 + E[( Bt − Bs )2 ] + 2Bs E[ Bt − Bs ] = Bs2 + (t − s)
θ2
h i h i
E eθBt | Fs = eθBs E eθ ( Bt − Bs ) = eθBs + 2 (t−s) .
Rearranging these identities readily gives the desired martingale property in each case.
Remark 1.5. The same argument works for any filtration (Ft )t≥0 such that B is adapted to (Ft )t≥0 and
Bt − Bs is independent of Fs for every 0 ≤ s ≤ t. We then speak of a (Ft )t≥0 −Brownian motion.
9
1.3. Martingales
Remark 1.6 (A general formula). The above computations are special cases of a useful general formula
for conditional expectations: if G ⊆ F is any σ−field, and if X and Y are two random variables, the first
being G−measurable and the second being independent of G , then
E [ f ( X, Y ) | G] = F ( X ), where F ( x ) := E [ f ( x, Y )] ,
for any measurable f such that E[| f ( X, Y )|] < ∞. In particular, if B is a (Ft )t≥0 −Brownian motion
and 0 ≤ s ≤ t, we may apply this to X = Bs , Y = Bt − Bs , G = Fs and f ( x, y) = φ( x + y) to obtain
1
Z
z2 √
E[ φ( Bt ) | Fs ] = √ e− 2 φ( Bs + z t − s) dz, (1.13)
2π R
As we will now see, the true strength of martingales lies in the fact that the time t in the
mean conservation identity (1.12) can, under appropriate conditions, be taken to be random.
Definition 1.3 (Stopping time). A stopping time is a [0, ∞]−valued random variable T such that
∀t ≥ 0, { T ≤ t} ∈ Ft . (1.14)
The intuition is that, at any given time, one should be able to determine whether the random
time T has already occurred or not, just by looking at the information available so far (and not
in the future). For example, the first time that a (Ft )t≥0 −Brownian motion reaches the value 1 is
a stopping time, but the last time that a Brownian motion reaches the value 0 in the time interval
[0, 1] is not. In practice, all stopping times that we shall encounter will be of the following form.
Proposition 1.4 (A useful criterion). Suppose that A ⊆ R is a closed set and that X = ( Xt )t≥0 is an
adapted, continuous process. Then, the hitting time of A by X, defined as
Proof. Using the continuity of X and the fact that A is closed, one can easily check that
∞
\ [ 1
{ TA ( X ) ≤ t} = dist(Xs , A) ≤ . (1.16)
k =1 s∈[0,t]∩Q
k
Exercise 1.1 (Stopping times). Show that if S, T are stopping times, then so are S ∧ T, S ∨ T, S + T.
Theorem 1.1 (Doob’s optional stopping Theorem). If ( Mt )t≥0 is a continuous martingale and T a
stopping time, then the stopped process M T := ( Mt∧T )t≥0 is a (continuous) martingale. In particular,
∀t ≥ 0, E [ MT ∧t ] = E[ M0 ]. (1.17)
If ( Mt∧T )t≥0 is uniformly integrable and T < ∞ a.-s., then taking t → ∞ to obtain E [ MT ] = E[ M0 ].
Here is an example to illustrate the practical interest of Doob’s optional stopping Theorem.
10
1.3. Martingales
Example 1.2 (Exit time from an interval). Fix two constants a, b > 0. How long will it take, on
average, for a Brownian motion B to exit the interval I = (− a, b) ? The variable of interest T is a
stopping time, because it is the hitting time of the closed set I c by the continuous and adapted process B.
Applying Doob’s optional stopping Theorem to the continuous martingale ( Bt2 − t)t≥0 , we deduce that
E[ T ∧ t] = E BT2 ∧t ,
for all t ≥ 0. We now send t → ∞. The left-hand side tends to E[ T ] by monotone convergence. Since
the right-hand side is bounded by ( a ∨ b)2 independently of t, we already see that E[ T ] ≤ ( a ∨ b)2 . In
particular, T is a.-s. finite, and the domination BT2 ∧t ≤ ( a ∨ b)2 now allows us to obtain the equality
E[ T ] = E[ BT2 ] = pb2 + (1 − p) a2 ,
where p = P( BT = b). The second equality relies on the observation that BT takes values in the
two-element set {− a, b}, by continuity of B. To compute p, we now apply Doob’s optional stopping
theorem to the martingale ( Bt )t≥0 . We already know that T < ∞ a.-s., and that | BT ∧t | ≤ a ∨ b, so we
may safely conclude that 0 = E[ BT ] = pb − (1 − p) a, i.e. p = a/( a + b). In conclusion, the answer is
ab2 ba2
E[ T ] = + = ab.
a+b a+b
More generally, Doob’s optional stopping theorem remains true for sub-martingales or super-
martingales (defined by relaxing the equality E[ Mt |Fs ] = Ms into the inequality E[ Mt |Fs ] ≥ Ms
or E[ Mt |Fs ] ≤ Ms , respectively). Such processes arise naturally when applying a convex or
concave function to a martingale (by the conditional Jensen inequality). We end this section by
mentioning a uniform refinement of Chebychev’s inequality in the case of martingales.
Theorem 1.2 (Doob’s maximal inequality). If M is a square-integrable continuous martingale, then
!
E[ Mt2 ]
∀ a, t ≥ 0, P sup | Ms | ≥ a ≤ . (1.18)
s∈[0,t] a2
Here is a nice and useful application of Doob’s maximal inequality.
Proposition 1.5 (Limits of continuous martingales). Let ( Mn )n≥1 be continuous, square-integrable
martingales, and suppose that for each t ≥ 0 the limit Mt := limn→∞ Mtn exists in L2 . Then, the process
M = ( Mt )t≥0 (has a modification which) is a continuous square-integrable martingale.
Proof. The only real difficulty is continuity. By Doob’s maximal inequality applied to the
continuous square-integrable martingale Mn − Mm , we have for fixed t ≥ 0 and k ∈ N,
!
1 h i
P sup | Ms − Ms | ≥ 2
n m
≤ k2 E ( Mtn − Mtm )2 . (1.19)
s∈[0,t] k
Since ( Mtn )n≥1 converges in L2 , the right-hand side can be made arbitrarily small by choosing
m ∧ n large. Consequently, there is an increasing sequence ( Nk )k≥1 such that
!
Nk+1 Nk 1 1
∀k ∈ N, P sup | Ms − Ms | ≥ 2 ≤ 2. (1.20)
s∈[0,t] k k
By the Borel-Cantelli lemma, we deduce that almost-surely,
∞
∑
Nk+1
sup Ms − MsNk < ∞. (1.21)
k =1 s∈[0,t]
This ensures that almost-surely, the sequence ( M Nk )k≥1 is convergent in the space of continuous
functions equipped with the topology of uniform convergence on every compact set. But the
limit is necessarily a version of M, because for each t ≥ 0, we have Mtn → Mt in L2 .
11
1.4. Absolute and quadratic variation
The function f has finite variation if V ( f , s, t) < ∞ for every 0 ≤ s ≤ t. This is the case for most
of the functions that we are used to manipulate: for example, we invite the reader to check that
Rt
(i) If f is continuously differentiable, then V ( f , s, t) = s | f ′ (u)| du < ∞;
(iii) V ( f + g, s, t) ≤ V ( f , s, t) + V ( g, s, t).
In particular, (ii) and (iii) imply that the difference of two non-decreasing functions has finite
variation. In fact, any function of finite variation is of this form.
Proposition 1.6 (Characterization of finite variation). A function f : R+ → R has finite variation if
and only if it can be written as f = f 1 − f 2 , where f 1 , f 2 : R+ → R are non-decreasing.
Proof. The ’if’ part is trivial. Conversely, if f has finite variation, then it is immediate to check
that the functions f 1 : t 7→ V ( f , 0, t) and f 2 : t 7→ V ( f , 0, t) − f (t) are non-decreasing.
It is now time to give an example of a function that fails to have finite variation.
Example 1.3 (Variation of the Brownian motion). Let B be a Brownian motion. Fix 0 ≤ s ≤ t and
consider the subdivision (t0 , . . . , tn ) of [s, t] into n intervals of equal length, i.e. tk = s + nk (t − s). Then,
n
r
t−s
∑ | Btk − Btk−1 | =
d
(|ξ 1 | + · · · + |ξ n |) , (1.23)
k =1
n
where (ξ k )k≥0 are i.i.d. with law N (0, 1). Now, the right-hand side diverges as n → ∞ by the strong
law of large numbers, implying that P(V ( B, s, t) = +∞) = 1. By taking s, t ∈ Q+ and noting that
V ( f , s, t) ≤ V ( f , s′ , t′ ) whenever [s, t] ⊆ [s′ , t′ ], we conclude that
P (∀s, t ≥ 0, V ( B, s, t) = +∞) = 1.
Thus, Brownian motion oscillates much more than the typical functions that we are used to manipulate.
The computation appearing at (1.23) strongly suggests looking at squared increments when
measuring the variations of Brownian motion. Indeed, the quadratic version of (1.23) is
n
t−s
∑ | Bt
d
− Btk−1 |2 = | ξ 1 |2 + · · · + | ξ n |2 ,
k
(1.24)
k =1
n
and the right-hand side now tends to t − s instead of +∞, by the strong law of large numbers.
This idea of considering quadratic variation when the function of interest has infinite absolute
variation turns out to work way beyond the specific example of Brownian motion, as shown in
the following fundamental result.
12
1.4. Absolute and quadratic variation
exists in and does not depend on the subdivisions (tnk )0≤k≤n of [0, t], as long as the mesh max0≤k≤n |tnk −
L1
tnk−1 | tends to 0 as n → ∞. Moreover, the process ⟨ M⟩ = (⟨ M ⟩t )t≥0 has the following properties:
(i) ⟨ M ⟩0 = 0 ;
Proof. Let us here admit existence and continuity, and focus on the important martingale
property (iv), whose proof is instructive (Properties (i) and (ii) are trivial). Fix 0 ≤ s ≤ t, and
consider a subdivision (tnk ) of [s, t] with max0≤k≤n |tnk − tnk−1 | → 0 as n → ∞. We may then write
n h i n 2
E Mt − Ms | F s = ∑ E Mt n − Mt n | F s = ∑ E Mt k − Mt k −1 | F s ,
2 2
2 2 n n
k k −1
k =1 k =1
Taking conditional expectation w.r.t. Fs , we conclude from the above computation that
E Mt2 − Ms2 | Fs = E [⟨ M⟩t − ⟨ M⟩s |Fs ] .
Since Ms and ⟨ M⟩s are Fs -measurable, this proves the desired martingale property.
Example 1.4 (Brownian case). In the case of a Brownian motion B = ( Bt )t≥0 , the computation (1.23)
shows that ⟨ B⟩t = t, which does indeed satisfy Properties (i)-(iv) above.
Remark 1.7 (Quadratic covariation). If M, N are two continuous square-integrable martingales, then
we may define their quadratic covariation by the polarization formula:
1
⟨ M, N ⟩ := (⟨ M + N ⟩ − ⟨ M⟩ − ⟨ N ⟩) .
2
The above result implies that ⟨ M, N ⟩ is continuous, that MN − ⟨ M, N ⟩ is a martingale, and that
n
L1
∑
Mt k − Mt k −1 Ntk − Ntk−1 −−−→ ⟨ M, N ⟩t ,
n→∞
k =1
for any subdivisions (tnk )0≤k≤n of [0, t] with max0≤k≤n |tnk − tnk−1 | → 0 as n → ∞.
Remark 1.8 (Absolute vs quadratic variation). If f has finite variation and g is continuous, then
n
∑ ( f (tk ) − f (tk−1 )) ( g(tk ) − g(tk−1 )) ≤ V ( f , 0, t) max
u,v∈[0,t],|u−v|≤∆
| g(u) − g(v)|,
k =1
where ∆ = max0≤k≤n |tnk − tnk−1 |. By uniform continuity of g on compact sets (Heine’s theorem), the
right-hand side tends to 0 as ∆ → 0, i.e. ⟨ f , g⟩ = 0. Taking f = g shows that a continuous process
with finite variation must have zero quadratic variation. When applied to martingales, this implies the
following result, which considerably extends our observation about the roughness of Brownian motion.
13
1.5. Levy’s characterization of Brownian motion
Corollary 1.1 (No interesting martingale has finite variation). If M = ( Mt )t≥0 is a continuous
square-integrable martingale which has finite variations a.-s., then M is a.-s. constant in time:
P(∀t ≥ 0, Mt = M0 ) = 1. (1.25)
Proof. Fix t ≥ 0. By the above remark, we have ⟨ M ⟩t = 0. On the other hand, the orthogonality
of martingale increments and property (iv) above yield
which shows that P( Mt = M0 ) = 1. This is true for any fixed t ≥ 0, so we may take t ∈ Q+ and
invoke the continuity of M to conclude that P(∀t ≥ 0, Mt = M0 ) = 1, as desired.
Remark 1.9 (Truncation). The result actually holds without the square-integrability assumption. Indeed,
stopping preserves both the finite variation and the martingale properties, so the conclusion applies to
M Tn , where Tn = inf{t ≥ 0 : | Mt | ≥ n}, and can then be transferred to M by sending n → ∞.
Remark 1.10 (Uniqueness). The quadratic variation ⟨ M⟩ = (⟨ M⟩t )t≥0 defined at (1.30) is the only
process satisfying the properties (i ) − (iv) in Theorem 1.3. Indeed, if A = ( At )t≥0 is another process
with these properties, then ⟨ M ⟩ − A = ( M2 − A) − ( M2 − ⟨ M ⟩) is a continuous martingale (as
the difference of two continuous martingales), and has finite variation a.-s. (as the difference of two
non-decreasing processes). Thus, it must be a.-s. constantly equal to its initial value, which is zero.
θ 2 (t−s)
h i
E eiθ ( Mt − Ms ) | Fs = e− 2 ,
for any 0 ≤ s ≤ t and θ ∈ R. This formula shows that Mt − Ms has law N (0, t − s) and is
independent of Fs , as desired. To prove (1.26), we Taylor-expand F: for 0 ≤ t ≤ t′ and x, x ′ ∈ R,
′ ′ ′ ∂F
F (t , x ) − F (t, x ) = (t − t) (t, x ) + o (1)
∂t
( x ′ − x )2 ∂2 F
′ ∂F
+( x − x ) (t, x ) + (t, x ) + o (1) ,
∂x 2 ∂x2
14
1.6. Local martingales
where the o (1) term is uniformly bounded and can be made arbitrary small by choosing
| x ′ − x | + |t′ − t| small (this uses the boundedness assumption on the derivatives of F). We now
choose ( x, x ′ ) = ( Mt , Mt′ ) and take conditional expectation w.r.t. to Ft . Using E[ Mt′ − Mt |
Ft ] = 0 and E[( Mt′ − Mt )2 | Ft ] = (t′ − t), we easily obtain
∂F 1 ∂2 F
E F (t′ , Mt′ ) − F (t, Mt ) − (t′ − t) + ( t, M t ) | F t = ( t ′ − t ) o (1).
∂t 2 ∂x2
(t−s)k
Finally, fix 0 ≤ s ≤ t, set tnk = n , and apply the above identity to t = tnk−1 and t′ = tnk . By the
tower property of conditional expectation, we may replace Ft by Fs . Summing the resulting
identity over 1 ≤ k ≤ n yields
" #
n 2F
∂F 1 ∂
E F (t, Mt ) − F (s, Ms ) − ∑ (tnk − tnk−1 ) + 2
(tnk−1 , Mtnk−1 ) | Fs = ( t − s ) o (1),
k =1
∂t 2 ∂x
(i) Almost-surely, Tn ↑ ∞ as n ↑ ∞.
Of course, any continuous martingale is a local martingale (take Tn = +∞), but the converse
is far from true. In fact, a local martingale needs not even be integrable ! However, any local
martingale which is uniformly dominated is a true martingale, as we now show.
Proposition 1.7 (Uniform domination). For a local martingale M to be a martingale, it suffices that
" #
∀t ≥ 0, E sup | Ms | < ∞. (1.27)
s∈[0,t]
Proof. As any local martingale, M is adapted: it is the pointwise limit of the sequence of adapted
processes ( M Tn )n≥1 , where ( Tn )n≥1 is a localizing sequence. Moreover, the above domination
ensures that M is integrable. Finally, fix 0 ≤ s ≤ t. For all n ∈ N, we know that
To conclude that E [ Mt | Fs ] = Ms , we now take n → ∞: the random variables MTn ∧t and MTn ∧s
tend to Mt and Ms a.-s., because Tn ↑ ∞. Moreover, the domination | MTn ∧t | ≤ Z with Z :=
sups∈[0,t] | Ms | ∈ L1 allows us to safely interchange the limit and conditional expectation.
Local martingales are easy to work with, because we can always localize them to obtain
true martingales (for which we have a well-developed theory), and then transfer the desired
conclusion by taking a limit. As a consequence, many of the results that we have mentioned
about martingales extend easily to local martingales. Here are a few important examples, which
we really invite the reader to prove.
15
1.6. Local martingales
Proposition 1.8 (Doob’s optional stopping theorem for local martingales). If M is a continuous
local martingale and T a stopping time, then the stopped process M T = ( Mt∧T )t≥0 is a local martingale.
Proof. Let ( Tn )n≥1 be a localizing sequence for M, and fix n ∈ N. Since M Tn is a (continuous)
martingale and T a stopping time, the non-local version of Doob’s optional stopping Theorem
ensures that M Tn ∧T is a martingale. Thus, ( Tn )n≥1 is also a localizing sequence for M T .
Remark 1.11 (A smart localizing sequence). If M is a continuous local martingale with M = 0, then
is a stopping time for any n ∈ N (hitting time of a closed set by a continuous adapted process), so the
local version of Doob’s optional stopping Theorem ensures that M Tn is a local martingale. But M Tn is
[−n, n]−valued by construction, so the uniform domination (1.27) trivially holds, showing that M Tn
is in fact a martingale. Finally, Tn → +∞ a.-s. as n → ∞, because sups∈[0,t] | Ms | < ∞ a.-s.. In
conclusion, the sequence ( Tn )n≥1 defined by (1.29) is always a localizing sequence for M. It has the
additional advantage that the stopped martingale M Tn is bounded for every n ∈ N, which can be useful.
Proposition 1.9 (Addition of local martingales). Continuous local martingales form a vector space.
Proof. Let M and M e be two local martingales, with localizing sequences ( Tn )n≥1 and ( T en )n≥1 .
Fix n ∈ N. Since M Tn and M e Ten are continuous martingales, Doob’s optional stopping Theorem
ensures that the stopped processes M Tn ∧Tn and M e Tn ∧Ten are martingales. Thus, so is λM Tn ∧Ten +
e
µMe Tn ∧Ten , for any λ, µ ∈ R. But this shows that ( Tn ∧ T en )n≥1 is a localizing sequence for
λM + µ M, e thereby completing the proof (note that Tn ∧ T en → ∞ because Tn , T en → ∞).
Proposition 1.10 (No interesting local martingale has finite variation). If M is a continuous local
martingale which has finite variation a.-s., then P(∀t ≥ 0, Mt = M0 ) = 1.
Proof. Assume without loss of generality that M0 = 0. The smart localizing sequence (1.29)
makes the stopped process M Tn a square-integrable martingale. Moreover, we have V ( M Tn , 0, t) =
V ( M, 0, t ∧ Tn ) < ∞ for all t ≥ 0. Thus, the non-local version of the result ensures that M Tn is
a.-s. constant in time, and letting n → ∞ shows that M is a.-s. constant in time, as desired.
Proposition 1.11 (Quadratic variation). Let M be a continuous local martingale. Then, the limit
n
⟨ M ⟩t := lim
n→∞
∑ | Mt n
k
− Mtnk−1 |2
k =1
exists in probability for each t ≥ 0, and does not depend on the subdivisions (tnk )0≤k≤n of [0, t], as long as
max0≤k≤n |tnk − tnk−1 | → 0 as n → ∞. Moreover, ⟨ M⟩ is the unique process (up to modification) so that
(i) ⟨ M ⟩0 = 0 ;
Exercise 1.2 (Square-integrable local martingales). Fix t ≥ 0. Show that a continuous local
martingale M = ( Ms )s∈[0,t] is a square-integrable martingale if and only if M0 ∈ L2 and ⟨ M ⟩t ∈ L1 .
Exercise 1.3 (Local martingales are unbounded). Let M be a continuous local martingale such that
a.-s., ⟨ M ⟩∞ = ∞. Prove that a.-s., lim supt→∞ Mt = +∞ and lim inft→∞ Mt = −∞.
16
Chapter 2
Stochastic integration
We have now arrived to the main theoretical challenge of this introductory course: giving a
proper meaning to a stochastic integral of the form
Z t
It = Xu dYu , (2.1)
0
where X = ( Xt )t≥0 and Y = (Yt )t≥0 are stochastic processes. A natural idea is of course to define
this integral as a limit of Riemann sums, just as one would do if X and Y were deterministic:
n
It := lim ∑ Xtnk−1 Ytnk − Ytnk−1 , (2.2)
n→∞
k =1
where (tnk )0≤k≤n is a subdivision of [0, t] such that max1≤k≤n tnk − tnk−1 → 0 as n → ∞. Unfortu-
nately, the almost-sure convergence of these Riemann sums requires the process Y to have finite
variations, thereby excluding Brownian motions as well as any interesting martingale.
The solution found by Itô consists in compensating roughness by randomness: with a bit
of work, it will be shown that the above limit does in fact exist when taken in the L2 sense,
for a wide class of stochastic processes X, Y which includes Brownian motion. The general
construction is rather delicate, but will eventually provide us with an extremely robust theory
of stochastic integration and differentiation. As a warm-up, let us first restrict our attention to
the special case where X is deterministic and Y is a Brownian motion. In this very comfortable
setting, the integral It is known as a Wiener integral, and it enjoys remarkable properties.
17
2.1. The Wiener isometry
which establishes uniqueness. To prove existence, one would like to use (2.4) as a definition, but
there are two potential problems: it is not clear that the limit exists, and even if it does, it might
a priori depend on the particular sequence ( xn )n≥1 chosen to approximate x. Fortunately, both
issues are solved by the fact that I is a partial isometry. Indeed, for all n, m ∈ N, we have
∥ I ( xn ) − I ( xm )∥ = ∥ I ( xn − xm )∥ = ∥ xn − xm ∥ −−−−→ 0, (2.5)
n∧m→∞
because ( xn )n≥1 is convergent. Thus, ( I ( xn ))n≥1 is a Cauchy sequence, hence the limit (2.4)
exists. Moreover, the latter does not depend on the chosen approximation ( xn )n≥1 . Indeed, if
(yn )n≥1 is another sequence in V which converges to x, then
Thus, the formula (2.4) defines a continuous extension, and the latter is automatically linear and
norm-preserving, because these properties depend continuously on their arguments.
Of course, one should have I ( f ) = Bt in the basic case f = 1(0,t] . Also, as any reasonable
integral, f 7→ I ( f ) should be linear. Together, these two requirements impose that
n n
∑ ∑ ak
f = ak 1(tk−1 ,tk ] =⇒ I ( f ) = Btk − Btk−1 , (2.8)
k =1 k =0
h i n Z t
E ( I ( f ))2 = ∑ a2k (tk − tk−1 ) =
0
f 2 (u) du.
k =0
Thus, our map I is a partial isometry on the subspace E ⊆ L2 (R+ ) of all step functions:
( )
n
E := ∑ ak 1(t − ,t ] : n ∈ N, (a1 , . . . , an ) ∈ Rn , 0 = t0 ≤ t1 ≤ . . . ≤ tn
k 1 k
. (2.9)
k =1
It turns out that this set is large enough, in the precise sense that it is dense in L2 (R+ ).
Lemma 2.1 (Approximation by step functions). Any function f ∈ L2 (R+ ) is the limit in L2 (R+ ) of
the sequence of step functions ( Pn f )n≥1 , where
!
n2 Z k
∑
n
Pn f := n k −1
f (u) du 1( k , k+1 ] . (2.10)
n n
k =1 n
Moreover, when f ∈ C0c (R+ ), we can replace the (·) term by f (k/n).
Thus, the isometry extension theorem applies, leading to the following result.
18
2.2. The Wiener integral as a process
Theorem 2.2 (Wiener isometry). Let B = ( Bt )t≥0 be a Brownian motion on (Ω, F , P). Then, there
exists a unique linear and continuous map I : L2 (R+ ) → L2 (Ω, F , P) such that for all t ≥ 0,
I (1(0,t] ) = Bt . (2.11)
Moreover, I is an isometry, in the sense that for all f ∈ L2 (R+ ),
∥ I ( f )∥ L2 (Ω) = ∥ f ∥ L2 (R+ ) . (2.12)
R∞
The map I is called the Wiener isometry, and denoted I ( f ) = 0 f (u) dBu .
Remark 2.1 (Explicit formula). Let us make several important remarks about this result.
1. By construction, for any f ∈ L2 (R+ ), we have the explicit formula
Z ∞ n2
0
f (t) dBt = lim
n→∞
∑ an,k ( f ) B k+1 − B k ,
n n
(2.13)
k =1
R k/n
where an,k ( f ) = n (k−1)/n f (u) du, and where the limit is taken in the L2 sense. Moreover, in the
particular case where f ∈ Cc0 (R+ ), we can take the simpler choice an,k ( f ) = f (k/n).
2. As any distributional limit of a sequence of (centered) Gaussian random variables, the Wiener
integral is a (centered) Gaussian random variable, with variance given by (2.12):
Z ∞ Z ∞
∀ f ∈ L (R+ ),
2
f (u) dBu ∼ N 0, 2
f (u) du . (2.14)
0 0
But M f has an even more remarkable property, which justifies by itself the interest of stochastic
integration. In the following result, the underlying filtration (Ft )t≥0 can be taken to be the
natural filtration of B, or any filtration for which B is a (Ft )t≥0 −Brownian motion.
19
2.2. The Wiener integral as a process
Proof. The square-integrability is clear, by construction of the Wiener integral. Now, for any
fixed 0 ≤ s ≤ t, the function f 1(s,t] is the L2 −limit of a sequence of step functions supported on
(s, t]. In view of our construction of the Wiener integral, this implies that
Z t
f f
Mt − Ms = f (u) dBu ∈ Vect ( Bu − Bs : u ∈ [s, t]). (2.21)
s
f f f
Since B is a (Ft )t≥0 −Brownian motion, it follows that Mt is Ft −measurable and that Mt − Ms
is independent of Fs . In particular, we have
f f f f
E[ Mt − Ms |Fs ] = E[ Mt − Ms ] = 0;
Z t
f f f f f f
E[( Mt )2 − ( Ms )2 |Fs ] = E[( Mt − Ms )2 |Fs ] = E[( Mt − Ms )2 ] = f 2 (u) du,
s
for each t ≥ 0. In view of Proposition 1.5, we thus only have to establish the continuity of M f
when f is a step function. By linearity, we may further assume that f = 1(a,b] , 0 ≤ a ≤ b. But
f
then the result is trivial, since Mt = Bb∧t − Ba∧t .
Exercise 2.1. Determine the law of the process X in the following two cases
Z t
1
Xt : = (1 − t ) dBu , t ∈ (0, 1);
1−u
0 Z t
−t
Xt : = e X0 + e dBu , t ∈ R+
u
with X0 ∼ N (0, 1/2) indep. o f B.
0
20
2.3. Progressive processes
Remark 2.2 (Progressive σ−field). It is easy to check that the set P defined by
is a σ−field, and that a process ϕ is progressive if and only if the map (t, ω ) 7→ ϕt (ω ) is P −measurable.
(i) any process of the form ϕt (ω ) = X (ω )1(a,b] (t) where 0 ≤ a ≤ b, and where X is F a −measurable;
(ii) any process of the form ϕt (ω ) = 1[0,T (ω )] (t), where T is a stopping time;
function and (ϕ1 , . . . , ϕn ) are progressive processes (so sums, products, etc);
(iv) any pointwise limit ϕ = limn→∞ ϕn of a sequence (ϕn )n≥1 of progressive processes;
Proof. For (i), we write for any Borel set B ∈ B(R) which does not contain 0 (otherwise, take Bc )
which is either empty (if t ≤ a), or of the form I × A with I ∈ B[0, t]) and A ∈ Ft (if t ≥ u). For
(ii), we note that ϕ is {0, 1}−valued, and that
q∈[0,t]∩Q
For (iii) and (iv), we simply use the fact that limits and compositions of measurable functions
are measurable. Finally, (v) follows from (i),(iii) and (iv) once we observe that any continuous
adapted process ϕ is the pointwise limit of the sequence (ϕn )n≥1 , where
n2
ϕtn (ω ) : = X0 ( ω ) 1 ( t = 0 ) + ∑X k
n
( ω ) 1 ( k , k +1 ] ( t ) .
n n
(2.24)
k =0
21
2.4. The Itô isometry
By Remark 2.2,RM2 (R+ ) = L2 (R+ × Ω, P , dt ⊗ P( dω )) is a Hilbert space, with scalar product
∞
⟨ψ, ϕ⟩M2 := E 0 ψu ϕu du . This space contains every elementary random step function
where 0 ≤ s ≤ t and X ∈ L2 (Ω, Fs , P). For such a basic process, it makes sense to define
Z ∞
ϕu dBu := X (ω ) ( Bt − Bs ) . (2.27)
0
As in the Wiener case, this definition extends uniquely to the whole Hilbert space:
Theorem 2.4 (Itô integral). There exists a unique continuous and linear map I : M2 (R+ ) → L2 (Ω)
such that I (ϕ) = X ( Bt − Bs ) whenever ϕ is as in (2.26). Moreover, I is an isometry, i.e.
Z ∞
∀ψ, ϕ ∈ M (R+ ), E [ I (ψ) I (ϕ)] = E
2
ψu ϕu du . (2.28)
0
R∞
We call I the Itô integral, and write I (ϕ) = 0
ϕu dBu .
n −1
ϕt (ω ) = ∑ Xk (ω )1(t ,t + ] , k k 1
(2.29)
k =0
with n ∈ N, 0 ≤ t0 ≤ . . . ≤ tn , and Xk ∈ L2 (Ω, Ftk , P) for each 0 ≤ k < n, we are forced to set
n −1
∑ Xk
I (ϕ) := Btk+1 − Btk .
k =0
Note that I (ϕ) ∈ L2 (Ω, F , P). Moreover, for 0 ≤ j < k < n, we have
h i h i
E X j ( Bt j+1 − Bt j ) Xk ( Btk+1 − Btk ) = E X j ( Bt j+1 − Bt j ) Xk E Btk+1 − Btk = 0,
because X j , Xk , ( Bt j+1 − Bt j ) are Ftk −measurable, while Btk+1 − Btk is independent of Ftk . Thus,
n −1 n −1 ∞
h i h Z
2 i
∑E ∑E
2
E | I (ϕ)| Xk2 Xk2 ( t k +1 − t k ) = E ϕt2 dt
= Btk+1 − Btk = .
k =0 k =0 0
This identity shows that our linear map I – so far defined on random step functions – is an
isometry. To conclude, it thus only remains to show that random step functions are dense in
M2 (R+ ). For this, we again use the approximation operators ( Pn )n≥1 from Lemma 2.1, i.e.
!
n2 Z k
∑
n
( Pn ϕ)t = n k −1
ϕu du 1( k , k+1 ] (t). (2.30)
n n
k =1 n
22
2.5. The Itô integral as a process
Note that Pn ϕ is a random step function for any ϕ ∈ M2 (R+ ) and n ∈ N, because the random
Rk
variable n kn−1 ϕs ds is Fk/n −measurable (the progressivity of ϕ is used here) with
n
!2 " Z k #
Z k
n n
E n k −1
ϕu du ≤ E n k−1 ϕu2 du < +∞ (2.31)
n n
Lemma 2.1 ensures that the term inside the expectation tends a.-s. to 0 as n → ∞ (the random
function u 7→ ϕu is a.-s. in L2 (R+ ), because ∥ϕ∥M2 < ∞). Moreover, we have the domination
2
∥ Pn ϕ − ϕ∥2L2 (R+ ) ≤ ∥ Pn ϕ∥ L2 (R+ ) + ∥ϕ∥ L2 (R+ ) ≤ 4∥ϕ∥2L2 (R+ ) , (2.33)
Remark 2.3 (Important comments). Here are a few elementary but important observations.
2. In the deterministic case ϕt (ω ) = f (t) with f ∈ L2 (R+ ), we recover the Wiener integral.
4. For any ϕ, ψ ∈ M2 (R+ ), the isometry formula also reads (by polarization)
Z ∞ Z ∞ Z ∞
Cov ϕu dBu , ψu dBu = E ϕu ψu du . (2.36)
0 0 0
R∞
5. Even in the elementary case (2.26), the random variable 0
ϕu dBu has no reason to be Gaussian!
for all 0 ≤ s ≤ t. Note that the right-hand side makes sense as soon as ϕ is progressive with
Z t
E ϕu2 du < ∞.
∀t ≥ 0, (2.38)
0
The space of such processes is quite larger than M2 (R+ ), and will be denoted by M2 . The
interest of stochastic integration is essentially contained in the following fundamental result.
23
2.5. The Itô integral as a process
Theorem 2.5 (Itô martingale). For any ϕ ∈ M2 , the process Mϕ = ( Mt )t≥0 defined by
ϕ
Z t
ϕ
Mt := ϕu dBu (2.39)
0
Proof. Let us first consider the case of an elementary random step function ϕt (ω ) = X (ω )1(a,b] (t),
with 0 ≤ a ≤ b and X ∈ L2 (Ω, F a , P). By definition, we then have for t ≥ 0,
ϕ X ( Bb∧t − Ba ) if a ≤ t
Mt = X ( Bb∧t − Ba∧t ) = (2.41)
0 else.
The continuity of Mϕ is clear from the first expression, and the adaptedness and square-
integrability easily follow from the second expression. Moreover, for 0 ≤ s ≤ t, we have
ϕ ϕ X ( Bb∧t − Bs∨a ) if s ≤ b and a ≤ t
Mt − Ms = (2.42)
0 else.
In either case, we easily find
h i
E Mt − Ms | F s ∨ a = 0
ϕ ϕ
(2.43)
Z t Z t
ϕ 2
E Mt − Ms 2
ϕu2 du.
ϕ
| Fs∨ a = X 1(a,b] (u) du = (2.44)
s s
holds whenever the elementary random step functions ϕ, ϕ e are equal or have disjoint support.
Now, if ϕ an arbitrary random step functions, then ϕ is a linear combination of elementary
random step functions with disjoint supports, so the above computations show that Mϕ is a
t
square-integrable martingale with ⟨ Mϕ ⟩t = 0 ϕu2 du. Finally, this extends to any ϕ ∈ M2 by
R
with ϕn = Pn ϕ, because (ω, s) 7→ ϕsn (ω )1(0,t] (s) converges to (ω, s) 7→ ϕs (ω )1(0,t] (s) in M2 (R+ ).
Remark 2.4 (Quadratic covariation). By polarization, we have for all ϕ, ψ ∈ M2 and all t ≥ 0,
Z t
⟨ M ϕ , M ψ ⟩t = ϕu ψu du. (2.47)
0
Remark
Rt 2.5 (Two differences with the Wiener integral). Unlike the Wiener case, the random variable
ϕ dBu has, in general, no reason to be Gaussian, and no reason to be independent of Fs !
s u
24
2.6. Generalized Itô integral
almost-surely. Note that this new space is much larger than M2 : in particular, it contains every
continuous adapted process ! Now, fix ϕ ∈ M2LOC and n ∈ N, and consider the stopping time
Z t
2
Tn := inf t ≥ 0 : ϕu du ≥ n , (2.49)
0
Rt
which is the hitting time of the closed set [n, ∞) by the continuous adapted process t 7→ 0
ϕu2 du.
Thanks to Proposition 2.2, the truncated process
The fact that Mt∧Tn = Mtn shows that M is a continuous local martingale. Let us sum this up.
Theorem 2.6 (Generalized Itô integral). For any ϕ ∈ M2LOC , the process Mϕ = ( Mt )t≥0 defined by
ϕ
Z t
ϕ
∀t ≥ 0, Mt := ϕu dBu ,
0
Proposition 2.3 (Stochastic dominated convergence). Fix t ≥ 0. In order to ensure the convergence
Z t Z t
P
ϕun dBu −−−→ ϕu dBu , (2.52)
0 n→∞ 0
25
2.6. Generalized Itô integral
(ii) (domination): for all u ∈ [0, t] and n ∈ N, |ϕun | ≤ Ψu a.-s., with Ψ ∈ M2LOC .
n Rt o
Proof. For k ∈ N, let Tk := inf t ≥ 0 : 0 ψu2 du ≥ k . Then the isometry formula in M2 yields
"Z #
Tk ∧t Z Tk ∧t
Z 2 Tk ∧t
E ϕun dBu − ϕu dBu = E (ϕun − ϕu )2 du −−−→ 0, (2.53)
0 0 0 n→∞
The first term can be made arbitrarily small by choosing k large enough, because Tk ↑ ∞ a.-s..
The second term can then be made arbitrarily small by choosing n large enough, by (2.53).
Corollary 2.1. (Approximation of the generalized Itô integral). If ϕ is continuous and adapted, then
n −1 Z t
P
∑ϕ tnk B tnk+1 −B tnk −−−→
n→∞ 0
ϕu dBu ,
k =0
for every t ≥ 0 and any subdivision (tkn )0≤k≤n of [0, t] with max0≤k<n |tnk+1 − tnk | → 0 as n → ∞.
−1 n
Proof. Apply the above theorem with ϕtn = ∑nk= 0 ϕtk 1(tk ,tk+1 ] ( t ) and Ψt = supu∈[0,t] | ϕu |.
In the next chapter, we will compute stochastic integrals explicitly. Here is an example.
Example 2.1 (Brownian against brownian). For all t ≥ 0, we have
Z t
1 2
Bu dBu = Bt − t . (2.54)
0 2
Proof. Fix t ≥ 0. Recall that for any continuous adapted process ϕ, we have
n −1 Z t
P
∑ϕ tnk B tnk+1 −B tnk −−−→
n→∞ 0
ϕu dBu ,
k =0
Adding up those two lines, and observing that 2a(b − a) + (b − a)2 = b2 − a2 , we arrive at
n −1 Z t
P
∑ Bt2n − Bt2n
k +1 k
−−−→ 2
n→∞ 0
Bu dBu + t.
k =0
But the left-hand side is a telescopic sum, which equals Bt2 independently of n. Thus,
Z t
Bt2 = 2 Bu dBu + t,
0
almost-surely, as desired.
26
Chapter 3
Stochastic differentiation
This process is well-defined as soon as ϕ ∈ M2LOC , and it is always a continuous local martingale.
On the other hand, we of course also know how to integrate a stochastic process ψ = (ψt )t≥0
against the Lebesgue measure, resulting in the classical integral
Z t
t 7→ ψu du. (3.2)
0
This process is well-defined, adapted, and continuous as soon as ψ belongs to the space M1LOC of
progressive processes satisfying almost-surely
Z t
∀t ≥ 0 |ψu | du < ∞. (3.3)
0
Note that M2LOC ⊆ M1LOC , (Cauchy-Schwartz) and that both spaces contain, in particular, every
continuous and adapted process. Also keep in mind that those two integrals are very different:
the process (3.1) is always a local martingale, while (3.2) has a.-s. finite variation ! We will now
combine those two kinds of processes to construct the main object of stochastic calculus.
Definition 3.1 (Itô process). An Itô process is a stochastic process X = ( Xt )t≥0 of the form
Z t Z t
∀t ≥ 0, X t = X0 + ϕu dBu + ψu du, (3.4)
0 0
with X0 ∈ F0 , ψ ∈ M1LOC and ϕ ∈ M2LOC . The two integrals are called the martingale term and the drift
term, respectively. Instead of (3.4), we will often use the more convenient differential notation
Remark 3.1 (Linearity). Itô processes form a vector space: if X, Y are Itô processes, and λ, µ ∈ R, then
Z = λX + µY is of course an Itô process, and the martingale terms and drift terms behave linearly:
27
3.1. Itô processes
Note that an Itô process is always continuous, and adapted. Giving names to the two parts ϕ
and ψ of the decomposition (3.4) suggests that they are unique. This is indeed the case.
Proposition 3.1 (Uniqueness of the drift and martingale terms). If X simultaneously satisfies
e ∈ M2LOC and ψ, ψ
for some ϕ, ϕ e ∈ M1LOC , then ϕ, ϕ
e are indistinguishable, and so are ψ, ψ.
e
Now, the left-hand side is a continuous local martingale, while the right-hand side has finite
variation almost-surely. Thus, both sides are null a.-s. In particular, the nullity of the left-hand
side implies that of its quadratic variation, i.e. a.-s.,
Z t
∀t ≥ 0, (ϕu − ϕeu )2 du = 0. (3.6)
0
It is a classical exercise on Lebesgue integrals that this forces the integrand to be null a.-e..
Remark 3.2 (Itô martingales). If X is as in (3.4), then it follows from the previous chapters that
For this reason, determining the martingale term ϕ and the drift term ψ of an Itô process is essential.
Remark 3.3 (Integral against an Itô process). Let X be as in (3.4), and let Y be a continuous and
adapted process. Then, clearly, Yϕ ∈ M2LOC and Yψ ∈ M1LOC , so it makes sense to define for t ≥ 0,
Z t Z t Z t
Yu dXu := Yu ϕu dBu + Yu ψu du.
0 0 0
By the dominated convergence Theorem (and its stochastic version), we then have
n −1 Z t
P
∑ Ytnk ( Xtnk+1 − Xtnk ) −−−→
n→∞ 0
Yu dXu , (3.7)
k =0
along any subdivisions (tnk )0≤k≤n of [0, t] with ∆n := max0≤k<n (tnk+1 − tnk ) → 0 as n → ∞.
Example 3.1 (Squared Brownian motion). Our Brownian motion B is of course an Itô process (take
ϕ = 1 and ψ = 0). A less trivial example is B2 , for which the computation in Example 2.1 shows that
Note the presence of the quadratic variation term dt, compared to the classical formula dXt2 = 2Xt dXt
that one would have in the case of a continuously differentiable process t 7→ Xt . We will come back to it!
28
3.2. Quadratic variation of an Itô process
Then for any subdivision (tnk )0≤k≤n of [0, t] with ∆n := max0≤k<n (tnk+1 − tnk ) → 0 as n → ∞, we have
n −1 2 Z t
P
∑ Xtnk+1 − Xtnk −−−→
n→∞ 0
ϕu2 du. (3.9)
k =0
We will naturally denote the right-hand side by ⟨ X ⟩t , and call t 7→ ⟨ X ⟩t the quadratic variation of X.
More generally, if X et = ϕ
e is another Itô process, with dX et dBt + ψ et dt, then
n −1 Z t
P
∑ Xtnk+1 − Xtnk e tn − X
X k +1
e tn
k
−−−→ ⟨ X, X
n→∞
e ⟩t =
0
eu du.
ϕu ϕ
k =0
We call t 7→ ⟨ X, X
e ⟩t the quadratic covariation of X and X,
e and write d⟨ X, X
e ⟩t = ϕt ϕ
et dt.
Proof. We only have to prove the first claim, since the second follows by polarization.
Rt 2 Now,
when ψ = 0, X is a continuous local martingale with quadratic variation t 7→ 0 ϕu du, so the
claim is Proposition 1.11. The general case then easily follows from the observation that
n −1
a.−s.
∑ (Yt +n
k 1
− Ytnk )( Ztnk+1 − Ztnk ) ≤ V (Y, 0, t) sup | Zu − Zv | −−−→ 0,
n→∞
k =0 u,v∈[0,t],|u−v|≤∆n
Proposition 3.2 (Stochastic integration by parts). If X, Y are Itô processes, then so is ( Xt Yt )t≥0 , and
d( Xt Yt ) = Xt dYt + Yt dXt + d⟨ X, Y ⟩t .
Proof. Fix t ≥ 0. Consider subdivisions (tnk )0≤k≤n of [0, t] with max |tnk+1 − tnk | → 0. We have
n −1
Xt Yt − X0 Y0 = ∑ Xtnk+1 Ytnk+1 − Xtnk Ytnk
k =0
n −1 n −1 n −1
= ∑ Xt n
k
Ytnk+1 − Ytnk + ∑ Ytnk Xtnk+1 − Xtnk + ∑ ( Xtnk+1 − Xtnk )(Ytnk+1 − Ytnk ),
k =0 k =0 k =0
Remark 3.4 (Itô term). Here again, note the extra covariation term d⟨ X, Y ⟩t , compared to the classical
integration-by-parts formula d( Xt Yt ) = Xt dYt + Yt dXt for continuously differentiable trajectories.
29
3.3. Itô’s Formula
in the elementary case where Yu = 1(0,s] (u), for any s ≥ 0. By linearity, this immediately extends
to the case where Y is a random step function. By density, it further extends to the case where Y
is any continuous and adapted process. In particular, we may take Yu = F ′′ ( Xu ), and this suffices
to yield (3.13), since max0≤k<n | F ′′ ( XUkn ) − F ′′ ( Xtnk )| → 0 a.-s. (by uniform continuity).
The Itô formula admits a multivariate extension, allowing to combine several Itô processes.
Theorem 3.2 (Multivariate extension). Let F ∈ C 2 (Rd ), and let X 1 , . . . , X d be Itô processes. Then,
the process t 7→ F ( Xt1 , . . . , Xtd ) is again an Itô process, with
d
∂F 1 1 d d ∂2 F
dF ( Xt1 , . . . , Xtd ) = ∑ ∂xi t( X , . . . , X d
t ) dX i
t +
2 ∑ ∑ ∂xi x j (Xt1 , . . . , Xtd ) d⟨Xi , X j ⟩t .
i =1 i =1 j =1
Proof. The argument is the same as above, with the multivariate version of Taylor expansion:
d
∂F 1 d d ∂2 F
F (y) = F ( x ) + ∑ ( x )(yi − xi ) + ∑ ∑ (z)(yi − xi )(y j − x j ),
i =1
∂xi 2 i=1 j=1 ∂xi x j
valid for any x, y ∈ Rd and some z ∈ Conv( x, y). More precisely, we here take x = ( Xt1n , . . . , Xtdn )
k k
and y = ( Xt1n , . . . , Xtdn ), and then sum over 0 ≤ k ≤ n − 1, and finally let n → ∞.
k +1 k +1
30
3.3. Itô’s Formula
Remark 3.6 (Special cases). Here are a few special cases of interest.
2. One can add a time dependency by letting one of the Itô processes be t 7→ t. For example,
∂F ∂F 1 ∂2 F
dF (t, Xt ) = (t, Xt ) dXt + (t, Xt ) dt + (t, Xt ) d⟨ X ⟩t ,
∂x ∂t 2 ∂x2
for any F ∈ C 2 (R2 ). Note that t 7→ t does not contribute to the last term (finite variation).
In view of Remark 3.2, Itô’s Formula is extremely useful for finding martingales. Here is a
typical exercise to familiarize with this powerful technique.
Exercise 3.1 (Practicing with Itô’s Formula). In each of the following cases, compute the stochastic
differential of the process M = ( Mt )t≥0 , and deduce that it is a martingale.
1. Mt := Bt2 − t.
2. Mt := Bt3 − 3tBt .
θ2
6. Mt := cos(θBt )e 2 t , with θ ∈ R.
Exercise 3.2 (A typical exam problem). Let B be a Brownian motion and let F denote the cumulative
distribution function of a standard Gaussian random variable. Consider the process
B
Mt : = F √ t , t ∈ [0, 1). (3.14)
1−t
5. How did we guess that the Gaussian cumulative distribution function was a good choice for F?
31
3.4. Exponential martingales
Z t
1 t 2
Z
ϕ
Zt := exp ϕu dBu − ϕ du , (3.15)
0 2 0 u
is a local martingale.
Rt 1
Rt
Proof. Applying Itô’s formula with F = exp and Xt = 0
ϕu dBu − 2 0
ϕu2 du yields
1
= e Xt dXt + e Xt d⟨ X ⟩t
ϕ
dZt
2
Xt 1 2 1
= e ϕt dBt − ϕt dt + e Xt ϕt2 dt
2 2
Xt
= e ϕt dBt .
ϕ
Since Z0 = 1, we obtain
Z t
ϕ ϕ
∀t ≥ 0, Zt = 1+ Zu ϕu dBu ,
0
and the result follows from the general properties of the Itô integral.
For reasons that will become clear in the next section, it is very important to ensure that
Zϕ is really a martingale, and not just a local martingale. This holds, for example, when ϕ is
deterministic: the result was proved in the section on Wiener’s integral, and can be recovered
by checking that the process ( Zu ϕu )u∈[0,t] appearing in the stochastic differential of Z ϕ is in M2 .
ϕ
The following criterion is much more general, but its proof is considerably more involved.
Theorem 3.3 (Novikov’s Condition). Fix T ∈ R+ . For ( Zt )t∈[0,T ] to be a martingale, it suffices that
ϕ
Z T
1
E exp 2
ϕ du < ∞. (3.16)
2 0 u
In the proof of this theorem, we will use the following elementary lemma.
Lemma 3.3 (Non-negative local martingales). If M = ( Mt )t∈[0,T ] is a non-negative local martingale,
then it is a super-martingale. Moreover, it is a martingale if and only if E[ MT ] ≥ E[ M0 ].
Proof. Let ( Tn )n≥1 be a localizing sequence, and let 0 ≤ s ≤ t ≤ T. For each n ∈ N, we have
E[ MTn ∧t | Fs ] = MTn ∧s .
We now take n → ∞. Since Tn → ∞ a.-s., the conditional version of Fatou’s Lemma yields
E [ Mt | F s ] ≤ Ms , (3.17)
which shows that M is a super-martingale. Now, suppose that E[ MT ] ≥ E[ M0 ]. This forces the
non-increasing map t 7→ E[ Mt ] to be constant on [0, T ]. In particular, for any 0 ≤ s ≤ t ≤ T, the
non-negative variable Ms − E[ Mt |Fs ] has zero mean, hence is null a.-s..
32
3.5. Girsanov’s Theorem
Proof of Theorem 3.3. Fix 0 < ε < 1. It is straightforward to check that for all 0 ≤ t ≤ T,
1 1+1 ε Rt 1+ε ε
(1− ε ) ϕ 1− ε2 ϕ 1
ϕu2 du
Zt = Zt e2 0 .
h i h 1 R T 2 i 1+ε ε
Since E ZT ∧Tn = 1, the right-hand side is bounded by E e 2 0 ϕu du
ϕ
independently of n.
(1− ε ) ϕ 1
This means that the sequence ( ZT ∧Tn )n≥1 is bounded in L p with p = 1− ε2
> 1. Thus,
h i h i
(1− ε ) ϕ (1− ε ) ϕ
E ZT = lim ZT ∧Tn = 1. (3.19)
n→∞
h i
(1− ε ) ϕ p
In particular, E ( ZT ) ≥ 1, so using (3.18) with T instead of T ∧ Tn yields
h i 1+1 ε h 1 R T 2 i 1+ε ε
1 ≤ E ZT E e 2 0 ϕu du
ϕ
.
Theorem 3.4 (Girsanov’s Theorem). Fix ϕ ∈ M2LOC and T ≥ 0, and suppose that the associated
ϕ
exponential local martingale ( Zt )t∈[0,T ] is a martingale. Then, the formula
h i
Q ( A ) : = E ZT 1 A
ϕ
∀ A ∈ FT , (3.20)
defines a probability measure on (Ω, F T ), under which the process X = ( Xt )t∈[0,T ] defined by
Z t
Xt := Bt − ϕu du, (3.21)
0
is a (Ft )t∈[0,T ] −Brownian motion (restricted to the time horizon [0, T ]).
Let us make a number of important comments before proceeding to the proof of this result.
1. In practice, the most efficient way to verify the assumption is to check Novikov’s Criterion:
h 1 RT 2 i
E e 2 0 ϕu du < +∞. (3.22)
2. The statement that Q is a probability measure follows from the fact that ZT ≥ 0 and E[ ZT ] = 1.
ϕ ϕ
33
3.5. Girsanov’s Theorem
3. By linearity and density, (3.20) implies that for any F T −measurable non-negative variable Y,
" #
Q
h i
Q Y
E [Y ] = E YZT , and E [Y ] = E
ϕ
ϕ , (3.23)
ZT
where EQ is expectation under Q. This is useful for transferring computations between Q and P.
4. It follows from the martingale property that we also have EQ [Y ] = E[YZt ] for any t ∈ [0, T ] and
ϕ
5. The practical interest of Girsanov’s Theorem is as follows: on our original probability space
(Ω, F , P), computing expectations about X is rather complicated. Moving to (Ω, F , Q) turns
X into a much simpler object, for which such computations become doable. One can then try to
transfer the results back to (Ω, F , P), using Formula (3.23). Practical examples will follow...
7. The result admits the following T = ∞ version: suppose that the whole process Z ϕ = ( Zt )t≥0
ϕ
is a martingale (this is the case, for example, when (3.22) holds for each t ≥ 0). Then, for each
t ≥ 0, the formula (3.20) can be used to define a probability measure Qt on (Ω, Ft ). Moreover, for
0 ≤ s ≤ t, the restriction of Qt to Fs coincides with Qs , since
Qt ( A) = E[ Zt 1 A ] = E[ Zs 1 A ] = Qs ( A).
ϕ ϕ
∀ A ∈ Fs ,
where the second equality uses the martingale property. Thus, (Qt )t≥0 is a consistent family of
probability measures, and the Kolmogorov extension theorem guarantees that these measures are
all restrictions of a common probability measure Q∞ defined on F∞ := σ t≥0 Ft . On the
S
probability space (Ω, F∞ , Q∞ ), the whole process X = ( Xt )t≥0 is then a Brownian motion.
Proof. Let us first settle the special case where (ϕt )t∈[0,T ] satisfies
Z T
ϕu2 du ≤ C, (3.24)
0
for some deterministic C < ∞. In particular, for any θ ∈ R, the shifted process ϕ + θ satisfies
ϕ+θ
Novikov’s Criterion, so ( Zt )t∈[0,T ] is a martingale. Thus, for any 0 ≤ s ≤ t ≤ T, we have,
h i
ϕ+θ ϕ+θ
E Zs |Ft = Zs .
ϕ+θ θ2 u
= Zu eθXu −
ϕ
Since Zu 2 for all u ≥ 0, we may rewrite this as follows:
θ2
h i
E Zs eθ (Xt −Xs ) |Fs = e 2 (t−s) Zs .
ϕ ϕ
In view of Item 4 in the above remark, this may be further rewritten in terms of Q as follows:
θ2
h i
EQ eθ (Xt −Xs ) 1 A = e 2 (t−s) Q( A).
34
3.6. An application
Taking A = Ω shows that Xt − Xs has distribution N (0, t − s), and the product form shows
that Xt − Xs is independent of Fs . But this holds for any 0 ≤ s ≤ t ≤ T, and X is continuous
by construction, so ( Xt )t∈[0,T ] is indeed a (Ft )t∈[0,T ] −Brownian motion under Q. To address the
general case, we of course introduce the truncated process ϕtn := ϕt 1Tn ≥t , where
Z t
2
Tn := inf t ≥ 0 : ϕu du ≥ n .
0
Since ϕn satisfies the condition (3.24) (with C = n), the first part of the proof implies that
−θ 2
h n n
i h i
E Zt∧Tn eiθ (Xt −Xs ) 1 A = e 2 (t−s) E Zs∧Tn 1 A ,
ϕ ϕ
R t∧ T
for all 0 ≤ s ≤ t ≤ T, θ ∈ R and A ∈ Fs , where Xtn := Bt − 0 n ϕu du. Now, as n → ∞,
we have Tn ↑ +∞, hence Xtn → Xt a.-s. Moreover, by Scheffe’s Lemma, the a.-s. convergence
ϕ ϕ
Zt∧Tn → Zt also holds in L1 , for all t ∈ [0, T ]. Thus, we may pass to the limit and obtain
−θ 2
h i h i
E Zt eiθ (Xt −Xs ) 1 A = e 2 (t−s) E Zs 1 A .
ϕ ϕ
3.6 An application
Here is a good technical exercise to practice with Girsanov’s Theorem.
Rt
Exercise 3.3 (Joint distribution of Bt2 , 0 Bs2 ds). In order to understand the joint distribution of Bt2
Rt
and 0 Bs2 ds for fixed t ≥ 0, one would naturally like to compute the following Laplace transform:
b2 t 2
Z
Lt ( a, b) := E exp − aBt2 − Bu du ( a, b, t ≥ 0).
2 0
1. Compute Lt ( a, 0) for all a, t ≥ 0. We henceforth assume that b > 0.
2. Find ψ ∈ M1loc so that the process Z defined below is a local martingale:
Z t Z t
Zt := exp −b Bu dBu − ψu du .
0 0
Rt
3. Express Zt in terms of the random variables Bt and 0 Bu2 du only, and deduce that
b bt
Lt ( a, b) = E Zt exp − a Bt 2
exp − .
2 2
35
3.6. An application
36
Chapter 4
4.1 Motivations
An ordinary differential equation (abbreviated as ODE) is an equation involving an unknown
function x = ( xt )t≥0 and its derivative. Such equations are massively used to model physical
processes whose evolution in any infinitesimal time-interval [t, t + dt] only depends on the
considered time t, and the current value xt . In differential notation, they take the form
where the function b : R+ × R → R describes the underlying dynamics. The classical Picard-
Lindelöf theorem gives a simple sufficient condition on b for such an equation to be well-posed,
in the sense that it admits a unique solution that starts from each possible initial condition x0 .
(i) (uniform spatial Lipschitz continuity): there exists a constant κ < ∞ such that
Rt
(ii) (local integrability in time): 0
|b(u, 0)| du < ∞ for each t ≥ 0.
Then, for each z ∈ R, there exists a unique measurable function x = ( xt )t≥0 satisfying
Z t
∀t ≥ 0, xt = z + b(u, xu ) du. (4.2)
0
In many interesting situations however, the dynamics is intrinsically chaotic and unpre-
dictable: it is then natural to add a random external influence to the above evolution equation,
typically driven by a Brownian motion B = ( Bt )t≥0 . This naturally leads to the following
stochastic analogue of (4.1):
37
4.2. Existence and uniqueness
where B = ( Bt )t≥0 is a given (Ft )t≥0 −Brownian motion on our filtered space (Ω, F , (Ft )t≥0 , P),
and b : R+ × R → R and σ : R+ × R → R are two given measurable functions. By a solution
to the stochastic equation (4.3), we will mean a progressive process X = ( Xt )t≥0 defined on
(Ω, F , (Ft )t≥0 , P), satisfying (b(t, Xt ))t≥0 ∈ M1LOC , (σ(t, Xt ))t≥0 ∈ M2LOC , and
Z t Z t
∀t ≥ 0, X0 + b(s, Xs ) ds + σ(s, Xs ) dBs = Xt . (4.4)
0 0
Note that X is then necessarily an Itô process (in particular, it is continuous and adapted).
Theorem 4.2 (Existence and uniqueness). Let b, σ : R+ × R → R be measurable functions such that
(i) (Uniform Lipschitz continuity in space): there exists κ < ∞ such that for all (t, x, y) ∈ R+ × R2 ,
Then, for each initial condition ζ ∈ L2 (Ω, F0 , P), there exists a unique (up to indistinguishability)
solution X to the SDE (4.3) satisfying X0 = ζ. Moreover, we have X ∈ M2 .
As in the proof of the Picard-Lindelöf theorem, the uniqueness uses Gronwall’s lemma:
Lemma 4.1 (Gronwall’s Lemma). Let ( xt )t∈[0,T ] be a non-negative function in L1 ([0, T ]) satisfying
Z t
∀t ∈ [0, T ], xt ≤ α + β xu du,
0
n −1
( βt)k ( βt)n
∀t ∈ [0, T ], xt ≤ α ∑ k!
+κ
n!
,
k =0
38
4.2. Existence and uniqueness
Summing these two estimates and using (u + v)2 ≤ 2u2 + 2v2 , we obtain that
h i Z t h i
2
E ( Xt − Yt ) 1(t≤Tn ) 2
≤ 2κ (t + 1) E ( Xu − Yu )2 1(u≤Tn ) du. (4.5)
0
h i
∀t ≥ 0, E ( Xt − Yt )2 1(t≤Tn ) = 0.
Proof of existence in Theorem 4.2. Let us construct a sequence of approximate solutions ( X n )n≥0
in M2 by setting X 0 ≡ 0 and then inductively, for each n ∈ N and each t ∈ R+ ,
Z t Z t
Xtn+1 := ζ + σ (u, Xun ) dBu + b (u, Xun ) du. (4.6)
0 0
Let us first check that this makes sense. Clearly, X 0 ≡ 0 ∈ M2 . Now, fix n ∈ N, and suppose we
know that X n ∈ M2 . Then both integrands in (4.6) are in M2 , because our assumptions imply
MT CTn tn−1
2
E Xt − Xt
n +1 n
≤
( n − 1) !
RT h 2 i
where MT := 0 E Xu1 − Xu0 du. This is more than enough to guarantee that
∞
∑ ∥X n+1 − X n ∥M ([0,T]) 2 < ∞, (4.7)
n =0
and hence that the sequence ( X n )n≥0 is convergent in the Hilbert space M2 ([0, T ]). But this is
true for each T ≥ 0, so the limit is an element X ∈ M2 , and passing to the limit in (4.6) yields
Z t Z t
Xt = ζ + σ (u, Xu ) dBu + b (u, Xu ) du, (4.8)
0 0
39
4.3. Practical examples
Remark 4.1 (Useful comments). There are several things to note about the theorem.
1. Condition (ii) is only used in the proof of existence: it is not needed for the uniqueness part.
2. Thanks to Condition (i), the measurability of b, σ only needs to be checked w.r.t. the time variable.
3. If Conditions (i) and (ii) are only satisfied on some restricted time horizon [0, T ], then the above
proof still yields existence and uniqueness of a restricted solution X = ( Xt )t∈[0,T ] .
4. In particular, the conclusion of the theorem remains valid if the Lipschitz constant κ = κt appearing
in Condition (i) is allowed to depend on time t, as long as supt∈[0,T ] κt < ∞ for each T ≥ 0.
5. Condition (ii) trivially holds in the homogeneous case where the coefficients b(t, x ), σ(t, x ) do not
depend on the time t and more generally, when they depend continuously on t.
6. Our construction shows that for each t ≥ 0, Xt is σ (ζ, ( Bs )0≤s≤t ) −measurable. In other words,
Xt = Ψt ζ, ( Bs )s∈[0,t] , (4.9)
for some measurable Ψt : R × R[0,t] → R which only depends on t and the coefficients b and σ.
Exercise 4.1 (Dependence in the initial condition). Consider the homogeneous SDE
dXt = b( Xt ) dt + σ( Xt ) dBt ,
where b, σ are Lipschitz functions. Let X, Y be the solutions starting from X0 , Y0 ∈ L2 (Ω, F0 , P), and
b( Xt ) − b(Yt ) σ( Xt ) − σ(Yt )
ψt := 1(Xt ̸=Yt ) , and ϕt := 1(Xt ̸=Yt ) (4.10)
Xt − Yt Xt − Yt
ϕ2
Z t Z t
Xt − Yt = ( X0 − Y0 ) exp ψu − u du + ϕu dBu . (4.11)
0 2 0
3. Prove the existence of a constant c ∈ (0, ∞) such that for all t ≥ 0 and all p ≥ 1,
2
E [( Xt − Yt ) p ] ≤ E [( X0 − Y0 ) p ] ecp t .
with b, σ ∈ (0, ∞). This is an homogeneous SDE with b(t, x ) = −bx and σ(t, x ) = σ. The above
theorem ensures existence and uniqueness, for any initial condition ζ ∈ L2 (Ω, F0 , P). In fact,
Z t
∀t ≥ 0, Xt = ζe−bt + σ e−b(t−u) dBu , (4.13)
0
40
4.3. Practical examples
σ2
d
Xt −−→ N 0, ,
t→∞ 2b
independently of the choice of the initial condition ζ. Thus, the process X mixes: as time increases,
the 2
random variable Xt progressively forgets its initial distribution, and approaches a limit N 0, σ2b .
2
This observation suggests to start directly from ζ ∼ N 0, σ2b . Recall that ζ is always assumed to
be F0 −measurable, hence independent of B. The right-hand side of (4.13) then belongs to the Gaussian
space vect(ζ, B), so X is a Gaussian process. Its mean is clearly 0, and its covariance is easily computed:
σ2 −b|t−s|
∀s, t ≥ 0, Cov( Xs , Xt ) = e . (4.14)
2b
A continuous centered Gaussian process with this covariance is called an Ornstein-Uhlenbeck process.
Since its covariance only depends on |t − s|, its distribution is invariant under time-translation:
d
∀ a ≥ 0, ( X t + a ) t ≥0 = ( X t ) t ≥0 .
This stationarity is a key property, which explains the importance of the Ornstein-Uhlenbeck process.
Example 4.2 (Geometric Brownian motion). Fix ζ ∈ L2 (Ω, F0 , P), σ, µ ∈ R, and consider the SDE
dXt = Xt (σ dBt + µ dt) , X0 = ζ.
This homogeneous SDE with coefficients b(t, x ) = µx and σ(t, x ) = σx has a unique solution X =
( Xt )t≥0 . In light of what the answer would be in the deterministic case σ = 0, it is natural to expect a
solution of the form Xt = ζeYt , where Y is an Itô process. By Itô’s formula,
Yt
Yt 1
d ζe = ζe dYt + d⟨Y ⟩t .
2
σ2
Writing dYt = ϕt dBt + ψt dt and identifying, we see that ϕt = σ and ψt = µ − 2 , yielding finally
2
σBt + µ− σ2 t
Xt := ζe .
This important process is known as the geometric Brownian motion.
Example 4.3 (Black-Scholes process). Fix ζ ∈ L2 (Ω, F0 , P) and two deterministic measurable
bounded functions σ = (σt )t≥0 and µ = (µt )t≥0 . Consider the inhomogeneous SDE
dXt = Xt (σt dBt + µt dt) , X0 = ζ.
The coefficients b(t, x ) = µt x and σ (t, x ) = σt x satisfy the Lipshitz and square-integrability conditions,
thanks to the boundedness of σ, µ. Thus, there is a unique solution X. As in the above example, it is
natural to expect that Xt = ζeYt , where Y is an Itô process. Writing dYt = ϕt dBt + ψt dt, we have
Yt
Yt 1 2
d ζe = ζe ϕt dBt + ϕt dt + ψt dt .
2
σ2
Thus, it suffices to choose ϕt = σt and ψt = µt − 2t , yielding finally
σu2
Z t Z t
Xt = ζ exp σu dBu + µu − du .
0 0 2
This natural generalization of the geometric Brownian motion is known as a Black-Scholes process.
41
4.4. Markov property for diffusions
Exercise 4.2 (Change of variable). Show that there is a unique Itô process X = ( Xt )t≥0 satisfying
q
Xt
q
dXt = 2
1 + Xt + dt + 1 + Xt2 dBt , X0 = x,
2
then determine it explicitly by means of the change of variable Yt = argsh( Xt ).
for some deterministic, measurable map Ψt : R × R[0,t] → R which only depends on the coef-
ficients b and σ. The following result shows that for any fixed time s ≥ 0, the shifted process
Xe = ( Xt+s )t≥0 solves a SDE with the same coefficients b and σ, but initialized with X
e 0 = Xs and
driven by the shifted Brownian motion B e = ( Bu+s − Bs )u≥0 (which is independent of Fs ).
is clear in the elementary case where ϕt (ω ) = X (ω )1]u,v] (t) with X ∈ L2 (Ω, Fu , P). By linearity
and density, it then extends to any ϕ ∈ M2LOC . In particular, we can write
Z t+s Z t+s
Xt + s = Xs + b( Xu ) du + σ( Xu ) dBu
s s
Z t Z t
= Xs + b( Xu+s ) du + σ ( Xu + s ) d B
eu .
0 0
driven by the Brownian motion B e on the filtered space (Ω, F , (Fet )t≥0 , P), where Fet := Ft+s .
e t = Ψ t ( Xs , ( B
But this precisely means that X eu )u∈[0,t] ) for all t ≥ 0.
A direct consequence of Theorem 4.3 (along with the general Remark 1.6 about conditional
expectation) is the following fundamental formula, which is known as the Markov property.
Given f ∈ L∞ (R) and t ≥ 0, we define a new function Pt f : R → R by
∀ x ∈ R, ( Pt f )( x ) := E[ f ( Xtx )], (4.17)
where X x denotes the unique solution to the SDE (4.15) with initial condition ζ = x.
42
4.5. Generator of a diffusion
Lemma 4.2 (Properties of the semi-group). The family ( Pt )t≥0 enjoys the following properties:
Proof. The linearity of Pt readily follows from that of E[·]. Moreover, for any f ∈ L∞ (R) and
t ≥ 0, the function Pt f : x 7→ E[ f ( Xtx )] = E[( f ◦ Ψt )( x, B)] is measurable by Fubini’s theorem
(because f ◦ Ψt is bounded and measurable). Since ∥ Pt f ∥∞ ≤ ∥ f ∥∞ , the first assertion is proved.
The fact that P0 = Id is clear since X0x = x. To prove that Pt+s = Pt ◦ Ps for all s, t ≥ 0, we write
where the second identity is obtained by taking expectations in (4.18). Finally, if f is continuous
and bounded, then so is the random function t 7→ f ( Xtx ), hence also its expectation. The last
three assertions are consequences of the identity (4.11), and the details are left to the reader.
Pt = etL , (4.19)
where L = limt→0 Pt −t P0 . Of course, at this level, those identities do not really make any sense
and the analogy is purely formal. Nevertheless, this motivates the following fruitful definition.
Definition 4.1 (Generator). The generator of the semi-group ( Pt )t≥0 is the linear operator L defined by
( Pt f )( x ) − f ( x )
∀ x ∈ R, ( L f )( x ) := lim , (4.20)
t →0 t
for all f ∈ L∞ (R) such that the limit exists. Those functions form a vector space denoted, by Dom( L).
43
4.5. Generator of a diffusion
The interest of this definition is summed up in the following important result. We recall that
Cc2 (R)
denotes the vector space of twice continuously differentiable functions f : R → R with
compact support, and that this space is dense in L∞ (R).
1
∀ x ∈ R, L f ( x ) = b( x ) f ′ ( x ) + σ2 ( x ) f ′′ ( x ). (4.21)
2
d
∀ x ∈ R, ( Pt f )( x ) = ( Pt L f )( x ) = ( LPt f )( x ). (4.22)
dt
1
d f ( Xt ) = f ′ ( Xt ) dXt + f ′′ ( Xt ) d⟨ X ⟩t
2
1
= b( Xt ) f ′ ( Xt ) + σ2 ( Xt ) f ′′ ( Xt ) dt + f ′ ( Xt )σ( Xt ) dBt
2
′
= ( R f )( Xt ) dt + f ( Xt )σ( Xt ) dBt .
Now, the fact that f ∈ Cc (R) easily ensures that the function u 7→ ( R f )( Xu ) and u 7→
σ ( Xu ) f ′ ( Xu ) are in M1 and M2 , respectively. In particular, the right-hand side of (4.24) is
a square-integrable martingale. Taking expectations and using Fubini’s theorem, we deduce that
Z t
E [ f ( Xt )] = f ( X0 ) + E [( R f )( Xu )] du.
0
But for each fixed x ∈ R, the function u 7→ Pu R f ( x ) is continuous (Lemma 4.2), so we conclude
that t 7→ ( Pt f )( x ) is continuously differentiable on R+ , with derivative
∂
( Pt f )( x ) = Pt R f ( x ). (4.26)
∂t
Since the left-hand side equals limh→0 1h (( Ph Pt f )( x ) − Pt ( x )), we see that Pt f ∈ Dom( L) and
that LPt f = Pt R f . Finally, taking t = 0 shows that L f = R f , and the proof is complete.
44
4.6. Connection with partial differential equations
Remark 4.2 (Extension). Since X is square-integrable (Theorem 4.2), the definition of Pt f given at
(4.17) actually extends to all measurable functions f with quadratic growth (i.e. | f ( x )| ≤ K (1 + x2 ) for
some constant K), and the general properties of ( Pt )t≥0 established in Lemma 4.2 remain valid for this
extended definition. In particular, our proof of Theorem 4.4 carries over to the case where f ∈ Cb2 (R),
meaning that f is twice continuously differentiable with f , f ′ , f ′′ being bounded.
Remark 4.3 (Martingales). The family of martingales given by (4.23) is of course extremely useful for
studying the process X. For example, in the case of Brownian motion, we obtain that
Z t
1
t 7→ f ( Bt ) − f ′′ ( Bu ) du,
2 0
is a martingale for any f ∈ Cb2 (R), generalizing the two simple cases ( Bt )t≥0 and ( Bt2 − t)t≥0 .
Remark 4.4 (Fokker-Planck equation). Writing ht for the distribution of Xt , the equation (4.22) gives
σ2 (z) ′′
Z
d
Z
′
f (z)ht ( dz) = b(z) f (z) + f (z) ht ( dz), (4.27)
dt R R 2
for any f ∈ Cb2 . Integrating by parts in the sense of distributions, we obtain Fokker-Planck’s equation:
∂ht 1 ′′
= L⋆ ht , where L⋆ h = hσ2 − (bh)′ . (4.28)
∂t 2
In particular, the equation L⋆ h = 0 characterizes those distributions h which are stationary.
2
Exercise 4.3 (Stationary distribution). Check that the Gaussian distribution N (0, σ2b ) is stationary for
the Langevin equation dXt = −bXt dt + σ dBt with b, σ > 0.
where b, σ : R → R are Lipschitz functions. Now, fix f ∈ L∞ (R), and consider the PDE
2 2
∂v (t, x ) = b( x ) ∂v (t, x ) + σ ( x ) ∂ v (t, x )
∂t ∂x 2 ∂x2 (4.30)
v(0, x ) = f ( x ),
Theorem 4.5 (Connection). The evolutions (4.29) and (4.30) are linked as follows:
45
4.6. Connection with partial differential equations
The interest of this connection between SDEs and PDEs is two-fold: on the one hand, one can
use tools from PDE theory to understand the distribution of Xtx . Conversely, the probabilistic
representation (4.31) offers a practical way to numerically solve the PDE (4.30), by simulation.
Here is an important extension, which incorporates a zero-order term into our PDE.
Theorem 4.6 (Feynman-Kac’s formula). Let v ∈ C 1,2 (R+ × R) be a bounded solution to the PDE
2 2
∂v (t, x ) = −h( x )v(t, x ) + b( x ) ∂v (t, x ) + σ ( x ) ∂ v (t, x )
∂t ∂x 2 ∂x2 (4.32)
v(0, x ) = f ( x ),
where f , h : R → R are measurable, with h non-negative. Then, we have the representation
h Rt x
i
∀(t, x ) ∈ R+ × R, v(t, x ) = E f ( Xtx )e− 0 h(Xu ) du . (4.33)
Proof. Fix T ≥ 0 and x ∈ R, and consider the stochastic process ( Mt )t∈[0,T ] defined by
Rt
h( Xux ) du
Mt := Vt e− 0 , with Vt := v( T − t, Xtx ).
As above, using Ito’s formula and the fact that v solves the PDE (4.32), we find for t ∈ [0, T ],
∂v
dVt = h( Xtx )Vt dt + σ( Xtx ) ( T − t, Xtx ) dBt ,
∂x
Consequently,
Rt
h( Xux ) du
dMt = e− 0 ( dVt − h( Xtx )Vt dt)
Rt x ∂v
= e− 0 h(Xu ) du σ( Xtx ) ( T − t, Xtx ) dBt .
∂x
Thus, ( Mt )t∈[0,T ] is a local martingale. Since it is bounded (v is bounded and h ≥ 0), it is in fact a
true martingale. In particular, E[ MT ] = E[ M0 ], i.e.
h RT x
i
E f ( XTx )e− 0 h(Xu ) du = v( T, x ).
Since T ≥ 0 and x ∈ R are arbitrary, the claim is proved.
46