0% found this document useful (0 votes)
43 views

APA Lecture Notes Part2

This document outlines the key concepts of conditional expectation and martingales. It begins by defining conditional expectation with respect to events, discrete random variables, continuous random variables, and sub-sigma fields. It then introduces martingales, including their basic definitions, stopping times, Doob's optional stopping theorem, the reflection principle, martingale transforms, and Doob's decomposition theorem. Finally, it covers martingale convergence theorems, including preliminary definitions, the first and second versions of the theorem, consequences, proof, and generalization to sub- and supermartingales, as well as Azuma's and McDiarmid's inequalities.

Uploaded by

Sheela Shivaram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

APA Lecture Notes Part2

This document outlines the key concepts of conditional expectation and martingales. It begins by defining conditional expectation with respect to events, discrete random variables, continuous random variables, and sub-sigma fields. It then introduces martingales, including their basic definitions, stopping times, Doob's optional stopping theorem, the reflection principle, martingale transforms, and Doob's decomposition theorem. Finally, it covers martingale convergence theorems, including preliminary definitions, the first and second versions of the theorem, consequences, proof, and generalization to sub- and supermartingales, as well as Azuma's and McDiarmid's inequalities.

Uploaded by

Sheela Shivaram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Advanced Probability and Applications (Part II)

Olivier Lévêque, IC–LTHI, EPFL


(with special thanks to Simon Guilloud for the figures)

July 31, 2018

Contents
1 Conditional expectation Week 9 2
1.1 Conditioning with respect to an event B ∈ F . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Conditioning with respect to a discrete random variable Y . . . . . . . . . . . . . . . . . . 2
1.3 Conditioning with respect to a continuous random variable Y ? . . . . . . . . . . . . . . . 3
1.4 Conditioning with respect to a sub-σ-field G . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Conditioning with respect to a random variable Y . . . . . . . . . . . . . . . . . . . . . . 4

2 Martingales Week 10 6
2.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Doob’s optional stopping theorem, version 1 Week 11 . . . . . . . . . 8
2.4 The reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Martingale transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Doob’s decomposition theorem Week 12 . . . . . . . . . . . . . 12

3 Martingale convergence theorems 12


3.1 Preliminary: Doob’s martingale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 The martingale convergence theorem: first version . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Consequences of the theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Proof of the theorem Week 13 . . . . . . . . . . . . . . . . 15
3.5 The martingale convergence theorem: second version . . . . . . . . . . . . . . . . . . . . . 17
3.6 Generalization to sub- and supermartingales . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7 Azuma’s and McDiarmid’s inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1
1 Conditional expectation Week 9
Let (Ω, F, P) be a probability space.

1.1 Conditioning with respect to an event B ∈ F

The conditional probability of an event A ∈ F given another event B ∈ F is defined as

P(A ∩ B)
P(A|B) = , provided that P(B) > 0
P(B)

Notice that if A and B are independent, then P(A|B) = P(A); the conditioning does not affect the
probability. This fact remains true in more generality (see below).
In a similar manner, the conditional expectation of an integrable random variable X given B ∈ F is
defined as
E(X 1B )
E(X|B) = , provided that P(B) > 0
P(B)

1.2 Conditioning with respect to a discrete random variable Y

Let us assume that the random variable Y (is F-measurable and) takes values in a countable set C.

P(A|Y ) = ϕ(Y ), where ϕ(y) = P(A|{Y = y}), y∈C


E(X|Y ) = ψ(Y ), where ψ(y) = E(X|{Y = y}), y∈C

If X is also a discrete random variable with values in C, then

E(X 1{Y =y} ) X E(1{X=x}∩{Y =y} ) X


E(X|Y ) = ψ(Y ), where ψ(y) = = x = x P({X = x}|{Y = y})
P({Y = y}) P({Y = y})
x∈C x∈C

Important remark. ϕ(y) and ψ(y) are functions, while ϕ(Y ) = P(A|Y ) and ψ(Y ) = E(X|Y ) are
random variables. They both are functions of the outcome of the random variable Y , that is, they are
σ(Y )-measurable random variables.
Example. Let X1 , X2 be two independent dice rolls and let us compute E(X1 + X2 |X2 ) = ψ(X2 ), where

E((X1 + X2 ) 1{X2 =y} )


ψ(y) = E(X1 + X2 |{X2 = y}) =
P({X2 = y})
E(X1 1{X2 =y} ) + E(X2 1{X2 =y} ) (a) E(X1 ) E(1{X2 =y} ) + E(y 1{X2 =y} )
= =
P({X2 = y}) P({X2 = y})
E(X1 ) P({X2 = y}) + y P({X2 = y})
= = E(X1 ) + y
P({X2 = y})

where the independence assumption between X1 and X2 has been used in equality (a). So finally (as
one would expect), E(X1 + X2 |X2 ) = E(X1 ) + X2 , which can be explained intuitively as follows: the
expectation of X1 conditioned on X2 is nothing but the expectation of X1 , as the outcome of X2 provides
no information on the outcome of X1 (X1 and X2 being independent); on the other hand, the expectation
of X2 conditioned on X2 is exactly X2 , as the outcome of X2 is known.

2
1.3 Conditioning with respect to a continuous random variable Y ?

In this case, one faces the following problem: if Y is a continuous random variable, P({Y = y}) = 0 for
all y ∈ R. So a direct generalization of the above formulas to the continuous case is impossible at first
sight. A possible solution to this problem is to replace the event {Y = y} by {y ≤ Y < y + ε} and to
take the limit ε → 0 for the definition of conditional expectation. This actually works, but also leads to a
paradox in the multidimensional setting (known as Borel’s paradox). In addition, some random variables
are neither discrete, nor continuous. It turns out that the cleanest way to define conditional expectation
in the general case is through σ-fields.

1.4 Conditioning with respect to a sub-σ-field G

In order to define the conditional expectation in the general case, one needs the following proposition.
Proposition 1.1. Let (Ω, F, P) be a probability space, G be a sub-σ-field of F and X be an integrable
random variable on (Ω, F, P). There exists then an integrable random variable Z such that
(i) Z is G-measurable,
(ii) E(ZU ) = E(XU ) for any random variable U G-measurable and bounded.
Moreover, if Z1 , Z2 are two integrable random variables satisfying (i) and (ii), then Z1 = Z2 a.s.

Definition 1.2. The above random variable Z is called the conditional expectation of X given G and is
denoted as E(X|G). Because of the last part of the above proposition, it is defined up to a negligible set.
Definition 1.3. One further defines P(A|G) = E(1A |G) for A ∈ F.

Remark. Notice that as before, both P(A|G) and E(X|G) are (G-measurable) random variables.

Properties. The above definition does not give a computation rule for the conditional expectation; it is
only an existence theorem. The properties listed below will therefore be of help for computing conditional
expectations. The proofs of the first two are omitted, while the next five are left as (important!) exercises.
- Linearity. E(c X + Y |G) = c E(X|G) + E(Y |G) a.s.

- Monotonicity. If X ≥ Y a.s., then E(X|G) ≥ E(Y |G) a.s. (so if X ≥ 0 a.s., then E(X|G) ≥ 0 a.s.)

- E(E(X|G)) = E(X).

- If X is independent of G, then E(X|G) = E(X) a.s.

- If X is G-measurable, then E(X|G) = X a.s.

- If Y is G-measurable and bounded (or if Y is G-measurable and both X and Y are square-integrable;
what actually matters here is that the random variable XY is integrable), then E(XY |G) = E(X|G) Y a.s.

- If H is a sub-σ-field of G, then E(E(X|H)|G) = E(E(X|G)|H) = E(X|H) a.s. (in other words, the
smallest σ-field always “wins”: this property is also known as the“towering property” of conditional ex-
pectation)

3
Some of the above properties are illustrated below with an example.
Example. Let Ω = {1, . . . , 6}, F = P(Ω) and P({ω}) = 61 for ω = 1, . . . , 6 (the probability space of the
die roll). Let also X(ω) = ω be the outcome of the die roll and consider the two sub-σ-fields:

G = σ({1, 3}, {2}, {5}, {4, 6}) and H = σ({1, 3, 5}, {2, 4, 6})

Then E(X) = 3.5,


 
2 if ω ∈ {1, 3} or ω = 2 3 if ω ∈ {1, 3, 5}
E(X|G)(ω) = and E(X|H)(ω) =
5 if ω ∈ {4, 6} or ω = 5 4 if ω ∈ {2, 4, 6}

So E(E(X|G)) = E(E(X|H)) = E(X). Moreover,


 1
3 (2 + 2 + 5) = 3 if ω ∈ {1, 3, 5}
E(E(X|G)|H)(ω) = 1 = E(X|H)(ω)
3 (2 + 5 + 5) = 4 if ω ∈ {2, 4, 6}

and 
3 if ω ∈ {1, 3} or ω = 5
E(E(X|H)|G)(ω) = = E(X|H)(ω)
4 if ω ∈ {4, 6} or ω = 2

The proposition below (given here without proof) is an extension of some of the above properties.
Proposition 1.4. Let G be a sub-σ-field of F, X, Y be two random variables such that X is independent
of G and Y is G-measurable, and let ϕ : R2 → R be a Borel-measurable function such that E(|ϕ(X, Y )|) <
+∞. Then
E(ϕ(X, Y )|G) = ψ(Y ) a.s., where ψ(y) = E(ϕ(X, y))

This proposition has the following consequence: when computing the expectation of a function ϕ of two
independent random variables X and Y , one can always divide the computation in two steps by writing

E(ϕ(X, Y )) = E(E(ϕ(X, Y )|G)) = E(ψ(Y ))

where ψ(y) = E(ϕ(X, y)) (this is actually nothing but Fubini’s theorem).

Finally, the proposition below (given again without proof) shows that Jensen’s inequality also holds for
conditional expectation.
Proposition 1.5. Let X be a random variable, G be a sub-σ-field of F and ψ : R → R be Borel-
measurable, convex and such that E(|ψ(X)|) < +∞. Then

ψ(E(X|G)) ≤ E(ψ(X)|G) a.s.

In particular, |E(X|G)| ≤ E(|X||G) a.s.

1.5 Conditioning with respect to a random variable Y

Once the definition of conditional expectation with respect to a σ-field is set, it is natural to define it for
a generic random variable Y :

E(X|Y ) = E(X|σ(Y )) and P(A|Y ) = P(A|σ(Y ))

Remark. Since any σ(Y )-measurable random variable may be written as g(Y ), where g is a Borel-
measurable function, the definition of E(X|Y ) may be rephrased as follows.
Definition 1.6. E(X|Y ) = ψ(Y ), where ψ : R → R is the unique Borel-measurable function such that
E(ψ(Y ) g(Y )) = E(Xg(Y )) for any function g : R → R Borel-measurable and bounded.

4
In two particular cases, the function ψ can be made explicit, which allows for concrete computations.
- If X, Y are two discrete random variables with values in a countable set C, then
X
E(X|Y ) = ψ(Y ), where ψ(y) = x P({X = x}|{Y = y}), y ∈ C
x∈C

which matches the formula given in Section 1.2. The proof that it also matches the theoretical definition
of conditional expectation is left as an exercise.
- If X, Y are two jointly continuous random variables with joint pdf pX,Y , then
Z
pX,Y (x, y)
E(X|Y ) = ψ(Y ), where ψ(y) = x dx, y ∈ R
R pY (y)
R
and pY is the marginal pdf of Y given by pY (y) = R pX,Y (x, y) dx, assumed here to be strictly positive.
Let us check that the random variable ψ(Y ) is indeed the conditional expectation of X given Y according
to Definition 1.6: for any function g : R → R Borel-measurable and bounded, one has
Z
E(ψ(Y ) g(Y )) = ψ(y) g(y) pY (y) dy
R
Z Z 
pX,Y (x, y)
= x dx g(y) pY (y) dy
pY (y)
ZRZ R
= x g(y) pX,Y (x, y) dx dy = E(Xg(Y ))
R2


Finally, the conditional expectation satisfies the following proposition when X is a square-integrable
random variable.
Proposition 1.7. Let X be a square-integrable random variable, G be a sub-σ-field of F and G be the lin-
ear subspace of square-integrable and G-measurable random variables. Then the conditional expectation
of X with respect to G is equal a.s. to the random variable Z satisfying

Z = argmin E((X − Y )2 )
Y ∈G

In other words, this is saying that Z is the orthogonal projection of X onto the linear subspace G of
square-integrable and G-measurable random variables (the scalar product considered here being hX, Y i =
E(XY )), as illustrated bleow:

In particular,
E((X − Z) U ) = 0, for any U ∈ G
which is nothing but a variant of condition (ii) in the definition of conditional expectation.

5
2 Martingales Week 10

2.1 Basic definitions

Let (Ω, F, P) be a probability space.


Definition 2.1. A filtration is a sequence (Fn , n ∈ N) of sub-σ-fields of F such that Fn ⊂ Fn+1 , ∀n ∈ N.

Example. Let Ω = [0, 1], F = B([0, 1]), Xn (ω) = nth decimal of ω, for n ≥ 1. Let also F0 = {∅, Ω},
Fn = σ(X1 , . . . , Xn ). Then Fn ⊂ Fn+1 , ∀n ∈ N.
Definitions 2.2. - A discrete-time process (Xn , n ∈ N) is said to be adapted to the filtration (Fn , n ∈ N)
if Xn is Fn -measurable ∀n ∈ N.
- The natural filtration of a process (Xn , n ∈ N) is defined as FnX = σ(X0 , . . . , Xn ), n ∈ N. It represents
the available amount of information about the process at time n.

Remark. A process is adapted to its natural filtration, by definition.


Let now (Fn , n ∈ N) be a given filtration.
Definition 2.3. A discrete-time process (Mn , n ∈ N) is a martingale with respect to (Fn , n ∈ N) if
(i) E(|Mn |) < +∞, ∀n ∈ N.
(ii) Mn is Fn -measurable, ∀n ∈ N (i.e., (Mn , n ∈ N) is adapted to (Fn , n ∈ N)).
(iii) E(Mn+1 |Fn ) = Mn a.s., ∀n ∈ N.

A martingale is therefore a fair game: the expectation of the process at time n + 1 given the information
at time n is equal to the value of the process at time n.
Remark. Conditions (ii) and (iii) are actually redundant, as (iii) implies (ii).
Properties. If (Mn , n ∈ N) is a martingale, then
- E(Mn+1 ) = E(Mn ) (= . . . = E(M0 )), ∀n ∈ N (by the first property of conditional expectation).
- E(Mn+1 − Mn |Fn ) = 0 a.s. (nearly by definition).
- E(Mn+m |Fn ) = Mn a.s., ∀n, m ∈ N.
This last property is important, as it says that the martingale property propagates over time. Here is a
short proof, which uses the towering property of conditional expectation:

E(Mn+m |Fn ) = E(E(Mn+m |Fn+m−1 )|Fn ) = E(Mn+m−1 |Fn ) = . . . = E(Mn+1 |Fn ) = Mn a.s.

Example: the simple symmetric random walk.


Let (Sn , n ∈ N) be the simple symmetric random walk : S0 = 0, Sn = X1 + . . . + Xn , where the Xn are
i.i.d. and P({X1 = +1}) = P({X1 = −1}) = 1/2.
Let us define the following filtration: F0 = {∅, Ω}, Fn = σ(X1 , . . . , Xn ), n ≥ 1. Then (Sn , n ∈ N) is a
martingale with respect to (Fn , n ∈ N). Indeed:
(i) E(|Sn |) ≤ E(|X1 |) + . . . + E(|Xn |) = 1 + . . . + 1 = n < +∞, ∀n ∈ N.
(ii) Sn = X1 + . . . + Xn is a function of (X1 , . . . , Xn ), i.e., is σ(X1 , . . . , Xn ) = Fn -measurable.
(iii) We have

E(Sn+1 |Fn ) = E(Sn + Xn+1 |Fn ) = E(Sn |Fn ) + E(Xn+1 |Fn )


= Sn + E(Xn+1 ) = Sn + 0 = Sn a.s.

6
The first equality on the second line follows from the fact that Sn is Fn -measurable and that Xn+1 is
independent of Fn = σ(X1 , . . . , Xn ). 
Here is an additional illustration of the martingale property of the simple symmetric random walk:

Remark. Even though one uses generally the same letter “M” for both martingales and Markov process,
these are a priori completely different processes! A possible way to state the Markov property is to say
that
E(g(Mn+1 )|Fn ) = E(g(Mn+1 )|Xn ) a.s. for any g : R → R continuous and bounded
which is clearly different from the above stated martingale property. Beyond the use of the same letter
“M”, the confusion between the two notions comes also from the fact that the simple symmetric random
walk is usually taken a paradigm example for both martingales and Markov processes.
Generalization. If the random variables Xn are i.i.d. and such that E(|X1 |) < +∞ and E(X1 ) = 0,
then (Sn , n ∈ N) is also a martingale (in particular, X1 ∼ N (0, 1) works).
Definition 2.4. Let (Fn , n ∈ N) be a filtration. A process (Mn , n ∈ N) is a submartingale (resp. a
supermartingale) with respect to (Fn , n ∈ N) if
(i) E(|Mn |) < +∞, ∀n ∈ N.
(ii) Mn is Fn -measurable, ∀n ∈ N.
(iii) E(Mn+1 |Fn ) ≥ Mn a.s., ∀n ∈ N (resp. E(Mn+1 |Fn ) ≤ Mn a.s., ∀n ∈ N).

Remarks. - Not every process is either a sub- or a supermartingale!


- The appellations sub- and supermartingale are counter-intuitive. They are due to historical reasons.
- Condition (ii) is now necessary in itself, as (iii) does not imply it.
- If (Mn , n ∈ N) is both a submartingale and a supermartingale, then it is a martingale.

Example: the simple asymmetric random walk.


- If P({X1 = +1}) = p = 1 − P({X1 = −1}) with p ≥ 1/2, then Sn = X1 + . . . + Xn is a submartingale.
- More generally, Sn = X1 + . . . + Xn is a submartingale if E(X1 ) ≥ 0.
Proposition 2.5. If (Mn , n ∈ N) is a martingale with respect to a filtration (Fn , n ∈ N) and ϕ : R → R
is a Borel-measurable and convex function such that E(|ϕ(Mn )|) < +∞, ∀n ∈ N, then (ϕ(Mn ), n ∈ N) is
a submartingale.

Proof. (i) E(|ϕ(Mn )|) < +∞ by assumption.


(ii) ϕ(Mn ) is Fn -measurable as Mn is (and ϕ is Borel-measurable).
(iii) E(ϕ(Mn+1 )|Fn ) ≥ ϕ(E(Mn+1 |Fn )) = ϕ(Mn ) a.s.
In (iii), the first inequality follows from Jensen’s inequality and the second follows from the fact that M
is a martingale.

7
Example. If (Mn , n ∈ N) is a square-integrable martingale (i.e., E(Mn2 ) < +∞, ∀n ∈ N), then the
process (Mn2 , n ∈ N) is a submartingale (as x 7→ x2 is convex).

2.2 Stopping times

Definitions 2.6. - A random time is a random variable T with values in N ∪ {+∞}. It is said to be finite
if T (ω) < +∞ for every ω ∈ Ω and bounded if there exists moreover an integer N such that T (ω) ≤ N
for every ω ∈ Ω (Notice that a finite random time is not necessarily bounded).
-PLet (Xn , n ∈ N) be a stochastic process and assume T is finite. One then defines XT (ω) = XT (ω) (ω) =
n∈N Xn (ω) 1{T =n} (ω).

- A stopping time with respect to a filtration (Fn , n ∈ N) is a random time T such that {T = n} ∈ Fn ,
∀n ∈ N.

Example. Let (Xn , n ∈ N) be a process adapted to (Fn , n ∈ N) and a > 0. Then Ta = inf{n ∈ N :
|Xn | ≥ a} is a stopping time with respect to (Fn , n ∈ N). Indeed:

{Ta = n} = {|Xk | < a, ∀0 ≤ k ≤ n − 1 and |Xn | ≥ a}


n−1
\
= {|Xk | < a} ∩ {|Xn | ≥ a} ∈ Fn , ∀n ∈ N
| {z }
k=0
∈Fk ⊂Fn

Definition 2.7. Let T be a stopping time with respect to a filtration (Fn , n ∈ N). One defines the
information one possesses at time T as the following σ-field:

FT = {A ∈ F : A ∩ {T = n} ∈ Fn , ∀n ∈ N}

Facts.
- If T (ω) = N ∀ω ∈ Ω, then FT = FN . This is obvious from the definition.
- If T1 , T2 are stopping times such that T1 (ω) ≤ T2 (ω) ∀ω ∈ Ω, then FT1 ⊂ FT2 . Indeed, if T1 (ω) ≤
T2 (ω) ∀ω ∈ Ω and A ∈ FT1 , then for all n ∈ N, we have:
 
A ∩ {T2 = n} = A ∩ (∪nk=1 {T1 = k}) ∩ {T2 = n} = ∪nk=1 A ∩ {T1 = k} ∩ {T2 = n} ∈ Fn
| {z }
∈Fk ⊂Fn

so A ∈ FT2 . By the way, here is an example of stopping times T1 , T2 such that T1 (ω) ≤ T2 (ω) ∀ω ∈ Ω:
let 0 < a < b and consider T1 = inf{n ∈ N : |Xn | ≥ a} and T2 = inf{n ∈ N : |Xn | ≥ b}.
- A random variable Y is FT -measurable if and only if Y 1{T =n} is Fn -measurable, ∀n ∈ N. As a conse-
quence: if (Xn , n ∈ N) is adapted to (Fn , n ∈ N), then XT is FT -measurable.

2.3 Doob’s optional stopping theorem, version 1 Week 11

Let (Mn , n ∈ N) be a martingale with respect to (Fn , n ∈ N), N ∈ N N be fixed and T1 , T2 be two
stopping times such that 0 ≤ T1 (ω) ≤ T2 (ω) ≤ N < +∞, ∀ω ∈ Ω. Then

E(MT2 |FT1 ) = MT1 a.s.

In particular, E(MT2 ) = E(MT1 ).


In particular, if T is a stopping time such that 0 ≤ T (ω) ≤ N < +∞, ∀ω ∈ Ω, then E(MT ) = E(M0 ).

8
Remarks. - The above theorem says that the martingale property holds even if one is given the option
to stop at any (bounded) stopping time.
- The theorem also holds for sub- and supermartingales (i.e., if M is a submartingale, then E(MT2 |FT1 ) ≥
MT1 a.s.).

Proof. - We first show that if T is a stopping time such that 0 ≤ T (ω) ≤ N, ∀ω ∈ Ω, then

E(MN |FT ) = MT (1)


PN
Indeed, let Z = MT = n=0 Mn 1{T =n} . We check below that Z is the conditional expectation of MN
given FT :
(i) Z is FT -measurable: Z 1{T =n} = Mn 1{T =n} is Fn -measurable ∀n, so Z is FT -measurable.
(ii) E(ZU ) = E(MN U ), ∀U FT -measurable and bounded:
N
X N
X XN
E(ZU ) = E(Mn 1{T =n} U ) = E(E(MN |Fn ) 1{T =n} U ) = E(MN 1{T =n} U ) = E(MN U )
n=0 n=0 n=0
| {z }
Fn −measurable

- Second, let us check that E(MT2 |FT1 ) = MT1 :

M T1 = E(MN |FT1 ) = E(E(MN |FT2 )|FT1 ) = E(MT2 |FT1 )


(1) with T =T1 FT1 ⊂FT2 (1) with T =T2

This concludes the proof of the theorem.

2.4 The reflection principle

Let (Sn , n ∈ N) be the simple symmetric random walk and

T = inf{n ≥ 1 : Sn = +1 or n = N }

As S is a martingale and T is a bounded stopping time (indeed, T (ω) ≤ N for every ω ∈ Ω), the optional
stopping theorem applies here, so it holds that E(ST ) = E(S0 ) = 0. But what is the distribution of the
random variable ST ? Intuitively, for N large, ST will be +1 with high probability, but in case it does not
this value, what is the average loss we should expect? More precisely, we are asking here for the value of

E SN 1{max0≤n≤N Sn ≤0}
     

E ST max Sn ≤ 0 = E SN max Sn ≤ 0 = (2)
0≤n≤N 0≤n≤N P ({max0≤n≤N Sn ≤ 0})

Let us first compute the denominator in (2), assuming that N is even to simplify notations:
  X
P max Sn ≤ 0 = P({Sn ≤ 0, ∀0 ≤ n ≤ N }) = P({SN = −2k, Sn ≤ 0, ∀0 ≤ n ≤ N − 1})
0≤n≤N
k≥0

noticing that SN can only take even values (because N itself is even) and that we are asking here that
SN ≤ 0. Let us now consider a fixed value of k ≥ 0. In order to compute the probability

P({SN = −2k, Sn ≤ 0, ∀0 ≤ n ≤ N − 1})

we should enurmate all paths of the following form: (please turn the page)

9
but this is rather complicated combinatorics. In order to avoid such a computation, first observe that

P({SN = −2k, Sn ≤ 0, ∀0 ≤ n ≤ N −1, }) = P({SN = −2k})−P({SN = −2k, ∃1 ≤ n ≤ N1 with Sn = +1})

A second important observation, which is at the heart of the reflection principle, is that to each path
going from 0 (at time 0) to −2k (at time N ) “via” +1 corresponds a mirror path that goes from 0 to
2k + 2, also “via” +1, as illustrated below:

so that in total:

P({SN = −2k, ∃1 ≤ n ≤ N1 with Sn = +1}) = P({SN = 2k + 2, ∃1 ≤ n ≤ N1 with Sn = +1})

A third observation is that for any k ≥ 0, there is no way to go from 0 to 2k + 2 without crossing the +1
line, so that
P({SN = 2k + 2, ∃1 ≤ n ≤ N1 with Sn = +1}) = P({SN = 2k + 2})
Finally, we obtain
  X X
P max Sn ≤ 0 = (P({SN = −2k})−P({SN = 2k+2})) = (P({SN = 2k})−P({SN = 2k+2}))
0≤n≤N
k≥0 k≥0

by symmetry. But this is a telescopic sum, and we know that for finite N , it ends before k = +∞. At
the end, we therefore obtain:
 
P max Sn ≤ 0 = P({SN = 0})
0≤n≤N

which can be computed via simple combinatorics (writing here N = 2M ):


 
1 2M 1 (2M )!
P({S2M = 0}) = 2M = 2M
2 M 2 (M !)2

which gives for large M , using Stirling’s formula: M ! ' M M e−M 2πM :

1 (2M )2M e−2M 4πM 1
P({S2M = 0} ' 2M √ =√
2 (M e −M 2πM ) 2 πM

10
This leads to the approximation for large N :
  r
2
P max Sn ≤ 0 '
0≤n≤N πN

Finally, the optional stopping theorem spares us the direct computation of the numerator in (2), since
 

0 = E(ST ) = 1 · P max Sn ≥ +1 + E SN 1{max0≤n≤N Sn ≤0}
0≤n≤N

so   r
 2
E SN 1{max0≤n≤N Sn ≤0} = −1 + P max Sn ≤ 0 ' −1 +
0≤n≤N πN
for large N , and finally
q
   −1 + 2 r
πN πN
E ST
max Sn ≤ 0 ' q =1−
0≤n≤N 2 2
πN

√ S does not reach the value +1 during the time interval {0, . . . N }, we
for large N . In conclusion, in case
should expect a loss of order − N .

2.5 Martingale transforms

Definition 2.8. A process (Hn , n ∈ N) is said to be predictable with respect to a filtration (Fn , n ∈ N)
if H0 = 0 and Hn is Fn−1 -measurable ∀n ≥ 1.

Remark. If a process is predictable, then it is adapted.


Let now (Fn , n ∈ N) be a filtration, (Hn , n ∈ N) be a predictable process with respect to (Fn , n ∈ N)
and (Mn , n ∈ N) be a martingale with respect to (Fn , n ∈ N).
Definition 2.9. The process G defined as
n
X
G0 = 0, Gn = (H · M )n = Hi (Mi − Mi−1 ), n≥1
i=1

is called the martingale transform of M through H.

Remark. This process is the discrete version of the stochastic integral. It represents the gain obtained
by applying the strategy H to the game M :
- Hi = amount bet on day i (Fi−1 -measurable).
- Mi − Mi−1 = increment of the process M on day i.
- Gn = gain on day n.
Proposition 2.10. If Hn is a bounded random variable for each n (i.e., |Hn (ω)| ≤ Kn ∀ω ∈ Ω), then
the process G is a martingale with respect to (Fn , n ∈ N).

In other words, one cannot win on a martingale!


Pn Pn
Proof. (i) E(|Gn |) ≤ i=1 E(|Hi | |Mi − Mi−1 |) ≤ i=1 Ki (E(|Mi |) + E(|Mi−1 |)) < +∞.
(ii) Gn is Fn -measurable by construction.
(iii) E(Gn+1 |Fn ) = E(Gn + Hn+1 (Mn+1 − Mn )|Fn ) = Gn + Hn+1 E(Mn+1 − Mn |Fn ) = Gn + 0 = Gn .

11
Example: “the” martingale.
Let (Mn , n ∈ N) be the simple symmetric random walk (Mn = X1 + . . . + Xn ) and consider the following
strategy: 
2Hn , if ξ1 = . . . = ξn = −1
H0 = 0, H1 = 1, Hn+1 =
0, otherwise
Notice that all the Hn are bounded random variables. Then by the above proposition, the process G
defined as
Xn n
X
G0 = 0, Gn = Hi (Mi − Mi−1 ) = Hi Xi , n ≥ 1
i=1 i=1

is a martingale. So E(Gn ) = E(G0 ) = 0, ∀n ∈ N. Let now

T = inf{n ≥ 1 : Xn = +1}

T is a stopping time and it is easily seen that GT = +1. But then E(GT ) = 1 6= 0 = E(G0 )? Is there a
contradiction? Actually no. The optional stopping theorem does not apply here, because the time T is
unbounded: P(T = n) = 2−n , ∀n ∈ N, i.e., there does not exist N fixed such that T (ω) ≤ N , ∀ω ∈ Ω.

2.6 Doob’s decomposition theorem Week 12

Theorem 2.11. Let (Xn , n ∈ N) be a submartingale with respect to a filtration (Fn , n ∈ N). Then there
exists a martingale (Mn , n ∈ N) with respect to (Fn , n ∈ N) and a process (An , n ∈ N) predictable with
respect to (Fn , n ∈ N) and increasing (i.e., An ≤ An+1 ∀n ∈ N) such that A0 = 0 and Xn = Mn + An ,
∀n ∈ N. Moreover, this decomposition of the process X is unique.

Proof. (main idea)


E(Xn+1 |Fn ) ≥ Xn , so a natural candidate for the process A is to set A0 = 0 and An+1 = An +
E(Xn+1 |Fn ) − Xn (≥ An ), which is a predictable and increasing process. Then, M0 = X0 and Mn+1 −
Mn = Xn+1 − Xn − (An+1 − An ) = Xn+1 − E(Xn+1 |Fn ) is indeed a martingale, as E(Mn+1 − Mn |Fn ) =
0.

3 Martingale convergence theorems

3.1 Preliminary: Doob’s martingale

Proposition 3.1. Let (Ω, F, P) be a probability space, (Fn , n ∈ N) be a filtration and X : Ω → R be


an F-measurable and integrable random variable. Then the process (Mn , n ∈ N) defined as

Mn = E(X|Fn ), n∈N

is a martingale with respect to (Fn , n ∈ N).

Proof. (i) E(|Mn |) = E(|E(X|Fn )|) ≤ E(E(|X| |Fn )) = E(|X|) < +∞, for all n ∈ N.
(ii) By the definition of conditional expectation, Mn = E(X|Fn ) is Fn -measurable, for all n ∈ N.
(iii) E(Mn+1 |Fn ) = E(E(X|Fn+1 )|Fn ) = E(X|Fn ) = Mn , for all n ∈ N.

Remarks. - This process describes the situation where one acquires more and more information about a
random variable. Think e.g. at the case where X is a number drawn uniformly at random between 0 and
1, and one reads this number from left to right: while reading, one obtains more and more information
about the number, as illustrated on the left-hand side of the figure below: (please turn the page)

12
- On the right-hand side of the figure is another illustration of a Doob martingale: as time goes by, one
gets more and more information about where to locate oneself in the space Ω.
- Are Doob’s martingales a very particular type of martingales? No! As the following paragraph shows,
there are quite many such martingales!

3.2 The martingale convergence theorem: first version

Theorem 3.2. Let (Mn , n ∈ N) be a square-integrable martingale (i.e., a martingale such that E(Mn2 ) <
+∞ for all n ∈ N) with respect to a filtration (Fn , n ∈ N). Under the additional assumption that

sup E(Mn2 ) < +∞ (3)


n∈N

there exists a limiting random variable M∞ such that


(i) Mn → M∞ almost surely.
n→∞

(ii) limn→∞ E (Mn − M∞ )2 = 0 (quadratic convergence).
(iii) Mn = E(M∞ |Fn ), for all n ∈ N (this last property is referred to as the martingale M being “closed
at infinity”).

Remarks. - Condition (3) is of course much stronger than just asking that E(Mn2 ) < +∞ for every n.
Think for example at the simple symmetric random walk Sn : E(Sn2 ) = n < +∞ for every n, but the
supremum is infinite.
- By conclusion (iii) in the theorem, any square-integrable martingale satisfying condition (3) is actually
a Doob martingale (take X = M∞ )!
- A priori, one could think that all the conclusions of the theorem hold true if one replaces all the squares
by absolute values in the above statement (such as e.g. replacing condition (3) by supn∈N E(|Mn |) < +∞,
etc.). This is wrong, and we will see interesting counter-examples later.
- A stronger condition than (3) (leading therefore to the same conclusion) is the following:

sup sup |Mn (ω)| < +∞. (4)


n∈N ω∈Ω

Martingales satisfying this stronger condition are called bounded martingales.


Example 3.3. Let M0 = x, where x ∈ [0, 1] is a fixed number, and let us define recursively:
(
Mn2 , with probability 12
Mn+1 =
2Mn − Mn2 , with probability 12

The process M is a bounded martingale. Indeed:

13
(i) By induction, if Mn ∈ [0, 1], then Mn+1 ∈ [0, 1], for every n ∈ N, so as M0 = x ∈ [0, 1], we obtain

sup sup |Mn (ω)| ≤ 1 < +∞


n∈N ω∈Ω

1 1
(ii) E(Mn+1 |Fn ) = 2 Mn2 + 2 (2Mn − Mn2 ) = Mn , for every n ∈ N.
By the theorem, there exists therefore a random variable M∞ such that the three conclusions of the
theorem hold. In addition, it can be shown by contradiction that M∞ takes values in the binary set
{0, 1} only, so that
x = E(M0 ) = E(M∞ ) = P(M∞ = 1)

3.3 Consequences of the theorem

Before diving into the proof of the above important theorem, let us first explore a few of its interesting
consequences.
Optional stopping theorem, version 2. Let (Fn , n ∈ N) be a filtration, let (Mn , n ∈ N) be a square-
integrable martingale with respect to (Fn , n ∈ N) which satisfies condition (3) and let 0 ≤ T1 ≤ T2 ≤ +∞
be two stopping times with respect to (Fn , n ∈ N). Then

E(MT2 |FT1 ) = MT1 a.s. and E(MT2 ) = E(MT1 )

Proof. Simply replace N by ∞ in the proof of the first version and use the fact that M is a closed
martingale by the convergence theorem.

Stopped martingale. Let (Mn , n ∈ N) be a martingale and T be a stopping time with respect to a
filtration (Fn , n ∈ N), without any further assumption. Let us also define the stopped process

(MT ∧n , n ∈ N)

where T ∧ n = min{T, n} by definition. Then this stopped process is also a martingale with respect to
(Fn , n ∈ N) (we skip the proof here, which uses the first version of the optional stopping theorem).

Optional stopping theorem, version 3. Let (Mn , n ∈ N) be a martingale with respect to (Fn , n ∈ N)
such that there exists c > 0 with |Mn+1 (ω) − Mn (ω)| ≤ c for all ω ∈ Ω and n ∈ N (this assumption
ensures that the martingale does not make jumps of uncontrolled size: the simple symmetric random
walk Sn satisfies in particular this assumption). Let also a, b > 0 and

T = inf{n ∈ N : Mn ≤ −a or Mn ≥ b}

Observe that T is a stopping time with respect to (Fn , n ∈ N) and that −a − c ≤ MT ∧n (ω) ≤ b + c for
all ω ∈ Ω and n ∈ N. In particular,
sup E(MT2 ∧n ) < +∞
n∈N

so the stopped process (MT ∧n , n ∈ N) satisfies the assumptions of the first version of the martingale
convergence theorem. By the conclusion of this theorem, the stopped martingale (MT ∧n , n ∈ N) is
closed, i.e. it admits a limit MT ∧∞ = MT and

E(MT ) = E(MT ∧∞ ) = E(MT ∧0 ) = E(M0 )

Application. Let (Sn , n ∈ N) be the simple symmetric random walk (which satisfies the above assump-
tions with c = 1) and T be the above stopping time (with a, b positive integers). Then E(ST ) = E(S0 ) = 0.
Given that ST ∈ {−a, +b}, we obtain

0 = E(ST ) = (+b) P({ST = +b}) + (−a) P({ST = −a}) = bp − a(1 − p), where p = P({ST = +b})

14
a
From this, we deduce that P({ST = +b}) = p = a+b .

Remark. Note that the same reasoning does not hold if we replace the stopping time T by a stopping
time of the form
T 0 = inf{n ∈ N : Mn ≥ b}
There is indeed no guarantee in this case that the stopped martingale (MT 0 ∧n , n ∈ N) is bounded (from
below).

3.4 Proof of the theorem Week 13

A key ingredient for the proof: the maximal inequality. The following inequality, apart from
being useful for the proof of the martingale convergence theorem, is interesting in itself. Let (Mn , n ∈ N)
be a square-integrable martingale. Then for every N ∈ N and x > 0,
2
 
E(MN )
P max |Mn | ≥ x ≤ 2
0≤n≤N x
Remark. This inequality resembles Chebychev’s inequality, but it is actually much stronger. In particu-
lar, note the remarkable fact that deviation probability of the maximum value of the martingale over the
whole time interval {0, . . . , N } is controlled by the second moment of the martingale at the final instant
N alone.

Proof. - First, let x > 0 and let Tx = inf{n ∈ N : |Mn | ≥ x}: Tx is a stopping time and note that
 
{Tx ≤ N } = max |Mn | ≥ x
0≤n≤N

2
E(MN )
So what we need actually to prove is that P({Tx ≤ N }) ≤ x2 .

- Second, observe that as M is a martingale, M 2 is a submartingale. So by the optional stopping theorem,


we obtain
2 2
|FTx ∧N ≥ E MT2x ∧N ≥ E MT2x ∧N 1{Tx ≤N }
  
E(MN ) = E E MN
= E MT2x 1{Tx ≤N } ≥ E x2 1{Tx ≤N } = x2 P({Tx ≤ N })
 

where the last inequality comes from the fact that |MTx | ≥ x, by definition of Tx . This proves the
claim.

Proof of Theorem 3.2. - We first prove conclusion (i), namely that the sequence (Mn , n ∈ N) converges
almost surely to some limit. This proof is divided in two parts.
Part 1. We first show that for every ε > 0,
 
P sup |Mn+m − Mm | ≥ ε → 0 (5)
n∈N m→∞

This is saying that for every ε > 0, the probability that the martingale M deviates by more than ε after
a given time m can be made arbitrarily small by taking m large enough. This essentially says that the
fluctuations of the martingale decay with time, i.e. that the martingale ultimately converges! Of course,
this is just an intuition and needs a formal proof, which will be done in the second part of the proof. For
now, let us focus on proving (5).
a) Let m ∈ N be fixed and define the process (Yn , n ∈ N) by Yn = Mn+m − Mm , for n ∈ N. Y is a
square-integrable martingale, so by the maximal inequality, we have for every N ∈ N and every ε > 0:
E(YN2 )
 
P max |Yn | ≥ ε ≤
0≤n≤N ε2

15
b) Let us now prove that
E(YN2 ) = E(Mm+N
2 2
) − E(Mm ).
This equality follows from the orthogonality of the increments of M . Here is a detailed proof:

E(YN2 ) = E((Mm+N − Mm )2 ) = E(Mm+N


2 2
) − 2 E(Mm+N Mm ) + E(Mm )
2 2
= E(Mm+N ) − 2 E(E(Mm+N Mm |Fm )) + E(Mm )
2 2
= E(Mm+N ) − 2 E(E(Mm+N |Fm ) Mm ) + E(Mm )
2 2 2 2 2
= E(Mm+N ) − 2 E(Mm ) + E(Mm ) = E(Mm+N ) − E(Mm )

Gathering a) and b) together, we obtain for every m, N ∈ N and every ε > 0:


2 2
) − E(Mm
 
E(Mm+N )
P max |Mm+n − Mm | ≥ ε ≤ .
0≤n≤N ε2

c) Assumption (3) states that supn∈N E(Mn2 ) < +∞. As the sequence (E(Mn2 ), n ∈ N) is increasing
(since M 2 is a submartingale), this also says that the sequence has a limit: limn→∞ E(Mn2 ) = K < +∞.
Therefore, for every m ∈ N and ε > 0, we obtain
   
P sup |Mm+n − Mm | ≥ ε = lim P max |Mm+n − Mm | ≥ ε
n∈N N →∞ 0≤n≤N
2 2 2
E(Mm+N )− E(Mm ) K − E(Mm )
≤ lim =
N →∞ ε2 ε 2

Taking now m to infinity, we further obtain


2
 
K − E(Mm ) K −K
P sup |Mm+n − Mm | ≥ ε ≤ 2
→ =0
n∈N ε m→∞ ε2

for every ε > 0. This proves (5) and concludes therefore the first part of the proof.
Part 2. Let C = {ω ∈ Ω : limn→∞ Mn (ω) exists}. In this second part, we prove that P(C) = 1, which
is conclusion (i).
Here is what we have proven so far. For m ∈ N and ε > 0, define Am (ε) = {supn∈N |Mm+n − Mm | ≥ ε}.
Then (5) says that for every fixed ε > 0, limm→∞ P(Am (ε)) = 0. We then have the following (long!)
series of equivalent statements:

∀ε > 0, limm→∞ P(Am (ε)) = 0


1
⇐⇒ ∀ε > 0, P(∩m∈N Am (ε)) = 0 ⇐⇒ ∀M ≥ 1, P(∩m∈N Am ( M )) = 0
1
⇐⇒ P(∪M ≥1 ∩m∈N Am ( M )) = 0 ⇐⇒ P(∪ε>0 ∩m∈N Am (ε)) = 0
⇐⇒ P({∃ε > 0 s.t. ∀m ∈ N, sup |Mm+n − Mm | ≥ ε}) = 0
n∈N
⇐⇒ P({∀ε > 0, ∃m ∈ N s.t. sup |Mm+n − Mm | < ε}) = 1
n∈N
⇐⇒ P({∀ε > 0, ∃m ∈ N s.t. |Mm+n − Mm | < ε, ∀n ∈ N}) = 1
⇐⇒ P({∀ε > 0, ∃m ∈ N s.t. |Mm+n − Mm+p | < ε, ∀n, p ∈ N}) = 1
⇐⇒ P({the sequence (Mn , n ∈ N) is a Cauchy sequence}) = 1 ⇐⇒ P(C) = 1

as every Cauchy sequence in R converges. This completes the proof of conclusion (i) in the theorem.
- In order to prove conclusion (ii) (quadratic convergence), let us recall that from what was shown above

E((Mn − Mm )2 ) = E(Mn2 ) − E(Mm


2
), ∀n ≥ m ≥ 0

This, together with the fact that limn→∞ E(Mn2 ) = K, implies that Mn is a Cauchy sequence in L2 : it
therefore converges to some limit, as the space of square-integrable random variables is complete. Let us

16
call this limit M
f∞ . But does it hold that Mf∞ = M∞ , the a.s. limit of part (i)? Yes, as both quadratic
convergence and a.s. convergence imply convergence in probability, and we have seen in part I (Theorem
5.3) that if a sequence of random variables converges in probability to two possible limits, then these two
limits are equal almost surely.
- Conclusion (iii) then follows from the following reasoning. We need to prove that Mn = E(M∞ |Fn ) for
every (fixed) n ∈ N (where M∞ is the limit found in parts (i) and (ii)). To this end, let us go back to
the very definition of conditional expectation and simply check that
(i) Mn is Fn -measurable: this is by definition.
(ii) E(M∞ U ) = E(Mn U ) for every random variable U Fn -measurable and bounded. This follows from
the following observation:
E(Mn U ) = E(MN U ), ∀N ≥ n
This equality together with the Cauchy-Schwarz inequality imply that for every N ≥ n:
p p
|E(M∞ U )−E(Mn U )| = |E(M∞ U )−E(MN U )| = |E((M∞ −MN ) U )| ≤ E((M∞ − MN )2 ) E(U 2 ) → 0
N →∞

by quadratic convergence (conclusion (ii)). So we obtain that necessarily, E(M∞ U ) = E(Mn U ) (remember
that n is fixed here). This completes the proof of Theorem 3.2. 

3.5 The martingale convergence theorem: second version

Theorem 3.4. Let (Mn , n ∈ N) be a martingale such that

sup E(|Mn |) < +∞ (6)


n∈N

Then there exists a limiting random variable M∞ such that Mn → M∞ almost surely.
n→∞

We shall not go through the proof of this second version of the martingale convergence theorem1 , whose
order of difficulty resembles that of the first one. Let us just make a few remarks and also exhibit an
interesting example below.

Remarks. - Contrary to what one could perhaps expect, it does not necessarily hold in this case that
limn→∞ E(|Mn − M∞ |) = 0, nor that E(M∞ |Fn ) = Mn for every n ∈ N.
- By the Cauchy-Schwarz inequality, we see that condition (6) is weaker than condition (3).
- On the other hand, condition (6) is of course stronger than just asking E(|Mn |) < +∞ for all n ∈ N
(this last condition is by the way satisfied by every martingale, by definition). It is also stronger than
asking supn∈N E(Mn ) < +∞. Why? Simply because for every martingale, E(Mn ) = E(M0 ) for every
n ∈ N, so the supremum is always finite! The same does not hold when one adds absolute values: the
process (|Mn |, n ∈ N) is a submartingale, so the sequence (E(|Mn |), n ∈ N) is non-decreasing, possibly
growing to infinity.
- If M is a non-negative martingale, then |Mn | = Mn for every n ∈ N and by what was just said above,
condition (6) is satisfied! So non-negative martingales always converge to a limit almost surely! But they
might not be closed at infinity.
A puzzling example. Let (Sn , n ∈ N) be the simple symmetric random walk and (Mn , n ∈ N) be the
process defined as
Mn = exp(Sn − cn), n ∈ N
 −1 
where c = log e+e2 > 0 is such that M is a martingale, with E(Mn ) = E(M0 ) = 1 for every n ∈ N.
On top of that, M is a positive martingale, so by the previous remark, there exists a random variable
1 It is sometimes called the first version in the literature!

17
M∞ such that Mn → M∞ almost surely. So far so good. Let us now consider some more puzzling
n→∞
facts:
- A simple computation shows that supn∈N E(Mn2 ) = supn∈N E(exp(2Sn − 2cn)) = +∞, so we cannot
conclude that (ii) and (iii) in Theorem 3.2 hold. Actually, these conclusions do not hold, as we will see
below.
- What can the random variable M∞ be? It can be shown that Sn − cn → −∞ almost surely, from
n→∞
which we deduce that Mn = exp(Sn − cn) → 0 almost surely, i.e. M∞ = 0!
n→∞

- It is therefore impossible that E(M∞ |Fn ) = Mn , as the left-hand side is 0, while the right-hand side
is not. Likewise, quadratic convergence to 0 does not hold (this would mean that limn→∞ E(Mn2 ) = 0,
which does not hold).
- On the contrary, we just said above that Var(Mn ) = E(Mn2 ) − (E(Mn ))2 = E(Mn2 ) − 1 grows to infinity
as n goes to infinity. Still, Mn converges to 0 almost surely. If this sounds puzzling to you, be reassured
that you are not alone!

3.6 Generalization to sub- and supermartingales


We state below the generalization of the two convergence theorems to sub- and supermartingales.
Theorem 3.5. (Generalization of Theorem 3.2)
Let (Mn , n ∈ N) be a square-integrable submartingale (resp., supermartingale) with respect to a filtration
(Fn , n ∈ N). Under the additional assumption that

sup E(Mn2 ) < +∞ (7)


n∈N

there exists a limiting random variable M∞ such that


(i) Mn → M∞ almost surely.
n→∞

(ii) limn→∞ E (Mn − M∞ )2 = 0 (quadratic convergence).
(iii) Mn ≤ E(M∞ |Fn ) (resp., Mn ≥ E(M∞ |Fn )), for all n ∈ N.
Theorem 3.6. (Generalization of Theorem 3.4)
Let (Mn , n ∈ N) be a submartingale (resp., supermartingale) such that

sup E(Mn+ ) < +∞ (resp., sup E(Mn− ) < +∞) (8)


n∈N n∈N

where recall here that Mn + = max(Mn , 0) and Mn− = max(−Mn , 0). Then there exists a limiting random
variable M∞ such that Mn → M∞ almost surely.
n→∞

As one can see, not much changes in the assumptions and conclusions of both theorems! Let us mention
some interesting consequences.
- From Theorem 3.5, it holds that if M is a sub- or a supermartingale satisfying condition (7), then
Mn converges both almost surely and quadratically to some limit M∞ . In case where M is a (non-
trivial) martingale, we saw previously that the limit M∞ cannot be equal to 0, as this would lead to
a contradiction, because of the third part of the conclusion stating that Mn = E(M∞ |Fn ) = 0 for all
n. In the case of a sub- or supermartingale, this third part only says that Mn ≤ E(M∞ |Fn ) = 0 or
Mn ≥ E(M∞ |Fn ) = 0, which is not necessarily a contradiction.
- From Theorem 3.6, one deduces that any positive supermartingale admits an almost sure limit at in-
finity. But the same conclusion cannot be drawn for a positive submartingale (think simply of Mn = n:
this very particular positive submartingale does not converge). From the same theorem, one deduces also
that any negative submartingale admits an almost sure limit at infinity.

18
Finally, for illustration purposes, here are again below 3 well known martingales: (Sn , n ∈ N0), (Sn2 −
n, n ∈ N) and (Mn = exp(Sn − cn, n ∈ N) juste seen above:

We see again here that even though theses 3 processes are all contant mean processes, they do exhibit
very different behaviours!

3.7 Azuma’s and McDiarmid’s inequalities

Theorem 3.7. (Azuma’s inequality)


Let (Mn , n ∈ N) be a martingale such that |Mn (ω) − Mn−1 (ω)| ≤ 1 for every n ≥ 1 and ω ∈ Ω. Such a
martingale is said to have bounded differences. Assume also that M0 is constant. Then for every n ≥ 1
and t > 0, we have
nt2
 
P({|Mn − M0 | ≥ nt}) ≤ 2 exp −
2

Remark. This statement resembles that of Hoeffding’s inequality! The difference here is that a martin-
gale is not necessarily a sum of i.i.d. random variables.

19
Pn
Proof. Let Xn = Mn − Mn−1 for n ≥ 1. Then, by the assumptions made, Mn − M0 = j=1 Xj , with
|Xj (ω)| ≤ 1 for every j ≥ 1 and ω ∈ Ω, but as mentioned above, the Xj ’s are not necessarily i.i.d.: we
only know that E(Xj |Fj−1 ) = 0 for every j ≥ 1. We need to bound
n P o nP o nP o
n n n
P j=1 Xj ≥ nt =P j=1 Xj ≥ nt +P j=1 Xj ≤ −nt

By Chebyshev’s inequality with ϕ(x) = esx and s > 0, we obtain


  P 
nP o E exp s nj=1 Xj    P  
n n
P j=1 Xj ≥ nt ≤ = e−snt E E exp s j=1 Xj Fn−1
exp(snt)
  P  
n−1
= e−snt E exp s j=1 Xj E exp (sXn ) Fn−1

As E(Xn |Fn−1 ) = 0 and |Xn (ω)| ≤ 1 for every ω ∈ Ω, we can apply the same lemma as in the proof of
Hoeffding’s inequality to conclude that
2
E(exp(sXn )|Fn−1 ) ≤ es /2

So nP o   P  2
n n−1
P j=1 Xj ≥ nt ≤ e−snt E exp s j=1 Xj es /2

and working backwards, we finally obtain the upper bound


nP o 2
n
P X
j=1 j ≥ nt ≤ e−snt+ns /2

which is again minimum for s∗ = t and equal then to exp(−nt2 /2). By symmetry, the same bound is
obtained for the other term:
nP o
n
P j=1 Xj ≤ −nt ≤ exp(−nt2 /2)

which completes the proof.

Generalization. Exactly like Hoeffding’s inequality, Azuma’s inequality can be generalized as follows.
Let M be a martingale such that Mn (ω) − Mn−1 (Ω) ∈ [an , bn ] for every n ≥ 1 and every ω ∈ Ω. Then
!
2n2 t2
P({|Mn − M0 | ≥ nt}) ≤ 2 exp − Pn 2
j=1 (bj − aj )

Application 1. Consider the martingale transform of Section 2.5 defined as follows. Let (Xn , n ≥ 1)
be a sequence of i.i.d. random variables such that P({X1 = +1}) = P({X1 = −1}) = 21 . Let F0 = {∅, Ω}
and Fn = σ(X1 , . . . , Xn ) for n ≥ 1. Let (Hn , n ∈ N) be a predictable process with
Pnrespect to (Fn , n ∈ N)
such that |Hn (ω)| ≤ Kn for every n ∈ N and ω ∈ Ω. Let finally G0 = 0, Gn = j=1 Hj Xj , n ≥ 1. Then
!
n2 t2
P({|Gn − G0 | ≥ nt}) ≤ 2 exp − Pn
2 j=1 Kj2

In the case where Kn = K for every n ∈ N, this says that

nt2
 
P({|Gn − G0 | ≥ nt}) ≤ 2 exp −
2K 2

We had obtained the same conclusion earlier for the random walk, but here, the increments of G are in
general far from being independent.

20
Application 2: McDiarmid’s inequality. Let n ≥ 1 be fixed, let X1 , . . . , Xn be i.i.d. random variables
and let F : Rn → R be a Borel-measurable function such that

|F (x1 , . . . , xj , . . . , xn ) − F (x1 , . . . , x0j , . . . , xn )| ≤ Kj , ∀x1 , . . . , xj , x0j , . . . , xn ∈ R, 1 ≤ j ≤ n

Then !
n2 t2
P({|F (X1 , . . . , Xn ) − E(F (X1 , . . . , Xn ))| ≥ nt}) ≤ 2 exp − Pn
2 j=1 Kj2

Proof. Define F0 = {∅, Ω}, Fm = σ(X1 , . . . , Xj ) and Mj = E(F (X1 , . . . , Xn )|Fj ) for j ∈ {0, . . . , n}. By
definition, M is a martingale and observe that

Mn = F (X1 , . . . , Xn ) and M0 = E(F (X1 , . . . , Xn ))

Moreover,

|Mj − Mj−1 | = |E(F (X1 , . . . , Xn )|Fj ) − E(F (X1 , . . . , Xn )|Fj−1 )| = |g(X1 , . . . , Xj ) − h(X1 , . . . , Xj−1 )|

where g(x1 , . . . , xj ) = E(f (x1 , . . . , xj , Xj+1 , . . . , Xn )) and h(x1 , . . . , xj−1 ) = E(f (x1 , . . . , xj−1 , Xj , . . . , Xn )).
By the assumption made, we find that for every x1 , . . . , xj ∈ R,

|g(x1 , . . . , xj ) − h(x1 , . . . , xj−1 )| ≤ E(|f (x1 , . . . , xj , Xj+1 , . . . , Xn ) − f (x1 , . . . , xj−1 , Xj , . . . , Xn )|) ≤ Kj

so Azuma’s inequality applies. This completes the proof.

21

You might also like