0% found this document useful (0 votes)
17 views14 pages

Lec Notes 10

This document provides an overview of martingales, which are sequences of dependent random variables that can be used to study sequences over time. It defines what constitutes a martingale and provides examples like sums of independent random variables, likelihood ratios, and gambling systems. The document also introduces related concepts like sub/supermartingales and the Doob decomposition theorem.

Uploaded by

Alberto Trelles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Lec Notes 10

This document provides an overview of martingales, which are sequences of dependent random variables that can be used to study sequences over time. It defines what constitutes a martingale and provides examples like sums of independent random variables, likelihood ratios, and gambling systems. The document also introduces related concepts like sub/supermartingales and the Doob decomposition theorem.

Uploaded by

Alberto Trelles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

36710-36752 Advanced Probability Overview Spring 2020

10. Martingales

Instructor: Alessandro Rinaldo

Associated reading: Chapter 6 of Ash and Doléans-Dade; Sec 5.2, 5.3, 5.4 of Durrett.

Overview
Martingales are elegant and powerful tools to study sequences of dependent random variables.
It is originated from gambling, where a gambler can adjust the bet according to the previous
results. In a simple version, assume a gambler bets 1 dollar in the first game. If he wins,
then he stops playing. Otherwise he doubles the bet until he wins. If each game is i.i.d coin
toss with non-zero winning probability and the gambler has infinite amount of money, then
he will win one dollar with probability one.

1 Martingales
Let (Ω, F, P ) be a probability space.

Definition 1 (Filtration and Martingales). Let F1 ⊆ F2 ⊆ . . . be a sequence of sub-


σ-field’s of F. We call {Fn }∞ n=1 a filtration . If Xn : Ω → IR is Fn -measurable for every
n, we say that {Xn }n=1 is adapted to the filtration. If {Xn }∞

n=1 is adapted to a filtration

{Fn }n=1 , and if E|Xn | < ∞ for all n and E(Xn+1 |Fn ) = Xn for all n, then we say that
{Xn }∞ ∞
n=1 is a martingale relative to the filtration. Alternatively we say that {(Xn , Fn )}n=1 is
a martingale. If Xn ≤ E(Xn+1 |Fn ) for all n, we say that {(Xn , Fn )}∞ n=1 is a submartingale.
If the inequality goes the other way, it is a supermartingale.

Proposition 2. A martingale is both a submartingale and a supermartingale. {Xn }∞


n=1 is a

submartingale if and only if {−Xn }n=1 is a supermartingale.

Example 3 (Sums of independent r.v.’s). Let {Yn }∞ n=1 be a sequence of independent


Pn
random variables with finite mean. Let Fn = σ(Y1 , . . . , Yn ) and Xn = j=1 Yj . If E(Yn ) = 0
for every n, then {(Xn , Fn )}∞
n=1 is a martingale. If E(Yn ) ≥ 0 for every n, then we have a
submartingale, and if E(Yn ) ≤ 0 for every n, we have a supermartingale.

Example 4 (R-N derivatives). Let (Ω, F, P ) be a probability space. Let {Fn }∞ n=1 be a
filtration. Let ν be a finite measure on (Ω, F) such that for every n, ν has a density Xn with

1
respect to P when both are restricted to (Ω, Fn ). Then {Xn }∞
n=1 is adapted to the filtration.
To see that we have a martingale, we need to show that for every n and A ∈ Fn
Z Z
Xn+1 (ω)dP (ω) = Xn (ω)dP (ω). (1)
A A

Since Fn ⊆ Fn+1 , each A ∈ Fn is also in Fn+1 . Hence both sides of Equation (1) equal ν(A).

Example 5 (Likelihood ratio – simple case). As a more specific example of Example 4,


let Ω = IR∞ and let Fn = {B × IR∞ : B ∈ Bn }. That is, Fn is the collection of cylinder
sets corresponding to the first n coordinates (the σ-field generated by the first n coordinate
projection functions). Let P be the joint distribution of an infinite sequence of iid standard
normal random variables. Let ν be the joint distribution of an infinite sequence of iid expo-
nential random variables with parameter 1. For each n, when we restrict both P and ν to
Fn , ν has the density
( P 
n
(2π)n/2 exp [ω
j=1 j
2
/2 − ωj ] for ω1 , . . . , ωn > 0,
Xn (ω) =
0 otherwise,

with respect to P . It is easy to see that


√ 2
E(Xn+1 |Fn ) = Xn E( 2π exp(ωn+1 /2 − ωn+1 )I(0,∞) (ωn+1 )) = Xn .

Example 6 (Likelihood ratio – general case). Let (Ω, F, P ) be a probability space. Let
{Yn }∞
n=1 be a sequence of random variables and Fn = σ(Y1 , . . . , Yn ). Suppose that, for each
n, µY1 ,...,Yn has a strictly positive density pn with respect to Lebesgue measure λn . Let Q be
another probability on (Ω, F) such that Q((Y1 , . . . , Yn )−1 (·)) has a density qn with respect to
λn for each n. Define
qn (Y1 , . . . , Yn )
Xn = .
pn (Y1 , . . . , Yn )
It is easy to check that {(Xn , Fn )}∞
n=1 is a martingale.

Example 7 (Convex transformation). Let {(Xn , Fn )}∞ n=1 be a martingale. Let φ : IR →


IR be a convex function such that E[φ(Xn )] is finite for all n. Define Yn = φ(Xn ). Then
{(Yn , Fn )}∞
n=1 is a submartingale according to the conditional version of Jensen’s inequality.

Example 8 (Lévy martingale). Let {Fn }∞ n=1 be a filtration and let X be a random variable
with finite mean. Define Xn = E(X|Fn ). By the law of total probability we have a martingale.
Such a martingale is sometimes called a Lévy martingale.

Example 9 (Gambler’s ruin). Consider Example 3 again. Think of Yn as being the


amount that a gambler wins per unit of currency bet on the nth play in a sequence of games.
Let Y0 denote the gambler’s initial fortune which we can assume is a known value, and let

2
F0 be the trivial σ-field. (We could let Y0 be a random variable and let F0 = σ(Y0 ), but
then we would also have to expand Fn to σ(Y0 , . . . , Yn ).) Suppose that the gambler devises
a system for determining how much Wn ≥ 0 to bet on the nth play. We assume that Wn
is Fn−1 measurable for each n. This forces the gambler to choose the amount to bet before
knowing what will happen. Now, define Zn = Y0 + nj=1 Wj Yj . Since
P

E(Wn+1 Yn+1 |Fn ) = Wn+1 E(Yn+1 ),

and Wn+1 ≥ 0, we have that E(Wn+1 Yn+1 |Fn ) is ≥, =, or ≤ than Wn Yn depending on whether
E(Yn+1 ) is ≥, =, or ≤ than 0, respectively. That is, {(Zn , Fn )}∞
n=1 is a submartingale, a

martingale, or a supermartingale according as {(Yn , Fn )}n=1 is a submartingale, a martin-
gale, or a supermartingale. This result is often described by saying that gambling systems
cannot change whether a game is favorable, fair, or unfavorable to a gambler.

Definition 10 (Previsibility). A sequence {Wn }∞ n=1 of random variables such that Wn


1
is Fn−1 /B -measurable is called previsible . (If there is no F0 , then require that W1 be
constant.)

Example 11 (Martingale transform). Let {(Xn , Fn )}∞ ∞


n=1 be a martingale, and let {Wn }n=1
be previsible. Define Z1 = X1 and Zn+1 = Zn + Wn+1 (Xn+1 − Xn ) for n ≥ 1. Then Zn is
Fn /B 1 -measurable and

E(Zn+1 |Fn ) = Zn + Wn+1 E(Xn+1 − Xn |Fn ) = Zn ,

for all n ≥ 1. This makes {(Zn , Fn )}∞


n=1 a martingale. This is called a martingale transform.
Example 9 is an example of this.

Theorem 12 (Doob decomposition). {(Xn , Fn )}∞ n=1 is a submartingale if and only if


there is a martingale {(Zn , Fn )}n=1 and a nondecreasing previsible process {An }∞

n=1 with
A1 = 0 such that Xn = Zn + An for all n. The decomposition is unique (a.s.).

Proof: For the “if” direction, notice that

E(Zn+1 + An+1 |Fn ) = Zn + An+1 ≥ Zn + An = Xn .

For the “only if” direction, Define A1 = 0 and


n
X
An = (E(Xk |Fk−1 ) − Xk−1 ) ,
k=2

for n > 1. Also, define Zn = Xn − An . Because E(Xk |Fk−1 ) ≥ Xk−1 for all k > 1, we
have An ≥ An−1 for all k > 1, so {An }∞ 1
n=1 is nondecreasing. Also, E(Xk |Fk−1 ) is Fn−1 /B -

3
measurable for all 1 < k ≤ n, so {An }∞
n=1 is previsible. Finally, notice that

E(Zn+1 |Fn ) = E(Xn+1 |Fn ) − An+1


n+1
X
= E(Xn+1 |Fn ) − [E(Xk |Fk−1 ) − Xk−1 ]
k=2
n
X
= Xn − [E(Xk |Fk−1 ) − Xk−1 ] = Zn ,
k=2

so Zn is a martingale.
For uniqueness, suppose that Xn = Yn + Wn is another decomposition so that Yn is a
martingale and Wn is previsible. Then write
n
X n
X
[E(Xk |Fk−1 ) − Xk−1 ] = [E(Yk + Wk |Fk−1 ) − Xk−1 ]
k=2 k=2
Xn
= (Yk−1 + Wk − Xk−1 )
k=2
n
X
= (Wk − Wk−1 ) = Wn .
k=2

It follows that Wk = Ak a.s., hence Yk = Zk a.s.

The previsible process in Theorem 12 is called the compensator for the submartingale.

2 Stopping Times
Let (Ω, F, P ) be a probability space, and let {Fn }∞
n=1 be a filtration.

Definition 13 (Stopping times). A positive1 extended integer valued random variable τ


is called a stopping time with respect to the filtration if {τ = n} ∈ Fn for all finite n. A
special σ-field, Fτ is defined by

Fτ = {A ∈ F : A ∩ {τ ≤ k} ∈ Fk , for all finite k} .

If {Xn }∞
n=1 is adapted to the filtration and if τ < ∞ a.s., then Xτ is defined as Xτ (ω) (ω).
(Define Xτ equal to some arbitrary random variable X∞ for τ = ∞.)
1
If your filtration starts at n = 0, you can allow stopping times to be nonnegative valued. Indeed, if your
filtration starts at an arbitrary integer k, then a stopping time can take any value from k on up. There is a
trivial extension of every filtration to one lower subscript. For example, if we start at n = 1, we can extend
to n = 0 by defining F0 = {Ω, ∅}. Every martingale can also be extended by defining X0 = E(X1 ). For the
rest of the course, we will assume that the lowest possible value for a stopping time is 1.

4
Example 14. Let {Xn }∞ n=1 be adapted to the filtration and let τ = k0 , a constant. Then
{τ = n} is either Ω or ∅ and it is in every Fn , so τ is a stopping time. Also,

A if k0 ≤ k,
A ∩ {τ ≤ k} =
∅ if k0 > k.
So A ∩ {τ ≤ k} ∈ Fk for all finite k if and only if A ∈ Fk0 . So Fτ = Fk0 .
Example 15 (First passage). Let {Xn }∞ n=1 be adapted to the filtration. Let B be a Borel
set and let τ = inf{n : Xn ∈ B}. As usual, inf ∅ = ∞. For each finite n,
\
{τ = n} = {Xn ∈ B} {Xk ∈ B C } ∈ Fn .
k<n

So, τ is a stopping time.


We can show that τ and Xτ are both Fτ measurable. For example, to show that Xτ is
Fτ measurable, we must show that, for all B ∈ B 1 Xτ−1 (B) ∈ F and for all 1 ≤ k < ∞,
{τ ≤ k} ∩ Xτ−1 (B) ∈ Fk . Now,

[
Xτ−1 (B) {τ = k} ∩ [Xk−1 (B)] ∪ {τ = ∞} ∩ X∞
−1
 
= (B) ∈ F.
k=1

This shows that Xτ is F-measurable. Next, fix k and write


k
[
{τ ≤ k} ∩ Xτ−1 (B) = Xj−1 (B) ∩ {τ = j} ∈ Fk .
 
j=1

This proves that Xτ is Fτ measurable. Suppose that τ1 and τ2 are two stopping times such
that τ1 ≤ τ2 . Let A ∈ Fτ1 . Since A ∩ {τ2 ≤ k} = A ∩ {τ1 ≤ k} ∩ {τ2 ≤ k} for every event A,
it follows that A ∩ {τ2 ≤ k} ∈ Fk and A ∈ Fτ2 . Hence Fτ1 ⊆ Fτ2 . As an example, let τ be
an arbitrary stopping time (not necessarily finite a.s.) and define τk = min{k, τ } for finite
k. Then τk is a finite stopping time with τk ≤ τ . Hence Xτk is Fτk measurable for each k
and so Xτk is Fτ measurable. Similarly, τk ≤ k so that Fτk ⊆ Fk and Xτk is Fk measurable.
Example 16 (Gambler’s ruin). The gambler in Example 9 can try to build a stopping
time into a gambling system. For example, let τ = min{n : Zn ≥ Y0 + x} for some integer
x > 0. This would seem to guarantee winning at least x. There are two possible drawbacks.
One is that there may be positive probability that τ = ∞. Even if τ < ∞ a.s., it might require
unlimited resources to guarantee that we can survive until τ . For example, let Y0 = 0 and
let Yn have equal probability of being 1 or −1 all n. So, we stop as soon as we have won x
more than we have lost. If we modify the problem so that we have only finite resources (say
k units) then this becomes the classic gambler’s ruin problem. The probability of achieving
Zn = x before Zn = −k is k/(k + x), which goes to 1 as k → ∞. So, if we have unlimited
resources, the probability is 1 that τ < ∞, otherwise, we may never achieve the goal. If the
probability of winning on each game is less than 1/2, then P (τ = ∞) > 0.

5
Suppose that we start with a martingale {(Xn , Fn )}∞
n=1 and a stopping time τ . We can
define 
∗ Xn if n ≤ τ ,
Xn = = Xmin{τ,n} .
Xτ if n > τ
We can call this the stopped martingale. It turns out that {Xn∗ }∞ n=1 is also a martingale
relative to the filtration. First, note that Xmin{τ,n} is Fn measurable. Next, notice that
n−1 Z
X Z
E(|Xn∗ |) = |Xk |dP + |Xn |dP
k=1 {τ =k} {τ ≥n}

Xn
≤ E(|Xk |) < ∞.
k=1

Finally, let A ∈ Fn . Then A ∩ {τ > n} ∈ Fn , so


Z Z
Xn+1 dP = Xn dP,
A∩{τ >n} A∩{τ >n}

because Xn = E(Xn+1 |Fn ). It now follows that


Z Z Z

Xn+1 dP = Xn+1 dP + Xτ dP
A A∩{τ >n} A∩{τ ≤n}
Z Z
= Xn dP + Xτ dP
A∩{τ >n} A∩{τ ≤n}
Z
= Xn∗ dP.
A

It follows that Xn∗ = E(Xn+1 |Fn ) and the stopped martingale is also a martingale. Notice

that limn→∞ Xn = Xτ a.s., if τ < ∞ a.s.
Because we can use a constant stopping time to stop a martingale, it follows that martingale
theorems will apply to finite sequences of random variables as well as infinite sequences.
Example 17. Consider the stopping time in Example 16 with x = 1. That is τ is the first
time that a gambler, betting on iid fair coin flips, wins 1 more than he/she has lost. This
τ < ∞ a.s. It follows that limn→∞ Xn∗ = Xτ a.s. However, E(Xn∗ ) = 0 for all n while
E(Xτ ) = 1 because Xτ = 1 a.s.

3 Optional Sampling
Let {(Xn , Fn )}n=1 be a martingale. Consider a sequence of a.s. finite stopping times {τn }∞
n=1
such that 1 ≤ τj ≤ τj+1 for all j. Then we can construct {(Xτn , Fτn )}∞n=1 and ask whether or
not it is a martingale. In general, an unpleasant integrability condition is needed to prove
this. We shall do a simplified case.

6
Theorem 18 (Optional sampling theorem). Let {(Xn , Fn )}∞ n=1 be a (sub)martingale.
Suppose that for each n, there is a finite constant Mn such that τn ≤ Mn a.s. Then
{(Xτn , Fτn )}∞
n=1 is a (sub)martingale.

The unpleasant integrability condition that can replace P (τn ≤ Mn ) = 1 is the following:
For every n,

• P (τn < ∞) = 1,
• E(|Xτn |) < ∞, and
• lim inf m→∞ E(|Xm |I(m,∞) (τn )) = 0.

Proof: [Theorem 18] Without loss of generality, assume that Mn ≤ Mn+1 for every n.
Since τn ≤ Mn ,
Mn Z
X XMn
E(|Xτn |) = |Xk |dP ≤ E(|Xk |) < ∞.
k=1 {τn =k} k=1

R already knowR that Xτn is Fτn measurable. Let A ∈ Fτn . We need to show that
We
A
Xτn+1 dP (≥) = A Xτn dP . Write
Z Z
[Xτn+1 − Xτn ]dP = [Xτn+1 − Xτn ]dP.
A A∩{τn+1 >τn }

Next, for each ω ∈ {τn+1 > τn }, write


X
Xτn+1 (ω) − Xτn (ω) = [Xk (ω) − Xk−1 (ω)].
All k such that τn (ω) < k ≤ τn+1 (ω)

The smallest k such that τn < k is k = 2, So,


Z Z MX
n+1

[Xτn+1 − Xτn ]dP = I{τn <k≤τn+1 } (Xk − Xk−1 )dP.


A A k=2

Since A ∈ Fτn and {τn < k ≤ τn+1 } = {τn ≤ k − 1} ∩ {τn+1 ≤ k − 1}C , it follows that
Bk = A ∩ {τn < k ≤ τn+1 } ∈ Fk−1 ,
for each k. So
Z Mn+1
XZ
[Xτn+1 − Xτn ]dP = (Xk − Xk−1 )dP
A k=2 Bk
Mn+1
XZ
(≥) = [Xk − E(Xk |Fk−1 )]dP = 0.
k=2 Bk

because Xk−1 (≤) = E(Xk |Fk−1 ) and Bk ∈ Fk−1 .

7
4 Martingale Convergence
The upcrossing lemma says that a submartingale cannot cross a fixed nondegenerate interval
very often with high probability. If the submartingale were to cross an interval infinitely
often, then its lim sup and lim inf would have to be different and it couldn’t converge.

Lemma 19 (Upcrossing lemma). Let {(Xk , Fk )}nk=1 be a submartingale. Let r < q, and
define V to be the number of times that the sequence X1 , . . . , Xn crosses from below r to
above q. Then
1
E(V ) ≤ (E|Xn | + |r|) . (2)
q−r

We will only give an outline of the proof of Lemma 19. Let Yk = max{0, Xk − r}. Then
V is the number of times that Yk moves from 0 to above q − r, and {(Yk , Fk )}∞ k=1 is a
submartingale. It is easy to see that V is at most the sum of the upcrossing increments
divided by q − r. That is,
n
1 X
V ≤ (Yk − Yk−1 )IEk ,
q − r k=2
where Ek is the event that the path is crossing up at time k. Notice that Ek ∈ Fk−1 for all
k. Hence, for each k ≥ 2,
Z Z
E([Yk − Yk−1 ]IEk ) = (Yk − Yk−1 )dP = [E(Yk |Fk−1 ) − Yk−1 ]dP.
Ek Ek

Because E(Yk |Fk−1 ) − Yk−1 ≥ 0 a.s. by the submartingale property, we can expand the
integral from Ek to all of Ω to get
Z
E([Yk − Yk−1 ]IEk ) ≤ [E(Yk |Fk−1 ) − Yk−1 ]dP = E(Yk − Yk−1 ).

It follows that (q − r)E(V ) ≤ E(Yn ) − E(Y1 ) ≤ E(Yn ) because Y1 ≥ 0. Because max{0, x} is


a convex function of x, E(Yn ) ≤ E(|Xn |) + r. The full proof is in the Appendix.

Theorem 20 (Martingale convergence theorem). Let {(Xn , Fn )}∞ n=1 be a submartingale


such that supn E|Xn | < ∞. Then X = limn→∞ Xn exists a.s. and E|X| < ∞.

Proof: Let X ∗ = lim supn→∞ Xn and X∗ = lim inf n→∞ Xn . Let B = {ω : X∗ (ω) < X ∗ (ω)}.
We will prove that P (B) = 0. We can write
[
B= {ω : X ∗ (ω) > q > r > X∗ (ω)}.
r < q, r, q rational

Now, X ∗ (ω) > q > r ≥ X∗ (ω) if and only if the values of Xn (ω) cross from being below r to
being above q infinitely often. For fixed r and q, we now prove that this has probability 0;

8
hence P (B) = 0. Let Vn equal the number of times that X1 , . . . , Xn cross from below r to
above q. According to Lemma 19,
 
1
sup E(Vn ) ≤ sup E(|Xn |) + |r| < ∞.
n q−r n

The number of times the values of {Xn (ω)}∞ n=1 cross from below r to above q equals
limn→∞ Vn (ω). By the monotone convergence theorem,

∞ > lim E(Vn ) = E( lim Vn ).


n n→∞

It follows that P ({ω : limn→∞ Vn (ω) = ∞}) = 0.


Since P (B) = 0, we have that X = limn→∞ Xn exists a.s. Fatou’s lemma says E(|X|) ≤
lim inf n→∞ E(|Xn |) ≤ supn E(|Xn |) < ∞.

Example 21 (Random walk). For the random walk martingale of Example 3, if the Yn ’s

are iid with finite variance σ 2 , then Xn / n converges in distribution so Xn can’t converge
a.s. To check how the condition of Theorem 20 is violated, the Markov inequality says that
E(|Xn |) √ h  c i
√ ≥ P (|Xn | > c n) → 2 1 − Φ ,
nc σ

for each positive c. So, eventually E(|Xn |) ≥ c n[1 − Φ(c/σ)] and limn→∞ E(|Xn |) = ∞.
P∞
However, if n=1 Var(Yn ) < ∞, then the condition of Theorem 20 holds. Indeed, the Basic
L2 Convergence Theorem already told us that the sum converges a.s.

Example 22 (Lévy martingale). For the Lévy martingale of Example 8,

E(|Xn |) = E(|E[X|Fn ]|) ≤ EE(|X||Fn ) = E(|X|) < ∞,

for all n, so the martingale converges. In Theorem 24, we can say even more about the limit.

We need the following result of uniform integrability of Lévy martingales before we can
identify the limit of a Lévy martingale. Recall that uniform integrability allows the exchange
of limit and integral under finite measure (Theorem 20 of Lecture Notes Set 3).

Lemma 23. Let {Fn }∞ n=1 be a sequence of σ-fields. Let E(|X|) < ∞. Define Xn = E(X|Fn ).

Then {Xn }n=1 is a uniformly integrable sequence.

Proof: Since E(X|Fn ) = E(X + |Fn ) − E(X − |Fn ), and the sum of uniformly integrable
sequences is uniformly
R integrable, we willR prove the result for nonnegative X. Let Ac,n =
{Xn ≥ c} ∈ Fn . So Ac,n Xn (ω)dP (ω) = Ac,n X(ω)dP (ω). If we can find, for every  > 0,
R
a C such that Ac,n X(ω)dP (ω) <  for all n and all c ≥ C, we are done. This is achieved
using absolute continuity and the detail is a homework problem.

9
Theorem 24 (Lévy’s theorem). Let {Fn }∞ n=1 be an increasing sequence of σ-fields. Let
F∞ be the smallest σ-field containing all of the Fn ’s. Let E(|X|) < ∞. Define Xn = E(X|Fn )
and X∞ = E(X|F∞ ). Then limn→∞ Xn = X∞ , a.s.

Proof: By Lemma 23, {Xn }∞ n=1 is a uniformly integrable sequence. Let Y be the limit of
the martingale guaranteed by Theorem 20. Since Y is a limit of functions of the Xn , it is
measurable with respect to F∞ . It follows from uniform integrability that for every event A,
limn→∞ E(Xn IA ) = E(Y IA ). Next, note that, for every m and A ∈ Fm ,
Z Z
Y dP = lim E(X|Fn )dP
A n→∞ A
Z
= lim Xn dP
n→∞ A
Z
= XdP,
A
R R
where the last equality follows from
R the factRthat A ∈ F n for all n ≥ m so A
X n dP = A
XdP
because Xn = E(X|FS n ). Since A
Y dP = A
XdP for all A ∈ Fm for all m, it holds for all

A in the field F = n=1 Fn . Since |X| is integrable and F is a field, we can conclude
that the equality holds for all A ∈ F∞ , the smallest σ-field containing F. The equality
E(XIA ) = E(Y IA ) for all A ∈ F∞ together with the fact that Y is F∞ measurable is
precisely what it means to say that Y = E(X|F∞ ) = X∞ .

Obviously there is an analogous result for super martingales.


Lemma 25 (Nonnegative supermartingale). Let {(Xn , Fn )}∞ n=1 be a nonnegative super-
martingale. Then Xn converges a.s. to a random variable with finite mean.

Proof: Let Yn = −Xn . Then {(Yn , Fn )}∞


n=1 is a submartingale.

E(|Yn |) = E(Xn ) = E[E(Xn |Fn−1 )] ≤ E(Xn−1 ).


It follows that E(|Yn |) ≤ E(X1 ) < ∞ for all n, so Theorem 20 applies and Yn converges a.s.
Trivially −Yn = Xn also converges.

In Section 6 we shall see an important example where F∞ 6= F. Before this, we shall first
introduce reversed martingales.

5 Reversed Martingales
Definition 26 (Reversed Martingales). For n = −1, −2, . . ., let sub-σ-field’s Fn−1 ⊆
Fn , suppose that Xn is Fn measurable, E(|Xn |) < ∞, and E(Xn |Fn−1 ) = Xn−1 . Then
{(Xn , Fn )}−∞
n=−1 is a reversed martingale.

10
An equivalent way to think about reversed martingales is through a decreasing sequence of
σ-field’s {Fn }∞
n=1 such that Fn+1 ⊆ Fn for n ≥ 1. The proofs of the next two theorems are
similar to the corresponding theorems for forward martingales.
Theorem 27 (Reversed martingale convergence theorem). If
{(Xn , Fn )}−∞
n=−1 is a reversed martingale, then X = limn→−∞ Xn exists a.s. and E(X) =
E(X−1 ).

Proof: Just as in the proof of Theorem 20, we let Vn be the number of times that the finite
sequence Xn , Xn+1 , . . . , X−1 crosses from below a rational r to above another rational q (for
n < 0). Lemma 19 says that
1
E(Vn ) ≤ (E(|X−1 |) + |r|) < ∞.
q−r
As in the proof of Theorem 20, it follows that X = limn→−∞ Xn exists with probability 1.
Since Xn = E(X−1 |Fn ) for each n < −1, Lemma 23 says that
E(X) = lim E(Xn ) = E(X−1 ). 
n→−∞

Notice that reversed martingales are all of the Lévy type. Not surprisingly, there is a version
of Lévy’s theorem 24 for reversed martingales. We state it in terms of decreasing σ-field’s.
Theorem 28 (Lévy Theorem for reversed martingales). Let {Fn }∞ n=1 be a decreasing
T∞
sequence of σ-fields. Let F∞ = n=1 Fn . Let E(|X|) < ∞. Define Xn = E(X|Fn ) and
X∞ = E(X|F∞ ). Then limn→∞ Xn = X∞ a.s.

Proof: It is easy to see that {(X−n , F−n )}−∞


n=−1 is a reversed martingale and that E(|X1 |) <
∞. By Theorem 27, it follows that limn→−∞ X−n = Y exists and is finite a.s. To prove
that Y = X∞ a.s., note that X∞ = E(X1 |F∞ ) since F∞ ⊆ F1 . So, we must show that
Y = E(X1 |F∞ ). Let A ∈ F∞ . Then
Z Z
Xn (ω)dP (ω) = X1 (ω)dP (ω),
A A

since A ∈ Fn and Xn = E(X1 |Fn ). Once again, using Lemma 23, it follows that
Z Z Z
lim Xn (ω)dP (ω) = Y (ω)dP (ω) = X1 (ω)dP (ω),
n→∞ A A A

hence Y = E(X1 |F∞ ).

Theorem 28 allows us to prove a strong law of large numbers that is even more general than
the usual version. The greater generality comes from the fact that it applies to sequences
that are not necessarily independent.

11
6 Exchangeability and de Finetti Theorem
A sequence of random quantities {Xn }∞ n=1 is exchangeable if, for every n and all distinct
j1 , . . . , jn , the joint distribution of (Xj1 , . . . , Xjn ) is the same as the joint distribution of
(X1 , . . . , Xn ).

Example 29 (Conditionally iid random quantities). Let {Xn }∞ n=1 be conditionally iid

given a σ-field C. Then {Xn }n=1 is an exchangeable sequence. The result follows easily from
the fact that
µXj1 ,...,Xjn |C = µX1 ,...,Xn |C , a.s.

Example 30. Let {Xn }∞


n=1 be Bernoulli random variables such that

1
P (X1 = x1 , . . . , Xn = xn ) = ,
(n + 1) ny

where y = nj=1 xj . One can show that this specifies consistent joint distributions. One can
P

also check that the Xn ’s are not independent.


1
P (X1 = 1) =,
2
 2
1 1
P (X1 = 1, X2 = 1) = 6= .
3 2

Theorem 31 (Strong law of large numbers). Let {Xn }∞ be an exchangeable sequence


1
Pnn=1
of random variables with finite mean. Then limn→∞ n j=1 Xj exists a.s. and has mean
equal to E(X1 ). If, the Xj ’s are independent, then the limit equals E(X1 ) a.s.

Proof: Define Yn = n1 nj=1 Xj and let Fn be the σ-field generated by all function of
P
(X1 , X2 , . . .) that are invariant under permutations of the first n coordinates. (For example,
Yn is such a function.) Let T Zn = E(X1 |Fn ). Theorem 28 says that Zn converges a.s. to
E(X1 |F∞ ), where F∞ = ∞ n=1 Fn . We prove next that Zn = Yn , a.s. Since Yn is Fn
measurable, we need only prove that, for all A ∈ Fn , E(IA Yn ) = E(IA X1 ). Notice that IA
can be written as a function h(X1 , X2 , ...), a function of X1 , X2 , . . . that is invariant under
permutations of X1 , . . . , Xn . By exchangeability, for all j = 1, . . . , n, X1 h(X1 , X2 , . . .) has
the same distribution as Xj h(Xj , X2 , . . . , Xj−1 , X1 , Xj+1 , . . .). But

h(Xj , X2 , . . . , Xj−1 , X1 , Xj+1 , . . .) = h(X1 , X2 , . . .) = IA ,

by permutation invariance of h. Hence, for all j = 1, . . . , n, IA Xj has the same distribution


as IA X1 . It follows that
n
1X
E(IA X1 ) = E(IA Xj ) = E(IA Yn ).
n j=1

12
Clearly E(X1 |F∞ ) has mean E(X1 ). If the Xn ’s are independent, then the limit, being
measurable with respect to the tail σ-field, must be constant a.s., by Kolmogorov 0-1 law.
The constant must equal the mean of the random variable, which is E(X1 ).

Example 32. In Example 30, we know that Yn converges a.s., hence it converges in dis-
tribution. We can compute the distribution of Yn exactly: P (Yn = k/n) = 1/(n + 1) for
k = 0, . . . , n. Hence, Yn converges in distribution to uniform on the interval [0, 1], which
must be the distribution of the limit. The limit is not a.s. constant.

There is a very useful theorem due to deFinetti about exchangeable random quantities that
relies upon the strong law of large numbers. To state the theorem, we need to recall the
concept of “random probability measure” that was introduced in Example 40 of Lecture
Notes Set 4. Let (X , B) be a Borel space, and let P be the set of all probability measures on
(X , B). We can think of P as a subset of the function space [0, 1]B , hence it has a product
σ-field. Recall that the product σ-field is the smallest σ-field such that for all B ∈ B, the
function fB : P → [0, 1] defined by fB (Q) = Q(B) is measurable. These are the coordinate
projection functions.
Example 33 (Empirical probability measure). Let X1 , . . . , Xn be random quantities
taking values in X . For each B ∈ B, define Pn (ω)(B) = n1 nj=1 IB (Xj (ω)). For each
P

ω, Pn (ω)(·) is clearly a probability measure, so Pn : Ω → P is a function that we could


prove was measurable, but that proof will not be given here. Theorem 31 says that Pn (ω)(B)
converges to E(IB (X1 )|F∞ )(ω) for all B and almost all ω. If we assume that the Xn ’s take
values in a Borel space, then E(IB (X1 )|F∞ ) = Pr(X1 ∈ B|F∞ ) is part of an rcd. This rcd
plays an important roll in deFinetti’s theorem.

DeFinetti’s theorem says that a sequence of random quantities is exchangeable if and only
if it is conditionally iid given a random probability measure, and that random probability
measure is the limit of the empirical probability measures of X1 , . . . , Xn . That is, Example 29
is essentially the only example of exchangeable sequences. A simple proof can be found in
Kingman (1978, Annals of Probability, Vol. 6, 183–197). A photocopy of the pages are
attached at the end of this note.
Theorem 34 (DeFinetti’s theorem). A sequence {Xn }∞ n=1 of random quantities is ex-
changeable if and only if Pn (the empirical probability measure of X1 , . . . , Xn ) converges a.s.
to a random probability measure P and the Xn ’s are conditionally iid with distribution Q
given P = Q.
Example 35. In Example 30, the empirical probability measure is equivalent to Yn =
Pn
k=1 Xk /n, since Yn is one minus the proportion of the observations less than or equal
to 0. So P is equivalent to the limit of Yn , the limit of relative frequency of 1’s in the se-
quence. Conditional on the limit of relative frequency of 1’s being x, the Xk ’s are iid with
Bernoulli distribution with parameter x.

13
A Upcrossing lemma
Proof: [Upcrossing lemma] Let Yk = max{0, Xk − r} for every k so that {(Yk , Fk )}nk=1 is
a submartingale. Note that a consecutive set of Xk (ω) cross from below r to above q if and
only if the corresponding consecutive set of Yk (ω) cross from 0 to above q − r. Let T0 (ω) = 0
and define Tm for m = 1, 2, . . . as

Tm (ω) = inf{k ≤ n : k > Tm−1 (ω), Yk (ω) = 0}, if m is odd,


Tm (ω) = inf{k ≤ n : k > Tm−1 (ω), Yk (ω) ≥ q − r}, if m is even,
Tm (ω) = n + 1, if the corresponding set above is empty.

Now V (ω) is one-half of the largest even m such that Tm (ω) ≤ n. Define, for k = 1, . . . , n,

1 if Tm (ω) < k ≤ Tm+1 (ω) for m odd,
Rk (ω) =
0 otherwise.

Then (q − r)V (ω) ≤ nk=1 Rk (ω)[Yk (ω) − Yk−1 (ω)] = X̂, where Y0 ≡ 0 for convenience. First,
P
note that for all m and k, {Tm (ω) ≤ k} ∈ Fk . Next, note that for every k,
[
{Tm ≤ k − 1} ∩ {Tm+1 ≤ k − 1}C ∈ Fk−1 .

{ω : Rk (ω) = 1} = (3)
m odd

n Z
X
E(X̂) = [Yk (ω) − Yk−1 (ω)]dP (ω)
k=1 {ω:Rk (ω)=1}
n Z
X
= [E(Yk |Fk−1 )(ω) − Yk−1 (ω)]dP (ω)
k=1 {ω:Rk (ω)=1}
Xn Z
≤ [E(Yk |Fk−1 )(ω) − Yk−1 (ω)]dP (ω)
k=1
n
X
= [E(Yk ) − E(Yk−1 )] = E(Yn ),
k=1

where the second equality follows from Equation (3) and the inequality follows from the
fact that {(Yk , Fk )}nk=1 is a submartingale. It follows that (q − r)E(V ) ≤ E(Yn ). Since
E(Yn ) ≤ |r| + E(|Xn |), it follows that Equation (2) holds.

14

You might also like