0% found this document useful (0 votes)
5 views11 pages

Gradprob Notes3

The document discusses various modes of convergence for sequences of random variables (RVs) in probability theory, including convergence in probability, almost sure convergence, and convergence in Lp. It introduces essential inequalities such as Markov's and Chebyshev's inequalities, and the Borel-Cantelli lemmas, which are used to analyze these modes of convergence. Additionally, it presents the strong and weak laws of large numbers and provides examples to illustrate these concepts.

Uploaded by

harish140707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Gradprob Notes3

The document discusses various modes of convergence for sequences of random variables (RVs) in probability theory, including convergence in probability, almost sure convergence, and convergence in Lp. It introduces essential inequalities such as Markov's and Chebyshev's inequalities, and the Borel-Cantelli lemmas, which are used to analyze these modes of convergence. Additionally, it presents the strong and weak laws of large numbers and provides examples to illustrate these concepts.

Uploaded by

harish140707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Notes 3 : Modes of convergence

Math 733-734: Theory of Probability Lecturer: Sebastien Roch

References: [Wil91, Chapters 2.6-2.8], [Dur10, Sections 2.2, 2.3].

1 Modes of convergence
Let (Ω, F, P) be a probability space. We will encounter various modes of conver-
gence for sequences of RVs on (Ω, F, P).
DEF 3.1 (Modes of convergence) Let {Xn }n be a sequence of (not necessarily
independent) RVs and let X be a RV. Then we have the following definitions.
• Convergence in probability: ∀ε > 0, P[|Xn −X| > ε] → 0 (as n → +∞);
which we denote by Xn →P X.

• Convergence almost sure: P[Xn → X] = 1.

• Convergence in Lp (p ≥ 1): E|Xn − X|p → 0.


To better understand the relationship between these different modes of conver-
gence, we will need Markov’s inequality as well as the Borel-Cantelli lemmas.
We first state these, then come back to applications of independent interest below.

1.1 Markov’s inequality


LEM 3.2 (Markov’s inequality) Let Z ≥ 0 be a RV on (Ω, F, P). Then for all
a>0
E[Z]
P[Z ≥ a] ≤ .
a
Proof: We have

E[Z] ≥ E[Z 1{Z≥a} ] ≥ aE[1{Z≥a} ] = aP[Z ≥ a],

where note that the first inequality uses nonnegativity.


Recall that (assuming the first and second moments exist):

Var[X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2 .

1
Lecture 3: Modes of convergence 2

LEM 3.3 (Chebyshev’s inequality) Let X be a RV on (Ω, F, P) with Var[X] <


+∞. Then for all a > 0

Var[X]
P[|X − E[X]| > a] ≤ .
a2
Proof: Apply Markov’s inequality to Z = (X − E[X])2 .
An immediate application of Chebyshev’s inequality is the following.

THM 3.4 Let (Sn )n be a sequence of RVs with µn = E[Sn ] and σn2 = Var[Sn ]. If
σn2 /b2n → 0, then
Sn − µn
→P 0.
bn

1.2 Borel-Cantelli lemmas


DEF 3.5 (Almost surely) Event A occurs almost surely (a.s.) if P[A] = 1.

DEF 3.6 (Infinitely often, eventually) Let (An )n be a sequence of events. Then
we define

An infinitely often (i.o.) ≡ {ω : ω is in infinitely many An }


≡ lim sup An
n
\ +∞
[
≡ An .
m n=m

Note that
1An i.o. = lim sup 1An .
n

Similarly,

[ +∞
\
An eventually (ev.) ≡ {ω : ω is in An for all large n} ≡ lim inf An ≡ An .
n
m n=m

Note that
1An ev. = limninf 1An .
Also we have (An ev.)c = (Acn i.o.).
Lecture 3: Modes of convergence 3

LEM 3.7 (First Borel-Cantelli lemma (BC1)) Let (An )n be as above. If


X
P[An ] < +∞,
n

then
P[An i.o.] = 0.

Proof: This follows trivially


Pfrom the monotone-convergence theorem (or Fubini’s
theorem). Indeed let N = n 1An . Then
X
E[N ] = P[An ] < +∞,
n

and therefore N < +∞ a.s.

EX 3.8 Let X1 , X2 , . . . be independent with P[Xn = fn ] = pn and P[Xn =


0] = 1 − pn for nondecreasing fn > 0 and nonincreasing pn > 0. By (BC1), if
P
n pn < +∞ then Xn → 0 a.s.

The converse is only true in general for IID sequences.

P Borel-Cantelli lemma (BC2)) If the events (An )n are inde-


LEM 3.9 (Second
pendent, then n P[An ] = +∞ implies P[An i.o.] = 1.

Proof: Take M < N < +∞. Then by independence


N
Y
P[∩N c
n=M An ] = (1 − P[An ])
n=M
N
!
X
≤ exp − P[An ]
n=M
→ 0,

as N → +∞. So P[∪+∞
n=M An ] = 1 and further

P ∩M ∪+∞
 
n=M A n = 1,

by monotonicity.

EX 3.10 Let X1 , X2 , . . . be independent with P[Xn = fn ] = pn and P[Xn = 0] =


1 − pn for nondecreasing fP n > 0 and nonincreasing pn > 0. By (BC1) and (BC2),
Xn → 0 a.s. if and only if n pn < +∞.
Lecture 3: Modes of convergence 4

1.3 Returning to convergence modes


We return to our example.
EX 3.11 Let X1 , X2 , . . . be independent with P[Xn = fn ] = pn and P[Xn =
0] = 1 − pn for nondecreasing fn > 0 and nonincreasing pn > 0. The cases

fn = 1, fn = n, and fn = n2 are interesting. In the first one, convergence in
probability (which is equivalent to pn → 0) and in Lr (1 P
· pn → 0) are identical,
but a.s. convergence follows from a stronger condition ( n pn < +∞). In the
second one, convergence in L 1 (√np → 0) can happen without convergence
n
L2 (npn → 0). Take for instance pn = 1/n. In the
P
a.s. ( n pn < +∞) or inP
last one, convergence a.s. ( n pn < +∞) can happen without convergence in L1
(n2 pn → 0) or in L2 (n4 pn → 0). Take for instance pn = 1/n2 .
In general we have:
THM 3.12 (Implications) • a.s. =⇒ in prob (Hint: Fatou’s lemma)

• Lp =⇒ in prob (Hint: Markov’s inequality)

• for r ≥ p ≥ 1, Lr =⇒ Lp (Hint: Jensen’s inequality)

• in prob if and only if every subsequence contains a further subsequence that


convergence a.s. (Hint: (BC1) for =⇒ direction)
Proof: We prove the first, second and (one direction of the) fourth one. For the
first one, we need the following lemma.
LEM 3.13 (Reverse Fatou lemma) Let (S, Σ, µ) be a measure space. Let (fn )n ∈
(mΣ)+ such that there is g ∈ (mΣ)+ with fn ≤ g for all n and µ(g) < +∞. Then

µ(lim sup fn ) ≥ lim sup µ(fn ).


n n

(This follows from applying (FATOU) to g − fn .)


Using the previous lemma on 1{|Xn − X| > ε} gives the result.
For the second claim, note that by Markov’s inequality
E|Xn − X|p
P[|Xn − X| > ε] = P[|Xn − X|p > εp ] ≤ .
εp
One direction of the fourth claim follows from (BC1). Indeed let (Xn(m) )m be
a subsequence of (Xn )n . Take εk ↓ 0 and let mk be such that n(mk ) > n(mk−1 )
and
P[|Xn(mk ) − X| > εk ] ≤ 2−k ,
Lecture 3: Modes of convergence 5

which is summable. Therefore by (BC1), P[|Xn(mk ) − X| > εk i.o.] = 0, i.e.,


Xn(mk ) → X a.s. For the other direction, see [D].
As a consequence of the last implication we get the following.
THM 3.14 If f is continuous and Xn → X in prob then f (Xn ) → f (X) in
probability.
Proof: For every subsequence (Xn(m) )m there is a further subsequence (Xn(mk ) )k
which converges a.s. and hence f (Xn(mk ) ) → f (X) a.s. But this implies that
f (Xn ) → f (X) in probability.
Our example and theorem show that a.s. convergence does not come from a
topology (or in particular from a metric). In contrast, it is possible to show that
convergence in probability corresponds to the Ky Fan metric
α(X, Y ) = inf{ε ≥ 0 : P[|X − Y | > ε] ≤ ε}.
See [D].

1.4 Statement of laws of large numbers


Our first goal will be to prove the following.

PIID with E|X1 | <


THM 3.15 (Strong law of large numbers) Let X1 , X2 , . . . be
+∞. (In fact, pairwise independence suffices.) Let Sn = k≤n Xk and µ =
E[X1 ]. Then
Sn
→ µ, a.s.
n
If instead E|X1 | = +∞ then
 
Sn
P lim exists ∈ (−∞, +∞) = 0.
n n

and
P
THM 3.16 (Weak law of large numbers) Let (Xn )n be IID and Sn = k≤n Xk .
A necessary and sufficient condition for the existence of constants (µn )n such that
Sn
− µn →P 0,
n
is
n P[|X1 | > n] → 0.
In that case, the choice
µn = E[X1 1|X1 |≤n ],
works.
Lecture 3: Modes of convergence 6

Before we give the proofs of these theorems, we discuss further applications of


Markov’s inequality and the Borel-Cantelli lemmas.

2 Further applications...
2.1 ...of Chebyshev’s inequality
Chebyshev’s inequality and Theorem 3.4 can be used to derive limit laws in some
cases where sequences are not necessarily IID. We give several important examples
from [D].
EX 3.17 (Occupancy problem) Suppose we throw r balls into n bins indepen-
dently uniformly at random. Let Nn be the number of empty boxes. If Ai is the
event that the i-th bin is empty, we have
1 r
 
P[Ai ] = 1 − ,
n
so that Nn = k≤n 1Ak (not independent) and
P

1 r
 
E[Nn ] = n 1 − .
n
In particular, if r/n → ρ we have
E[Nn ]
→ e−ρ .
n
Because there is no independence, the variance calculation is trickier. Note that
 !2 
n
1Am  =
X X
E[Nn2 ] = E  P[Am ∩ Am0 ],
m=1 1≤m,m0 ≤n

and
Var[Nn ] = E[Nn2 ] − (E[Nn ])2
X
= [P[Am ∩ Am0 ] − P[Am ]P[Am0 ]]
1≤m,m0 ≤n

= n(n − 1)[(1 − 2/n)r − (1 − 1/n)2r ] + n[(1 − 1/n)r − (1 − 1/n)2r ]


= o(n2 ) + O(n),
where we divided the sum into cases m 6= m0 and m = m0 . Taking bn = n in
Theorem 3.4, we have
Nn
→P e−ρ .
n
Lecture 3: Modes of convergence 7

EX 3.18 (Coupon’s collector problem) Let X1 , X2 , . . . be IID uniform in [n] =


{1, . . . , n}. We are interested in the time it takes to see every element in [n] at least
once. Let
τkn = inf{m : |{X1 , . . . , Xm }| = k},
be the first time we collect k different items, with the convention τ0n = 0. Let Tn =
τnn . Define Xn,k = τkn − τk−1n and note that the Xn,k ’s are independent (but not
identically distributed) with geometric distribution with parameter 1 − (k − 1)/n.
Recall that a geometric RV N with parameter p has law

P[N = i] = p(1 − p)i−1 ,

and moments
1
E[N ] = ,
p
and  
1−p 1
Var[N ] = ≤ 2 .
p2 p

Hence
n  n
k − 1 −1

X X 1
E[Tn ] = 1− =n ∼ n log n,
n m
k=1 m=1

and
n  n
k − 1 −2

X X 1
Var[Tn ] ≤ 1− = n2 ≤ Cn2 ,
n m2
k=1 m=1

for some C > 0 not depending on n.


Taking bn = n log n in Theorem 3.4 gives

Tn − n nm=1 m−1
P
→P 0,
n log n
or
Tn
→P 1.
n log n

The previous example involved a so-called triangular array {Xn,k }n≥1,1≤k≤n .

EX 3.19 (Random permutations) Any permutation can be decomposed into cy-


cles. E.g., if π = [3, 9, 6, 8, 2, 1, 5, 4, 7], then π = (136)(2975)(48). In fact, a
uniform permutation can be generated by following a cycle until it closes, then
starting over from the smallest unassigned element, and so on. Let Xn,k be the
Lecture 3: Modes of convergence 8

indicator that the k-th element in this construction precedes the closure of a cycle.
E.g., we have X9,3 = X9,7 = X9,9 = 1. The construction above implies that the
Xn,k ’s are independent and

1
P[Xn,j = 1] = .
n−j+1

P is because only one of the remaining elements closes the cycle. Letting Sn =
That
k≤n Xn,k be the number of cycles in π we have

n
X 1
E[Sn ] = ∼ log n,
n−j+1
j=1

and
n
X n
X n
X
2
Var[Sn ] = Var[Xn,j ] ≤ E[Xn,j ]= E[Xn,j ] = E[Sn ].
j=1 j=1 j=1

Taking bn = log n in Theorem 3.4 we have


Sn
→P 1.
log n

2.2 ...of (BC1)


EX 3.20 (Head runs) Let (Xn )n∈Z be IID with P[Xn = 1] = P[Xn = −1] =
1/2. Let
`n = max{m ≥ 1 : Xn−m+1 = · · · = Xn = 1},
(with `n = 0 if Xn = −1) and

Ln = max `m .
1≤m≤n

Note that P[`n = k] = (1/2)k+1 for all n, k. (The +1 in the exponent is for the
first −1.) We will prove
Ln
→ 1, a.s.
log2 n
For the lower bound, it suffices to divide the sequence into disjoint blocks to
use independence. Take blocks of size [(1 − ε) log2 n] + 1 so that a block is all-1
with probability at least

2−[(1−ε) log2 n]−1 ≥ n−(1−ε) /2.


Lecture 3: Modes of convergence 9

For n large enough


 n/ log2 n  
−(1−ε)
P[Ln ≤ (1 − ε) log2 n] ≤ 1 − n /2 ≤ exp − ,
log2 n
which is summable. By (BC1),
Ln
lim inf ≥ 1 − ε, a.s.
n log2 n
The upper bound follows from (BC1). Indeed note that, for any ε > 0,
 k+1
X 1
P[`n ≥ (1 + ε) log2 n] = ≤ n−(1+ε) ,
2
k≥(1+ε) log2 n

so that
P[`n ≥ (1 + ε) log2 n i.o.] = 0,
Hence, there is Nε (random) such that `n ≤ (1 + ε) log2 n for all n ≥ Nε and note
that the `n ’s with n < Nε are finite a.s. as they have a finite expectation. Therefore
Ln
lim sup ≤ 1 + ε, a.s.
n log2 n
Since ε is arbitrary, we get the upper bound.

2.3 ...of (BC2)


We will need a more refined version of (BC2).
P
THM 3.21 If A1 , A2 , . . . are pairwise independent and n P[An ] = +∞ then

1
Pn
Pnm=1 Am → 1, a.s.
m=1 P[Am ]

P in probability follows from Chebyshev’s inequality. Let Xk =


Proof: Convergence
1Ak and Sn = k≤n Xk . Then by pairwise independence
X X X X
Var[Sn ] = Var[Xk ] ≤ E[Xk2 ] = E[Xk ] = P[Ak ] = E[Sn ],
k≤n k≤n k≤n k≤n

using Xk ∈ {0, 1}. Then

Var[Sn ] 1
P[|Sn − E[Sn ]| > δE[Sn ]] ≤ ≤ 2 → 0,
δ 2 E[Sn ]2 δ E[Sn ]
Lecture 3: Modes of convergence 10

by assumption. In particular,
Sn
→P 1.
E[Sn ]
We use a standard trick to obtain almost sure convergence. The idea is to take
subsequences, use (BC1), and sandwich the original sequence.)
1. Take
nk = inf{n : E[Sn ] ≥ k 2 },
and let Tk = Snk . Since E[Xn ] ≤ 1 we have in particular k 2 ≤ E[Tk ] ≤
k 2 + 1. Using Chebyshev again,
1
P[|Tk − E[Tk ]| > δE[Tk ]] ≤ ,
δ2 k2
which is summable so that, using (BC1) and the fact that δ is arbitrary,
Tk
→ 1, a.s.
E[Tk ]

2. For nk ≤ n < nk+1 , we have by monotonicity


Tk Sn Tk+1
≤ ≤
E[Tk+1 ] E[Sn ] E[Tk ]
Finally, note that
E[Tk ] Tk Sn Tk+1 E[Tk+1 ]
≤ ≤ ,
E[Tk+1 ] E[Tk ] E[Sn ] E[Tk ] E[Tk ]
and
k 2 ≤ E[Tk ] ≤ E[Tk+1 ] ≤ (k + 1)2 + 1.
Since the ratio of the two extremes terms goes to 1, the ratio of the expecta-
tions goes to 1 and we are done.

We will see this argument again when we prove the strong law of large num-
bers.

EX 3.22 (Record values) Let X1 , X2 , . . . be a sequence of IID RVs with a contin-


uous DF F corresponding to, say, an individual’s times in a race. Let
( )
Ak = Xk > sup Xj ,
j<k
Lecture 3: Modes of convergence 11

1Am , we will prove that


P
that is, that time k is a new record. Let Rn = m≤n

Rn
→ 1, a.s.
log n

Because F is continuous, there is no atom and P[Xj = Xk ] = 0 for j 6= k.


Let Y1n > · · · > Ynn be the sequence X1 , . . . , Xn in decreasing order. By the IID
assumption, the permutation πn (i) = j if Xi = Yjn is clearly uniform by symmetry.
In particular,
1
P[An ] = P[πn (n) = 1] = .
n
Moreover, for any m1 < m2 , note that on Am2 the distribution of the relative
ordering of the Xi s for i < m2 is unchanged by symmetry and therefore

P[Am1 ∩ Am2 ] 1
= P[Am1 ] = .
P[Am2 ] m1

We have proved that the Ak ’s are pairwise independent and that P[Ak ] = 1/k.
Now use the fact that
n
X 1
∼ log n,
i
i=1

and the previous theorem. This proves the claim.

References
[Dur10] Rick Durrett. Probability: theory and examples. Cambridge Series in
Statistical and Probabilistic Mathematics. Cambridge University Press,
Cambridge, 2010.

[Wil91] David Williams. Probability with martingales. Cambridge Mathematical


Textbooks. Cambridge University Press, Cambridge, 1991.

You might also like