0% found this document useful (0 votes)
17 views11 pages

Probability and Stochastic Process 51

The document discusses different modes of convergence for sequences of random variables. It defines convergence in probability, which means the probability that a random variable Xn differs from X by more than ε approaches 0 as n approaches infinity. It proves several important inequalities for bounding probabilities, including Markov's inequality, Chebyshev's inequality, and the one-sided Chebyshev inequality. It also states the weak law of large numbers, which says that the sample mean of independent random variables converges in probability to the true mean as the sample size increases.

Uploaded by

tarungajjuwalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

Probability and Stochastic Process 51

The document discusses different modes of convergence for sequences of random variables. It defines convergence in probability, which means the probability that a random variable Xn differs from X by more than ε approaches 0 as n approaches infinity. It proves several important inequalities for bounding probabilities, including Markov's inequality, Chebyshev's inequality, and the one-sided Chebyshev inequality. It also states the weak law of large numbers, which says that the sample mean of independent random variables converges in probability to the true mean as the sample size increases.

Uploaded by

tarungajjuwalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

PROBABILITY AND STOCHASTIC PROCESS 51

5. Modes of Convergence
5.1. Some important inequalities. We start this section by proving a result known as Markov
inequality.
Lemma 5.1 (Markov inequality). If X is a non-negative random variable whose expected
value exists, then for all a > 0,
E(X)
P(X ≥ a) ≤ .
a
Proof. Observe that, since X is non-negative
h i
E[X] = E X1{X≥a} + X1{X<a} ≥ E[X1{X≥a} ] ≥ aP(X ≥ a).
Hence the result follows. 
Corollary 5.2. If X is a random variable such that E[|X|] < +∞, then for all a > 0
E(|X|)
P(|X| ≥ a) ≤ .
a
Example 5.1. A coin is weighted so that its probability of landing on heads is 20%. Suppose
the coin is flipped 20 times. We want to find a bound for the probability it lands on heads at
least 16 times. Let X be the number of times the coin lands on heads. Then X ∼ B(20, 15 ). We
use Markov inequality to find the required bound:
E(X) 4 1
P(X ≥ 16) ≤ = = .
16 16 4
The actual probability that this happen is
20  
X 20
P(X ≥ 16) = (0.2)k (0.8)20−k ≈ 1.38 × 10−8 .
k
k=16

Lemma 5.3 (Chebyschev’s inequality). Let Y be an integrable random variable such that
Var(Y ) < +∞. Then for any ε > 0
Var(Y )
P(|Y − E(Y )| ≥ ε) ≤ .
ε2
Proof. To get the result, take X = |Y − E(Y )|2 and a = ε2 in Markov inequality. 
Example 5.2. Is there any random variable X for which
 1
P µ − 3σ ≤ X ≤ µ + 3σ = ,
2
2
where µ = E(X) and σ = Var(X).
Solution: Observe that
  
P µ − 3σ ≤ X ≤ µ + 3σ = P |X − µ| ≤ 3σ = 1 − P |X − E(X)| > 3σ .
By Chebyschev’s inequality, we get that

  σ2
P |X − E(X)| > 3σ ≤ P |X − E(X)| ≥ 3σ ≤ 2 ,

and hence
 1 8
P µ − 3σ ≤ X ≤ µ + 3σ ≥ 1 − = .
9 9
1
Since 2 < 89 , there exists NO random variable X satisfying the given condition.
52 A. K. MAJEE

In principle Chebyshev’s inequality asks about distance from the mean in either direction, it
can still be used to give a bound on how often a random variable can take large values, and will
usually give much better bounds than Markov’s inequality. For example consider Example 5.1.
Markov’s inequality gives a bound of 14 . Using Chebyshev’s inequality, we see that
Var(X) 1
P(X ≥ 16) = P(X − 4 ≥ 12) ≤ P(|X − 4| ≥ 12) ≤ = .
144 45
Lemma 5.4 (One-sided Chebyschev inequality). Let X be a random variable with mean 0 and
variance σ 2 < +∞. Then for any a > 0,
σ2
P(X ≥ a) ≤ .
σ 2 + a2
Proof. For any b ≥ 0, we see that X ≥ a is equivalent to X + b ≥ a + b. Hence by Markov’s
inequality, we have
E[(X + b)2 ] σ 2 + b2
P(X ≥ a) = P(X + b ≥ a + b) ≤ P((X + b)2 ≥ (a + b)2 ) ≤ =
(a + b)2 (a + b)2
σ 2 + b2 σ2
=⇒ P(X ≥ a) ≤ min = .
b≥0 (a + b)2 σ 2 + a2

One can use one-sided Chebyschev inequality to arrive at the following corollary.
Corollary 5.5. If E[X] = µ and Var(X) = σ 2 , then for any a > 0,
σ2
P(X ≥ µ + a) ≤
σ 2 + a2
σ2
P(X ≤ µ − a) ≤ 2 .
σ + a2
Example 5.3. Let X be a Poisson random variable with mean 20. Show that one-sided Cheby-
shev inequality gives better upper bound on P(X ≥ 26) compare to Markov and Chebyschev
inequalities. Indeed, by Markov inequality, we have
E[X] 10
p = P(X ≥ 26) ≤ = .
26 13
By Chebyschev inequality, we get
Var(X) 10
p = P(X − 20 ≥ 6) ≤ P(|X − 20| ≥ 6) ≤ = .
36 18
One-sided Chebyschev inequality gives
Var(X) 10
p = P(X − 20 ≥ 6) ≤ = .
Var(X) + 36 28
Theorem 5.6 (Weak Law of Large Number). Let {Xi } be a sequence of iid random variables
with finite mean µ and variance σ 2 . Then for any ε > 0
Sn  σ2
P | − µ| > ε ≤ 2 ,
n nε
where Sn = ni=1 Xi . In particular,
P

Sn 
lim P | − µ| > ε = 0 .
n→∞ n
Proof. First inequality follows from Chebyschev’s inequality. Sending limit as n tends to infinity
in the first inequality, we arrive at the second result. 
PROBABILITY AND STOCHASTIC PROCESS 53

We shall discuss various modes of convergence for a given sequence of random variables {Xn }
defined on a given probability space (Ω, F, P).
Definition 5.1 (Convergence in probability). We say that {Xn } converges to a random
variable X, defined on the same probability space (Ω, F, P), in probability if for every ε > 0,
lim P(|Xn − X| > ε) = 0.
n→∞
P
We denote it by Xn → X.
1
Example 5.4. Let {Xn } be a sequence of random variables such that P(Xn = 0) = 1 − n and
1 P
P(Xn = n) = n. Then Xn → 0. Indeed for any ε > 0,
(
1
if ε < n,
P(|Xn | > ε) = n
0 if ε ≥ n .
Hence, limn→∞ P(|Xn | > ε) = 0.
Example 5.5. Let {Xn } be a sequence of i.i.d. random variables with P(Xn = 1) = 21 and
P(Xn = −1) = 12 . Then n1 ni=1 Xi converges to 0 in probability. Indeed for any ε > 0, thanks
P
to weak law of large number, we have
1 Var(X1 )
P(| Sn − µ| > ε) ≤
n nε2
where µ = E(X1 ). Observe that µ = 0 and Var(X1 ) = 1. Hence
n
1X 1
P(| Xi | > ε) ≤ 2 → 0 as n → ∞.
n nε
i=1
P |Xn −X| 
Theorem 5.7. Xn → X if and only if lim E 1+|Xn −X| = 0.
n→∞
P
Proof. With out loss of generality, take X = 0. Thus, we want to show that Xn → 0 if and only
|Xn | 
if lim E 1+|X n|
= 0.
n→∞
P
Suppose Xn → 0. Then given ε > 0, we have limn→∞ P(|Xn | > ε) = 0. Now,
|Xn | |Xn | |Xn |
= 1 + 1 ≤ 1|Xn |>ε + ε
1 + |Xn | 1 + |Xn | |Xn |>ε 1 + |Xn | |Xn |≤ε
|Xn |  |Xn | 
=⇒ E ≤ P(|Xn | > ε) + ε =⇒ lim E ≤ ε.
1 + |Xn | n→∞ 1 + |Xn |
|Xn | 
Since ε > 0 is arbitrary, we have limn→∞ E 1+|X n|
= 0.
|Xn |  x
Conversely, let limn→∞ E 1+|X n|
= 0. Observe that the function f (x) = 1+x is strictly
increasing on [0, ∞). Thus,
ε |Xn | |Xn |
1|Xn |>ε ≤ 1|Xn |>ε ≤
1+ε 1 + |Xn | 1 + |Xn |
ε |Xn | 
=⇒ P(|Xn | > ε) ≤ E
1+ε 1 + |Xn |
1+ε |Xn |  P
=⇒ lim P(|Xn | > ε) ≤ lim E = 0 =⇒ Xn → 0.
n→∞ ε n→∞ 1 + |Xn |

54 A. K. MAJEE

Definition 5.2 (Convergence in r-th mean). Let X, {Xn } be random variables defined on
a given probability space (Ω, F, P) such that for r ∈ N, E[|X|r ] < ∞ and E[|Xn |r ] < ∞ for all
r
n. We say that {Xn } converges in the r-th mean to X, denoted by Xn → X, if the following
holds:
lim E[|Xn − X|r ] = 0.
n→∞

Example 5.6. Let {Xn } be i.i.d. random variables with E[Xn ] = µ and Var(Xn ) = σ 2 . Define
2
Yn = n1 ni=1 Xi . Then Yn → µ. Indeed
P
Pn
2 Xi − nµ 2 1 1 σ2
E[|Yn − µ| ] = E[ i=1 ] = 2 E[|Sn − E(Sn )|2 ] = 2 Var(Sn ) =
n n n n
Pn 2
where Sn = i=1 Xi . Hence Yn → µ.
Theorem 5.8. The following holds:
r P
i) Xn → X =⇒ Xn → X for any r ≥ 1.
P P
ii) Let f be a given continuous function. If Xn → X, then f (Xn ) → f (X).
Proof. Proof of (i) follows from Markov’s inequality. Indeed, for any given ε > 0,
E[|Xn − X|r ]
P(|Xn − X| > ε) ≤
εr
1
=⇒ lim P(|Xn − X| > ε) ≤ r lim E[|Xn − X|r ] = 0.
n→∞ ε n→∞
Proof of (ii): For any k > 0, we see that
{|f (Xn ) − f (X)| > ε} ⊂ {|f (Xn ) − f (X)| > ε, |X| ≤ k} ∪ {|X| > k} .
Since f is continuous, it is uniformly continuous on any bounded interval. Therefore, for any
given ε > 0, there exists δ > 0 such that |f (x) − f (y)| ≤ ε if |x − y| ≤ δ for x and y in [−k, k].
This means that
{|f (Xn ) − f (X)| > ε, |X| ≤ k} ⊂ {|Xn − X| > δ, |X| ≤ k} ⊂ {|Xn − X| > δ} .
Thus we have
{|f (Xn ) − f (X)| > ε} ⊂ {|Xn − X| > δ} ∪ {|X| > k}
=⇒ P(|f (Xn ) − f (X)| > ε) ≤ P(|Xn − X| > δ) + P(|X| > k).
P
Since Xn → X and limk→∞ P(|X| > k) = 0, we obtain that limn→∞ P(|f (Xn ) − f (X)| > ε) = 0.
This completes the proof. 

In general, convergence in probability does not imply convergence in r-th mean. To see it,
consider the following example.
P
Example 5.7. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Let Xn = n1(0,1/n) . Then Xn → 0
r
but Xn 9 0 for all r ≥ 1. To show this, observe that
1 P
P(|Xn | > ε) ≤ =⇒ lim P(|Xn | > ε) = 0 i.e., Xn → 0.
n n→∞

On the other hand, for r ≥ 1


Z 1
n
r
E[|Xn | ] = nr dx = nr−1 9 0 as n → ∞.
0
PROBABILITY AND STOCHASTIC PROCESS 55

Definition 5.3 (Almost sure convergence). Let X, {Xn } be random variables defined on
a given probability space (Ω, F, P). We say that {Xn } converges to X almost surely or with
probability 1 if the following holds:
P( lim Xn = X) = 1.
n→∞
a.s
We denote it by Xn → X.
Example 5.8. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define
(
1 if ω ∈ (0, 1 − n1 )
Xn (ω) =
n, otherwise.
It is easy to check that if ω = 0 or ω = 1, then limn→∞ Xn (ω) = ∞. For any ω ∈ (0, 1), we
can find n0 ∈ N such that ω ∈ (0, 1 − n1 ) for any n ≥ n0 . As a consequence, Xn (ω) = 1 for any
n ≥ n0 . In other words, for ω ∈ (0, 1), limn→∞ Xn (ω) = 1. Define X(ω) = 1 for all ω ∈ [0, 1].
Then
a.s.
P(ω ∈ [0, 1] : {Xn (ω)} does not converge to X(ω)) = P({0, 1}) = 0 =⇒ Xn → 1.
Sufficient condition for almost sure convergence: Let {An } be a sequence of events in F.
Define
lim sup An = ∩∞
 
n=1 ∪m≥n Am = lim ∪m≥n Am .
n→∞
n
This can be interpreted probabilistically as
lim sup An = “ An occurs infinitely often”.
n
We denote this as
{An i.o.} = lim sup An .
n

Theorem 5.9 (Borel-Cantelli lemma). Let {An } be a sequence of events in (Ω, F, P).
i) If ∞
P
n=1 P(An ) < +∞, then P(An i.o.) = 0.
ii) If An are mutually independent events, and if ∞
P
n=1 P(An ) = ∞, then
P(An i.o.) = 1.
Remark 5.1. For mutually independent events An , since ∞
P
n=1 P(An ) is either finite or infinite,
the event {An i.o.} has probability either 0 or 1. This is sometimes called zero-one law.
As a consequence of Borel-Cantelli lemma, we have the following proposition.
Proposition 5.10. Let {Xn } be a sequence of random variables defined on a probability space
a.s.
(Ω, F, P). If ∞
P
n=1 P(|Xn | > ε) < +∞ for any ε > 0, then Xn → 0.

Proof. Fix ε > 0. Let An = {|Xn | > ε}. Then ∞


P
n=1 P(An ) < +∞, and hence by Borel-Cantelli
lemma, P(An i.o.) = 0. Now
c
lim sup An = {ω : ∃ n0 (ω) such that |Xn (ω)| ≤ ε ∀ n ≥ n0 (ω)} := Bε .
n
Thus, P(Bε ) = 1. Let B = ∩∞ c ∞ c
r=1 B 1 =⇒ B = ∪r=1 B 1 . Moreover, since P(Bε= 1 ) = 1, we have
r r r
P(B c1 ) = 0. Observe that
r

{ω : lim |Xn (ω)| = 0} = ∩∞


r=1 B 1 .
n→∞ r
P∞
Again, P(B c ) ≤ c
r=1 P(B 1 ) = 0, and hence P(B) = 1. In other words,
r
 a.s.
P {ω : lim |Xn (ω)| = 0} = 1, i.e., Xn → 0.
n→∞
56 A. K. MAJEE


Example 5.9. Let {Xn } be a sequence of i.i.d. random variables such that P(Xn = 1) = 12 and
a.s.
P(Xn = −1) = 12 . Let Sn = ni=1 Xi . Then n12 Sn2 → 0. To show the result, we use Proposition
P
5.10. Note that

1 E[|Sn2 |2 ] 1 X 1 1 a.s.
P( 2 |Sn2 | > ε) ≤ 4 2
≤ 2 2 =⇒ P( 2 |Sn2 | > ε) < ∞ =⇒ 2 Sn2 → 0.
n n ε n ε n n
n=1

Let us consider the following example.


Example 5.10. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define
Xn = 1[ j j+1
, ] , n = 2k + j, j ∈ {0, . . . , 2k − 1}, k = blog2 (n)c.
2k 2k

Note that, for each positive integer n, there exist integers j and k(uniquely determined) such
that
n = 2k + j, , j ∈ {0, . . . , 2k − 1}, k = blog2 (n)c.
( for n = 1, k = j = 0, and for n = 5, k = 2, j = 1 and so on). Let An = {Xn > 0}. Then,
P
clearly P(An ) → 0. Consequently, Xn → 0 but Xn (ω) 9 0 for all ω ∈ Ω.
Theorem 5.11. The followings hold.
a.s P
i) If Xn → X, then Xn → X.
P a.s.
ii) If Xn → X, then there exists a subsequence Xnk of Xn such that Xnk → X.
a.s a.s
iii) If Xn → X, then for any continuous function f , f (Xn ) → f (X).
Proof. Proof of i): For any ε > 0, define Aεn = {|Xn − X| > ε} and Bm ε = ∪∞ Aε . Since
n=m n
a.s ε ) = 0. Note that {B ε } are nested and decreasing sequence of events. Hence
Xn → X, P(∩m Bm m
from the continuity of probability measure P, we have
ε ε
lim P(Bm ) = P(∩m Bm ) = 0.
m→∞
Since Aεm ⊂ Bm
ε , we have P(Aε ) ≤ P(B ε ). This implies that lim
m m
ε
m→∞ P(Am ) = 0. In other
P
words, Xn → X.
P
Proof of ii): To prove ii), we will use Borel-Cantelli lemma. Since Xn → X, we can choose
a subsequence Xnk such that P(|Xnk − X| > k1 ) ≤ 21k . Let Ak := {|Xnk − X| > k1 }. Then
P∞
k=1 P(Ak ) < +∞. Hence, by Borel-Cantelli lemma P(Ak i.o.) = 0. This implies that
1
P(∪∞ ∞ c
n=1 ∩m=n Am ) = 1 =⇒ P {ω ∈ Ω : ∃ n0 : ∀k ≥ n0 , |Xnk − X| ≤ }) = 1
k
a.s.
=⇒ Xnk → X.
Proof of iii): Let N = {ω : limn→∞ Xn (ω) 6= X(ω)}. Then P(N ) = 0. If ω ∈
/ N , then by the
continuity property of f , we have
lim f (Xn (ω)) = f ( lim Xn (ω)) = f (X(ω)) .
n→∞ n→∞
a.s
This is true for any ω ∈
/ N and P(N ) = 0. Hence f (Xn ) → f (X). 
PROBABILITY AND STOCHASTIC PROCESS 57

Definition 5.4 (Convergence in distribution). Let X, X1 , X2 , . . . be real-valued random


variables with distribution functions FX , FX1 , FX2 , . . . respectively. We say that (Xn ) converges
d
to X is distribution, denoted by Xn → X, if
lim FXn (x) = FX (x) for all continuity points x of FX .
n→∞
Remark 5.2. In the above definition, the random variables X, {Xn } need not be defined on
the same probability space.
1
Example 5.11. Let Xn = and X = 0. Then
n
( (
1 if x ≥ n1 1 x ≥ 0,
FXn (x) = P(Xn ≤ x) = and FX (x) =
0, otherwise 0, x < 0.
Observe that 0 is the only discontinuity point of FX , and limn→∞ FXn (x) = FX (x) for x 6= 0.
d
Thus, Xn → 0.
Example 5.12. Let X be a real-valued random variable with distribution function F . Define
Xn = X + n1 . Then
1 1
FXn (x) = P(X + ≤ x) = F (x − )
n n
=⇒ lim FXn (x) = F (x−) = F (x) for continuity point x of F .
n→∞
d
This implies that Xn → X.
P d
Theorem 5.12. Xn → X implies that Xn → X.
Proof. Let ε > 0. Since FXn (t) = P(Xn ≤ t), we have
FXn (t) = P(Xn ≤ t, |Xn − X| > ε) + P (Xn ≤ t, |Xn − X| ≤ ε)
≤ P(|Xn − X| > ε) + P(Xn ≤ t, |Xn − X| ≤ ε)
≤ P(|Xn − X| > ε) + P(X ≤ t + ε)
≤ P(|Xn − X| > ε) + FX (t + ε),
and
FX (t − ε) = P(X ≤ t − ε) = P(X ≤ t − ε, |Xn − X| > ε) + P(X ≤ t − ε, |Xn − X| ≤ ε)
≤ P(|Xn − X| > ε) + P(X ≤ t − ε, |Xn − X| ≤ ε)
≤ P(|Xn − X| > ε) + P(Xn ≤ t)
≤ P(|Xn − X| > ε) + FXn (t) .
Thus, since limn→∞ P(|Xn − X| > ε) = 0, we obtain from the above inequalities
FX (t − ε) ≤ lim inf FXn (t) ≤ lim sup FXn (t) ≤ FX (t + ε).
n→∞ n→∞
Thet t be the continuity point of F . Then sending ε → 0 in the above inequality, we get
d
lim FXn (t) = FX (t), i.e., Xn → X.
n→∞

Converse of this theorem is NOT true in general.
Example 5.13. Let X ∼ N (0, 1). Define Xn = −X for n = 1, 2, 3, . . .. Then Xn ∼ N (0, 1)
d
and hence Xn → X. But
ε P
P(|Xn − X| > ε) = P(|2X| > ε) = P(|X| > ) 6= 0 =⇒ Xn 9 X.
2
58 A. K. MAJEE

Theorem 5.13 (Continuity theorem). Let X, {Xn } be random variables having the charac-
teristic function φX , {φXn } respectively. Then the followings are equivalent.
d
i) Xn → X.
ii) E(g(Xn )) → E(g(X)) for all bounded Lipschitz continuous function.
iii) limn→∞ φXn (t) = φX (t) for all t ∈ R.
Example 5.14. Let {Xn } be a sequence of Poisson random variables with parameter λn = n.
n −n
Define Zn = X√ n
. Then
d
Zn → Z, where L(Z) = N (0, 1).
Solution: To see this, we use Levy’s continuity theorem. Let ΦZn : R → C be the characteristic
function of Zn . Then we have
√ √ iu

 i √u X  n
ΦZn (u) = E eiuZn = e−iu n E e n n = e−iu n en(e −1) .
 

Using Taylor‘s series expansion, we have


iu
√ iu u2 iu3
e n −1= √ − − + ...
n 2n 6n 32
and hence we get
iu 2 3
√ n( √ −u − iu 3 +...) √ √ 2 h(u,n)
−iu n n+−iu n − u2 − √n
= e−iu
n 2n
ΦZn (u) = e e 6n 2 e e
h(u,n)
where h(u, n) stays bounded in n for each u and hence limn→∞ √
n
= 0. Therefore, we have
√ √ 2 h(u,n) u2
n+−iu n − u2 − √n
lim ΦZn (u) = lim e−iu e e = e− 2 .
n→∞ n→∞
u2 d
Since e− 2 is the characteristic function of N (0, 1), we conclude that Zn → Z.
Theorem 5.14 (Strong law of large number (SLLN)). Let {Xi } be a sequence of i.i.d.
random variables with finite mean µ and variance σ 2 . Then
n
Sn a.s. X
→ µ, where Sn = Xi .
n
i=1
The special case of 4-th order moment, above theorem is refered as Borel’s SLLN. Before
going to prove the theorem, let us consider some examples.
Example 5.15. 1) Let {Xn } be a sequence of i.i.d. random variables that are bounded,
a.s.
i.e., there exists C < ∞ such that P(|X1 | ≤ C) = 1. Then Snn → E(X1 ).
2) Let {Xn } be a sequence of i.i.d. Bernoulli(p) random variables. Then µ = p and σ 2 =
p(1 − p). Hence by SLLN theorem, we have
n
1X
lim Xi = p with probability 1.
n→∞ n
i=1
To prove the theorem, we need following lemma.
Lemma 5.15. Let {Xi } be a sequence of random variables defined on a given probability space
(Ω, F, P).
i) If {Xn } are positive, then

X ∞
 X
E Xn = E[Xn ] . (5.1)
n=1 n=1
P∞ P∞
ii) If n=1 E[|Xn |] < ∞, then i=1 Xi converges almost surely and (5.1) holds as well.
PROBABILITY AND STOCHASTIC PROCESS 59

Proof of Strong law of large number: With out loss of generality we can assume that µ = 0
(otherwise, if µ 6= 0, then set X̃i = Xi − µ, and work with X̃i ). Set Yn = Snn . Observe that,
thanks to independent property,
n
1 X 1 X σ2
E[Yn ] = 0, E[Yn2 ] = 2 E(Xj Xk ) = 2 E[Xj2 ] = .
n n n
1≤j,k≤n j=1

Thus, limn→∞ E[Yn2 ] = 0, and hence along a subsequence, Yn converges to 0 almost surely. But
we need to show that original sequence converges to 0 with probability 1. To do so, we proceed
2 P∞ σ2
as follows. Since E[Yn2 ] = σn , we see that ∞ 2
P
P∞ n=1 E[Yn2 ] = n=1 n2 < +∞ and hence by Lemma
5.15, ii), n=1 Yn22 converges almost surely. Thus,
lim Yn2 = 0 with probability 1. (5.2)
n→∞

Let n ∈ N. Then there exists m(n) ∈ N such that (m(n))2 ≤ n < (m(n) + 1)2 . Now
(m(n)) 2
n
(m(n))2 1X (m(n))2 1 X
Yn − Y(m(n))2 = Xi − Xi
n n n (m(n))2
i=1 i=1
n
1 X
= Xi
n
i=1+(m(n))2
n
h (m(n))2 2 i 1 X
=⇒ E Yn − Y(m(n))2 = 2 E[Xi2 ]
n n
i=1+(m(n))2

n − (m(n))2 2 2m(n) + 1 2
= σ ≤ σ (∴ n < (m(n) + 1)2 )
n2 n2

2 n + 1 2 3σ 2 √
≤ 2
σ ≤ 3 (∴ m(n) ≤ n)
n n2
∞ 2 ∞ 2
X h (m(n))  2 i X 3σ
=⇒ E Yn − Y(m(n))2 ≤ 3 < +∞ .
n n 2
n=1 n=1

Thus, again by Lemma 5.15, ii), we conclude that


(m(n))2
lim Yn − Y(m(n))2 = 0 with probability 1. (5.3)
n→∞ n
(m(n))2
Obsere that limn→∞ n = 1. Thus, in view of (5.2) and (5.3), we conclude that

lim Yn = 0 with probability 1.


n→∞

This completes the proof.


Theorem 5.16 (Kolmogorov’s strong law of large numbers). Let {Xn } be a sequence of
i.i.d. random variables and µ ∈ R. Then
Sn
lim = µ a.s. if and only if E[Xn ] = µ.
n→∞ n
In this case, the convergence also holds in L1 .
Example 5.16 (Monte Carlo approximation). Let f be a measurable function in [0, 1] such
R1 R1
that 0 |f (x)| dx < ∞. Let α = 0 f (x) dx. In general we cannot obtain a closed form expression
60 A. K. MAJEE

for α and need to estimate it. Let {Uj } be a sequence of independent uniformly random variables
on [0, 1]. Then by Theorem 5.16,
n Z 1
1X
lim f (Uj ) = E[f (Uj )] = f (x) dx
n→∞ n 0
j=1
R1
a.s. and in L2 . Thus, to get an approximation of 0 f (x) dx, we need to simulate the uniform
random variables Uj (by using a random number generator).
Theorem 5.17 (Central limit theorem). Let {Xn } be a sequence of i.i.d. random variables
−nµ
with finite mean µ and variance σ 2 with 0 < σ 2 < +∞. Let Yn = Sσn√ n
. Then Yn converges in
distribution to Y , where L(Y ) = N (0, 1).
Proof. With out loss of generality, we assume that µ = 0. Let Φ, ΦYn be the characteristic
function of Xj and Yn respectively. Since {Xj } are i.i.d., we have
Pn
X
iuYn iu σS√nn iu √ i
i=1
ΦYn (u) = E[e ] = E[e ] = E[e ] σ n

n n
hY X i
iu √i
Y  iu X√i  u n
=E e σ n = E e σ n = Φ( √ ) .
σ n
i=1 i=1

Since E[|Xj |2 ] < +∞, the function Φ has two continuous derivatives. In particular,
Φ0 (u) = iE[Xj eiuXj ], Φ00 (u) = −E[Xj2 eiuXj ] =⇒ Φ0 (0) = 0, Φ00 (0) = −σ 2 .
Expanding Φ in a Taylor expansion about u = 0, we have
σ 2 u2
Φ(u) = 1 − + h(u)u2 , where h(u) → 0 as u → 0.
2
Thus, we get
u 2 u2
n log(Φ( σ√ )) n log(1− u + u
h( σ√ ))
ΦYn (u) = e n =e 2n nσ 2 n

u2
=⇒ lim ΦYn (u) = e− 2 = ΦY (u) (by L’Hôpital rule).
n→∞

Hence by Levy’s continuity theorem, we conclude that Yn converges in distribution to Y with


L(Y ) = N (0, 1). 
Sn
Remark 5.3. If σ 2 = 0, then Xj = µ a.s. for all j, and hence n = µ a.s.
One can weaken slightly the hypotheses of Theorem 5.17. Indeed, we have the following
Central limit theorem.
Theorem 5.18. Let {Xn } be independent but not necessarily identicallu distributed.
Let E[Xn ] = 0 for all n, ane let σn2 = Var(Xn ). Assume that

X
sup E[|Xn |2+ε ] < +∞ for some ε > 0, σn2 = ∞.
n
n=1

Then √PSnn converges in distribution to Y with L(Y ) = N (0, 1).


i=1 σi2

Example 5.17. Let {Xn } be a sequence of i.i.d random variables such that P(Xn = 1) = 12 and
P(Xn = 0) = 12 . Then µ = 12 and σ 2 = 14 . Hence by central limit theorem, Yn = 2S√nn−n converges
in distribution to Y with L(Y ) = N (0, 1).
PROBABILITY AND STOCHASTIC PROCESS 61

Example 5.18. Let X ∼ B(n, p). For any given 0 ≤ α ≤ 1, we want to find n such that
P(X > n2 ) ≤ 1 − α. We can thaink of X as a sum of n i.i.d. random variable Xi such that
Xi ∼ B(1, p). Hence by central limit theorem, for large n,
Z x
1 u2
e− 2 du.
p
P(X − np ≤ x np(1 − p)) = Φ(x) = √
2π −∞

Choose x such that np + x np(1 − p) = n2 . This implies that x = 2n √1−2p . Thus,
p
p(1−p)

n
Z √1−2p
n n 1 2 u2
p(1−p)
P(X > ) = 1 − P(X ≤ ) = 1 − √ e− 2 du .
2 2 2π −∞
Therefore, we need to choose n such that
Z √n √1−2p p
1 2 p(1−p) − u
2 √ 2 p(1 − p) −1
α≤ √ e 2 du =⇒ n ≥ Φ (α)
2π −∞ 1 − 2p
4p(1 − p) −1 2
=⇒ n ≥ Φ (α) .
(1 − 2p)2
Example
Pn 5.19. Let {Xi } be sequence of i.i.d. random variables with exp(1) distributed. Let
i=1 Xi
X̄ = n . How large should n be such that
P(0.9 ≤ X̄ ≤ 1.1) ≥ 0.95?
Pn
Since Xi ’s are exp(1) distributed, µ = E[Xi ] = 1 and σ 2 = Var(Xi ) = 1. Let Y = i=1 Xi .
Then by central limit theorem, Y√−n
n
is approximately N (0, 1). Now
(0.9)n − n Y −n (1.1)n − n
P(0.9 ≤ X̄ ≤ 1.1) = P((0.9)n ≤ Y ≤ (1.1)n) = P( √ ≤ √ ≤ √ )
n n n
√ Y −n √  √ √
= P − (0.1) n ≤ √ ≤ (0.1) n = Φ((0.1) n) − Φ(−(0.1) n)
n

= 2Φ((0.1) n) − 1 (∴ Φ(−x) = 1 − Φ(x))
Hence we need to find n such that
√ √ √
2Φ((0.1) n) − 1 ≥ 0.95 =⇒ Φ((0.1) n) ≥ 0.975 =⇒ (0.1) n ≥ Φ−1 (0.975) = 1.96

=⇒ n ≥ 19.6 =⇒ n ≥ 384.16 =⇒ n ≥ 385 (∴ n ∈ N).

You might also like