Probability and Stochastic Process 51
Probability and Stochastic Process 51
5. Modes of Convergence
5.1. Some important inequalities. We start this section by proving a result known as Markov
inequality.
Lemma 5.1 (Markov inequality). If X is a non-negative random variable whose expected
value exists, then for all a > 0,
E(X)
P(X ≥ a) ≤ .
a
Proof. Observe that, since X is non-negative
h i
E[X] = E X1{X≥a} + X1{X<a} ≥ E[X1{X≥a} ] ≥ aP(X ≥ a).
Hence the result follows.
Corollary 5.2. If X is a random variable such that E[|X|] < +∞, then for all a > 0
E(|X|)
P(|X| ≥ a) ≤ .
a
Example 5.1. A coin is weighted so that its probability of landing on heads is 20%. Suppose
the coin is flipped 20 times. We want to find a bound for the probability it lands on heads at
least 16 times. Let X be the number of times the coin lands on heads. Then X ∼ B(20, 15 ). We
use Markov inequality to find the required bound:
E(X) 4 1
P(X ≥ 16) ≤ = = .
16 16 4
The actual probability that this happen is
20
X 20
P(X ≥ 16) = (0.2)k (0.8)20−k ≈ 1.38 × 10−8 .
k
k=16
Lemma 5.3 (Chebyschev’s inequality). Let Y be an integrable random variable such that
Var(Y ) < +∞. Then for any ε > 0
Var(Y )
P(|Y − E(Y )| ≥ ε) ≤ .
ε2
Proof. To get the result, take X = |Y − E(Y )|2 and a = ε2 in Markov inequality.
Example 5.2. Is there any random variable X for which
1
P µ − 3σ ≤ X ≤ µ + 3σ = ,
2
2
where µ = E(X) and σ = Var(X).
Solution: Observe that
P µ − 3σ ≤ X ≤ µ + 3σ = P |X − µ| ≤ 3σ = 1 − P |X − E(X)| > 3σ .
By Chebyschev’s inequality, we get that
σ2
P |X − E(X)| > 3σ ≤ P |X − E(X)| ≥ 3σ ≤ 2 ,
9σ
and hence
1 8
P µ − 3σ ≤ X ≤ µ + 3σ ≥ 1 − = .
9 9
1
Since 2 < 89 , there exists NO random variable X satisfying the given condition.
52 A. K. MAJEE
In principle Chebyshev’s inequality asks about distance from the mean in either direction, it
can still be used to give a bound on how often a random variable can take large values, and will
usually give much better bounds than Markov’s inequality. For example consider Example 5.1.
Markov’s inequality gives a bound of 14 . Using Chebyshev’s inequality, we see that
Var(X) 1
P(X ≥ 16) = P(X − 4 ≥ 12) ≤ P(|X − 4| ≥ 12) ≤ = .
144 45
Lemma 5.4 (One-sided Chebyschev inequality). Let X be a random variable with mean 0 and
variance σ 2 < +∞. Then for any a > 0,
σ2
P(X ≥ a) ≤ .
σ 2 + a2
Proof. For any b ≥ 0, we see that X ≥ a is equivalent to X + b ≥ a + b. Hence by Markov’s
inequality, we have
E[(X + b)2 ] σ 2 + b2
P(X ≥ a) = P(X + b ≥ a + b) ≤ P((X + b)2 ≥ (a + b)2 ) ≤ =
(a + b)2 (a + b)2
σ 2 + b2 σ2
=⇒ P(X ≥ a) ≤ min = .
b≥0 (a + b)2 σ 2 + a2
One can use one-sided Chebyschev inequality to arrive at the following corollary.
Corollary 5.5. If E[X] = µ and Var(X) = σ 2 , then for any a > 0,
σ2
P(X ≥ µ + a) ≤
σ 2 + a2
σ2
P(X ≤ µ − a) ≤ 2 .
σ + a2
Example 5.3. Let X be a Poisson random variable with mean 20. Show that one-sided Cheby-
shev inequality gives better upper bound on P(X ≥ 26) compare to Markov and Chebyschev
inequalities. Indeed, by Markov inequality, we have
E[X] 10
p = P(X ≥ 26) ≤ = .
26 13
By Chebyschev inequality, we get
Var(X) 10
p = P(X − 20 ≥ 6) ≤ P(|X − 20| ≥ 6) ≤ = .
36 18
One-sided Chebyschev inequality gives
Var(X) 10
p = P(X − 20 ≥ 6) ≤ = .
Var(X) + 36 28
Theorem 5.6 (Weak Law of Large Number). Let {Xi } be a sequence of iid random variables
with finite mean µ and variance σ 2 . Then for any ε > 0
Sn σ2
P | − µ| > ε ≤ 2 ,
n nε
where Sn = ni=1 Xi . In particular,
P
Sn
lim P | − µ| > ε = 0 .
n→∞ n
Proof. First inequality follows from Chebyschev’s inequality. Sending limit as n tends to infinity
in the first inequality, we arrive at the second result.
PROBABILITY AND STOCHASTIC PROCESS 53
We shall discuss various modes of convergence for a given sequence of random variables {Xn }
defined on a given probability space (Ω, F, P).
Definition 5.1 (Convergence in probability). We say that {Xn } converges to a random
variable X, defined on the same probability space (Ω, F, P), in probability if for every ε > 0,
lim P(|Xn − X| > ε) = 0.
n→∞
P
We denote it by Xn → X.
1
Example 5.4. Let {Xn } be a sequence of random variables such that P(Xn = 0) = 1 − n and
1 P
P(Xn = n) = n. Then Xn → 0. Indeed for any ε > 0,
(
1
if ε < n,
P(|Xn | > ε) = n
0 if ε ≥ n .
Hence, limn→∞ P(|Xn | > ε) = 0.
Example 5.5. Let {Xn } be a sequence of i.i.d. random variables with P(Xn = 1) = 21 and
P(Xn = −1) = 12 . Then n1 ni=1 Xi converges to 0 in probability. Indeed for any ε > 0, thanks
P
to weak law of large number, we have
1 Var(X1 )
P(| Sn − µ| > ε) ≤
n nε2
where µ = E(X1 ). Observe that µ = 0 and Var(X1 ) = 1. Hence
n
1X 1
P(| Xi | > ε) ≤ 2 → 0 as n → ∞.
n nε
i=1
P |Xn −X|
Theorem 5.7. Xn → X if and only if lim E 1+|Xn −X| = 0.
n→∞
P
Proof. With out loss of generality, take X = 0. Thus, we want to show that Xn → 0 if and only
|Xn |
if lim E 1+|X n|
= 0.
n→∞
P
Suppose Xn → 0. Then given ε > 0, we have limn→∞ P(|Xn | > ε) = 0. Now,
|Xn | |Xn | |Xn |
= 1 + 1 ≤ 1|Xn |>ε + ε
1 + |Xn | 1 + |Xn | |Xn |>ε 1 + |Xn | |Xn |≤ε
|Xn | |Xn |
=⇒ E ≤ P(|Xn | > ε) + ε =⇒ lim E ≤ ε.
1 + |Xn | n→∞ 1 + |Xn |
|Xn |
Since ε > 0 is arbitrary, we have limn→∞ E 1+|X n|
= 0.
|Xn | x
Conversely, let limn→∞ E 1+|X n|
= 0. Observe that the function f (x) = 1+x is strictly
increasing on [0, ∞). Thus,
ε |Xn | |Xn |
1|Xn |>ε ≤ 1|Xn |>ε ≤
1+ε 1 + |Xn | 1 + |Xn |
ε |Xn |
=⇒ P(|Xn | > ε) ≤ E
1+ε 1 + |Xn |
1+ε |Xn | P
=⇒ lim P(|Xn | > ε) ≤ lim E = 0 =⇒ Xn → 0.
n→∞ ε n→∞ 1 + |Xn |
54 A. K. MAJEE
Definition 5.2 (Convergence in r-th mean). Let X, {Xn } be random variables defined on
a given probability space (Ω, F, P) such that for r ∈ N, E[|X|r ] < ∞ and E[|Xn |r ] < ∞ for all
r
n. We say that {Xn } converges in the r-th mean to X, denoted by Xn → X, if the following
holds:
lim E[|Xn − X|r ] = 0.
n→∞
Example 5.6. Let {Xn } be i.i.d. random variables with E[Xn ] = µ and Var(Xn ) = σ 2 . Define
2
Yn = n1 ni=1 Xi . Then Yn → µ. Indeed
P
Pn
2 Xi − nµ 2 1 1 σ2
E[|Yn − µ| ] = E[ i=1 ] = 2 E[|Sn − E(Sn )|2 ] = 2 Var(Sn ) =
n n n n
Pn 2
where Sn = i=1 Xi . Hence Yn → µ.
Theorem 5.8. The following holds:
r P
i) Xn → X =⇒ Xn → X for any r ≥ 1.
P P
ii) Let f be a given continuous function. If Xn → X, then f (Xn ) → f (X).
Proof. Proof of (i) follows from Markov’s inequality. Indeed, for any given ε > 0,
E[|Xn − X|r ]
P(|Xn − X| > ε) ≤
εr
1
=⇒ lim P(|Xn − X| > ε) ≤ r lim E[|Xn − X|r ] = 0.
n→∞ ε n→∞
Proof of (ii): For any k > 0, we see that
{|f (Xn ) − f (X)| > ε} ⊂ {|f (Xn ) − f (X)| > ε, |X| ≤ k} ∪ {|X| > k} .
Since f is continuous, it is uniformly continuous on any bounded interval. Therefore, for any
given ε > 0, there exists δ > 0 such that |f (x) − f (y)| ≤ ε if |x − y| ≤ δ for x and y in [−k, k].
This means that
{|f (Xn ) − f (X)| > ε, |X| ≤ k} ⊂ {|Xn − X| > δ, |X| ≤ k} ⊂ {|Xn − X| > δ} .
Thus we have
{|f (Xn ) − f (X)| > ε} ⊂ {|Xn − X| > δ} ∪ {|X| > k}
=⇒ P(|f (Xn ) − f (X)| > ε) ≤ P(|Xn − X| > δ) + P(|X| > k).
P
Since Xn → X and limk→∞ P(|X| > k) = 0, we obtain that limn→∞ P(|f (Xn ) − f (X)| > ε) = 0.
This completes the proof.
In general, convergence in probability does not imply convergence in r-th mean. To see it,
consider the following example.
P
Example 5.7. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Let Xn = n1(0,1/n) . Then Xn → 0
r
but Xn 9 0 for all r ≥ 1. To show this, observe that
1 P
P(|Xn | > ε) ≤ =⇒ lim P(|Xn | > ε) = 0 i.e., Xn → 0.
n n→∞
Definition 5.3 (Almost sure convergence). Let X, {Xn } be random variables defined on
a given probability space (Ω, F, P). We say that {Xn } converges to X almost surely or with
probability 1 if the following holds:
P( lim Xn = X) = 1.
n→∞
a.s
We denote it by Xn → X.
Example 5.8. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define
(
1 if ω ∈ (0, 1 − n1 )
Xn (ω) =
n, otherwise.
It is easy to check that if ω = 0 or ω = 1, then limn→∞ Xn (ω) = ∞. For any ω ∈ (0, 1), we
can find n0 ∈ N such that ω ∈ (0, 1 − n1 ) for any n ≥ n0 . As a consequence, Xn (ω) = 1 for any
n ≥ n0 . In other words, for ω ∈ (0, 1), limn→∞ Xn (ω) = 1. Define X(ω) = 1 for all ω ∈ [0, 1].
Then
a.s.
P(ω ∈ [0, 1] : {Xn (ω)} does not converge to X(ω)) = P({0, 1}) = 0 =⇒ Xn → 1.
Sufficient condition for almost sure convergence: Let {An } be a sequence of events in F.
Define
lim sup An = ∩∞
n=1 ∪m≥n Am = lim ∪m≥n Am .
n→∞
n
This can be interpreted probabilistically as
lim sup An = “ An occurs infinitely often”.
n
We denote this as
{An i.o.} = lim sup An .
n
Theorem 5.9 (Borel-Cantelli lemma). Let {An } be a sequence of events in (Ω, F, P).
i) If ∞
P
n=1 P(An ) < +∞, then P(An i.o.) = 0.
ii) If An are mutually independent events, and if ∞
P
n=1 P(An ) = ∞, then
P(An i.o.) = 1.
Remark 5.1. For mutually independent events An , since ∞
P
n=1 P(An ) is either finite or infinite,
the event {An i.o.} has probability either 0 or 1. This is sometimes called zero-one law.
As a consequence of Borel-Cantelli lemma, we have the following proposition.
Proposition 5.10. Let {Xn } be a sequence of random variables defined on a probability space
a.s.
(Ω, F, P). If ∞
P
n=1 P(|Xn | > ε) < +∞ for any ε > 0, then Xn → 0.
Example 5.9. Let {Xn } be a sequence of i.i.d. random variables such that P(Xn = 1) = 12 and
a.s.
P(Xn = −1) = 12 . Let Sn = ni=1 Xi . Then n12 Sn2 → 0. To show the result, we use Proposition
P
5.10. Note that
∞
1 E[|Sn2 |2 ] 1 X 1 1 a.s.
P( 2 |Sn2 | > ε) ≤ 4 2
≤ 2 2 =⇒ P( 2 |Sn2 | > ε) < ∞ =⇒ 2 Sn2 → 0.
n n ε n ε n n
n=1
Note that, for each positive integer n, there exist integers j and k(uniquely determined) such
that
n = 2k + j, , j ∈ {0, . . . , 2k − 1}, k = blog2 (n)c.
( for n = 1, k = j = 0, and for n = 5, k = 2, j = 1 and so on). Let An = {Xn > 0}. Then,
P
clearly P(An ) → 0. Consequently, Xn → 0 but Xn (ω) 9 0 for all ω ∈ Ω.
Theorem 5.11. The followings hold.
a.s P
i) If Xn → X, then Xn → X.
P a.s.
ii) If Xn → X, then there exists a subsequence Xnk of Xn such that Xnk → X.
a.s a.s
iii) If Xn → X, then for any continuous function f , f (Xn ) → f (X).
Proof. Proof of i): For any ε > 0, define Aεn = {|Xn − X| > ε} and Bm ε = ∪∞ Aε . Since
n=m n
a.s ε ) = 0. Note that {B ε } are nested and decreasing sequence of events. Hence
Xn → X, P(∩m Bm m
from the continuity of probability measure P, we have
ε ε
lim P(Bm ) = P(∩m Bm ) = 0.
m→∞
Since Aεm ⊂ Bm
ε , we have P(Aε ) ≤ P(B ε ). This implies that lim
m m
ε
m→∞ P(Am ) = 0. In other
P
words, Xn → X.
P
Proof of ii): To prove ii), we will use Borel-Cantelli lemma. Since Xn → X, we can choose
a subsequence Xnk such that P(|Xnk − X| > k1 ) ≤ 21k . Let Ak := {|Xnk − X| > k1 }. Then
P∞
k=1 P(Ak ) < +∞. Hence, by Borel-Cantelli lemma P(Ak i.o.) = 0. This implies that
1
P(∪∞ ∞ c
n=1 ∩m=n Am ) = 1 =⇒ P {ω ∈ Ω : ∃ n0 : ∀k ≥ n0 , |Xnk − X| ≤ }) = 1
k
a.s.
=⇒ Xnk → X.
Proof of iii): Let N = {ω : limn→∞ Xn (ω) 6= X(ω)}. Then P(N ) = 0. If ω ∈
/ N , then by the
continuity property of f , we have
lim f (Xn (ω)) = f ( lim Xn (ω)) = f (X(ω)) .
n→∞ n→∞
a.s
This is true for any ω ∈
/ N and P(N ) = 0. Hence f (Xn ) → f (X).
PROBABILITY AND STOCHASTIC PROCESS 57
Theorem 5.13 (Continuity theorem). Let X, {Xn } be random variables having the charac-
teristic function φX , {φXn } respectively. Then the followings are equivalent.
d
i) Xn → X.
ii) E(g(Xn )) → E(g(X)) for all bounded Lipschitz continuous function.
iii) limn→∞ φXn (t) = φX (t) for all t ∈ R.
Example 5.14. Let {Xn } be a sequence of Poisson random variables with parameter λn = n.
n −n
Define Zn = X√ n
. Then
d
Zn → Z, where L(Z) = N (0, 1).
Solution: To see this, we use Levy’s continuity theorem. Let ΦZn : R → C be the characteristic
function of Zn . Then we have
√ √ iu
√
i √u X n
ΦZn (u) = E eiuZn = e−iu n E e n n = e−iu n en(e −1) .
Proof of Strong law of large number: With out loss of generality we can assume that µ = 0
(otherwise, if µ 6= 0, then set X̃i = Xi − µ, and work with X̃i ). Set Yn = Snn . Observe that,
thanks to independent property,
n
1 X 1 X σ2
E[Yn ] = 0, E[Yn2 ] = 2 E(Xj Xk ) = 2 E[Xj2 ] = .
n n n
1≤j,k≤n j=1
Thus, limn→∞ E[Yn2 ] = 0, and hence along a subsequence, Yn converges to 0 almost surely. But
we need to show that original sequence converges to 0 with probability 1. To do so, we proceed
2 P∞ σ2
as follows. Since E[Yn2 ] = σn , we see that ∞ 2
P
P∞ n=1 E[Yn2 ] = n=1 n2 < +∞ and hence by Lemma
5.15, ii), n=1 Yn22 converges almost surely. Thus,
lim Yn2 = 0 with probability 1. (5.2)
n→∞
Let n ∈ N. Then there exists m(n) ∈ N such that (m(n))2 ≤ n < (m(n) + 1)2 . Now
(m(n)) 2
n
(m(n))2 1X (m(n))2 1 X
Yn − Y(m(n))2 = Xi − Xi
n n n (m(n))2
i=1 i=1
n
1 X
= Xi
n
i=1+(m(n))2
n
h (m(n))2 2 i 1 X
=⇒ E Yn − Y(m(n))2 = 2 E[Xi2 ]
n n
i=1+(m(n))2
n − (m(n))2 2 2m(n) + 1 2
= σ ≤ σ (∴ n < (m(n) + 1)2 )
n2 n2
√
2 n + 1 2 3σ 2 √
≤ 2
σ ≤ 3 (∴ m(n) ≤ n)
n n2
∞ 2 ∞ 2
X h (m(n)) 2 i X 3σ
=⇒ E Yn − Y(m(n))2 ≤ 3 < +∞ .
n n 2
n=1 n=1
for α and need to estimate it. Let {Uj } be a sequence of independent uniformly random variables
on [0, 1]. Then by Theorem 5.16,
n Z 1
1X
lim f (Uj ) = E[f (Uj )] = f (x) dx
n→∞ n 0
j=1
R1
a.s. and in L2 . Thus, to get an approximation of 0 f (x) dx, we need to simulate the uniform
random variables Uj (by using a random number generator).
Theorem 5.17 (Central limit theorem). Let {Xn } be a sequence of i.i.d. random variables
−nµ
with finite mean µ and variance σ 2 with 0 < σ 2 < +∞. Let Yn = Sσn√ n
. Then Yn converges in
distribution to Y , where L(Y ) = N (0, 1).
Proof. With out loss of generality, we assume that µ = 0. Let Φ, ΦYn be the characteristic
function of Xj and Yn respectively. Since {Xj } are i.i.d., we have
Pn
X
iuYn iu σS√nn iu √ i
i=1
ΦYn (u) = E[e ] = E[e ] = E[e ] σ n
n n
hY X i
iu √i
Y iu X√i u n
=E e σ n = E e σ n = Φ( √ ) .
σ n
i=1 i=1
Since E[|Xj |2 ] < +∞, the function Φ has two continuous derivatives. In particular,
Φ0 (u) = iE[Xj eiuXj ], Φ00 (u) = −E[Xj2 eiuXj ] =⇒ Φ0 (0) = 0, Φ00 (0) = −σ 2 .
Expanding Φ in a Taylor expansion about u = 0, we have
σ 2 u2
Φ(u) = 1 − + h(u)u2 , where h(u) → 0 as u → 0.
2
Thus, we get
u 2 u2
n log(Φ( σ√ )) n log(1− u + u
h( σ√ ))
ΦYn (u) = e n =e 2n nσ 2 n
u2
=⇒ lim ΦYn (u) = e− 2 = ΦY (u) (by L’Hôpital rule).
n→∞
Example 5.17. Let {Xn } be a sequence of i.i.d random variables such that P(Xn = 1) = 12 and
P(Xn = 0) = 12 . Then µ = 12 and σ 2 = 14 . Hence by central limit theorem, Yn = 2S√nn−n converges
in distribution to Y with L(Y ) = N (0, 1).
PROBABILITY AND STOCHASTIC PROCESS 61
Example 5.18. Let X ∼ B(n, p). For any given 0 ≤ α ≤ 1, we want to find n such that
P(X > n2 ) ≤ 1 − α. We can thaink of X as a sum of n i.i.d. random variable Xi such that
Xi ∼ B(1, p). Hence by central limit theorem, for large n,
Z x
1 u2
e− 2 du.
p
P(X − np ≤ x np(1 − p)) = Φ(x) = √
2π −∞
√
Choose x such that np + x np(1 − p) = n2 . This implies that x = 2n √1−2p . Thus,
p
p(1−p)
√
n
Z √1−2p
n n 1 2 u2
p(1−p)
P(X > ) = 1 − P(X ≤ ) = 1 − √ e− 2 du .
2 2 2π −∞
Therefore, we need to choose n such that
Z √n √1−2p p
1 2 p(1−p) − u
2 √ 2 p(1 − p) −1
α≤ √ e 2 du =⇒ n ≥ Φ (α)
2π −∞ 1 − 2p
4p(1 − p) −1 2
=⇒ n ≥ Φ (α) .
(1 − 2p)2
Example
Pn 5.19. Let {Xi } be sequence of i.i.d. random variables with exp(1) distributed. Let
i=1 Xi
X̄ = n . How large should n be such that
P(0.9 ≤ X̄ ≤ 1.1) ≥ 0.95?
Pn
Since Xi ’s are exp(1) distributed, µ = E[Xi ] = 1 and σ 2 = Var(Xi ) = 1. Let Y = i=1 Xi .
Then by central limit theorem, Y√−n
n
is approximately N (0, 1). Now
(0.9)n − n Y −n (1.1)n − n
P(0.9 ≤ X̄ ≤ 1.1) = P((0.9)n ≤ Y ≤ (1.1)n) = P( √ ≤ √ ≤ √ )
n n n
√ Y −n √ √ √
= P − (0.1) n ≤ √ ≤ (0.1) n = Φ((0.1) n) − Φ(−(0.1) n)
n
√
= 2Φ((0.1) n) − 1 (∴ Φ(−x) = 1 − Φ(x))
Hence we need to find n such that
√ √ √
2Φ((0.1) n) − 1 ≥ 0.95 =⇒ Φ((0.1) n) ≥ 0.975 =⇒ (0.1) n ≥ Φ−1 (0.975) = 1.96
√
=⇒ n ≥ 19.6 =⇒ n ≥ 384.16 =⇒ n ≥ 385 (∴ n ∈ N).