Random Variable
Random Variable
Convergence of sequences of
random variables
Throughout this chapter we assume that {X
1
, X
2
, . . .} is a sequence of r.v. and X is a
r.v., and all of them are dened on the same probability space (, F, P).
Stochastic convergence formalizes the idea that a sequence of r.v. sometimes is
expected to settle into a pattern.
1
The pattern may for instance be that:
there is a convergence of X
n
() in the classical sense to a xed value X(), for each
and every event ;
the probability that the distance between X
n
from a particular r.v. X exceeds any
prescribed positive value decreases and converges to zero;
the series formed by calculating the expected value of the (absolute or quadratic)
distance between X
n
and X converges to zero;
the distribution of X
n
may grow increasingly similar to the distribution of a
particular r.v. X.
Just as in analysis, we can distinguish among several types of convergence (Rohatgi,
1976, p. 240). Thus, in this chapter we investigate modes of convergence of sequences of
r.v.:
almost sure convergence (
a.s.
);
convergence in probability (
P
);
convergence in quadratic mean or in L
2
(
q.m.
);
convergence in L
1
or in mean (
L
1
);
1
See http://en.wikipedia.org/wiki/Convergence of random variables.
216
convergence in distribution (
d
).
Two laws of large numbers and central limit theorems are also stated.
It is important for the reader to be familiarized with all these modes of convergence,
the way they can be related and with the applications of such results and understand
their considerable signicance in probability, statistics and stochastic processes.
5.1 Modes of convergence
The rst four modes of convergence (
, where = a.s., P, q.m., L
1
) pertain to the
sequence of r.v. and to X as functions of , while the fth (
d
) is related to the convergence
of d.f. (Karr, 1993, p. 135).
5.1.1 Convergence of r.v. as functions on
Motivation 5.1 Almost sure convergence (Karr, 1993, p. 135)
Almost sure convergence or convergence with probability one is the probabilistic
version of pointwise convergence known from elementary real analysis.
Denition 5.2 Almost sure convergence (Karr, 1993, p. 135; Rohatgi, 1976, p.
249)
The sequence of r.v. {X
1
, X
2
, . . .} is said to converge almost surely to a r.v. X if
P
__
w : lim
n+
X
n
() = X()
__
= 1. (5.1)
In this case we write X
n
a.s.
X (or X
n
X with probability 1).
Exercise 5.3 Almost sure convergence
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that X
n
Bernoulli(
1
n
), n IN.
Prove that X
n
a.s.
0, by deriving P({X
n
= 0, for every m n n
0
}) and observing
that this probability does not converge to 1 as n
0
+ for all values of m (Rohatgi,
1976, p. 252, Example 9).
217
Motivation 5.4 Convergence in probability (Karr, 1993, p. 135;
http://en.wikipedia.org/wiki/Convergence of random variables)
Convergence in probability essentially means that the probability that |X
n
X| exceeds
any prescribed, strictly positive value converges to zero.
The basic idea behind this type of convergence is that the probability of an unusual
outcome becomes smaller and smaller as the sequence progresses.
Denition 5.5 Convergence in probability (Karr, 1993, p. 136; Rohatgi, 1976, p.
243)
The sequence of r.v. {X
1
, X
2
, . . .} is said to converge in probability to a r.v. X denoted
by X
n
P
X if
lim
n+
P ({|X
n
X| > }) = 0, (5.2)
for every > 0.
Remarks 5.6 Convergence in probability (Rohatgi, 1976, p. 243;
http://en.wikipedia.org/wiki/Convergence of random variables)
The denition of convergence in probability says nothing about the convergence
of r.v. X
n
to r.v. X in the sense in which it is understood in real analysis. Thus,
X
n
P
X does not imply that, given > 0, we can nd an N such that |X
n
X| < ,
for n N.
Denition 5.5 speaks only of the convergence of the sequence of probabilities
P (|X
n
X| > ) to zero.
Formally, Denition 5.5 means that
, > 0, N
: P({|X
n
X| > }) < , n N
. (5.3)
The concept of convergence in probability is used very often in statistics. For
example, an estimator is called consistent if it converges in probability to the
parameter being estimated.
Convergence in probability is also the type of convergence established by the weak
law of large numbers.
218
Exercise 5.7 Convergence in probability
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that X
n
Bernoulli(
1
n
), n IN.
(a) Prove that X
n
P
0, by obtaining P({|X
n
| > }), for 0 < < 1 and 1 (Rohatgi,
1976, pp. 243244, Example 5).
(b) Verify that E(X
k
n
) E(X
k
), where k IN and X
d
= 0.
Exercise 5.8 Convergence in probability does not imply convergence of kth.
moments
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that X
n
d
= n Bernoulli(
1
n
), n IN, i.e.
P({X
n
= x}) =
_
_
1
1
n
, x = 0
1
n
, x = n
0, otherwise.
(5.4)
Prove that X
n
P
0, however E(X
k
n
) E(X
k
), where k IN and the r.v. X is degenerate
at 0 (Rohatgi, 1976, p. 247, Remark 3).
Motivation 5.9 Convergence in quadratic mean and in L
1
We have just seen that convergence in probability does not imply the convergence of
moments, namely of orders 2 or 1.
Denition 5.10 Convergence in quadratic mean or in L
2
(Karr, 1993, p. 136)
Let X, X
1
, X
2
, . . . belong to L
2
. Then the sequence of r.v. {X
1
, X
2
, . . .} is said to converge
to X in quadratic mean (or in L
2
) denoted by X
n
q.m.
X (or X
n
L
2
X) if
lim
n+
E
_
(X
n
X)
2
= 0. (5.5)
n
i=1
X
i
.
Prove that if there is a constant c such that V (X
i
) c, for every i, then
Sn
n
q.m.
0 for all >
1
2
.
Denition 5.15 Convergence in mean or in L
1
(Karr, 1993, p. 136)
Let X, X
1
, X
2
, . . . belong to L
1
. Then the sequence of r.v. {X
1
, X
2
, . . .} is said to converge
to X in mean (or in L
1
) denoted by X
n
L
1
X if
lim
n+
E (|X
n
X|) = 0. (5.6)
, 0 < x <
0, otherwise,
(5.8)
where 0 < < +, and X
(n)
= max
1,...,n
X
i
.
Prove that X
(n)
d
(Rohatgi, 1976, p. 241, Example 2).
221
Exercise 5.21 A sequence of d.f. converging to a non d.f.
Consider the sequence of d.f.
F
Xn
(x) =
_
0, x < n
1, x n,
(5.9)
where F
Xn
(x) is the d.f. of the r.v. X
n
degenerate at x = n.
Verify that F
Xn
(x) converges to a function (that is identically equal to 0!!!) which is
not a d.f. (Rohatgi, 1976, p. 241, Example 1).
Exercise 5.22 The requirement that only the continuity points of F
X
should
be considered is essential
Let X
n
Uniform
_
1
2
1
n
,
1
2
+
1
n
_
and X be a r.v. degenerate at
1
2
.
(a) Prove that X
n
d
X (Karr, 1993, p. 142).
(b) Verify that F
Xn
_
1
2
_
=
1
2
for each n, and these values do not converge to F
X
_
1
2
_
= 1.
Is there any contradiction with the convergence in distribution previously proved?
(Karr, 1993, p. 142.)
Exercise 5.23 The requirement that only the continuity points of F
X
should
be considered is essential (bis)
Let X
n
Uniform
_
0,
1
n
_
and X a r.v. degenerate at 0.
Prove that X
n
d
X, even though F
Xn
(0) = 0, for all n, and F
X
(0) = 1,
that is, the convergence of d.f. fails at the point x = 0 where F
X
is discontinuous
(http://en.wikipedia.org/wiki/Convergence of random variables).
Exercise 5.24 Convergence in distribution does not imply convergence of
corresponding p.(d.)f.
Let {X
1
, X
2
, . . .} be a sequence of r.v. with p.f.
P({X
n
= x}) =
_
1, x = 2 +
1
n
0, otherwise.
(5.10)
(a) Prove that X
n
d
X, where X a r.v. degenerate at 2.
(b) Verify that none of the p.f. P({X
n
= x}) assigns any probability to the point x = 2,
for all n, and that P({X
n
= x}) 0 for all x (Rohatgi, 1976, p. 242, Example 4).
222
The following table condenses the denitions of convergence of sequences of r.v.
Mode of convergence Assumption Dening condition
X
n
a.s.
X (almost sure) P({w : X
n
() X()}) = 1
X
n
P
X (in probability) P ({|X
n
X| > }) 0, for all > 0
X
n
q.m
X (in quadratic mean) X, X
1
, X
2
, . . . L
2
E
_
(X
n
X)
2
0
X
n
L
1
X (in L
1
) X, X
1
, X
2
, . . . L
1
E (|X
n
X|) 0
X
n
d
X (in distribution) F
Xn
(x) F
X
(x), at continuity points x of F
X
Exercise 5.25 Modes of convergence and uniqueness of limit (Karr, 1993, p.
158, Exercise 5.1)
Prove that for all ve forms of convergence the limit is unique. In particular:
(a) if X
n
X and X
n
X X
n
X
0, (5.11)
i.e. the four function-based forms of convergence are compatible with the vector space
structure of the family of r.v.
223
5.1.3 Alternative criteria
The denition of almost sure convergence and its verication are far from trivial. More
tractable criteria have to be stated...
Proposition 5.27 Relating almost sure convergence and convergence in
probability (Karr, 1993, p. 137; Rohatgi, 1976, p. 249)
X
n
a.s.
X i
> 0, lim
n+
P
__
sup
kn
|X
k
X| >
__
= 0, (5.12)
i.e.
X
n
a.s.
X Y
n
= sup
kn
|X
k
X|
P
0. (5.13)
n=1
P ({|X
n
X| > } ) < +, (5.17)
for every > 0.
The next results relate complete convergence, which is stronger than almost sure
convergence, and sometimes more convenient to establish (Karr, 1993, p. 137).
Proposition 5.34 Relating almost sure convergence and complete
convergence (Karr, 1993, p. 138)
+
n=1
P ({|X
n
X| > } ) < +, > 0 X
n
a.s.
X. (5.18)
+
n=1
P ({|X
n
X| > } ), is nite.
225
Exercise 5.36 Relating almost sure convergence and complete convergence
Prove Proposition 5.34, by using the (1st.) Borel-Cantelli lemma (Karr, 1993, p. 138).
Theorem 5.37 Almost sure convergence of a sequence of independent r.v.
(Rohatgi, 1976, p. 265)
Let {X
1
, X
2
, . . .} be a sequence of independent r.v. Then
X
n
a.s.
0
+
n=1
P ({|X
n
| > } ) < +, > 0. (5.19)
The next table summarizes the alternative criteria and sucient conditions for almost
sure convergence and convergence in distribution of sequences of r.v.
Alternative criterion or sucient condition Mode of convergence
> 0, lim
n+
P
__
sup
kn
|X
k
X| >
__
= 0 X
n
a.s.
X
Y
n
= sup
kn
|X
k
X|
P
0 X
n
a.s.
X
lim
n+
P ({sup
m
|X
n+m
X
n
| }) = 1, > 0 X
n
a.s.
X
+
n=1
P ({|X
n
X| > } ) < +, > 0 X
n
a.s.
X
E[f(X
n
)] E[f(X)], f C X
n
d
X
E[f(X
n
)] E[f(X)], f C
(k)
for a xed k IN
0
X
n
d
X
227
5.2 Relationships among the modes of convergence
Given the plethora of modes of convergence, it is natural to inquire how they can be
always related or hold true in the presence of additional assumptions (Karr, 1993, pp. 140
and 142).
5.2.1 Implications always valid
Proposition 5.43 Almost sure convergence implies convergence in
probability (Karr, 1993, p. 140; Rohatgi, 1976, p. 250)
X
n
a.s.
X X
n
P
X. (5.22)
228
Exercise 5.50 Convergence in probability implies convergence in
distribution
Prove Proposition 5.49, (Karr, 1993, p. 141).
Figure 5.1 shows that convergence in distribution is the weakest form of convergence,
since it is implied by all other types of convergence studied so far.
X
n
q.m.
X X
n
L
1
X
X
n
P
X X
n
d
X
X
n
a.s.
X
Figure 5.1: Implications always valid between modes of convergence.
5.2.2 Counterexamples
Counterexamples to all implications among the modes of convergence (and more!) are
condensed in Figure 5.2 and presented by means of several exercises.
X
n
q.m.
X X
n
L
1
X
X
n
P
X X
n
d
X
X
n
a.s.
X
Figure 5.2: Counterexamples to implications among the modes of convergence.
Before proceeding with exercises, recall exercises 5.3 and 5.7 which pertain to the
sequence of r.v. {X
1
, X
2
, . . .}, where X
n
Bernoulli(
1
n
), n IN. In the rst exercise
we proved that X
n
a.s.
0, whereas in the second one we concluded that X
n
P
0. Thus,
combining these results we can state that X
n
P
0 X
n
a.s.
0.
229
Exercise 5.51 Almost sure convergence does not imply convergence in
quadratic mean
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that
P({X
n
= x}) =
_
_
1
1
n
, x = 0
1
n
, x = n
0, otherwise.
(5.26)
Prove that X
n
a.s.
0, and, hence, X
n
P
0 and X
n
d
0, but X
n
L
1
0 and X
n
q.m.
0
(Karr, 1993, p. 141, Counterexample a)).
Exercise 5.52 Almost sure convergence does not imply convergence in
quadratic mean (bis)
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that
P({X
n
= x}) =
_
_
1
1
n
r
, x = 0
1
n
r
, x = n
0, otherwise,
(5.27)
where r 2.
Prove that X
n
a.s.
0, but X
n
q.m.
0 (Rohatgi, 1976, p. 252, Example 10).
Exercise 5.53 Convergence in quadratic mean does not imply almost sure
convergence
Let X
n
Bernoulli
_
1
n
_
.
Prove that X
n
q.m.
0, but X
n
a.s.
0 (Rohatgi, 1976, p. 252, Example 9).
Exercise 5.54 Convergence in L
1
does not imply convergence in quadratic
mean
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that
P({X
n
= x}) =
_
_
1
1
n
, x = 0
1
n
, x =
n
0, otherwise.
(5.28)
Prove that X
n
a.s.
0 and X
n
L
1
0, however X
n
q.m.
0 (Karr, 1993, p. 141,
Counterexample b)).
230
Exercise 5.55 Convergence in probability does not imply almost sure
convergence
For each positive integer n there exists integers m and k (uniquely determined) such that
n = 2
k
+m, m = 0, 1, . . . , 2
k
1, k = 0, 1, 2, . . . (5.29)
Thus, for n = 1, k = m = 0; for n = 5, k = 2, m=1; and so on.
Dene r.v. X
n
, for n = 1, 2, . . ., on = [0, 1] by
X
n
() =
_
2
k
,
m
2
k
w <
m+1
2
k
0, otherwise.
(5.30)
Let the probability distribution of X
n
be given by P({I}) =length of the interval I .
Thus,
P({X
n
= x}) =
_
_
1
1
2
k
, x = 0
1
2
k
, x = 2
k
0, otherwise.
(5.31)
Prove that X
n
P
0, but X
n
a.s.
0 (Rohatgi, 1976, pp. 2512, Example 8).
Exercise 5.56 Convergence in distribution does not imply convergence in
probability
Let {X
1
, X
2
, . . .} be a sequence of r.v. such that
F
Xn
(x) =
_
_
0, x < 0
1
2
1
n
, 0 x < 1
1, x 1,
(5.32)
i.e. X
n
Bernoulli
_
1
2
+
1
n
_
.
Prove that X
n
d
X, where X Bernoulli
_
1
2
_
, but X
n
P
X (Karr, 1993, p. 142,
Counterexample d)).
Exercise 5.57 Convergence in distribution does not imply convergence in
probability (bis)
Let X, X
1
, X
2
, . . . be i.i.d. r.v. and let the joint p.f. of (X, X
n
) be P({X = 0, X
n
= 1}) =
P({X = 1, X
n
= 0}) =
1
2
.
Prove that X
n
d
X, but X
n
P
X (Rohatgi, 1976, p. 247, Remark 2).
231
5.2.3 Implications of restricted validity
Proposition 5.58 Convergence in distribution to a constant implies
convergence in probability (Karr, 1993, p. 140; Rohatgi, 1976, p. 246)
Let {X
1
, X
2
, . . .} be a sequence of r.v. and c IR. Then
X
n
d
c X
n
P
c. (5.33)
232
Exercise 5.64 Combining convergence in probability and uniform
integrability is equivalent to convergence in L
1
Prove Proposition 5.63 (Karr, 1993, p. 144).
Exercise 5.65 Combining convergence in probability of the sequence of r.v.
and convergence of sequence of the means implies convergence in L
1
(Karr,
1993, p. 160, Exercise 5.16)
Let X, X
1
, X
2
, . . . be positive r.v.
Prove that if X
n
P
X and E(X
n
) E(X), then X
n
L
1
X.
Exercise 5.66 Increasing character and convergence in probability
combined imply almost sure convergence (Karr, 1993, p. 160, Exercise 5.15)
Prove that if X
1
X
2
. . . and X
n
P
X, then X
n
a.s.
X.
Exercise 5.67 Strictly decreasing and positive character and convergence
in probability combined imply almost sure convergence (Rohatgi, 1976, p. 252,
Theorem 13)
Let {X
1
, X
2
, . . .} be a strictly decreasing sequence of positive r.v.
Prove that if X
n
P
0 then X
n
a.s.
0.
References
Karr, A.F. (1993). Probability. Springer-Verlag.
Resnick, S.I. (1999). A Probability Path. Birkh auser. (QA273.4-.67.RES.49925)
Rohatgi, V.K. (1976). An Introduction to Probability Theory and Mathematical
Statistics. John Wiley & Sons. (QA273-280/4.ROH.34909)
233