0% found this document useful (0 votes)
21 views21 pages

Ch11 12

Uggugguhghgigugugugigigjvvjgjgigugigugufugugugugihigihih igigigjvivigivcyfkcugigug gugjvkhk mbm gkbkgkvivkvkbkbibkbm mb kbigkbifibhmbm kbkbk k vkvivm k k kbk kbivufugihihogohohohohpjph

Uploaded by

Sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views21 pages

Ch11 12

Uggugguhghgigugugugigigjvvjgjgigugigugufugugugugihigihih igigigjvivigivcyfkcugigug gugjvkhk mbm gkbkgkvivkvkbkbibkbm mb kbigkbifibhmbm kbkbk k vkvivm k k kbk kbivufugihihogohohohohpjph

Uploaded by

Sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 11

The Central Limit


Theorem

Printed: April 13, 2020

1. Sums of independent identically distributed


random variables
2
Denote by Z the ”standard normal random variable” with density √1 e−x /2 .

2 /2
Lemma 11.1. EeitZ = e−t

Proof. We use the same calculation as for the moment generating function:
Z ∞
1 2 −t2 /2
Z ∞
1 √
exp(itx − x )dx = e exp(− (x − it)2 )dx = 2π
−∞ 2 −∞ 2
2 2
Note that e−z /2 is an analytic function so γ e−z /2 dz = 0 over any closed path. So
H

Z A Z A Z it Z it
2 /2
exp −(x − it)2 /2dx − e−x dx + exp(−(A − is)2 /2)ds − exp(−(−A − is)2 /2)ds = 0
−A −A 0 0

Theorem 11.2 (CLT for i.i.d.). Suppose {Xn } is i.i.d. with mean m and variance 0 < σ 2 < ∞.
Let Sn = X1 + · · · + Xn . Then
Sn − nm D
√ −→Z
σ n

This is one of the special cases of the Lindeberg theorem and the proof uses characteristic functions.
2
Note that ϕSn /√n (t) = e−t /2 when Xj are independent N (0, 1).

121
122 11. The Central Limit Theorem

In general, ϕSn /√n (t) is a complex number. For example, when Xn are exponential with pa-
rameter λ = 1, the conclusion says that

e−it n
2 /2
ϕSn /√n (t) = n → e−t
1 − i √tn
which is not so obvious to see. On the other hand, characteristic function in Exercise 10.5 on page
119 is real and the limit can be found using calculus:
√ 2
ϕSn /√n (t) = cosn (t/ n) → e−t /2 .
Here is a simple inequality that will suffice for the proof in the general case.

Lemma 11.3. If z1 , . . . , zm and w1 , . . . , wm are complex numbers of modulus at most 1 then


m
X
(11.1) |z1 . . . zm − w1 . . . wm | ≤ |zk − wk |
k=1

Proof. Write the left hand side of (11.1) as a telescoping sum:


m
X
z1 . . . zm − w1 . . . wm = z1 . . . zk−1 (zk − wk )wk+1 . . . wm
k=1

(Omitted in 2020)
Example 11.1. We show how to complete the proof for the exponential distribution.
√ √ !n √
e−it n
−t2 /2 e−it/ n 2 e−it/ n 2
n − e = − (e−t /(2n) n
) ≤n − e−t /(2n)
1 − i √tn 1 − i √tn

1 − i √tn
√ √
1 − it/ n + t2 /(2n) + it3 /(6n n) − . . .
=n t − 1 + t2 /(2n) − t4 /(6n2 ) + . . .
1 − i √n
t2 it3 t2 t2 t4
  
it t
=n 1− √ − − √ + ... 1 + i√ − + ... − 1 + − + ...
n 2n 6n n n n 2n 6n2
t2 t2 t3 t2 t4 C(t)
= n (1 − + + i √ − ··· − 1 + − + . . . ≤ n √ → 0.
n 2n 6n n 2n 6n2 n n

Proof of Theorem 11.2. Without loss of generality we may assume m = 0 and σ = 1. We have
√ t2
ϕSn /√n (t) = ϕX (t/ n)n . For a fixed t ∈ R choose n large enough so that 1 − 2n > −1. For such
√ t2
n, we can apply (11.1) with zk = ϕX (t/ n) and wk = 1 − 2n . We get
√ t2 |t||X|3
 n  
√ t2 2
√ ,X 2
ϕSn / n (t) − 1 − 2n ≤ n ϕX (t/ n) − 1 − ≤ t E min
2n n
3 √ 2
Noting that limn→∞ min{|t||X| / n, X } = 0, by dominated n convergence
o theorem (the integrand
2 |t||X|3 2
is dominated by the integrable function X ) we have E min √n , X → 0 as n → ∞. So
n
t2


lim ϕSn / n (t) − 1 − = 0.
n→∞ 2n
3. Lindeberg’s theorem 123

t2 n 2 /2
It remains to notice that (1 − 2n ) → e−t . 
D
Remark 11.4. If Xn − → Z then the cumulative distribution functions converge uniformly: supn |P (Xn ≤
x) − P (Z ≤ x)| → 0.
Example 11.2 ( Normal approximation to Binomial). If Xn is Bin(n, p) and p is fixed then
1 √ p
P ( n Xn < p + x/ n) → P (Z ≤ x p(1 − p) as n → ∞.
Example 11.3 ( Normal approximation to Poisson). If Xλ is P oiss and p is fixed then (Xλ −
√ D √ D
λ)/ λ −→ Z as λ → ∞. (Strictly speaking, the CLT gives only convergence of (Xλn − λn)/ nλ −

Z as n → ∞.)

(Omitted in 2020)

2. General form of a limit theorem


The general problem of convergence in distribution can be stated as follows: Given a sequence Zn of random variables, find
normalizing constants an , bn and a limiting distribution/random variable Z such that (Zn − bn )/an → Z.
In Example 9.1, Zn is a maximum, an = 1, bn = log n.
p
In Theorem 11.2, Zn is the sum, the normalizing constants are bn = E(Sn ) and an = V ar(Sn ), and we will make the
same choice for sums of independent random variables in the next section. However, finding an appropriate normalization
for CLT may be not obvious or easy, see Section 5.
One may wonder how much flexibility do we have in the choice of the normalizing constants an , bn

D D
Theorem 11.5 (Convergence of types). Suppose Xn −→ X and an Xn + bn −→ Y for some an > 0, bn ∈ R, and both
X, Y are non-degenerate. Then an → a > 0 and bn → b and in particular Y has the same law as aX + b.

an
So if (Zn − bn )/an → Z and (Zn − b0n )/a0n → Z 0 then (Zn − b0n )/a0n = a0n
((Zn − bn )/an ) + (bn − b0n )/a0n , which means
b
that an /a0n → a > 0 and (bn − b0n )/a0n → b. So a0n = an /a, b0n = bn − a
a n
and Z 0 = aZ + b.

(Omitted in 2020)

Proof. To be written...  

It is clear that independence alone is not sufficient for the CLT.

3. Lindeberg’s theorem
The setting is of sums of triangular arrays: For each n we have a family of independent random
variables
Xn,1 , . . . , Xn,rn
and we set Sn = Xn,1 + · · · + Xn,rn .
−m
Xk√ Xk −m
For Theorem 11.2, the triangular array can be Xn,k = σ n
. Or one can take Xn,k = σ ...
Through this section we assume that random variables are square-integrable with mean zero,
and we use the notation
rn
X
2 2 2 2
(11.2) E(Xn,k ) = 0, σnk = E(Xn,k ), sn = σnk
k=1
124 11. The Central Limit Theorem

Definion 11.1 (The Lindeberg condition). We say that the Lindeberg condition holds if
rn Z
1 X 2
(11.3) lim Xnk dP = 0 for all ε > 0
n→∞ s2
n |Xnk |>εsnk=1

2 dP can be replaced by 2 dP and the


R R
(Note that strict inequality |Xnk |>εsn Xnk |Xnk |≥εsn Xnk
resulting condition is the same.)
Remark 11.6. Under the Lindeberg condition, we have
2
σnk
(11.4) lim max =0
n→∞ k≤rn s2n
Indeed, Z Z Z
2 2 2
σnk = Xnk dP + Xnk dP ≤ εs2n + 2
Xnk dP
|Xnk |≤εsn |Xnk |>εsn |Xnk |>εsn
So
rn Z
σ2
Z
1 2 1 X 2
max nk ≤ ε + 2 max Xnk dP ≤ε+ 2 Xnk dP
k≤rn s2
n s n k≤rn |Xnk |>εsn sn |Xnk |>εsn
k=1

Theorem 11.7 (Lindeberg CLT). Suppose that for each n the sequence Xn1 . . . Xn,rn is inde-
D
pendent with mean zero. If the Lindeberg condition holds for all ε > 0 then Sn /sn −
→ Z.

Xk −m
Example 11.4 (Proof of Theorem 11.2). In the setting of Theorem 11.2, we have Xn,k =
√ σ
and sn = n. The Lindeberg condition is
n Z
(Xk − m)2
Z
1X 1
lim √ 2
dP = lim 2 √
(X1 − m)2 = 0
n→∞ n |Xk −m|>εσ n σ n→∞ σ |X1 −m|>εσ n
k=1

by Lebesgue dominated convergence theorem, say. (Or by Corollary 6.12 on page 71.)

Proof. Without loss of generality we may assume that s2n = 1 so that rk=1
Pn 2
σnk = 1. Denote
ϕnk = E(eitXnk ). From (10.13) we have
1
ϕnk (t) − (1 − t2 σnk
2
) ≤ E min{|tXnk |2 , |tXnk |3 }

(11.5)
2
Z Z Z
3 2 3 2 2 2
≤ |tXnk | dP + |tXnk | dP ≤ t εσnk + t Xnk dP
|Xnk |<ε |Xnk |≥ε |Xnk |≥ε

Using (11.1), we see that


n n n Z
Y 1 X X
(11.6) ϕSn (t) − (1 − t2 σnk
2
) ≤ εt3 2
σnk + t2 2
|Xnk dP
2 |Xnk |>ε
k=1 k=1 k=1
This shows that
n
Y
lim ϕSn (t) − (1 − 21 t2 σnk
2
) =0
n→∞
k=1
4. Lyapunov’s theorem 125

2 /2 Qn
It remains to verify that limn→∞ e−t − k=1 (1 − 12 t2 σnk
2 ) = 0.

To do so, we apply the previous proof to the triangular array σn,k Zk of independent normal
random variables. Note that
rn
2 2 2
Y
ϕP Znk (t) = e−t σnk /2 = e−t /2
k=1

We only need to verify the Lindeberg condition for {Znk }:


Z Z
2 2
Znk dP = σnk x2 f (x)dx
Znk >ε |x|>ε/σnk
So
Xrn Z rn
X Z Z Z
2 2 2 2
Znk dP ≤ σnk x f (x)dx ≤ max x f (x)dx ≤ x2 f (x)dx
Znk >ε |x|>ε/σnk 1≤k≤rn |x|>ε/σnk |x|>ε/ maxk σnk
k=1 k=1

The right hand side goes to zero as n → ∞, because by max1≤k≤rn σnk → 0 by (11.4). 

4. Lyapunov’s theorem

Theorem 11.8. Suppose that for each n the sequence Xn1 . . . Xn,rn is independent with mean
zero. If the Lyapunov’s condition
n
1 X
(11.7) lim 2+δ E|Xnk |2+δ = 0
n→∞ sn
k=1
D
holds for some δ > 0, then Sn /sn −
→Z

Proof. We use the following bound to verify Lindeberg’s condition:


rn Z rn Z n
1 X 2 1 X 2+δ 1 X
Xnk dP ≤ |Xnk | dP ≤ E|Xnk |2+δ
s2n |Xnk |>εs n ε δ s2+δ
n |Xnk |>εs n ε δ s2+δ
n
k=1 k=1 k=1

Corollary 11.9. Suppose Xk are independent with mean zero, variance σ 2 and that
√ D
supk E|Xk |2+δ < ∞. Then Sn / n −
→ σZ.

√ 1 Pn
Proof. Let C = supk E|Xk |2+δ Then sn = n and k=1 E(|Xk |
2+δ ) ≤ C/nδ/2 → 0, so
s2+δ
n
Lyapunov’s condition is satisfied. 

Corollary 11.10. Suppose Xk are independent, uniformly bounded, and have mean zero. If
P p D
n Var(X n ) = ∞, then S n / Var(Sn ) −
→ N (0, 1).
126 11. The Central Limit Theorem

Proof. Suppose |Xn | ≤ C for a constant C. Then

n
1 X 3 s2n C
E|Xn | ≤ C = →0
s3n s3n sn
k=1

5. Normal approximation without Lindeberg


condition
One basic idea is truncation: Xn = Xn I|Xn |≤an +Xn I|Xn |>an . One wants to show that s1n
P
Xk I|Xk |≤an →
P
Z and that s1n
P
Xk I|Xk |>an −
→ 0. Then Sn /sn is asymptotically normal by Slutski’s theorem.

Example 11.5. Let X1 , X2 , . . . be independent random variables with the distribution (k ≥ 1)

Pr(Xk = ±1) = 1/4,


Pr(Xk = k k ) = 1/4k ,
Pr(Xk = 0) = 1/2 − 1/4k .

k D √ D √
Then σk2 = 21 + k4 and sn ≥ nn /4n . But Sn /sn −→ 0 and in fact we have Sn / n − → Z/ 2. To see
this, note that Yk = Xk I|X|k ≤ 1 are independent with mean 0, variance 12 and P (Yk 6= Xk ) = 1/4k
so by the first Borel Cantelli Lemma (Theorem 3.8) | √1n nk=1 (Yk −Xk )| ≤ √Un → 0 with probability
P
one.

It is sometimes convenient to use Corollary 9.5 (Exercise 9.2) combined with the law of large
numbers. This is how one needs to proceed in Exercise 11.2.

Example 11.6. Suppose X1 , X2 , . . . , are i.i.d. with mean 0 and variance σ 2 > 0. Then

Pn
X
qPk=1 k
n 2
k=1 Xk

converges in distribution to N (0, 1). To see this, write

Pn Pn
Xk σ X
qPk=1
=q P √ k
× k=1
n 2 1 n 2 σ n
k=1 Xk n k=1 Xk

and note that the first factor converges to 1 with probability one.
Required Exercises 127

Required Exercises
Pn 2
Exercise 11.1. Suppose ank is an array of numbers such that k=1 ank = 1 and max1≤k≤n |ank | →
Pn D
0. Let Xj be i.i.d. with mean zero and variance 1. Show that k=1 ank Xk −
→ Z.
Exercise 11.2. Suppose that X1 , X2 , . . . are i.i.d., E(X1 ) = 1, Var(X12 ) = σ 2 < ∞. Let X̄n =
1 Pn
n j=1 Xj . Show that for all k > 0
√  k 
D
n X̄n − 1 − → N (0, kσ)
as n → ∞.
Exercise 11.3. Suppose X1 , X2 , . . . are independent, Xk = ±1 with probability 12 (1 − k −2 ) and
Xk = ±k with probability 21 k −2 . Let Sn = nk=1 Xk
P

√ D
(i) Show that Sn / n −
→ N (0, 1)
(ii) Is the Lindeberg condition satisfied?
Exercise 11.4. Suppose X1 , X2 , . . . are independent
P random variables with distribution Pr(Xk =
1) = pk and Pr(Xk = 0) = 1 − pk . Prove that if V ar(Xk ) = ∞ then
Pn
(Xk − pk ) D
pPk=1 n −
→ N (0, 1).
k=1 pk (1 − pk )

Exercise 11.5. Suppose Xk are independent and have density 1


for |x| > 1. Show that √ Sn →
|x|3 n log n
N (0, 1).
Hint: Verify that Lyapunov’s condition (11.7) holds with δ = 1 for truncated random variables.
Several different truncations can be used, but technical details differ:
P
• Yk = Xk I|Xk |≤√k is a solution in [Billingsley]. To show that √n 1log n nk=1 (Xk − Yk ) −
P
→ 0 use
L1 -convergence.
• Triangular array Ynk = Xk I|Xk |≤√n is simpler computationally
• Truncation Yk = Xk I|Xk |≤√k log k leads to “asymptotically equivalent” sequences.

Exercise 11.6 ( stat). A real estate aggent wishes to estimate the unknown mean sale price of a
house µ which she believes is well described by the distribution which has finite second moment.
She estimates µ by the sample mean X̄n of the i.i.d. sample X1 , . . . , Xn , and she estimates the
variance by the expression
n
2 1 X
Sn = (Xk − X̄n )2 .
n−1
k=1

She then uses a formula X̄n ± zα Sn / n from Wikpiedia to produce the large sample confidence
interval for µ. To understand why this procedure works, she would like to know that
D
(X̄n − µ)/Sn −
→ N (0, 1).
Please supply the proof.
Exercise 11.7 ( stat). A psychologist wishes to estimate parameter λ > 0 of the exponential
distribution, see Example 2.4, by taking the average X̄n of the i.i.d. sample X1 , . . . , Xn , and
128 11. The Central Limit Theorem

defining λ̂n = 1/X̄n . Show that λ̂n is asymptotically normal, i.e. determine an (λ) such that the
α-confidence interval for λ is
λ̂n ± an (λ)zα/2
where zα/2 comes from the normal table P (Z > zα/2 ) = α/2.

Some previous
prelim problems

Exercise 11.8 (May 2018). Suppose that X1 , X2 , . . . are independent random variables with dis-
tributions
1 1−k
P (Xk = ±1) = and P (Xk = 0) = .
2k k
Prove that
n
1 X D
√ Xk − → N (0, 1).
ln n k=1

Exercise 11.9 (Aug 2017). Let {Xn }n∈N be a collection of independent random variables with
1 1
P(Xn = ±n2 ) = β
and P(Xn = 0) = 1 − β , n ∈ N,
2n n
where β ∈ (0, 1) is fixed for all n ∈ N. Consider Sn := X1 + · · · + Xn . Show that
Sn D
→ N (0, σ 2 )


for some σ > 0, γ > 0. Identify σ and γ as functions of β. You may use the formula
n
X nθ+1
kθ ∼
θ+1
k=1

for θ > 0, and recall that by an ∼ bn we mean limn→∞ an /bn = 1.

Exercise 11.10 (May 2017). Let {Xn }n∈N be independent random variables with P(Xn = 1) =
1/n = 1 − P(Xn = 0). Let Sn := X1 + · · · + Xn be the partial sum.
(i) Show that
ESn Var(Sn )
lim =1 and lim = 1.
n→∞ log n n→∞ log n
(ii) Prove that
Sn − log n D
√ −
→ N (0, 1)
log n
as n → ∞. Explain which central limit theorem you use. State and verify all the conditions
clearly.
Pn
k=1 1/k
Hint: recall the relation lim = 1.
n→∞ log n
Exercise 11.11 (May 2016).(a) State Lindeberg–Feller central limit theorem.
Some previous prelim problems 129

(b) Use Lindeberg–Feller central limit theorem to prove the following. Consider a triangular ar-
ray of random variables {Yn,k }n∈N,k=1,...,n such that for each n, EYn,k = 0, k = 1, . . . , n, and
{Yn,k }k=1,...,n are independent. In addition, with σn := ( nk=1 EYn,k
2 )1/2 , assume that
P

n
1 X 4
lim 4
EYn,k = 0.
n→∞ σn
k=1
Show that
Yn,1 + · · · + Yn,n D

→ N (0, 1).
σn
Exercise 11.12 (Aug 2015). Let {Un }n∈N be a collection of i.i.d. random variables with EUn = 0
and EUn2 = σ 2 ∈ (0, ∞). Consider random variables {Xn }n∈N defined by Xn = Un + U2n , n ∈ N,
and the partial sum Sn = X1 + · · · + Xn . Find appropriate constants {an , bn }n∈N such that
Sn − bn D

→ N (0, 1).
an
Exercise 11.13 (May 2015). Let {Un }n∈N be a collection of i.i.d. random variables distributed
uniformly on interval (0, 1). Consider a triangular array of random variables {Xn,k }k=1,...,n,n∈N
defined as
1
Xn,k = 1{√nUk ≤1} − √ .
n
Find constants {an , bn }n∈N such that
Xn,1 + · · · + Xn,n − bn D

→ N (0, 1).
an
Exercise 11.14 (Aug 2014). Let X1 , X2 , . . . be independent and identically distributed random
variables with
P (Xi = 1) = P (Xi = −1) = 1/2.
Prove that √ n
3 X D
√ kXk −
→ N (0, 1)
n3 k=1
(You may use formulas nj=1 j 2 = 61 n(n + 1)(2n + 1) and nj=1 j 3 = 14 n2 (n + 1)2 without proof.)
P P

Exercise 11.15 (May 2014). Let {Xnk : k = 1, . . . , n, n ∈ N} be a family of independent random


variables satisfying
   
k k
P Xnk = √ = P Xnk = − √ = P (Xnk = 0) = 1/3
n n
Let Sn = Xn1 +· · ·+Xnn . Prove that Sn /sn converges in distribution to a standard normal random
variable for a suitable sequence of real numbers sn .
Some useful identities:
n
X 1
k = n(n + 1)
2
k=1
n
X 1
k2 = n(n + 1)(2n + 1)
6
k=1
n
X 1 2
k3 = n (n + 1)2
4
k=1
130 11. The Central Limit Theorem

Exercise 11.16 (Aug 2013). Suppose X1 , Y1 , X2 , Y2 , . . . , are independent identically distributed


with mean zero and variance 1. For integer n, let
 2  2
n n
1 X 1 X
Un =  Xj  +  Yj  .
n n
j=1 j=1

Prove that limn→∞ P (Un ≤ u) = 1 − e−u/2 for u > 0.


Exercise 11.17 (May 2013). SupposeP Xn,1 , Xn,2 , .... are
 independent random variables centered
at expectations (mean 0) and set s2n = nk=1 E (Xn,k )2 . Assume for all k that |Xn,k | ≤ Mn with
probability 1 and that Mn /sn → 0. Let Yn,i = 3Xn,i + Xn,i+1 . Show that
Yn,1 + Yn,2 + ... + Yn,n
sn
converges in distribution and find the limiting distribution.
Chapter 12

Limit Theorems in Rk

This is based on [Billingsley, Section 29]. Printed: April 13, 2020

1. The basic theorems


If X : Ω → Rk is measurable, then X is called a random vector. X is also called a k-variate random
variable, as X = (X1 , . . . , Xk ).
Recall that a probability distribution of X is a probability measure µ on Borel subsets of Rk
defined by µ(U ) = P ({ω : X(ω) ∈ U }).
Recall that a (joint) cumulative distribution function of X = (X1 , . . . , Xn ) is a function F :
Rn → [0, 1] such that
F (x1 , . . . , xk ) = P (X1 ≤ x1 , . . . , Xk ≤ xk )
From π − λ theorem we know that F determines uniquely µ. In particular, if
Z x1 Z xk
F (x1 , . . . , xk ) = ··· f (y1 , . . . , yk )dy1 . . . dyk
−∞ −∞
R
then µ(U ) = U f (y1 , . . . , yk )dy1 . . . dyk .
Let Xn : Ω → Rk be a sequence of random vectors.

Definion 12.1. We say that Xn converges in distribution to X if for every bounded continuous
function f : Rk → R the sequence of numbers E(f (Xn ) converges to Ef (X).

D
We will write Xn − → X; if µn is the law of Xn we will also write µn → D; the same notation
D
in the language of cumulative distribution functions is Fn − → F ; the latter can be defined as
D
Fn (x) −
→ F (x) for all points of continuity of F , but it is simpler to use Definition 12.1.
D D
→ X and g : Rk → Rm is a continuous function then g(Xn ) −
Proposition 12.1. If Xn − → g(X)
D D
→ (Z1 , Z2 ) then Xn2 + Yn2 −
For example, if (Xn , Yn ) − → Z12 + Z22 .

Proof. Denoting by Yn = g(Xn ), we see that for any bounded continuous function f : Rm → R,
f (bYn ) is a bounded continuous function f ◦ g of Xn . 

131
132 12. Limit Theorems in Rk

(Omitted in 2020) The following is a k-dimensional version of Portmanteau Theorem 9.7

Theorem 12.2. For a sequence µn of probability measures on the Borel sets of Rk , the following are equivalent:
D
(i) µn −→ µ
(ii) lim supn→∞ µn (C) ≤ µ(C) for all closed sets C ⊂ Rk .
(iii) lim inf n→∞ µn (G) ≤ µ(G) for all open sets G ⊂ Rk .
(iv) limn→∞ µn (A) = µ(A) for all sets A ⊂ Rk such that µ(∂A) = 0

Proof. The detailed proof is omitted. Here are some steps:


• By passing to complements, it is clear that (2) and (3) are equivalent.
• Since the interior A◦ of a set A is its subset, A◦ ⊂ A ⊂ Ā. So µn (A◦ ) ≤ µn (A) ≤ µn (Ā) and we get
µ(A◦ ) ≤ lim inf µn (A) ≤ lim sup µn (A) ≤ µ(Ā)
Since ∂A = Ā \ A◦ , we have µ(A◦ ) = µ(Ā) = µ(A) so it is clear that (2)+(3) imply (4).
• To see how (1) implies (2), fix closed set C ⊂ Rk and consider a bounded continuous function f such that f = 1
on C and f = 0 onTC ε = {x ∈ Rk : d(x, C) ≤ ε} Then µn (C) ≤ f (x)µn (dx) → f µ(dx) ≤ µ(C ε ). Since
R R

limε→0 µ(C ε ) = µ( ε>0 Cε ) = µ(C), we get the conclusion.




Definion 12.2. The sequence of measures µn on Rk is tight if for every ε > 0 there exists a
compact set K ⊂ Rk such that µn (K) ≥ 1 − ε for all n.

Theorem 12.3. If µn is a tight sequence of probability measures then there exists µ and a
D
subsequence nk such that µnk −
→µ

Proof. The detailed proof is omitted.


Here are the main steps in the proof: 

Corollary 12.4. If {µn } is a tight sequence of probability measures on Borel subsets of Rk and if
D
each convergent subsequence has the same limit µ, then µn −
→µ

2. Multivariate characteristic function


Pk
Recall the dot-product x · y := x0 y j=1 xj yj . The multivariate characteristic function ϕ : Rk → C
is

(12.1) ϕ(t) = E exp(it · X)

This is also written as ϕ(t1 , . . . , tk ) = E exp( kj=1 tj Xj ).


P
3. Multivariate normal distribution 133

The inversion formula shows how to determine µ(U ) for a rectangle U = (a1 , b1 ] × (a2 , b2 ] ×
· · · × (ak , bk ] such that µ(∂U ) = 0:
k
T T
e−iak jtj − e−ibj tj
Z Z
1 Y
(12.2) µ(U ) = lim ··· ϕ(t1 , . . . , tk )dt1 . . . dtk
T →∞ (2π)k −T −T j=1 itj

Thus the characteristic function determines the probability measure µ uniquely.

Corollary 12.5 (Cramer-Wold devise). The law of X is uniquely determined by the univariate
laws t · X = kj=1 tj Xj .
P

Corollary 12.6. X, Y are independent iff ϕX,Y (s, t) = ϕX (s)ϕY (t)

D
→ Y iff ϕn (t) → ϕ(t) for all t ∈ Rk .
Theorem 12.7. Xn −

D P D P
Note that this means that Xn −
→ Y iff tj Xj (n) −
→ tj Yj for all t1 , . . . , tk

Example 12.1. If X, Y are independent normal then X + Y and X − Y are independent normal.
2 2
Indeed, ϕX+Y,X−Y (s, t) = ϕX (s+t)ϕY (s−t) = exp((t+s)2 /2+(s−t)2 /2) = es et , and ϕX±Y (s) =
2 2 2
es /2 es /2 = es .

P
Corollary 12.8. If Z1 , . . . Zm are independent normal and X = AZ then j tj Xj is (univariate)
normal.

Proof. Lets simplify


P the calculations by assuming Zj are standard normal. The characteristic
function of S = j tj Xj is
m
2 [AT t]2 /2 2 kAT tk2 /2
Y
ϕ(s) = E exp(ist · X) = E exp(ist · AZ) = E exp(is(A t) · Z) = T
e−s i = e−s
i=1

The generalization of this property is the simplest definition of the mutlivariate normal distri-
bution. Note that
kAT tk2 = (AT t) · (AT t) = t0 AAT t = t0 Σt

3. Multivariate normal distribution


 
Two equivalent approaches
 
134 12. Limit Theorems in Rk

Definion 12.3. X is multivariate normal if there is a vector m and a positive-definite matrix


Σ such that its characteristic function is
ϕ(t) = exp im0 t − 21 t0 Σt


Notation: N (m, Σ). (How do we know that this is a characteristic function? See the proof of
Corollary 12.8!)

We need to show that this is indeed a characteristic function! But if it is, then by differentiation
the parameters have interpretation: EX = m and Σi,j = cov(Xi , Xj ).
Remark 12.9. If X is normal N (m, Σ), then X − m is centered normal N (0, Σ). In the sequel,
to simplify notation we only discuss centered case.

The simplest way to define the univariate distribution is to start with a standard normal random
2
variable Z with density √12π e−x /2 , and then define the general normal as the linear function
X = µ + σZ. It is then easy to work out the density of X and the characteristic function, which is
1 2 2
ϕX (t) = eitµ+ 2 σ t .
Exercise 12.1 is worth doing in two ways - using both definitions.
In Rk the role of the standard normal distribution is played by the distribution of Z =
(Z1 , . . . , Zk ) of i.i.d. N (0, 1) r.v.. Their density is
1 1 2
(12.3) f (x) = k/2
e− 2 kxk
(2π)
0
The characteristic function Eeit Z is just the product of the individual characteristic functions
Qn −t2j /2
j=1 e which in vector notation is
2 /2
ϕZ (y) = e−ktk

Definion 12.4. We will say that X, written as a column vector, has multivariate normal dis-
tribution if X = m + AZ.

Clearly, E(X) = m. In the sequel we will only consider centered multivariate normal distribu-
tion with E(X) = 0.

Remark 12.10. Denoting by ak the columns of A, we have X = kj=1 Zj aj . This is the universal
P
feature of Gaussian vectors, even in infinite-dimensional vector spaces – they all can be written as
linear combinations of deterministic vectors with independent real-valued ”noises” as coefficients.
For example, the random “vector” (Wt )0≤t≤1 withP∞ values in the vector space C[0, 1] of continuous
functions on [0, 1] can be written as Wt = k=1 Zj gj (t) with deterministic functions gj (t) =
1
2j+1 sin((2j + 1)πt).

Proposition 12.11. The characteristic function of the centered normal distribution is


ϕ(t) = exp − 21 t0 Σt

(12.4)
where Σ is a k × k positive definite matrix.
4. The CLT 135

Proof. This is just a calculation:


0 0 0 0 0 2 /2
Eeit X = Eeit AZ = Eei(A t) Z = e−kA tk = exp − 12 (A0 t)0 A0 t = exp − 21 t0 AA0 t = exp − 21 t0 Σt
  

Remark 12.12. Notice that E(XX0 ) = E(AZZ0 A0 ) = AE(ZZ0 )A0 = AIA0 = Σ is the covariance
matrix of X.

Remark 12.13. From linear algebra, any positive definite matrix Σ = U ΛU 0 so each such matrix
can be written as Σ = AA0 with A = U Λ1/2 U 0 . So ϕ(t) = exp(− 21 t0 Σt) is a characteristic function
of X = AZ.

Remark 12.14. If det(Σ) > 0 then det A 6= 0 and (by linear algebra) the inverse A−1 exists. The
density of X is recalculated from (12.3) as follows
1 1
− 2 kA−1 xk2 1 1
− 2 x0 Σ−1 x
f (x) = e = e
(2π)k/2 det(A) (2π)k/2 det(Σ)1/2
Remark 12.15. Matrix A in the representation X = AZ is not unique, but the covariance matrix
Σ = AA0 is unique. For example
     
Z 1 0 Z
X= 1 = × 1
Z2 0 1 Z2
can also be represented as
√  √  
1/ √2 1/√2 Z1
X=
−1/ 2 1/ 2 Z2
(Exercise 12.6)
2 /2−t2 /2−ρst
Example 12.2. Suppose ϕ(s, t) = e−s . Then
 
1 ρ
Σ=
ρ 1

is non-negative definite for any |ρ| ≤ 1 and this is a characteristic function of a random variable
∂2
X = (X1 , X2 ) with univariate N(0,1) laws, with correlation E(X1 X2 ) = − ∂s∂t ϕ(s, t)|s=t=0 = ρ. If
Z1 , Z2 are independent N (0, 1) then
p
(12.5) X1 = Z1 , X2 = ρZ1 + 1 − ρ2 Z2

will have exactly the same second moments, and the same characteristic function.
Since det Σ = 1 − ρ2 , when ρ2 6= 1 the matrix is invertible and the resulting bivariate normal
density is
 2
x + y 2 − 2ρxy

1
f (x, y) = p exp −
2π 1 − ρ2 2(1 − ρ2 )

From (12.5) we also see that X2 − ρX1 is independent of X1 and has variance 1 − ρ2

4. The CLT
136 12. Limit Theorems in Rk

Theorem 12.16. Let Xn = (Xn1 , . . . , Xnk ) be independent random vectors with the same dis-
tribution and finite second moments. Denote m = EXk and Sn = X1 + · · · + Xn . Then
√ D
(Sn − nm)/ n − →Y
where Y is a centered normal distribution with the covariance matrix Σ = E(Xn X0n ) − mm0 .

The notation is N (0, Σ). Note that this is inconsistent with the univariate notation N (µ, σ) which
for consistency with the multivariate case should be replaced by N (µ, σ 2 ).

Proof. Without loss of generality we can assume m = 0. Let t ∈ Rk . Then Xn := t0 Xn are


independent random variables with mean zero and variance σ 2 = t0 Σt. By Theorem 11.2, we have
√ D
Sn / n −
→ σZ.
If Y = (Y1 , . . . , Yk ) has multivariate normal distribution with covariance Σ, then t0 Y is uni-
√ D 0
variate normal with variance σ 2 . So we showed that t0 Sn / n − → t Y for all t ∈ Rk . This ends the
proof by Theorem 12.7.


Example 12.3. Suppose ξk , ηk are i.i.d with mean zero variance one. Then ( nk=1 ηk , nk=1 (ηk +
P P
D
ξk ) −
→ (Z1 , Z1 + Z2 ).
     
ξk 1 1 Z1
Indeed, random vectors Xk = has covariance matrix Σ = and has
ξk + η k 1 2 Z1 + Z2
the same covariance matrix.

4.1. Application: Chi-Squared test for multinomial distribution. A multinomial experi-


ment has k outcomes with probabilities p1 , . . . , pk . A multinomial random variable Sn = (S1 (n), . . . , Sk (n))
lists observed counts per category in n repeats of the multinomial experiment.
The following result is behind the use of the chi-squared statistics in tests of consistency.

Pk (Sj (n)−npj )2 D
Theorem 12.17. j=1 npj −
→ 2
Z12 + · · · + Zk−1

Proof.
   Lets
 prove
 this for k = 3. Consider independent random vectors Xk that take three values
1 0 0
0 , 1 and 0 with probabilities p1 , p2 , p3 . Then Sn is the sum of n independent identically
0 0 1
distributed vectors X1 , . . . , Xn .
 
p1
Clearly, EXk = p2 . To compute the covariance matrix, write X for Xk . For non-centered

p3
vectors, the covariance is E(XX0 ) − E(X)E(X0 ). We have
       
1 0 0 p1 0 0
E(XX0 ) = p1 0 × 1 0 0 + p2 1 × 0 1 0 + p3 0 × 0 0 1 =  0 p2 0 
     

0 0 1 0 0 p3
4. The CLT 137

So  
p1 (1 − p1 ) −p1 p2 −p1 p3
Σ = E(XX0 ) − E(X)E(X0 ) =  −p1 p2 p2 (1 − p2 ) −p2 p3 
−p1 p3 −p2 p3 p3 (1 − p3 )
Then Snis 
 the
 sum of n independent vectors, and the central limit theorem implies that
p1
D
√1 Sn − n p2  − → X.
n
p3
In particular, by Proposition 12.1 we have
3 3
X (Sj (n) − npj )2 D
X Xj2


npj pj
j=1 j=1

where (X1 , X2 , X3 ) is multivariate normal with covariance matrix Σ.


Note that since kj=1 Sj (n) = n, the gaussian distribution is degenerate: X1 + X2 + X3 = 0.
P

X2
It remains to show that 3j=1 pjj has the same law as Z12 + Z22 i.e. that it is exponential. To
P
√ √ √
do so, we first note that the covariance of (Y1 , Y2 , Y3 , ) := (X1 / p1 , X2 / p2 , X3 / p3 ) is
 √ √  √ 
1 − p1 − p1 p2 − p1 p3 p
√ √ √ 1 √ √ √ 
ΣY = − p1 p2 1 − p2 − p2 p3  = I −  p2  × p1 p2 p3
√ √ √
− p1 p3 − p2 p3 1 − p3 p3
√   
p1 α1

Since v1 =  p2  is a unit vector, we can complete it with two additional vectors v2 = α2  and

p3 α3
 
β1
v3 = β2  to form an orthonormal basis {v1 , v2 , v3 } of R3 . This can be done in many ways, for
β3
example by the Gram-Schmidt orthogonalization to v1 , [100]0 , [010]0 . However, the specific form of
v2 , v3 does not enter the calculation - we only need to know that v2 , v3 are orthonormal.
The point is that I = v1 v10 + v2 v20 + v3 v30 as these are orthogonal eigenvectors of I with λ = 1.
(Or, because x = v1 v10 x + v2 v20 v2 + v3 v30 x as vj0 x = x · vj are the coefficients of expansion of x in
orthonormal basis {v1 , v2 , v3 } of R3 .)
Therefore,
ΣY = v2 v20 + v3 v30
We now notice that ΣY is the covariance of the multivariate normal random variable Z = v2 Z2 +
v3 Z3 where Z2 , Z3 are independent real-valued N (0, 1). Indeed,
3
X 3
X
EZZ0 = vi vj0 E(Zi Zj ) = vi vi0
i,j=2 i=2

Therefore, vector [Y1 Y2 Y3 ]0 has the same distribution as Z, and Y12 + Y22 + Y32 has the same
distribution as
kZk2 = kv2 Z2 + v3 Z3 k2 = kv2 Z2 k2 + kv3 Z3 k2 = Z22 + Z32
(recall that v2 and v3 are orthogonal unit vectors).

Remark 12.18. It is clear that this proof generalizes to all k.
138 12. Limit Theorems in Rk

We note that the distribution of Z12 + · · · + Zk−1


2 is Gamma with parameters α = (k − 1)/2 and
β = 2, which is known under the name of chi-squared distribution with k − 1 degrees of freedom.
To see that Z22 + Z32 is indeed chi-squared with two-degrees of freedom (i.e., exponential), we can
determine the cumulative distribution function by computing 1 − F (u):
Z Z 2π Z
1 2 2 1 2
P (Z22 + Z32 > u) = e−(x +y )/2 dxdy = √
e−r /2 rdrdθ = e−u/2
2π x2 +y2 >u 2π 0 r> u

Required Exercises

Exercise 12.1. Suppose X, Y are independent univariate normal random variables. Use Definition
12.3 to verify that each of the following is bivariate normal: X = (X, X), X = (X, Y ), X =
(X + εY, X − εY ).
D
Exercise 12.2. Suppose that R2k -valued random variables (Xn , Yn ) are such that Xn −
→ X and
P
Yn −
→ 0 (that is, limn→∞ P (kYn k > ε) = 0 for all ε > 0).
D
Prove that Xn + Yn −
→X
D D
Exercise 12.3. Suppose (Xn , Yn ) are pairs of independent random variables and Xn −
→ X, Yn −

D
Y . Show that (Xn , Yn ) −
→ µ where µ is the product of the laws of X and Y .
Exercise 12.4. Let ξ1 , ξ2 , . . . be i.i.d. random variables such 2
  that E(ξ1 ) = 0, E(ξ1 ) = 1. For
ξi
i = 1, 2, . . . , define R2 -valued random variables Xi = and let Sn = ni=1 Xi . Show that
P
ξi+1
1 D
√ Sn −
→ N (0, Σ)
n
for a suitable 2 × 2 covariance matrix Σ.
Exercise 12.5. Suppose ξj , ηj , γj are i.i.d. mean zero variance 1. Construct the following vectors:
 
ξj − ηj
Xj = ηj − γj 
γ j − ξj
1 D
2 →
Let Sn = X1 + · · · + Xn . Show that n kSn k − Y . (In fact, Y has gamma density.)
Exercise 12.6. Use the characteristic function to verify that Remark 12.15 indeed gives two
representations of the same normal law.
Bibliography

[Billingsley] P. Billingsley, Probability and Measure IIIrd edition


[Durrett] R. Durrett, Probability: Theory and Examples, Edition 4.1 (online)
[Gut] A. Gut, Probability: a graduate course
[Resnik] S. Resnik, A Probability Path, Birkhause 1998
[Proschan-Shaw] S M. Proschan and P. Shaw, Essential of Probability Theory for Statistitcians,
CRC Press 2016
[Varadhan] S.R.S. Varadhan, Probability Theory, (online pdf from 2000)

147
Index

Central Limit Theorem, 121


Convergence of types, 123
converges in distribution, 131
covariance matrix, 135

Lindeberg condition, 124


Lyapunov’s condition, 125

multivariate normal, 134


multivariate normal distribution, 134

You might also like