Ch11 12
Ch11 12
2 /2
Lemma 11.1. EeitZ = e−t
Proof. We use the same calculation as for the moment generating function:
Z ∞
1 2 −t2 /2
Z ∞
1 √
exp(itx − x )dx = e exp(− (x − it)2 )dx = 2π
−∞ 2 −∞ 2
2 2
Note that e−z /2 is an analytic function so γ e−z /2 dz = 0 over any closed path. So
H
Z A Z A Z it Z it
2 /2
exp −(x − it)2 /2dx − e−x dx + exp(−(A − is)2 /2)ds − exp(−(−A − is)2 /2)ds = 0
−A −A 0 0
Theorem 11.2 (CLT for i.i.d.). Suppose {Xn } is i.i.d. with mean m and variance 0 < σ 2 < ∞.
Let Sn = X1 + · · · + Xn . Then
Sn − nm D
√ −→Z
σ n
This is one of the special cases of the Lindeberg theorem and the proof uses characteristic functions.
2
Note that ϕSn /√n (t) = e−t /2 when Xj are independent N (0, 1).
121
122 11. The Central Limit Theorem
In general, ϕSn /√n (t) is a complex number. For example, when Xn are exponential with pa-
rameter λ = 1, the conclusion says that
√
e−it n
2 /2
ϕSn /√n (t) = n → e−t
1 − i √tn
which is not so obvious to see. On the other hand, characteristic function in Exercise 10.5 on page
119 is real and the limit can be found using calculus:
√ 2
ϕSn /√n (t) = cosn (t/ n) → e−t /2 .
Here is a simple inequality that will suffice for the proof in the general case.
(Omitted in 2020)
Example 11.1. We show how to complete the proof for the exponential distribution.
√ √ !n √
e−it n
−t2 /2 e−it/ n 2 e−it/ n 2
n − e = − (e−t /(2n) n
) ≤n − e−t /(2n)
1 − i √tn 1 − i √tn
1 − i √tn
√ √
1 − it/ n + t2 /(2n) + it3 /(6n n) − . . .
=n t − 1 + t2 /(2n) − t4 /(6n2 ) + . . .
1 − i √n
t2 it3 t2 t2 t4
it t
=n 1− √ − − √ + ... 1 + i√ − + ... − 1 + − + ...
n 2n 6n n n n 2n 6n2
t2 t2 t3 t2 t4 C(t)
= n (1 − + + i √ − ··· − 1 + − + . . . ≤ n √ → 0.
n 2n 6n n 2n 6n2 n n
Proof of Theorem 11.2. Without loss of generality we may assume m = 0 and σ = 1. We have
√ t2
ϕSn /√n (t) = ϕX (t/ n)n . For a fixed t ∈ R choose n large enough so that 1 − 2n > −1. For such
√ t2
n, we can apply (11.1) with zk = ϕX (t/ n) and wk = 1 − 2n . We get
√ t2 |t||X|3
n
√ t2 2
√ ,X 2
ϕSn / n (t) − 1 − 2n ≤ n ϕX (t/ n) − 1 − ≤ t E min
2n n
3 √ 2
Noting that limn→∞ min{|t||X| / n, X } = 0, by dominated n convergence
o theorem (the integrand
2 |t||X|3 2
is dominated by the integrable function X ) we have E min √n , X → 0 as n → ∞. So
n
t2
√
lim ϕSn / n (t) − 1 − = 0.
n→∞ 2n
3. Lindeberg’s theorem 123
t2 n 2 /2
It remains to notice that (1 − 2n ) → e−t .
D
Remark 11.4. If Xn − → Z then the cumulative distribution functions converge uniformly: supn |P (Xn ≤
x) − P (Z ≤ x)| → 0.
Example 11.2 ( Normal approximation to Binomial). If Xn is Bin(n, p) and p is fixed then
1 √ p
P ( n Xn < p + x/ n) → P (Z ≤ x p(1 − p) as n → ∞.
Example 11.3 ( Normal approximation to Poisson). If Xλ is P oiss and p is fixed then (Xλ −
√ D √ D
λ)/ λ −→ Z as λ → ∞. (Strictly speaking, the CLT gives only convergence of (Xλn − λn)/ nλ −
→
Z as n → ∞.)
(Omitted in 2020)
D D
Theorem 11.5 (Convergence of types). Suppose Xn −→ X and an Xn + bn −→ Y for some an > 0, bn ∈ R, and both
X, Y are non-degenerate. Then an → a > 0 and bn → b and in particular Y has the same law as aX + b.
an
So if (Zn − bn )/an → Z and (Zn − b0n )/a0n → Z 0 then (Zn − b0n )/a0n = a0n
((Zn − bn )/an ) + (bn − b0n )/a0n , which means
b
that an /a0n → a > 0 and (bn − b0n )/a0n → b. So a0n = an /a, b0n = bn − a
a n
and Z 0 = aZ + b.
(Omitted in 2020)
Proof. To be written...
3. Lindeberg’s theorem
The setting is of sums of triangular arrays: For each n we have a family of independent random
variables
Xn,1 , . . . , Xn,rn
and we set Sn = Xn,1 + · · · + Xn,rn .
−m
Xk√ Xk −m
For Theorem 11.2, the triangular array can be Xn,k = σ n
. Or one can take Xn,k = σ ...
Through this section we assume that random variables are square-integrable with mean zero,
and we use the notation
rn
X
2 2 2 2
(11.2) E(Xn,k ) = 0, σnk = E(Xn,k ), sn = σnk
k=1
124 11. The Central Limit Theorem
Definion 11.1 (The Lindeberg condition). We say that the Lindeberg condition holds if
rn Z
1 X 2
(11.3) lim Xnk dP = 0 for all ε > 0
n→∞ s2
n |Xnk |>εsnk=1
Theorem 11.7 (Lindeberg CLT). Suppose that for each n the sequence Xn1 . . . Xn,rn is inde-
D
pendent with mean zero. If the Lindeberg condition holds for all ε > 0 then Sn /sn −
→ Z.
Xk −m
Example 11.4 (Proof of Theorem 11.2). In the setting of Theorem 11.2, we have Xn,k =
√ σ
and sn = n. The Lindeberg condition is
n Z
(Xk − m)2
Z
1X 1
lim √ 2
dP = lim 2 √
(X1 − m)2 = 0
n→∞ n |Xk −m|>εσ n σ n→∞ σ |X1 −m|>εσ n
k=1
by Lebesgue dominated convergence theorem, say. (Or by Corollary 6.12 on page 71.)
Proof. Without loss of generality we may assume that s2n = 1 so that rk=1
Pn 2
σnk = 1. Denote
ϕnk = E(eitXnk ). From (10.13) we have
1
ϕnk (t) − (1 − t2 σnk
2
) ≤ E min{|tXnk |2 , |tXnk |3 }
(11.5)
2
Z Z Z
3 2 3 2 2 2
≤ |tXnk | dP + |tXnk | dP ≤ t εσnk + t Xnk dP
|Xnk |<ε |Xnk |≥ε |Xnk |≥ε
2 /2 Qn
It remains to verify that limn→∞ e−t − k=1 (1 − 12 t2 σnk
2 ) = 0.
To do so, we apply the previous proof to the triangular array σn,k Zk of independent normal
random variables. Note that
rn
2 2 2
Y
ϕP Znk (t) = e−t σnk /2 = e−t /2
k=1
The right hand side goes to zero as n → ∞, because by max1≤k≤rn σnk → 0 by (11.4).
4. Lyapunov’s theorem
Theorem 11.8. Suppose that for each n the sequence Xn1 . . . Xn,rn is independent with mean
zero. If the Lyapunov’s condition
n
1 X
(11.7) lim 2+δ E|Xnk |2+δ = 0
n→∞ sn
k=1
D
holds for some δ > 0, then Sn /sn −
→Z
Corollary 11.9. Suppose Xk are independent with mean zero, variance σ 2 and that
√ D
supk E|Xk |2+δ < ∞. Then Sn / n −
→ σZ.
√ 1 Pn
Proof. Let C = supk E|Xk |2+δ Then sn = n and k=1 E(|Xk |
2+δ ) ≤ C/nδ/2 → 0, so
s2+δ
n
Lyapunov’s condition is satisfied.
Corollary 11.10. Suppose Xk are independent, uniformly bounded, and have mean zero. If
P p D
n Var(X n ) = ∞, then S n / Var(Sn ) −
→ N (0, 1).
126 11. The Central Limit Theorem
n
1 X 3 s2n C
E|Xn | ≤ C = →0
s3n s3n sn
k=1
k D √ D √
Then σk2 = 21 + k4 and sn ≥ nn /4n . But Sn /sn −→ 0 and in fact we have Sn / n − → Z/ 2. To see
this, note that Yk = Xk I|X|k ≤ 1 are independent with mean 0, variance 12 and P (Yk 6= Xk ) = 1/4k
so by the first Borel Cantelli Lemma (Theorem 3.8) | √1n nk=1 (Yk −Xk )| ≤ √Un → 0 with probability
P
one.
It is sometimes convenient to use Corollary 9.5 (Exercise 9.2) combined with the law of large
numbers. This is how one needs to proceed in Exercise 11.2.
Example 11.6. Suppose X1 , X2 , . . . , are i.i.d. with mean 0 and variance σ 2 > 0. Then
Pn
X
qPk=1 k
n 2
k=1 Xk
Pn Pn
Xk σ X
qPk=1
=q P √ k
× k=1
n 2 1 n 2 σ n
k=1 Xk n k=1 Xk
and note that the first factor converges to 1 with probability one.
Required Exercises 127
Required Exercises
Pn 2
Exercise 11.1. Suppose ank is an array of numbers such that k=1 ank = 1 and max1≤k≤n |ank | →
Pn D
0. Let Xj be i.i.d. with mean zero and variance 1. Show that k=1 ank Xk −
→ Z.
Exercise 11.2. Suppose that X1 , X2 , . . . are i.i.d., E(X1 ) = 1, Var(X12 ) = σ 2 < ∞. Let X̄n =
1 Pn
n j=1 Xj . Show that for all k > 0
√ k
D
n X̄n − 1 − → N (0, kσ)
as n → ∞.
Exercise 11.3. Suppose X1 , X2 , . . . are independent, Xk = ±1 with probability 12 (1 − k −2 ) and
Xk = ±k with probability 21 k −2 . Let Sn = nk=1 Xk
P
√ D
(i) Show that Sn / n −
→ N (0, 1)
(ii) Is the Lindeberg condition satisfied?
Exercise 11.4. Suppose X1 , X2 , . . . are independent
P random variables with distribution Pr(Xk =
1) = pk and Pr(Xk = 0) = 1 − pk . Prove that if V ar(Xk ) = ∞ then
Pn
(Xk − pk ) D
pPk=1 n −
→ N (0, 1).
k=1 pk (1 − pk )
Exercise 11.6 ( stat). A real estate aggent wishes to estimate the unknown mean sale price of a
house µ which she believes is well described by the distribution which has finite second moment.
She estimates µ by the sample mean X̄n of the i.i.d. sample X1 , . . . , Xn , and she estimates the
variance by the expression
n
2 1 X
Sn = (Xk − X̄n )2 .
n−1
k=1
√
She then uses a formula X̄n ± zα Sn / n from Wikpiedia to produce the large sample confidence
interval for µ. To understand why this procedure works, she would like to know that
D
(X̄n − µ)/Sn −
→ N (0, 1).
Please supply the proof.
Exercise 11.7 ( stat). A psychologist wishes to estimate parameter λ > 0 of the exponential
distribution, see Example 2.4, by taking the average X̄n of the i.i.d. sample X1 , . . . , Xn , and
128 11. The Central Limit Theorem
defining λ̂n = 1/X̄n . Show that λ̂n is asymptotically normal, i.e. determine an (λ) such that the
α-confidence interval for λ is
λ̂n ± an (λ)zα/2
where zα/2 comes from the normal table P (Z > zα/2 ) = α/2.
Some previous
prelim problems
Exercise 11.8 (May 2018). Suppose that X1 , X2 , . . . are independent random variables with dis-
tributions
1 1−k
P (Xk = ±1) = and P (Xk = 0) = .
2k k
Prove that
n
1 X D
√ Xk − → N (0, 1).
ln n k=1
Exercise 11.9 (Aug 2017). Let {Xn }n∈N be a collection of independent random variables with
1 1
P(Xn = ±n2 ) = β
and P(Xn = 0) = 1 − β , n ∈ N,
2n n
where β ∈ (0, 1) is fixed for all n ∈ N. Consider Sn := X1 + · · · + Xn . Show that
Sn D
→ N (0, σ 2 )
−
nγ
for some σ > 0, γ > 0. Identify σ and γ as functions of β. You may use the formula
n
X nθ+1
kθ ∼
θ+1
k=1
Exercise 11.10 (May 2017). Let {Xn }n∈N be independent random variables with P(Xn = 1) =
1/n = 1 − P(Xn = 0). Let Sn := X1 + · · · + Xn be the partial sum.
(i) Show that
ESn Var(Sn )
lim =1 and lim = 1.
n→∞ log n n→∞ log n
(ii) Prove that
Sn − log n D
√ −
→ N (0, 1)
log n
as n → ∞. Explain which central limit theorem you use. State and verify all the conditions
clearly.
Pn
k=1 1/k
Hint: recall the relation lim = 1.
n→∞ log n
Exercise 11.11 (May 2016).(a) State Lindeberg–Feller central limit theorem.
Some previous prelim problems 129
(b) Use Lindeberg–Feller central limit theorem to prove the following. Consider a triangular ar-
ray of random variables {Yn,k }n∈N,k=1,...,n such that for each n, EYn,k = 0, k = 1, . . . , n, and
{Yn,k }k=1,...,n are independent. In addition, with σn := ( nk=1 EYn,k
2 )1/2 , assume that
P
n
1 X 4
lim 4
EYn,k = 0.
n→∞ σn
k=1
Show that
Yn,1 + · · · + Yn,n D
−
→ N (0, 1).
σn
Exercise 11.12 (Aug 2015). Let {Un }n∈N be a collection of i.i.d. random variables with EUn = 0
and EUn2 = σ 2 ∈ (0, ∞). Consider random variables {Xn }n∈N defined by Xn = Un + U2n , n ∈ N,
and the partial sum Sn = X1 + · · · + Xn . Find appropriate constants {an , bn }n∈N such that
Sn − bn D
−
→ N (0, 1).
an
Exercise 11.13 (May 2015). Let {Un }n∈N be a collection of i.i.d. random variables distributed
uniformly on interval (0, 1). Consider a triangular array of random variables {Xn,k }k=1,...,n,n∈N
defined as
1
Xn,k = 1{√nUk ≤1} − √ .
n
Find constants {an , bn }n∈N such that
Xn,1 + · · · + Xn,n − bn D
−
→ N (0, 1).
an
Exercise 11.14 (Aug 2014). Let X1 , X2 , . . . be independent and identically distributed random
variables with
P (Xi = 1) = P (Xi = −1) = 1/2.
Prove that √ n
3 X D
√ kXk −
→ N (0, 1)
n3 k=1
(You may use formulas nj=1 j 2 = 61 n(n + 1)(2n + 1) and nj=1 j 3 = 14 n2 (n + 1)2 without proof.)
P P
Limit Theorems in Rk
Definion 12.1. We say that Xn converges in distribution to X if for every bounded continuous
function f : Rk → R the sequence of numbers E(f (Xn ) converges to Ef (X).
D
We will write Xn − → X; if µn is the law of Xn we will also write µn → D; the same notation
D
in the language of cumulative distribution functions is Fn − → F ; the latter can be defined as
D
Fn (x) −
→ F (x) for all points of continuity of F , but it is simpler to use Definition 12.1.
D D
→ X and g : Rk → Rm is a continuous function then g(Xn ) −
Proposition 12.1. If Xn − → g(X)
D D
→ (Z1 , Z2 ) then Xn2 + Yn2 −
For example, if (Xn , Yn ) − → Z12 + Z22 .
Proof. Denoting by Yn = g(Xn ), we see that for any bounded continuous function f : Rm → R,
f (bYn ) is a bounded continuous function f ◦ g of Xn .
131
132 12. Limit Theorems in Rk
Theorem 12.2. For a sequence µn of probability measures on the Borel sets of Rk , the following are equivalent:
D
(i) µn −→ µ
(ii) lim supn→∞ µn (C) ≤ µ(C) for all closed sets C ⊂ Rk .
(iii) lim inf n→∞ µn (G) ≤ µ(G) for all open sets G ⊂ Rk .
(iv) limn→∞ µn (A) = µ(A) for all sets A ⊂ Rk such that µ(∂A) = 0
Definion 12.2. The sequence of measures µn on Rk is tight if for every ε > 0 there exists a
compact set K ⊂ Rk such that µn (K) ≥ 1 − ε for all n.
Theorem 12.3. If µn is a tight sequence of probability measures then there exists µ and a
D
subsequence nk such that µnk −
→µ
Corollary 12.4. If {µn } is a tight sequence of probability measures on Borel subsets of Rk and if
D
each convergent subsequence has the same limit µ, then µn −
→µ
The inversion formula shows how to determine µ(U ) for a rectangle U = (a1 , b1 ] × (a2 , b2 ] ×
· · · × (ak , bk ] such that µ(∂U ) = 0:
k
T T
e−iak jtj − e−ibj tj
Z Z
1 Y
(12.2) µ(U ) = lim ··· ϕ(t1 , . . . , tk )dt1 . . . dtk
T →∞ (2π)k −T −T j=1 itj
Corollary 12.5 (Cramer-Wold devise). The law of X is uniquely determined by the univariate
laws t · X = kj=1 tj Xj .
P
D
→ Y iff ϕn (t) → ϕ(t) for all t ∈ Rk .
Theorem 12.7. Xn −
D P D P
Note that this means that Xn −
→ Y iff tj Xj (n) −
→ tj Yj for all t1 , . . . , tk
Example 12.1. If X, Y are independent normal then X + Y and X − Y are independent normal.
2 2
Indeed, ϕX+Y,X−Y (s, t) = ϕX (s+t)ϕY (s−t) = exp((t+s)2 /2+(s−t)2 /2) = es et , and ϕX±Y (s) =
2 2 2
es /2 es /2 = es .
P
Corollary 12.8. If Z1 , . . . Zm are independent normal and X = AZ then j tj Xj is (univariate)
normal.
The generalization of this property is the simplest definition of the mutlivariate normal distri-
bution. Note that
kAT tk2 = (AT t) · (AT t) = t0 AAT t = t0 Σt
Notation: N (m, Σ). (How do we know that this is a characteristic function? See the proof of
Corollary 12.8!)
We need to show that this is indeed a characteristic function! But if it is, then by differentiation
the parameters have interpretation: EX = m and Σi,j = cov(Xi , Xj ).
Remark 12.9. If X is normal N (m, Σ), then X − m is centered normal N (0, Σ). In the sequel,
to simplify notation we only discuss centered case.
The simplest way to define the univariate distribution is to start with a standard normal random
2
variable Z with density √12π e−x /2 , and then define the general normal as the linear function
X = µ + σZ. It is then easy to work out the density of X and the characteristic function, which is
1 2 2
ϕX (t) = eitµ+ 2 σ t .
Exercise 12.1 is worth doing in two ways - using both definitions.
In Rk the role of the standard normal distribution is played by the distribution of Z =
(Z1 , . . . , Zk ) of i.i.d. N (0, 1) r.v.. Their density is
1 1 2
(12.3) f (x) = k/2
e− 2 kxk
(2π)
0
The characteristic function Eeit Z is just the product of the individual characteristic functions
Qn −t2j /2
j=1 e which in vector notation is
2 /2
ϕZ (y) = e−ktk
Definion 12.4. We will say that X, written as a column vector, has multivariate normal dis-
tribution if X = m + AZ.
Clearly, E(X) = m. In the sequel we will only consider centered multivariate normal distribu-
tion with E(X) = 0.
Remark 12.10. Denoting by ak the columns of A, we have X = kj=1 Zj aj . This is the universal
P
feature of Gaussian vectors, even in infinite-dimensional vector spaces – they all can be written as
linear combinations of deterministic vectors with independent real-valued ”noises” as coefficients.
For example, the random “vector” (Wt )0≤t≤1 withP∞ values in the vector space C[0, 1] of continuous
functions on [0, 1] can be written as Wt = k=1 Zj gj (t) with deterministic functions gj (t) =
1
2j+1 sin((2j + 1)πt).
Remark 12.12. Notice that E(XX0 ) = E(AZZ0 A0 ) = AE(ZZ0 )A0 = AIA0 = Σ is the covariance
matrix of X.
Remark 12.13. From linear algebra, any positive definite matrix Σ = U ΛU 0 so each such matrix
can be written as Σ = AA0 with A = U Λ1/2 U 0 . So ϕ(t) = exp(− 21 t0 Σt) is a characteristic function
of X = AZ.
Remark 12.14. If det(Σ) > 0 then det A 6= 0 and (by linear algebra) the inverse A−1 exists. The
density of X is recalculated from (12.3) as follows
1 1
− 2 kA−1 xk2 1 1
− 2 x0 Σ−1 x
f (x) = e = e
(2π)k/2 det(A) (2π)k/2 det(Σ)1/2
Remark 12.15. Matrix A in the representation X = AZ is not unique, but the covariance matrix
Σ = AA0 is unique. For example
Z 1 0 Z
X= 1 = × 1
Z2 0 1 Z2
can also be represented as
√ √
1/ √2 1/√2 Z1
X=
−1/ 2 1/ 2 Z2
(Exercise 12.6)
2 /2−t2 /2−ρst
Example 12.2. Suppose ϕ(s, t) = e−s . Then
1 ρ
Σ=
ρ 1
is non-negative definite for any |ρ| ≤ 1 and this is a characteristic function of a random variable
∂2
X = (X1 , X2 ) with univariate N(0,1) laws, with correlation E(X1 X2 ) = − ∂s∂t ϕ(s, t)|s=t=0 = ρ. If
Z1 , Z2 are independent N (0, 1) then
p
(12.5) X1 = Z1 , X2 = ρZ1 + 1 − ρ2 Z2
will have exactly the same second moments, and the same characteristic function.
Since det Σ = 1 − ρ2 , when ρ2 6= 1 the matrix is invertible and the resulting bivariate normal
density is
2
x + y 2 − 2ρxy
1
f (x, y) = p exp −
2π 1 − ρ2 2(1 − ρ2 )
From (12.5) we also see that X2 − ρX1 is independent of X1 and has variance 1 − ρ2
4. The CLT
136 12. Limit Theorems in Rk
Theorem 12.16. Let Xn = (Xn1 , . . . , Xnk ) be independent random vectors with the same dis-
tribution and finite second moments. Denote m = EXk and Sn = X1 + · · · + Xn . Then
√ D
(Sn − nm)/ n − →Y
where Y is a centered normal distribution with the covariance matrix Σ = E(Xn X0n ) − mm0 .
The notation is N (0, Σ). Note that this is inconsistent with the univariate notation N (µ, σ) which
for consistency with the multivariate case should be replaced by N (µ, σ 2 ).
Example 12.3. Suppose ξk , ηk are i.i.d with mean zero variance one. Then ( nk=1 ηk , nk=1 (ηk +
P P
D
ξk ) −
→ (Z1 , Z1 + Z2 ).
ξk 1 1 Z1
Indeed, random vectors Xk = has covariance matrix Σ = and has
ξk + η k 1 2 Z1 + Z2
the same covariance matrix.
Pk (Sj (n)−npj )2 D
Theorem 12.17. j=1 npj −
→ 2
Z12 + · · · + Zk−1
Proof.
Lets
prove
this for k = 3. Consider independent random vectors Xk that take three values
1 0 0
0 , 1 and 0 with probabilities p1 , p2 , p3 . Then Sn is the sum of n independent identically
0 0 1
distributed vectors X1 , . . . , Xn .
p1
Clearly, EXk = p2 . To compute the covariance matrix, write X for Xk . For non-centered
p3
vectors, the covariance is E(XX0 ) − E(X)E(X0 ). We have
1 0 0 p1 0 0
E(XX0 ) = p1 0 × 1 0 0 + p2 1 × 0 1 0 + p3 0 × 0 0 1 = 0 p2 0
0 0 1 0 0 p3
4. The CLT 137
So
p1 (1 − p1 ) −p1 p2 −p1 p3
Σ = E(XX0 ) − E(X)E(X0 ) = −p1 p2 p2 (1 − p2 ) −p2 p3
−p1 p3 −p2 p3 p3 (1 − p3 )
Then Snis
the
sum of n independent vectors, and the central limit theorem implies that
p1
D
√1 Sn − n p2 − → X.
n
p3
In particular, by Proposition 12.1 we have
3 3
X (Sj (n) − npj )2 D
X Xj2
−
→
npj pj
j=1 j=1
X2
It remains to show that 3j=1 pjj has the same law as Z12 + Z22 i.e. that it is exponential. To
P
√ √ √
do so, we first note that the covariance of (Y1 , Y2 , Y3 , ) := (X1 / p1 , X2 / p2 , X3 / p3 ) is
√ √ √
1 − p1 − p1 p2 − p1 p3 p
√ √ √ 1 √ √ √
ΣY = − p1 p2 1 − p2 − p2 p3 = I − p2 × p1 p2 p3
√ √ √
− p1 p3 − p2 p3 1 − p3 p3
√
p1 α1
√
Since v1 = p2 is a unit vector, we can complete it with two additional vectors v2 = α2 and
√
p3 α3
β1
v3 = β2 to form an orthonormal basis {v1 , v2 , v3 } of R3 . This can be done in many ways, for
β3
example by the Gram-Schmidt orthogonalization to v1 , [100]0 , [010]0 . However, the specific form of
v2 , v3 does not enter the calculation - we only need to know that v2 , v3 are orthonormal.
The point is that I = v1 v10 + v2 v20 + v3 v30 as these are orthogonal eigenvectors of I with λ = 1.
(Or, because x = v1 v10 x + v2 v20 v2 + v3 v30 x as vj0 x = x · vj are the coefficients of expansion of x in
orthonormal basis {v1 , v2 , v3 } of R3 .)
Therefore,
ΣY = v2 v20 + v3 v30
We now notice that ΣY is the covariance of the multivariate normal random variable Z = v2 Z2 +
v3 Z3 where Z2 , Z3 are independent real-valued N (0, 1). Indeed,
3
X 3
X
EZZ0 = vi vj0 E(Zi Zj ) = vi vi0
i,j=2 i=2
Therefore, vector [Y1 Y2 Y3 ]0 has the same distribution as Z, and Y12 + Y22 + Y32 has the same
distribution as
kZk2 = kv2 Z2 + v3 Z3 k2 = kv2 Z2 k2 + kv3 Z3 k2 = Z22 + Z32
(recall that v2 and v3 are orthogonal unit vectors).
Remark 12.18. It is clear that this proof generalizes to all k.
138 12. Limit Theorems in Rk
Required Exercises
Exercise 12.1. Suppose X, Y are independent univariate normal random variables. Use Definition
12.3 to verify that each of the following is bivariate normal: X = (X, X), X = (X, Y ), X =
(X + εY, X − εY ).
D
Exercise 12.2. Suppose that R2k -valued random variables (Xn , Yn ) are such that Xn −
→ X and
P
Yn −
→ 0 (that is, limn→∞ P (kYn k > ε) = 0 for all ε > 0).
D
Prove that Xn + Yn −
→X
D D
Exercise 12.3. Suppose (Xn , Yn ) are pairs of independent random variables and Xn −
→ X, Yn −
→
D
Y . Show that (Xn , Yn ) −
→ µ where µ is the product of the laws of X and Y .
Exercise 12.4. Let ξ1 , ξ2 , . . . be i.i.d. random variables such 2
that E(ξ1 ) = 0, E(ξ1 ) = 1. For
ξi
i = 1, 2, . . . , define R2 -valued random variables Xi = and let Sn = ni=1 Xi . Show that
P
ξi+1
1 D
√ Sn −
→ N (0, Σ)
n
for a suitable 2 × 2 covariance matrix Σ.
Exercise 12.5. Suppose ξj , ηj , γj are i.i.d. mean zero variance 1. Construct the following vectors:
ξj − ηj
Xj = ηj − γj
γ j − ξj
1 D
2 →
Let Sn = X1 + · · · + Xn . Show that n kSn k − Y . (In fact, Y has gamma density.)
Exercise 12.6. Use the characteristic function to verify that Remark 12.15 indeed gives two
representations of the same normal law.
Bibliography
147
Index