0% found this document useful (0 votes)
130 views6 pages

Random Vectors and Multivariate Normal Distribution

This document summarizes key concepts about the multivariate normal distribution: 1) It defines the mean and covariance matrix of a random vector, and properties such as how linear transformations affect these quantities. 2) It presents the probability density function of the multivariate normal distribution when the covariance matrix is positive definite, as well as properties like how linear transformations of the data relate to the distribution. 3) It discusses how the distribution can be generalized to allow for positive semi-definite covariance matrices, using the characteristic function definition.

Uploaded by

John Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views6 pages

Random Vectors and Multivariate Normal Distribution

This document summarizes key concepts about the multivariate normal distribution: 1) It defines the mean and covariance matrix of a random vector, and properties such as how linear transformations affect these quantities. 2) It presents the probability density function of the multivariate normal distribution when the covariance matrix is positive definite, as well as properties like how linear transformations of the data relate to the distribution. 3) It discusses how the distribution can be generalized to allow for positive semi-definite covariance matrices, using the characteristic function definition.

Uploaded by

John Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lecture 1.

Random vectors and multivariate normal distribution

1.1 Moments of random vector


A random vector X of size p is a column vector consisting of p random variables X1 , . . . , Xp
and is X = (X1 , . . . , Xp )0 . The mean or expectation of X is defined by the vector of expec-
tations,  
E(X1 )
µ ≡ E(X) =  ...  ,
 
E(Xp )
which exists if E|Xi | < ∞ for all i = 1, . . . , p.
Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any
non-random matrices A(m×p) , B(m×q) , C(1×n) , and D(m×n) ,

E(AX + BY) = AE(X) + BE(Y),

E(AXC + D) = AE(X)C + D.

For a random vector X of size p satisfying E(Xi2 ) < ∞ for all i = 1, . . . , p, the variance–
covariance matrix (or just covariance matrix) of X is

Σ ≡ Cov(X) = E[(X − EX)(X − EX)0 ].

The covariance matrix of X is a p × p square, symmetric matrix. In particular, Σij =


Cov(Xi , Xj ) = Cov(Xj , Xi ) = Σji .
Some properties:

1. Cov(X) = E(XX0 ) − E(X)E(X)0 .

2. If c = c(p×1) is a constant, Cov(X + c) = Cov(X).

3. If A(m×p) is a constant, Cov(AX) = ACov(X)A0 .


Lemma 2. The p × p matrix Σ is a covariance matrix if and only if it is non-negative
definite.

1.2 Multivariate normal distribution - nonsingular case


Recall that the univariate normal distribution with mean µ and variance σ 2 has density
1 1
f (x) = (2πσ 2 )− 2 exp[− (x − µ)σ −2 (x − µ)].
2
Similarly, the multivariate normal distribution for the special case of nonsingular covariance
matrix Σ is defined as follows.
Definition 1. Let µ ∈ Rp and Σ(p×p) > 0. A random vector X ∈ Rp has p-variate normal
distribution with mean µ and covariance matrix Σ if it has probability density function
 
− 12 1 0 −1
f (x) = |2πΣ| exp − (x − µ) Σ (x − µ) , (1)
2

for x ∈ Rp . We use the notation X ∼ Np (µ, Σ).

Theorem 3. If X ∼ Np (µ, Σ) for Σ > 0, then


1
1. Y = Σ− 2 (X − µ) ∼ Np (0, Ip ),
L 1
2. X = Σ 2 Y + µ where Y ∼ Np (0, Ip ),

3. E(X) = µ and Cov(X) = Σ,

4. for any fixed v ∈ Rp , v0 X is univariate normal.

5. U = (X − µ)0 Σ−1 (X − µ) ∼ χ2 (p).

Example 1 (Bivariate normal).

1.2.1 Geometry of multivariate normal

The multivariate normal distribution has location parameter µ and the shape parameter
Σ > 0. In particular, let’s look into the contour of equal density

Ec = {x ∈ Rp : f (x) = c0 }
= {x ∈ Rp : (x − µ)0 Σ−1 (x − µ) = c2 }.

Moreover, consider the spectral decomposition of Σ = UΛU0 where U = [u1 , . . . , up ] and


Λ = diag(λ1 , . . . , λp ) with λ1 ≥ λ2 ≥ . . . ≥ λp > 0. The Ec , for any √
c > 0, is an ellipsoid
centered around µ with principal axes ui of length proportional to λi . If Σ = Ip , the
ellipsoid is the surface of a sphere of radius c centered at µ.
As an example, consider a bivariate normal distribution N2 (0, Σ) with
     0
2 1 cos(π/4) − sin(π/4) 3 0 cos(π/4) − sin(π/4)
Σ= = .
1 2 sin(π/4) cos(π/4) 0 1 sin(π/4) cos(π/4)

The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution
is determined by the ellipse given by the two principal axes (one at 45 degree line, the
other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for
c = 0.5, 1, 1.5, 2, . . ..

2
Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can
represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the
similar role.

1.3 General multivariate normal distribution


The characteristic function of a random vector X is defined as
0
ϕX (t) = E(eit X ), for t ∈ Rp .

Note that the characteristic function is C-valued, and always exists. We collect some
important facts.

L
1. ϕX (t) = ϕY (t) if and only if X = Y.

2. If X and Y are independent, then ϕX+Y = ϕX (t)ϕY (t).

3. Xn ⇒ X if and only if ϕXn (t) → ϕX (t) for all t.

An important corollary follows from the uniqueness of the characteristic function.

Corollary 4 (Cramer–Wold device). If X is a p × 1 random vector then its distribution is


uniquely determined by the distributions of linear functions of t0 X, for every t ∈ Rp .

Corollary 4 paves the way to the definition of (general) multivariate normal distribution.

Definition 2. A random vector X ∈ Rp has a multivariate normal distribution if t0 X is an


univariate normal for all t ∈ Rp .

The definition says that X is MVN if every projection of X onto a 1-dimensional subspace
is normal, with a convention that a degenerate distribution δc has a normal distribution with
variance 0, i.e., c ∼ N (c, 0). The definition does not require that Cov(X) is nonsingular.

3
Theorem 5. The characteristic function of a multivariate normal distribution with mean µ
and covariance matrix Σ ≥ 0 is, for t ∈ Rp ,
1
ϕ(t) = exp[it0 µ − t0 Σt].
2
If Σ > 0, then the pdf exists and is the same as (1).

In the following, the notation X ∼ N (µ, Σ) is valid for a non-negative definite Σ. How-
ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite.

Proposition 6. If X ∼ Np (µ, Σ) and Y = AX + b for A(q×p) and b(q×1) , then Y ∼


Nq (Aµ + b, AΣA0 ).

Next two results are concerning independence and conditional distributions of normal
random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s,
r + s = p, and suppose µ and Σ are partitioned accordingly. That is,
     
X1 µ1 Σ11 Σ12
X= ∼ Np , .
X2 µ2 Σ21 Σ22

Proposition 7. The normal random vectors X1 and X2 are independent if and only if
Cov(X1 , X2 ) = Σ12 = 0.

Proposition 8. The conditional distribution of X1 given X2 = x2 is

Nr (µ1 + Σ12 Σ−1 −1


22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 )

Proof. Consider new random vectors X∗1 = X1 − Σ12 Σ−1 ∗


22 X2 and X2 = X2 ,

−Σ12 Σ−1
 ∗  
∗ X1 Ir 22
X = = AX, A = .
X∗2 0(s×r) Is

By Proposition 6, X∗ is multivariate normal. An inspection of the covariance matrix of X∗


leads that X∗1 and X∗2 are independent. The result follows by writing

X1 = X∗1 + Σ12 Σ−1


22 X2 ,

−1
and that the distribution (law) of X1 given X2 = x2 is L(X1 | X2 = x2 ) = L(X∗1 +Σ12 Σ22 X2 |
∗ −1
X2 = x2 ) = L(X1 + Σ12 Σ22 x2 | X2 = x2 ), which is a MVN of dimension r.

4
1.4 Multivariate Central Limit Theorem
If X1 , X2 , . . . ∈ Rp are i.i.d. with E(Xi ) = µ and Cov(X) = Σ, then
n
− 12
X
n (Xj − µ) ⇒ Np (0, Σ) as n → ∞,
j=1

or equivalently,
1
n 2 (X̄n − µ) ⇒ Np (0, Σ) as n → ∞,
1
Pn
where X̄n = 2 j=1 Xj .
The delta-method can be used for asymptotic normality of h(X̄n ) for some function
h : Rp → R. In particular, denote ∇h(x) for the gradient of h at x. Using the first two
terms of Taylor series,

h(X̄n ) = h(µ) + (∇h(µ))0 (X̄n − µ) + Op (kX̄n − µk22 ),

Then Slutsky’s theorem gives the result,


√ √ √
n(h(X̄n ) − h(µ)) = (∇h(µ))0 n(X̄n − µ) + Op ( n(X̄n − µ)0 (X̄n − µ))
⇒ (∇h(µ))0 Np (0, Σ) as n → ∞,
= Np (0, (∇h(µ))0 Σ(∇h(µ)))

1.5 Quadratic forms in normal random vectors


Let X ∼ Np (µ, Σ). A quadratic form in X is a random variable of the form
p p
X X
0
Y = X AX = Xi aij Xj ,
i=1 j=1

where A is a p × p symmetric matrix. We are interested in the distribution of quadratic


forms and the conditions under which two quadratic forms are independent.
Example 2. A special case: If X ∼ Np (0, Ip ) and A = Ip ,
p
X
0 0
Y = X AX = X X = Xi2 ∼ χ2 (p).
i=1

Fact 1. Recall the following:

1. A p × p matrix A is idempotent if A2 = A.

2. If A is symmetric, then A = Γ0 ΛΓ, where Λ = diag(λi ) and Γ is orthogonal.

3. If A is symmetric idempotent,

(a) its eigenvalues are either 0 or 1,

5
(b) rank(A) = #{non zero eigenvalues} = trace(A).

Theorem 9. Let X ∼ Np (0, σ 2 I) and A be a p × p symmetric matrix. Then

X0 AX
Y = ∼ χ2 (m)
σ2
if and only if A is idempotent of rank m < p.

Corollary 10. Let X ∼ Np (0, Σ) and A be a p × p symmetric matrix. Then

Y = X0 AX ∼ χ2 (m)

if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m.

Example 3. If X ∼ Np (µ, Σ) then (X − µ)0 Σ−1 (X − µ) ∼ χ2 (p).

Theorem 11. Let X ∼ Np (0, I) and A be a p × p symmetric matrix, and B be a k × p


matrix. If BA = 0, then BX and X0 AX are independent.

Example 4. Let Xi ∼ N (µ, σ 2 ) i.i.d. The sample mean X̄n and the sample variance Sn2 =
2
(n − 1)−1 ni=1 (Xi − X̄n )2 are independent. Moreover, (n − 1) Sσn2 ∼ χ2 (n − 1).
P

Theorem 12. Let X ∼ Np (0, I). Suppose A and B are p × p symmetric matrices. If
BA = 0, then X0 AX and X0 BX are independent.

Corollary 13. Let X ∼ Np (0, Σ) and A be a p × p symmetric matrix.

1. For B(k×p) , BX and X0 AX are independent if BΣA = 0;

2. For symmetric B, X0 AX and X0 BX are independent if BΣA = 0.

Example 5. The residual sum of squares in the standard linear regression has a scaled chi-
squared distribution and is independent with the coefficient estimates.

Next lecture is on the distribution of the sample covariance matrix.

You might also like