Random Vectors and Multivariate Normal Distribution
Random Vectors and Multivariate Normal Distribution
E(AXC + D) = AE(X)C + D.
For a random vector X of size p satisfying E(Xi2 ) < ∞ for all i = 1, . . . , p, the variance–
covariance matrix (or just covariance matrix) of X is
The multivariate normal distribution has location parameter µ and the shape parameter
Σ > 0. In particular, let’s look into the contour of equal density
Ec = {x ∈ Rp : f (x) = c0 }
= {x ∈ Rp : (x − µ)0 Σ−1 (x − µ) = c2 }.
The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution
is determined by the ellipse given by the two principal axes (one at 45 degree line, the
other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for
c = 0.5, 1, 1.5, 2, . . ..
2
Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can
represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the
similar role.
Note that the characteristic function is C-valued, and always exists. We collect some
important facts.
L
1. ϕX (t) = ϕY (t) if and only if X = Y.
Corollary 4 paves the way to the definition of (general) multivariate normal distribution.
The definition says that X is MVN if every projection of X onto a 1-dimensional subspace
is normal, with a convention that a degenerate distribution δc has a normal distribution with
variance 0, i.e., c ∼ N (c, 0). The definition does not require that Cov(X) is nonsingular.
3
Theorem 5. The characteristic function of a multivariate normal distribution with mean µ
and covariance matrix Σ ≥ 0 is, for t ∈ Rp ,
1
ϕ(t) = exp[it0 µ − t0 Σt].
2
If Σ > 0, then the pdf exists and is the same as (1).
In the following, the notation X ∼ N (µ, Σ) is valid for a non-negative definite Σ. How-
ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite.
Next two results are concerning independence and conditional distributions of normal
random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s,
r + s = p, and suppose µ and Σ are partitioned accordingly. That is,
X1 µ1 Σ11 Σ12
X= ∼ Np , .
X2 µ2 Σ21 Σ22
Proposition 7. The normal random vectors X1 and X2 are independent if and only if
Cov(X1 , X2 ) = Σ12 = 0.
−Σ12 Σ−1
∗
∗ X1 Ir 22
X = = AX, A = .
X∗2 0(s×r) Is
−1
and that the distribution (law) of X1 given X2 = x2 is L(X1 | X2 = x2 ) = L(X∗1 +Σ12 Σ22 X2 |
∗ −1
X2 = x2 ) = L(X1 + Σ12 Σ22 x2 | X2 = x2 ), which is a MVN of dimension r.
4
1.4 Multivariate Central Limit Theorem
If X1 , X2 , . . . ∈ Rp are i.i.d. with E(Xi ) = µ and Cov(X) = Σ, then
n
− 12
X
n (Xj − µ) ⇒ Np (0, Σ) as n → ∞,
j=1
or equivalently,
1
n 2 (X̄n − µ) ⇒ Np (0, Σ) as n → ∞,
1
Pn
where X̄n = 2 j=1 Xj .
The delta-method can be used for asymptotic normality of h(X̄n ) for some function
h : Rp → R. In particular, denote ∇h(x) for the gradient of h at x. Using the first two
terms of Taylor series,
1. A p × p matrix A is idempotent if A2 = A.
3. If A is symmetric idempotent,
5
(b) rank(A) = #{non zero eigenvalues} = trace(A).
X0 AX
Y = ∼ χ2 (m)
σ2
if and only if A is idempotent of rank m < p.
Y = X0 AX ∼ χ2 (m)
Example 4. Let Xi ∼ N (µ, σ 2 ) i.i.d. The sample mean X̄n and the sample variance Sn2 =
2
(n − 1)−1 ni=1 (Xi − X̄n )2 are independent. Moreover, (n − 1) Sσn2 ∼ χ2 (n − 1).
P
Theorem 12. Let X ∼ Np (0, I). Suppose A and B are p × p symmetric matrices. If
BA = 0, then X0 AX and X0 BX are independent.
Example 5. The residual sum of squares in the standard linear regression has a scaled chi-
squared distribution and is independent with the coefficient estimates.