JG Note
JG Note
JG Note
1 Introduction
1.1 Definitions
We list equivalent definitions of jointly Gaussian random variables below.
Definition 1. Let random vector X := (X1 , . . . , Xn )⊤ . Let Z ∈ Rℓ be the
standard normal random vector (i.e. Zi ∼ N (0, 1) for i = 1, . . . , ℓ are i.i.d.).
Then X1 , . . . , Xn are jointly Gaussian if there exist µ ∈ Rn , A ∈ Rn×ℓ such
that X = AZ + µ.
Definition 2. X1 , . . . , Xn are jointly Gaussian if any linear combination of
X1 , . . . , Xn , u⊤ X, follows a normal distribution.
1
= E[(AZ)(AZ)⊤ ]
= E[AZZ⊤ A⊤ ]
= A E[ZZ⊤ ]A⊤
= AA⊤
2 Properties of JG RVs
2.1 Independent iff Uncorrelated
In general, for any two random variables X1 , X2 , if X1 and X2 are independent,
then they are necessarily uncorrelated:
Proof. Without loss of generality, we will consider the case of two jointly
Gaussian random variables. Extensions to higher dimensions follow by the
same reasoning. Suppose that X1 , X2 are uncorrelated. Recall that the entries
of the covariance matrix are Σi,j = cov(Xi , Xj ), which means that
2
σ1 0 −1 1/σ12 0
Σ= and Σ = .
0 σ22 0 1/σ22
1 (x1 − µ1 )2 (x2 − µ2 )2
1
fX (x1 , x2 ) = p exp − +
(2π)2 σ12 σ22 2 σ12 σ22
2
1 (x1 − µ1 )2 1 (x2 − µ2 )2
1 1
=p exp − ·p exp −
2πσ12 2 σ12 2πσ22 2 σ22
Note. We have shown that for jointly Gaussian random variables, the
variables being uncorrelated implies that they are independent. This does not,
however, mean that any two uncorrelated marginally normally distributed
random variables are necessarily independent. To see why the variables being
jointly Gaussian is so crucial, we will consider an example.
Example 1. Consider X ∼ N (0, 1), and Y = W X, where
(
1 w.p. 0.5
W =
−1 w.p. 0.5
Therefore, one must ensure that the random variables are jointly Gaussian
before assuming that any of these properties necessarily hold.
3
where Ai is the ith row vector of A. Now, for any C, D, c, d ∈ R, let U =
CX1 + c and V = DX1 + d. Substituting in the above expressions, we find
U = C(A⊤
1 Z + µ1 ) + c
V = D(A⊤
2 Z + µ2 ) + d
U CA1 Cµ1 + c
= Z+ .
V DA2 Dµ2 + d
4
Therefore, X −L[X | Y ] and Y are uncorrelated. By Theorem 2, X −L[X | Y ]
and Y are jointly Gaussian since they are linear combinations of X and Y .
Thus, by Theorem 1, the uncorrelated jointly Gaussian X − L[X | Y ] and Y
must be independent.
We know that functions of independent random variables are independent
(see Lemma 1 in the Appendix). This implies that X − L[X | Y ] and φ(Y )
are independent for all functions φ(·). Independent random variables are
uncorrelated, so
Clearly the first point is true for covariance matrix of jointly Gaussian
random variables by definition. In the following subsections, we shall see how
to interpret each of these statements in different ways.
Note. In order for the PDF of a multivariate normal to be defined, the
covariance matrix must be positive definite, meaning that for all x, x⊤ Σx > 0
or that Σ has all real positive eigenvalues.
5
3.2 Projection
Suppose we had a jointly Gaussian vector X and its centered version X̂ =
X − µ, and wanted to find the variance when projecting X̂ along a particular
unit direction u. By the definition of projection, this quantity is
= u⊤ Σu.
M = U ΛU ⊤ ,
6
3.4 Density Level Curves
If we examine the PDF of a JG RV (assuming it has positive definite Σ, so
an inverse exists), the significant term is
g(x) = (x − µ)⊤ Σ−1 (x − µ).
The level curves of g are the points which have equal density in the PDF. It
turns out that the level curves of g are hyperellipsoids centered at µ. For
additional details, reference 4.2 in the Appendix.
4 Appendix
4.1 Functions of Independent RVs Are Independent
Lemma 1 (Functions of independent RVs are independent). Let X, Y be two
independent random variables and g, h be real valued functions defined on the
codomains of X and Y respectively. Then, g(X) and h(Y ) are independent
random variables.
Proof.
P(g(X) ∈ A, h(Y ) ∈ B) = P(X ∈ g −1 (A), Y ∈ h−1 (B))
= P(X ∈ g −1 (A)) · P(Y ∈ h−1 (B))
= P(g(X) ∈ A) · P(h(Y ) ∈ B).
4.2.1 When Σ = I
Let us start by considering the level curves of g when Σ = I:
g(x) = x⊤ Σ−1 x = x⊤ x = ∥x∥22 .
From this, we can clearly see that the level curves of g are hyperspheres
centered at the origin.
7
4.2.2 When Σ = Λ
Things get slightly more complicated when we generalize to a positive diagonal
matrix for Σ, but not by much:
ℓ
X 1 2
g(x) = x⊤ Σ−1 x = x⊤ Λ−1 x = xi .
i=1
λi
4.2.3 When Σ = U ΛU ⊤
Now let us consider the most general case:
ℓ
⊤ −1 ⊤
X
−1 1 ⊤ 2
⊤
g(x) = x Σ x = x U Λ U x = (U x)i .
λ
i=1 i
The
√ level curves are again hyperellipsoids with the same semi-axis lengths of
λi . However, this time, the semi-axis directions are not along the coordinate
directions, but along the directions defined by the columns of U !
4.2.4 Nonzero µ
Previously we have assumed µ = 0, but what if that isn’t actually true?
When our random vector has nonzero mean, we effectively have a translation.
The level curves of g will still remain the same shape, but will simply be
moved in space such that the center is at µ instead of the origin.