0% found this document useful (0 votes)
11 views

Lect 03

The document discusses random vectors and their properties. It defines a random vector as a collection of random variables defined on the same probability space. A random vector is specified by its joint cumulative distribution function or probability density/mass function. Key properties discussed include the mean, covariance matrix, independence, conditional independence, coloring and whitening transformations, and finding the square root of the covariance matrix.

Uploaded by

davidlass547
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lect 03

The document discusses random vectors and their properties. It defines a random vector as a collection of random variables defined on the same probability space. A random vector is specified by its joint cumulative distribution function or probability density/mass function. Key properties discussed include the mean, covariance matrix, independence, conditional independence, coloring and whitening transformations, and finding the square root of the covariance matrix.

Uploaded by

davidlass547
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture Notes 3

Random Vectors

• Specifying a Random Vector

• Mean and Covariance Matrix

• Coloring and Whitening

• Gaussian Random Vectors

EE 278: Random Vectors Page 3 – 1


Specifying a Random Vector

• Let X1, X2, . . . , Xn be random variables defined on the same probability space.
We define a random vector (RV) as
 
X1
 X2 
X=  .. 

Xn

• X is completely specified by its joint cdf for x = (x1, x2, . . . , xn):


FX(x) = P{X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn} , x ∈ Rn

• If X is continuous, i.e., FX(x) is a continuous function of x, then X can be


specified by its joint pdf:
fX(x) = fX1,X2,...,Xn (x1, x2, . . . , xn) , x ∈ Rn

• If X is discrete then it can be specified by its joint pmf:


pX(x) = pX1,X2,...,Xn (x1, x2, . . . , xn) , x ∈ Xn

EE 278: Random Vectors Page 3 – 2


• A marginal cdf (pdf, pmf) is the joint cdf (pdf, pmf) for a subset of
{X1, . . . , Xn}; e.g., for  
X1
X = X 2 
X3
the marginals are
fX1 (x1) , fX2 (x2) , fX3 (x3)
fX1,X2 (x1, x2) , fX1,X3 (x1, x3) , fX2,X3 (x2, x3)

• The marginals can be obtained from the joint in the usual way. For the previous
example,
FX1 (x1) = lim FX(x1, x2, x3 )
x2 ,x3 →∞
Z ∞
fX1,X2 (x1, x2) = fX1,X2,X3 (x1, x2, x3) dx3
−∞

EE 278: Random Vectors Page 3 – 3


• Conditional cdf (pdf, pmf) can also be defined in the usual way. E.g., the
conditional pdf of Xnk+1 = (Xk+1, . . . , Xn) given Xk = (X1, . . . , Xk ) is
fX(x1, x2, . . . , xn) fX(x)
fXn |Xk (xnk+1 |xk ) = =
k+1 fXk (x1, x2, . . . , xk ) fXk (xk )
• Chain Rule: We can write
fX(x) = fX1 (x1)fX2|X1 (x2|x1)fX3|X1,X2 (x3|x1, x2) · · · fXn|Xn−1 (xn|xn−1)

Proof: By induction. The chain rule holds for n = 2 by definition of conditional


pdf. Now suppose it is true for n − 1. Then
fX(x) = fXn−1 (xn−1)fXn|Xn−1 (xn|xn−1)
= fX1 (x1)fX2|X1 (x2|x1) · · · fXn−1|Xn−2 (xn−1|xn−2)fXn|Xn−1 (xn|xn−1) ,
which completes the proof

EE 278: Random Vectors Page 3 – 4


Independence and Conditional Independence

• Independence is defined in the usual way; e.g., X1, X2, . . . , Xn are independent
if n
Y
fX(x) = fXi (xi) for all (x1, . . . , xn)
i=1

• Important special case, i.i.d. r.v.s: X1, X2, . . . , Xn are said to be independent,
identically distributed (i.i.d.) if they are independent and have the same
marginals
Example: if we flip a coin n times independently, we generate i.i.d. Bern(p)
r.v.s. X1, X2, . . . , Xn
• R.v.s X1 and X3 are said to be conditionally independent given X2 if
fX1,X3|X2 (x1 , x3|x2) = fX1|X2 (x1|x2)fX3|X2 (x3|x2) for all (x1 , x2, x3)

• Conditional independence neither implies nor is implied by independence;


X1 and X3 independent given X2 does not mean that X1 and X3 are
independent (or vice versa)

EE 278: Random Vectors Page 3 – 5


• Example: Coin with random bias. Given a coin with random bias P ∼ fP (p),
flip it n times independently to generate the r.v.s X1, X2, . . . , Xn , where
Xi = 1 if i-th flip is heads, 0 otherwise
◦ X1, X2, . . . , Xn are not independent
◦ However, X1, X2, . . . , Xn are conditionally independent given P ; in fact, they
are i.i.d. Bern(p) for every P = p
• Example: Additive noise channel. Consider an additive noise channel with signal
X , noise Z , and observation Y = X + Z , where X and Z are independent
◦ Although X and Z are independent, they are not in general conditionally
independent given Y

EE 278: Random Vectors Page 3 – 6


Mean and Covariance Matrix

• The mean of the random vector X is defined as


 T
E(X) = E(X1) E(X2) · · · E(Xn)

• Denote the covariance between Xi and Xj , Cov(Xi, Xj ), by σij (so the


2
variance of Xi is denoted by σii , Var(Xi), or σX i
)
• The covariance matrix of X is defined as
 
σ11 σ12 · · · σ1n
 σ21 σ22 · · · σ2n 
ΣX =  .. .. ... .. 

σn1 σn2 · · · σnn

• For n = 2, we can use the definition of correlation coefficient to obtain


2
   
σ11 σ12 σX1 ρX1,X2 σX1 σX2
ΣX = = 2
σ21 σ22 ρX1,X2 σX1 σX2 σX 2

EE 278: Random Vectors Page 3 – 7


Properties of Covariance Matrix ΣX

• ΣX is real and symmetric (since σij = σji )


• ΣX is positive semidefinite, i.e., the quadratic form
aT ΣXa ≥ 0 for every real vector a
Equivalently, all the eigenvalues of ΣX are nonnegative, and also all principal
minors are nonnegative
• To show that ΣX is positive semidefinite we write
T
 
ΣX = E (X − E(X))(X − E(X)) ,
i.e., as the expectation of an outer product. Thus
T T T
 
a ΣXa = a E (X − E(X))(X − E(X)) a
 T T

= E a (X − E(X))(X − E(X)) a
 T 2

= E (a (X − E(X))) ≥ 0

EE 278: Random Vectors Page 3 – 8


Which of the Following Can Be a Covariance Matrix ?

     
1 0 0 1 2 1 1 0 1
1. 0 1 0 2. 2 1 1 3. 1 2 1
0 0 1 1 1 1 0 1 3

     
−1 1 1 1 1 1 1 2 3
4.  1 1 1 5. 1 2 1 6. 2 4 6
1 1 1 1 1 3 3 6 9

EE 278: Random Vectors Page 3 – 9


Coloring and Whitening

• Square root of covariance matrix: Let Σ be a covariance matrix. Then there


exists an n × n matrix Σ1/2 such that Σ = Σ1/2(Σ1/2)T . The matrix Σ1/2 is
called the square root of Σ
• Coloring: Let X be white RV, i.e., has zero mean and ΣX = aI , a > 0. Assume
without loss of generality that a = 1
Let Σ be a covariance matrix, then the RV Y = Σ1/2X has covariance matrix
Σ (why?)
Hence we can generate a RV with any prescribed covariance from a white RV
• Whitening: Given a zero mean RV Y with nonsingular covariance matrix Σ,
then the RV X = Σ−1/2 Y is white
Hence, we can generate a white RV from any RV with nonsingular covariance
matrix
• Coloring and whitening have applications in simulations, detection, and
estimation

EE 278: Random Vectors Page 3 – 10


Finding Square Root of Σ

• For convenience, we assume throughout that Σ is nonsingular


• Since Σ is symmetric, it has n real eigenvalues λ1, λ2, . . . , λn and n
corresponding orthogonal eigenvectors u1, u2, . . . , un
Further, since Σ is positive definite, the eigenvalues are all positive
• Thus, we have
Σui = λiui, λi > 0, i = 1, 2, . . . , n
uTi uj = 0 for every i 6= j
Without loss of generality assume that the ui vectors are unit vectors
• The first set of equations can be rewritten in the matrix form
ΣU = U Λ,
where
U = [u1 u2 . . . un]
and Λ is a diagonal matrix with diagonal elements λi

EE 278: Random Vectors Page 3 – 11


• Note that U is a unitary matrix (U T U = U U T = I ), hence
Σ = U ΛU T
and the square root of Σ is
Σ1/2 = U Λ1/2,
1/2
where Λ1/2 is a diagonal matrix with diagonal elements λi
• The inverse of the square root is straightforward to find as
Σ−1/2 = Λ−1/2U T

• Example: Let  
2 1
Σ=
1 3
To find the eigenvalues of Σ, we find the roots of the polynomial equation
det(Σ − λI) = λ2 − 5λ + 5 = 0,
which gives λ1 = 3.62, λ2 = 1.38
To find the eigenvectors, consider
    
2 1 u11 u
= 3.62 11 ,
1 3 u12 u12

EE 278: Random Vectors Page 3 – 12


and u211 + u212 = 1, which yields
 
0.53
u1 =
0.85
Similarly, we can find the second eigenvector
 
−0.85
u2 =
0.53
Hence,   √   
1/2 0.53 −0.85 3.62 √ 0 1 −1
Σ = =
0.85 0.53 0 1.38 1.62 0.62
The inverse of the square root is
 √    
1/ 3.62 √0 0.53 0.85 0.28 0.45
Σ−1/2 = =
0 1/ 1.38 −0.85 0.53 −0.72 0.45

EE 278: Random Vectors Page 3 – 13


Geometric Interpretation

• To generate a RV Y with covariance matrix Σ from white RV X, we use the


transformation Y = U Λ1/2X
• Equivalently, we first scale each component of X to obtain the RV Z = Λ1/2X
And then rotate Z using U to obtain Y = U Z
• We can visualize this by plotting xT Ix = c, zT Λz = c, and yT Σy = c
x2 z2 y2

Z = Λ1/2X Y = UZ
=⇒ =⇒

x1 z1 y1

⇐= ⇐=
X = Λ−1/2Z Z = UT Y

EE 278: Random Vectors Page 3 – 14


Cholesky Decomposition

• Σ has many square roots:


If Σ1/2 is a square root, then for any unitary matrix V , Σ1/2V is also a square
root since Σ1/2V V T (Σ1/2)T = Σ
• The Cholesky decomposition is an efficient algorithm for computing lower
triangle square root that can be used to perform coloring causally (sequentially)
• For n = 3, we want to find a lower triangle matrix (square root) A such that
    
σ11 σ12 σ13 a11 0 0 a11 a21 a31
Σ = σ21 σ22 σ23 = a21 a22 0   0 a22 a32
σ31 σ32 σ33 a31 a32 a33 0 0 a33
The elements of A are computed in a raster scan manner:

a11 : σ11 = a211 ⇒ a11 = σ11
a21 : σ21 = a21a11 ⇒ a21 = σ21/a11
p
2 2
a22 : σ22 = a21 + a22 ⇒ a22 = σ22 − a221
a31 : σ31 = a11a31 ⇒ a31 = σ31/a11

EE 278: Random Vectors Page 3 – 15


a32 : σ32 = a21a31 + a22a32 ⇒ a32 = (σ32 − a21a31)/a22
p
2 2 2
a33 : σ33 = a31 + a32 + a33 ⇒ a33 = σ33 − a231 − a232
• The inverse of a lower triangle square root is also lower triangular
• Coloring and whitening summary:
◦ Coloring:

X Σ1/2 Y

ΣX = I ΣY = Σ
◦ Whitening:

Y Σ−1/2 X

ΣY = Σ ΣX = I
◦ Lower triangle square root and its inverse can be efficiently computed using
Cholesky decomposition

EE 278: Random Vectors Page 3 – 16


Gaussian Random Vectors

• A random vector X = (X1, . . . , Xn) is a Gaussian random vector (GRV) (or


X1, X2, . . . , Xn are jointly Gaussian r.v.s) if the joint pdf is of the form
1 − 1
(x − µ)T −1
Σ (x − µ) ,
fX(x) = n 1 e 2
(2π) 2 |Σ| 2
where µ is the mean and Σ is the covariance matrix of X, and Σ > 0, i.e., Σ
is positive definite
• Verify that this joint pdf is the same as the case n = 2 from Lecture Notes 2
• Notation: X ∼ N (µ, Σ) denotes a GRV with given mean and covariance matrix
• Since Σ is positive definite, Σ−1 is positive definite. Thus if x − µ 6= 0,
(x − µ)T Σ−1 (x − µ) > 0 ,
which means that the contours of equal pdf are ellipsoids
• The GRV X ∼ N (0, aI), where I is the identity matrix and a > 0, is called
white; its contours of equal joint pdf are spheres centered at the origin

EE 278: Random Vectors Page 3 – 17


Properties of GRVs

• Property 1: For a GRV, uncorrelation implies independence


This can be verified by substituting σij = 0 for all i 6= j in the joint pdf.
Then Σ becomes diagonal and so does Σ−1 , and the joint pdf reduces to the
product of the marginals Xi ∼ N (µi, σii)
For the white GRV X ∼ N (0, aI), the r.v.s are i.i.d. N (0, a)
• Property 2: Linear transformation of a GRV yields a GRV, i.e., given any
m × n matrix A, where m ≤ n and A has full rank m, then
Y = AX ∼ N (Aµ, AΣAT )

• Example: Let   
2 1
X ∼ N 0,
1 3
Find the joint pdf of 

1 1
Y= X
1 0

EE 278: Random Vectors Page 3 – 18


Solution: From Property 2, we conclude that
       
1 1 2 1 1 1 7 3
Y ∼ N 0, = N 0,
1 0 1 3 1 0 3 2

Before we prove Property 2, let us show that


E(Y) = Aµ and ΣY = AΣAT
These results follow from linearity of expectation. First, expectation:
E(Y) = E(AX) = A E(X) = Aµ
Next consider the covariance matrix:
T
 
ΣY = E (Y − E(Y))(Y − E(Y))
T
 
= E (AX − Aµ)(AX − Aµ)
= A E (X − µ)(X − µ) A = AΣAT
T
  T

Of course this is not sufficient to show that Y is a GRV — we must also show
that the joint pdf has the right form
We do so using the characteristic function for a random vector

EE 278: Random Vectors Page 3 – 19


• Definition: If X ∼ fX(x), the characteristic function of X is
 T 
ΦX(ω) = E eiω X ,

where ω is an n-dimensional real valued vector and i = −1
Thus Z ∞ Z ∞ Z ∞
iω T x
ΦX(ω) = ... fX(x)e dx
−∞ −∞ −∞
This is the inverse of the multi-dimensional Fourier transform of fX(x), which
implies that there is a one-to-one correspondence between ΦX(ω) and fX(x).
The joint pdf can be found by taking the Fourier transform of ΦX(ω), i.e.,
Z ∞Z ∞ Z ∞
1 −iω T x
fX(x) = ... Φ (ω)e
n X

−∞ −∞ −∞ (2π)

• Example: The characteristic function for X ∼ N (µ, σ 2 ) is


1 2 2
ΦX (ω) = e 2 ω σ + iµω ,

and for a GRV X ∼ N (µ, Σ),


1 T T
ΦX(ω) = e − 2 ω Σω + iω µ

EE 278: Random Vectors Page 3 – 20


• Now let’s go back to proving Property 2
Since A is an m × n matrix, Y = AX and ω are m-dimensional. Therefore the
characteristic function of Y is
 T 
ΦY (ω) = E eiω Y
 T 
= E eiω AX

= ΦX(AT ω)
1 T T T T

=e 2 (A ω) Σ(A ω) + iω Aµ

1 T T T
=e− 2 ω (AΣA )ω + iω Aµ

Thus Y = AX ∼ N (Aµ, AΣAT )


• An equivalent definition of GRV: X is a GRV iff for every real vector a 6= 0, the
r.v. Y = aT X is Gaussian (see HW for proof)
• Whitening transforms a GRV to a white GRV; conversely, coloring transforms a
white GRV to a GRV with prescribed covariance matrix

EE 278: Random Vectors Page 3 – 21


• Property 3: Marginals of a GRV are Gaussian, i.e., if X is GRV then for any
subset {i1 , i2, . . . , ik } ⊂ {1, 2, . . . , n} of indexes, the RV
 
Xi1
 Xi 
2
Y=  .. 
Xik
is a GRV
 
X1
• To show this we use Property 2. For example, let n = 3 and Y =
X3
We can express Y as a linear transformation of X:
 
  X1  
1 0 0   X1
Y= X2 =
0 0 1 X3
X3
Therefore    
µ1 σ σ13
Y∼N , 11
µ3 σ31 σ33
• As we have seen in Lecture Notes 2, the converse of Property 3 does not hold in
general, i.e., Gaussian marginals do not necessarily mean that the r.v.s are
jointly Gaussian

EE 278: Random Vectors Page 3 – 22


• Property 4: Conditionals of a GRV are Gaussian, more specifically, if
     
X1 µ1 Σ11 | Σ12
X = −− ∼ N −− , −− | −− ,
X2 µ2 Σ21 | Σ22
where X1 is a k-dim RV and X2 is an n − k-dim RV, then
Σ21Σ−1 Σ21Σ−1

X2 | {X1 = x} ∼ N 11 (x − µ 1) + µ 2 , Σ22 − 11 Σ12

Compare this to the case of n = 2 and k = 1:


2
 
σ21 σ12
X2 | {X1 = x} ∼ N (x − µ1) + µ2 , σ22 −
σ11 σ11
• Example:      
X1 1 1 | 2 1
−−
 ∼ N −− , −− | −− −−
   
 
 X2   2   2 | 5 2 
X3 2 1 | 2 9

EE 278: Random Vectors Page 3 – 23


From Property 4, it follows that
     
2 2 2x
E(X2 | X1 = x) = (x − 1) + =
1 2 x+1
   
5 2 2  
Σ{X2|X1=x} = − 2 1
2 9 1
 
1 0
=
0 8

• The proof of Property 4 follows from properties 1 and 2 and the orthogonality
principle (HW exercise)

EE 278: Random Vectors Page 3 – 24

You might also like