0% found this document useful (0 votes)
20 views43 pages

Lect Slides#4

Lecture #4 covers Vector Random Variables, including their definitions, properties, and functions of several random variables. It discusses concepts such as expected values, joint distributions, and independence of random variables, along with examples and applications in estimation techniques like MAP and ML estimators. The lecture emphasizes the importance of understanding jointly Gaussian random vectors and their implications in statistical analysis.

Uploaded by

sknlilzawaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views43 pages

Lect Slides#4

Lecture #4 covers Vector Random Variables, including their definitions, properties, and functions of several random variables. It discusses concepts such as expected values, joint distributions, and independence of random variables, along with examples and applications in estimation techniques like MAP and ML estimators. The lecture emphasizes the importance of understanding jointly Gaussian random vectors and their implications in statistical analysis.

Uploaded by

sknlilzawaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Lecture#4

Chapter#6.

Vector Random Variables


Overview
1. Vector Random Variables
2. Functions of Several Random Variables
3. Expected Values of Vector Random Variables
4. Jointly Gaussian Random Vectors
5. Estimation of Random Variables (Constant and Linear Case)
6.1 Vector Random Variables
A vector random variable X is a function that assigns a vector of
real numbers to each outcome ζ ∈ S , the sample space of the
random experiment.
The uppercase boldfaced notation X is a column vector (n × 1 matrix)

Vector notation may be used (just omit the transpose)

X = (X 1 , X 2 , ···, X n )

Value of the random vector is denoted by lowercase letters

x = (x 1 , x 2 , . . . , x n )
Examples
An MP3 codec (coder and decoder), or cellular phone
6.1 Arrivals at a Packet Switch
codec processes the audio in blocks of n samples,
Packets arrive at each of three input ports according to
independent Bernoulli trials with p = 1 / 2 , and each X = ( X 1 , X 2 , . . . , X n ) . Then X is a vector random
packet is equally likely to be relayed to any of three variable.
output ports. Sampling sin(x) with T = /5
Let X = ( X 1 , X 2 , X 3 ) where X i is the total number
of packets arriving for output port i . Then X is a vector 1 sin X(t)
random variable, whose values are determined by the sin X
k
Pattern of arrivals at the input ports. 0.5

0
T = /5
6.3 Samples of an Audio Signal -0.5
Let the outcome ζ of a random experiment be an audio
-1
signal X ( t ) .
Let the random variable X k = X ( k T ) be the sampling 0 1 2 3 4 5 6
of the signal at time k T with unit interval T . k
The function from X ( t ) to X k is called analog-to-digital X(t) or X = X(kT)
converter (A2D or ADC).
Events and Joint Probabilities
For the n-dimensional random variable X , events have the following
product form:

A = { X 1 ∈ A 1 } ∩{ X 2 ∈ A 2 } ∩···∩{ X n ∈ A n }

where each A i is a one-dimensional event that involve X i only.


Obtaining the probabilities of the product-form events:

P [A] = P [{X1 ∈ A1 } ∩ {X2 ∈ A2 } ∩ · · · ∩ {Xn ∈ An }]


, P [X1 ∈ A1 , X2 ∈ A2 , . . . , Xn ∈ An ]

Joint probability mass function


Marginal pmf
The marginal pmf of X j can be obtained by adding the joint pmf
over all variables other than x j :

Any marginal pmf of (k ≤ n) random variables can be obtained by


marginalizing (n −k) “other” variables.
For example, the marginal pmf for (X 1 , . . . , X n−1 ) is given by

Summing over all other xn


Joint cdf: General Case
The joint cumulative distribution function of X is a function of
x = (x 1 , . . . , x n ):
F X (x ) , F X 1 ,...,X n (x 1 , . . . , x n )
= P [X 1 ≤ x 1 , X 2 ≤ x 2 , . . . , X n ≤ x n ] ,

whose domain is [0, 1] (by Axioms 1 and 2):


0 ≤ FX(x) ≤ 1

The joint cdf is defined for discrete, continuous, and of mixed type.
The joint marginal cdf can be obtained by sending uninterested
variables to + ∞ :
Joint pdf: Continuous RV
X 1 , . . . , X n are said to be jointly continuous random variables if
the probability of any n-dimensional event A is given by
n-dimensional integral of a joint pdf, i.e., if a joint pdf exists:

If the derivative of a cdf exists, its pdf is given by

The marginal pdf for a subset of the random variables is obtained by


integrating out the other variables:
Example#6.6
The random variable X1, X2, and X3 have the joint Gaussian pdf given below. Find the marginal
pdf of X1 and X3

The marginal pdf for the pair X1 and X3 is found by integrating the joint pdf over x2
6.1.3 Independence
RVs X 1 , . . . , X n are independent if and only if their joint cdf can be
factorized:

If the RVs are discrete,

If the RVs are jointly continuous, the same factorization is applied to


the joint pdf as well,

Example 6.8 The n samples X 1 , X 2 , . . . , X n of a noise signal have joint pdf


— 12 ( x 21 + · · · + x 2n )
e
f X 1 ,...,X n (x 1 , . . . , x n ) = n
2π 2
Are they independent?

The given joint pdf is the product of ‘n’ one dimensional Gaussian (0,1) pdf. Hence independent
6.2 Functions of Several RVs
One Function of Several RVs Let a RV Z = g(X 1 , X 2 , . . . , X n ). The cdf of Z
is found by
F Z (z) = P [Z ≤ z] = P [{ x : g(x ) ≤ z}
and the pdf of Z is found by the derivative of F Z (z).

Transformations of Random Vectors Let X = (X 1 , . . . , X n ), and

Z 1 = g 1 (X), Z 2 = g 2 (X), . . . , Z n = g n (X).

The joint cdf of Z = (Z 1 , . . . , Z n ) at the point z = (z 1 , . . . , z n ) is

F Z 1 ,...,Z n (z1 , . . . , zn ) = P [g1 (X ) ≤ z1 , g2 (X ) ≤ z2 , . . . , gn (X ) ≤ zn ]


Ex 6.9 Maximum a nd Minimum of n RVs
Let W = max(X 1 , X 2 , . . . , X n ) and Z = min(X 1 , X 2 , . . . , X n ), where X i
are i.i.d. RVs. Find F W (w) and F Z (z) in terms of F X (·).
(i.i.d. ≡ “independent, identically distributed” )
Solution
X i are i.i.d. RVs with their cdf being F X ( x) ⇒

P [X i ≤ x] = F X ( x) , P [X i > x] = 1 − F X ( x) , ∀i ∈ { 1, 2, . . . , n}

Then,

n
F W (w) = P [ ma x( X 1 , . . . , X n ) ≤ w] = P [X 1 ≤ w] · · · P [ X n ≤ w] = { F X (w)}

F Z ( z) = P [min(X 1 , . . . , X n ) ≤ z] = P [X 1 ≤ z ∪ · · · ∪ X n ≤ z]
= P [(X 1 > z ∩ · · · ∩ X n > z) c ] = 1 − P [X 1 > z ∩ · · · ∩ X n > z]
= 1 − P [X 1 > z] · · · P [X n > z] = 1 − { 1 − F X ( z) } n
Max of Two Random variables
X , X Y,
Z  max( X , Y )  
Y , X Y,

FZ ( z)  P X  z,Y  z   FXY ( z, z).

If X and Y are independent, then


FZ ( z )  FX ( z ) FY ( z )

If X and Y same and independent (IID), then


FZ ( z )  FX ( z ) FX ( z )  FX ( z ) 
2
Min of Two Random Variables
Y , X Y,
W  min( X , Y )  
X , X  Y.

FW ( w)  1  PW  w  1  P X  w, Y  w

If X and Y same and independent (IID), then

FW ( w)  1  PW  w  1  P X  w, X  w

FW ( w)  1  PW  w  1  [ P X  w) P ( X  w]

FW ( w)  1  PW  w  1  [ P ( X  w)]2

FW ( w)  1  PW  w  1  [1  FX ( x)]2
Example:6.12
6.3 Expected Values of Vector RVs
An expected value of a function g(X) = g(X 1 , . . . , X n ) of a vector
random variable X = (X 1 , . . . , X n ) is given by:

Expected values for sum of functions of X

E [g1 (X ) + ···+ gn (X )] = E [g1 (X )] + ···+ E [gn (X )]

If X 1 , . . . , X n are independent random variables, then

E[g 1 (X 1 )g 2 (X 2 ) ···g n (X n )] = E[g 1 (X 1 )]E[g 2 (X 2 )] ···E[g n (X n )]


Mean Vector and Correlation Matrix
Expected value of a vector is also a vector: X size : (N × 1)

Correlation matrix: size N × N , Symmetric Matrix

(2nd Moment as entries)


RX = E[XXT ]

. . .
Covariance Matrix

Diagonal Element: Variance

Independent components: Diagonal Matrix ( also uncorrelated)


Diagonal Elements is Variance vector
Linear Transformation: Mean
(nxn) (nx1)
(nx1)
Linear Transformation: Covariance

{𝐴(𝑋 − 𝑚𝑋 )}𝑇 = (𝑋 − 𝑚𝑥)𝑇 𝐴𝑇


Example 6.17
Transformation of Uncorrelated Random Vector

Suppose that the components of X are uncorrelated and have unit


variance, then K X = I, the identity matrix. The covariance matrix for
Y = A X is
K Y = AKXAT = AIAT = AAT
In general K Y = A A T is not a diagonal matrix and so the components of
Y are correlated.
6.4 Jointly Gaussian Random Vectors
The random variables X 1 , X 2 , . . . , X n are said to be jointly Gaussian if
their joint pdf is in the following form:

where x = [x 1 , x 2 , . . . , x n ] T and m = [E[X 1 ], E[X 2 ], . . . , E[X n ]] T are n × 1


vectors, and K is the covariance matrix (n × n, symmetric).
Example: 6.20
Exponent term calculation
Example:6.16
Example#6.18
Example#6.21
Example 6.22
Independence of Uncorrelated Jointly Gaussian Random Variables .
Suppose X 1 , X 2 , . . . , X n are jointly Gaussian random variables with
COV(X i , X j ) = 0 for i /= j . Show that X 1 , X 2 , . . . , X n are
independent random variables.

Diagonal Matrix
Diagonal Matrix

The inverse of a diagonal matrix is obtained by replacing each element in the diagonal with its reciprocal
The determinant of diagonal matrix is the product of its diagonal values.

𝑇
𝑥1 𝑚1 1/𝜎12 0 𝑥1 𝑚1
− [ ] [ − [ ]]
𝑥2 𝑚2 0 1/𝜎22 𝑥2 𝑚2

1/𝜎12 0 𝑥1 − 𝑚1
[ 𝑥1 − 𝑚1 (𝑥2 − 𝑚2)] 2 𝑥2 − 𝑚2]
[
0 1/𝜎2

(𝑥1 − 𝑚1)2 (𝑥2 − 𝑚2)2


2 +
𝜎1 𝜎22
6.5: Estimation of R.V

 Estimating the value of an inaccessible random variable ‘X’ in


terms of the observation of an accessible random variable ‘Y’
 For example: ‘X’ could be the input to a communication channel
and ‘Y’ could be the observed output.
 In prediction application ‘X’ could be a future values of the some
quantity and ‘Y’ its present values.
 Two Type of Estimators:
 MAP : Maximum A-Posterior Estimator (Bayesian Estimator)
 ML: Maximum Likelihood Estimator (Non-Bayesian Estimator)

‘Y’ is the observation (measurement), ‘X’ Unknown


Goal: Estimate ‘X’ using the observation ‘Y’
6.5.1: MAP and M L Estimators
Maximum a posteriori (MAP) estimator: finding the most probable value of
input x given the observation Y = y,
i.e., x maximizing the a posteriori probability of x given y:
P [Y = y|X = x]P [X = x]
x = arg m ax P [X = x|Y = y] = arg m ax
x x P [Y = y]

requiring that we should know the a priori probabilities P [X = x].


Maximum likelihood (ML) estimator: In situations we do not know P [X = x],
we choose the estimator value x to maximize the likelihood of the
observed value Y = y,
x = arg max P [Y = y|X = x]
x

When X and Y are continuous random variables

f X (x)
xMAP = arg m ax f X (x|y) = arg max f Y (y|x) ,
x x f Y (y)
xML = arg m ax f Y (y|x)
x
Ex 6.25 M L vs. MAP Estimators
Let X and Y be the random pair in Example 5.16. Find the MAP and ML
estimators for X in terms of Y .

From example 5.16, we know

f X ,Y (x, y) = 2e−x e−y , 0≤y≤x < ∞

f X (x) = 2e −x (1 −e −x ), 0≤x < ∞

f Y (y) = 2e−2y, 0≤y< ∞


Since y ≤ x, define a positive variable δ = x −y,
and find the ML and MAP estimators for δwith y being constant.

Example:5.32 f X (x|y) = e−( x −y) = e−δ ⇒ X̂ M A P = y

e−y e−y
f Y (y|x) = = ⇒ Xˆ M L = y
1 −e−x 1 −e−y e−( x −y)
Plot for y=2, x=[2 20]
Max occur at x=2 which is the MAP estimate
Xmap=y

Plot for y=2, x=[2 20]


Max occur at x=2 which is the ML estimate
Xmap=y
Example: MAP Estimate
Let ‘X’ be continuous random variable with following pdf (prior density)

2𝑥 0≤𝑥≤1
𝑓𝑋 𝑥 =
0 𝑜𝑤

Also the conditional likelihood pdf of y given x is: 𝑓𝑦 𝑦 𝑥 = 𝑥(1 − 𝑥)𝑦−1 y=1,2,3….

Find the MAP estimate of X given Y=3

The MAP estimate of ‘X’ is between [0 1] and it is the one which will maximize ( with y=3)

𝑓𝑦 𝑦 𝑥 𝑓𝑋 (𝑥) = (2𝑥)𝑥(1 − 𝑥)3−1 = 2𝑥 2 (1 − 𝑥)2

We can find maximum value by differentiating w.r.t ‘x’ and putting it equal to zero
Example: ML Estimate
Let ‘X’ be continuous random variable with following pdf (prior density)

2𝑥 0≤𝑥≤1
𝑓𝑋 𝑥 =
0 𝑜𝑤
𝑓𝑦 𝑦 𝑥 = 𝑥(1 − 𝑥)𝑦−1 y=1,2,3….
Also the conditional likelihood pdf of y given x is:

Find the ML estimate of X given Y=3


The ML estimate of ‘X’ is between [0 1] and it is the one which will maximize ( with y=3)
𝑓𝑦 𝑦 𝑥 = 𝑥(1 − 𝑥)3−1 = 𝑥(1 − 𝑥)2

We can find maximum value by differentiating w.r.t ‘x’ and putting it equal to zero

𝑑 𝑑 3
𝑥 1−𝑥 2 = 𝑥 − 2𝑥 2 + 𝑥 = 0 3𝑥 2 − 4𝑥 + 1 = 0 x=[1 1/6]
𝑑𝑥 𝑑𝑥

2 𝑥=1 1
2nd derivative = [6𝑥 − 4] = 𝑥𝑀𝐿 =
−3 𝑥 = 1/6 6
2𝑥 2 (1 − 𝑥)2

MAP Estimate

x(1 − 𝑥)2

xML= 1/6 = 0.333

ML Estimate
6.5.2 Minimum MSE Linear Estimator
The estimate for X is given by a function of the observation
𝑋 = 𝑔 𝑌 , and the estimation error is (most of time non-zero)

X −X̂ = X −g(Y )

The cost [C()] is a function of error, and our goal is minimizing the
expected value of the cost (expected because of random nature)

g∗ (·) = arg m in E [C(X −g(Y ))]


g(·)

When X and Y are continuous RVs, we often use the mean square error
(MSE) as the cost function:

g∗ (·) = arg m in E [(X −g(Y )) 2 ]


g(·)

𝑒 = 𝐸[(𝑋 − 𝑔(𝑌))2 ] Mean-Square Error


Min. MSE Estimator: Constant Case
Estimate the R.V by a constant ‘a’

g(Y ) = a
a∗ = arg min E[(X −a)2] = arg min{E[X 2 ] −2aE[X] + a 2 }
a a
= arg m in{ a2 −2aE [X ]} Keep only terms which has ‘a’
a

Since a2 −2aE[X] is a convex function, a∗ can be obtained by the value of


a that equates the derivative w.r.t. a to zero.

𝑑 2
𝑎 − 2𝑎𝐸 𝑋 =0 2a − 2E X = 0 𝑎∗ = 𝐸[𝑋] MMSE Estimate
𝑑𝑎

𝑒 = 𝐸[(𝑋 − 𝑎∗ ))2 ]
Minimum Mean-Square Error
𝑒 = 𝐸 (𝑋 − 𝐸[𝑋]))2 = 𝑉𝐴𝑅[𝑋]
Min. MSE Estimator: Linear Case
g(Y ) = aY + b
(a∗ , b∗ ) = arg m in E [(X −aY −b) 2 ]
a,b
Can be viewed as approximation of [X-aY] by ‘b’
From constant case,
Constant case: 𝑏 ∗ = 𝐸[𝑋 − 𝑎𝑌] (best value of ‘b’)
b∗ = E [X ] −aE [Y ] ⇒
a∗ = arg m in E [(X −aY −E [X ] + aE [Y ]) 2 ]
a
= arg m in E [{ (X −E [X ]) −a(Y −E [Y ])} 2 ] Solve for best ‘a’
a
Take the derivative w.r.t. a, equate it to 0, then
COV(X, Y ) σX
a∗ = = ρX , Y
VAR(Y ) σY
Y −E [Y ]
Xˆ = a∗Y + b∗ = ρX ,Y σX + E [X ] Linear Estimator
σY
e∗L = VAR(X)(1 −ρ2X ,Y )
Minimum MSE
𝑌 − 𝐸[𝑌]
𝑋 = 𝜌𝑋𝑌 𝜎𝑋 + 𝐸[𝑋] To Make sure that estimator has correct mean
𝜎𝑌

Zero Mean Unit Variance version of Y

Zero Mean ,Variance(X) version of Y

𝜌𝑋𝑌 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑠 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑏𝑜𝑣𝑒 𝑡𝑒𝑟𝑚

If X and Y are uncorrelated (𝜌𝑋𝑌 = 0) then the best estimate for X is E[X]

If X and Y are linearly related (𝜌𝑋𝑌 = ±1) then the best estimate for X

𝑌 − 𝐸[𝑌]
±𝜎𝑋 + 𝐸[𝑋]
𝜎𝑌
Orthogonality Condition
a∗ = arg m in E [{ (X −E [X ]) −a(Y −E [Y ])} 2 ]

d / d a [ E [{ (X −E [X ]) −a(Y −E [Y ])} 2 ]]=0

𝐸 𝑋−𝐸 𝑋 − 𝑎∗ 𝑌 − 𝐸 𝑌 𝑌−𝐸 𝑌 =0

The error of best linear estimator (the quantity inside the braces) is orthogonal to
the observation (Y-E[Y])

Error and Measurement are always orthogonal


Fundamental result in MS Estimation

You might also like