0% found this document useful (0 votes)
5 views20 pages

Estimation 2

The document discusses the Optimum Bayesian Estimator, focusing on minimizing expected costs through various estimation methods such as Minimum Mean Square Error (MMSE), Minimum Mean Absolute Error (MMAE), and Maximum A Posteriori (MAP) estimates. It provides mathematical formulations for these estimators, particularly in the context of jointly Gaussian random vectors and exponential observations. Examples illustrate how these estimates can differ based on the underlying distributions of the random variables involved.

Uploaded by

Fehad Nazir 037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

Estimation 2

The document discusses the Optimum Bayesian Estimator, focusing on minimizing expected costs through various estimation methods such as Minimum Mean Square Error (MMSE), Minimum Mean Absolute Error (MMAE), and Maximum A Posteriori (MAP) estimates. It provides mathematical formulations for these estimators, particularly in the context of jointly Gaussian random vectors and exponential observations. Examples illustrate how these estimates can differ based on the underlying distributions of the random variables involved.

Uploaded by

Fehad Nazir 037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Section 4.

Ajit K Chaturvedi

Ajit K Chaturvedi Section 4.2


Optimum Bayesian Estimator

The objective of Bayesian estimation is to find a vector


estimator
h iT
X̂(Y) = X̂1 (Y) . . . X̂i (Y) . . . X̂m (Y)

taking a value in Rm which minimizes the expected cost


E [C (X, X̂(Y))].
To characterize this estimator, we observe that the expected
cost can be expressed as
ZZ  
E [C (X, X̂(Y))] = C X, X̂(y) fX,Y (x, y)dxdy
Z Z 
= C (X, X̂(y))fX|Y (x | y)dx fY (y)dy

2 / 20
Here the posterior density of the vector X given the
observations Y is evaluated by applying the Bayes’s rule

fY|X (y | x)fX (x)


fX|Y (x | y) =
fY (y)

The marginal density fY (y) is obtained as integrating the joint


density of X and Y with respect to x, i.e.,
Z Z
fY (y) = fX,Y (x, y)dx = fY|X (y | x)fX (x)dx

3 / 20
Since fY (y) ≥ 0, the expected cost will be minimized if the
term between brackets is minimized for each y. This gives
Z
X̂(y) = arg minm C (x, x̂)fX|Y (x | y)dx
x̂∈R

Furthermore, by observing that for a fixed y, the marginal


density fY (y) appearing in the denominator of the conditional
density fX|Y (x | y) is an arbitrary scaling factor that does not
affect the outcome of the minimization, we have also
Z
X̂(y) = arg minm C (x, x̂)fX,Y (x, y)dx
x̂∈R

4 / 20
MSE
For
C (x, x̂) = ∥x − x̂∥22 ,
the minimum mean-square error (MMSE) estimate X̂MSE (y)
minimizes the conditional mean-square error given by
Z
J(x̂ | y) = ∥x − x̂∥2 fX|Y (x | y)dx

To minimize this function, we set its gradient equal to zero.


The gradient is defined as

∂J T
 
∂J ∂J
∇x̂ J = ··· ··· .
∂ x̂1 ∂ x̂i ∂ x̂m
Therefore,
Z
∇x̂ J(x̂ | y ) = 2 (x̂ − x)fX|Y (x | y)dx = 0

5 / 20
We get Z Z
x̂ fX|Y (x | y)dx = xfX|Y (x | y)dx

Area under a density curve is unity, therefore


Z
fX|Y (x | y)dx = 1

This leads to
Z
X̂MSE (y) = xfX|Y (x | y)dx =?

It is equal to
E [X | Y = y]
Thus the mean-square error estimate X̂MSE (Y) is just the
conditional mean of X given Y.

6 / 20
MMSE

The conditional covariance matrix KX|Y (y) of X given Y = y


is given by:
h i
KX|Y (y) = E (X − E [X | y])(X − E [X | y])T | Y = y
Z
= (x − E [X | y])(x − E [X | y])T fX|Y (x | y)dx.

Recall Z
J(x̂ | y) = ∥x − x̂∥2 fX|Y (x | y)dx

Therefore, the performance of the estimator can be expressed


in terms of KX|Y (y) as:
  
J X̂MSE (y) | y = tr KX|Y (y)

7 / 20
Averaging with respect to Y, the minimum mean-square error
(MMSE) can be expressed as

MMSE = E ∥X − E [X | Y]∥22 = tr (KE ) ,


 

Here the error covariance matrix KE is given by


h i
KE = E (X − E [X | Y])(X − E [X | Y])T
ZZ
= (x − E [X | y])(x − E [X | y])T fX,Y (x, y)dxdy.

8 / 20
MAE Estimate
For
C (x, x̂) = ||x − x̂||1 ,
the minimum mean absolute error estimate (MMAE)
X̂MAE (y) minimizes the objective function
Z
J(x̂|y) = ||x − x̂||1 fX|Y (x|y)dx.

Taking the partial derivative of J with respect to x̂i for


1 ≤ i ≤ m gives
Z
∂J
= − sgn(xi − x̂i )fXi |Y (xi |y)dxi ,
∂ x̂i
Here the sgn function is defined as

1 for z ≥ 0
sgn(z) =
−1 for z < 0
9 / 20
Setting ∂∂Jx̂i = 0, we get
R x̂i R∞
−∞ fXi |Y (xi |y)dy − x̂i fXi |Y (xi |y)dy = 0

Z x̂i Z ∞
fXi |Y (xi |y)dy = fXi |Y (xi |y)dy
−∞ x̂i

Noting that the total probability mass of the density


fXi |Y (xi |y) equals one, we get
Z x̂i Z ∞
1
fXi |Y (xi |y)dy = fXi |Y (xi |y)dy =
−∞ x̂i 2
Recall that the median of the probability density of a RV is
the point on the real axis where half of the probability mass is
located on one side, and the other half on the other side.
Thus, for each i, the ith entry of XMAE (y) is the median of
the conditional density fXi |Y (xi |y).

10 / 20
MAP Estimate
The estimator corresponding to
C (x, x̂) = Lϵ (x − x̂),
minimizes the objective function
Z
J(x̂|y ) = Lϵ (x − x̂)fX|Y (x|y)dx

This can be simplified as


Z
J(x̂|y ) = 1 − fX|Y (x|y)dx
∥x−x̂∥∞ <ϵ

for ϵ vanishingly small.


This is equivalent to maximizing
Z
fX|Y (x|y)dx ≈ fX|Y (x̂|y) (2ϵ)m
∥x−x̂∥∞ <ϵ

11 / 20
MAP Estimate

Therefore,
X̂MAP (y) = arg maxm fX|Y (x|y)
x∈R

corresponds to the maximum, which is also called the mode of


the a posteriori density fX|Y (x|y). This is why this estimator is
called MAP estimate.
This choice makes sense only if the conditional density
fX|Y (x|y) has a dominant peak.
When the posterior density has several peaks of similar size,
selecting the largest one and ignoring the others may lead to
unacceptably large errors.

12 / 20
Example 4.1: Jointly Gaussian Random Vectors

Let X ∈ Rm and Y ∈ Rn be jointly Gaussian, so that


 
X
= N(m, K)
Y

with    
mX E [X]
m= =
mY E [Y]
and
    
KX KXY X − mX 
(X − mX )T )T

K= =E (Y − mY
KYX KY Y − mY

13 / 20
The conditional density of X given Y is also Gaussian.
It is given by
1
fX|Y (x|y) = 1/2
(2π)m/2 KX|Y
 
1 T −1
exp − (x − mX|Y ) KX|Y (x − mX|Y )
2
Here
mX|Y = mX + KX|Y K−1
Y (Y − mY )
r σ (y −η )
(recall E [x|y ] = ηx + x σy y )
and
KX|Y = KX − KXY K−1
Y KYX
mX|Y and KX|Y denote the conditional mean vector and the
conditional covariance matrix of X given Y.
This can be written compactly as:
fX|Y (x|y) ∼ N(mX|Y , KX|Y )

14 / 20
Then
X̂MSE (Y ) = mX|Y = mX + KXY K−1
Y (Y − mY )
It can be seen that the estimate depends linearly on the
observation vector Y while the conditional error covariance
matrix KX|Y does not depend on the observation vector Y.
It can be seen that that the error covariance matrix KE is the
same as KX|Y i.e.
KE = KX − KXY K−1
Y KYX
Since the median of a Gaussian distribution equals its mean,
and since the maximum of a Gaussian density is achieved at
its mean, we have also
X̂MAE (Y) = X̂MAP (Y) = mX|Y
Thus, in the Gaussian case the MSE, MAE and MAP
estimates coincide.
However, this last property does not hold in general as can be
seen from the following example.
15 / 20
Example 4.2: Exponential observation of an exponential
parameter
Assume that Y is an exponential random variable with
parameter X , so that

x exp(−xy ) for y ≥0
fY |X (y |x) =
0 otherwise

The parameter X is itself exponentially distributed with


parameter a > 0, i.e.,

a exp(−ax) for x ≥0
fX (x) =
0 x <0

Then the joint density of X and Y can be expressed as

fX ,Y (x, y ) = fY |X (y |x)fX (x) = ax exp(−(y + a)x)u(x)u(y )

where u(·) denotes the unit step function.


16 / 20
We find the marginal density using integration by parts:
Z
fY (y ) = fX ,Y (x, y )fX (x)dx
Z ∞
= axexp(−(y + a)x)dxu(y )
0
a
= u(y )
(y + a)2

The conditional density of X given Y = y is given by

fX ,Y (x, y )
fX |Y (x|y ) = = (y + a)2 x exp(−(y + a)x)u(x)
fY (y )

17 / 20
We find the estimate using integration by parts:
Z ∞
2
X̂MSE (y ) = xfX |Y (x|y )dx =
−∞ y +a
To obtain the MAE estimate, we need to solve:
Z x̂ Z x̂
1 2
= fX |Y (x|y )dx = (y + a) x exp(−(y + a)x)dx
2 −∞ 0
= [1 + (a + y )x̂] exp(−(y + a)x̂)

Replacing (y + a)x̂ by c in this equation, we obtain


1
(1 + c) exp(−c) =
2
c ≈ 1.68 is the unique solution of the equation.
The MAE estimate is given by:
c
X̂MAE (y ) =
y +a
18 / 20
Finally, to find the MAP estimate, i.e., the maximum of the
conditional density, we set the derivative equal to zero.


f (x|y ) = (y + a)2 exp(−(y + a)x)[1 − (y + a)x] = 0
∂x X |Y
which yields
1
X̂MAP (y ) =
y +a
So in this example, the MAP, MAE and MSE estimates take
different values.

19 / 20
20 / 20

You might also like