100% found this document useful (1 vote)
135 views17 pages

Estimation Theory Overview

The document provides an overview of estimation theory and discusses several classical and Bayesian approaches to parameter estimation from observations. It distinguishes between classical approaches that assume parameters are deterministic and Bayesian approaches that treat parameters as random variables. For the classical linear model, closed-form optimal estimators can be determined, such as the minimum variance unbiased estimator, which is the best linear unbiased estimator and achieves the Cramér-Rao lower bound. Bayesian approaches include the minimum mean square error estimator, maximum a-posteriori estimator, and linear minimum mean square error estimator, which are also optimal under certain conditions such as joint Gaussian distributions.

Uploaded by

james
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
135 views17 pages

Estimation Theory Overview

The document provides an overview of estimation theory and discusses several classical and Bayesian approaches to parameter estimation from observations. It distinguishes between classical approaches that assume parameters are deterministic and Bayesian approaches that treat parameters as random variables. For the classical linear model, closed-form optimal estimators can be determined, such as the minimum variance unbiased estimator, which is the best linear unbiased estimator and achieves the Cramér-Rao lower bound. Bayesian approaches include the minimum mean square error estimator, maximum a-posteriori estimator, and linear minimum mean square error estimator, which are also optimal under certain conditions such as joint Gaussian distributions.

Uploaded by

james
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Overview on Estimation Theory (see S. M.

Kay,
)

Fundamentals of Statistical Signal Processing: Estimation Theory Ch. 14

We distinguish the following modelling assumptions to estimate a p-dimensional parameter vector x from N observations y:
Classical Estimation Approaches (Parameters x assumed to be deterministic)
1. Cramr-Rao Lower Bound (CRLB) on mean squared estimation error
Data Model / Assumptions
PDF p(y; x) of the observations y for fixed parameter(s) x is known

Estimator

ln fY ( y; x)
I ( x) g ( y ) x
If the equality condition for the CRLB
x
is satisfied i.e. the PDF can be factored into a p p matrix I(x) dependent only on the
parameters and a function g(y) of the data/observations the optimal estimator is given
as x g ( y )

Optimality / Error Criterion


x achieves the CRLB, the lower bound on the variance for any unbiased estimator
(hence it is said to be efficient), and is therefore the minimum variance unbiased (MVU)
estimator. The MVU estimator is the one whose variance for each component is minimum among all unbiased estimators
February 4, 2015

Wolfgang Rave

Slide 1

Overview on Estimation Theory


Performance
i 1, 2 p
The estimator is unbiased, i.e. E x i xi
and has minimum (error) variance (mean squared estimation error)
var x i I 1 x

i ,i

i 1, 2 p

where

I ( x)i , j

ln fY ( y; x) ln fY ( y; x)
E

x
x

i
j

Comment
An efficient estimator may not exist and hence the approach may fail

February 4, 2015

Wolfgang Rave

Slide 2

Overview on Estimation Theory


2. Rao-Blackwell-Lehmann-Scheffe Theorem (how to find a sufficient statistic)
Data Model / Assumptions
PDF f(y; x) is known

Estimator
Step 1: Find a sufficient statistic T(y) by factoring the PDF as f(y; x) = g(T(y), x) h(y)
where T(y) is a p-dimensional function of y, g is a function depending only on T and x,
and h depends only on y.
Step 2: If E[T(y)] = x, then x = T(y) . If not, we must find a p-dimensional function g in
such a way that E[g(T)] = x; then the estimator reads x = g(T).

Optimality/Error Criterion
x is the MVU estimator.
Performance
x for i = 1, 2, ... ,p is unbiased. The variance depends on the PDF no general formula
is available

Comments: The "completeness" of the sufficient statistic must be checked. A p-dim.


sufficient statistic may not exist, so this method may fail.
February 4, 2015

Wolfgang Rave

Slide 3

Overview on Estimation Theory


3. Best Linear Unbiased Estimator (BLUE)
Data Model / Assumptions
E[y] = H x
(linear data model)
where H is a known N p matrix and Ry the covariance matrix of the observations y is
known. Equivalently we have y Hx v
where E[v]=0 and Rv = Ry.

Estimator
x H T Rv1H H T Rv1 y
1

linear estimator

Optimality/Error Criterion
x for i = 1, 2, ... ,p has the minimum variance of all unbiased estimators that are
linear in the observations y.

Performance

1
x for i = 1, 2, ... ,p is unbiased. The variance is var xi H T Rv1H
i 1, 2 p

ii
Comments: If v is a Gaussian random vector so that v (0, Rv), then x is also the
MVU estimator (for all linear or nonlinear functions of y) .
February 4, 2015

Wolfgang Rave

Slide 4

Overview on Estimation Theory


4. Maximum Likelihood Estimator (MLE)
Data Model / Assumptions
PDF p(y; x) is known
Estimator
x is the value of x maximizing Pr(y|x) consistent with f(y; x), where y is replaced by the
observed data samples

Optimality/Error Criterion
Not optimal in general. Under certain conditions on the PDF, however, the MLE is efficient for large data records or as N (asymptotically). Hence, asymptotically it is the
MVU estimator.

Performance
for finite N performance depends on PDF no general formula is available. Asymptotically under certain conditions it holds that x ( x, I 1 ( x))

Comments: If an MVU estimator exists, the maximum likelihood estimator will produce it.
February 4, 2015

Wolfgang Rave

Slide 5

Overview on Estimation Theory


5. Least Squares Estimator (LSE)
Data Model / Assumptions
y[n] = s[n; x] + v[n]
n = 0, 1, , N-1
where the signal s[n; x] depends explicitly on the unknown parameters (typically a functional model or ansatz for the signal has to be chosen). Equivalently the model is
y s( x) v
where s is a known N-dimensional function of x. The noise (perturbation) v has zero mean

Estimator
x is the value of x that minimizes

J ( x) y s ( x)

N 1

y s( x) y[n] s[n; x]

n 0

Optimality/Error Criterion
None in general, because noise statistics are ignored and only the squared distance between observations and the assumed data model is minimized

Performance: Depends on PDF of v no general formula is available


Comments: The fact that a LS error criterion is minimized does not in general translate
into minimizing the estimation error. If v is a Gaussian random vector with v (0, Rv),
then the LSE is equivalent to the MLE

February 4, 2015

Wolfgang Rave

Slide 6

Overview on Estimation Theory

Bayesian Estimation Approaches (Parameters x assumed to be random!)


6. Minimum Mean Square Error (MMSE) Estimator
Data Model / Assumptions
The joint PDF of y, x or f(y, x) is known, where x is now considered to be a random
vector. Usually f(y|x) is specified as the data model and f(x) as the prior PDF for x, so that
f(y, x) = f(y|x) f(x).

Estimator

remember the meaning of posterior:


after observing y, we determine the
most probable value of x

x Ex x | y

f x | yp(x)

where the expectation is with respect to the posterior PDF = p(y|x)

If y, x are jointly Gaussian, the estimator becomes linear

x E x Rxy Ry1 y E y

f y | x f x

f y | x f x

The rule of Bayes: a conservation law for probabilities

f X,Y ( x , y ) f X |Y ( x | y ) fY ( y ) fY |X ( y | x ) f X ( x )

Optimality / Error Criterion

x minimizes the Bayesian MSE


2
Bmse x i E xi x i
i 1, 2 p

where the expectation is with respect to f(y, xi)


February 4, 2015

Wolfgang Rave

Slide 7

Overview on Estimation Theory


Performance
The error i xi x i has zero mean ( estimator is unbiased) and variance
var i Bmse x i Rx| y f ( y )dy
i ,i

where Rx|y is the covariance matrix of x conditioned on y, or of the posterior


PDF f(y|x). If y, x are jointly Gaussian, then the error is Gaussian with zero
mean and variance
var i Bmse x i Rx Rxy Ry1Ryx

i ,i

Comment
In the non-Gaussian case, this will be difficult to implement

February 4, 2015

Wolfgang Rave

Slide 8

Overview on Estimation Theory


7. Maximum A-Posteriori (MAP) Estimator
Data Model / Assumptions
Joint PDF of y, x or f(y, x) is known; same as for the MMSE estimator

Estimator
x is the value of x that maximizes f(x|y) or, equivalently, the value that maximizes f(y|x)
f(x). If y, x are jointly Gaussian then x is given by
x E x Rxy Ry1 y E y
(but almost always the solution for the estimator can not be given in closed form)

Optimality / Error Criterion


Minimizes the hit or miss cost function (in contrast to the MMSE not a distance but a
probability is optimized; interesting for non-unimodal densities)

Performance
Depends on PDF no general formula is available.

Comment
For PDFs whose mean and mode (the location of the maximum) are the same, the
MMSE and MAP estimators will be identical, i.e., the Gaussian PDF, for example.
February 4, 2015

Wolfgang Rave

Slide 9

Overview on Estimation Theory


8. Linear Minimum Mean Square Error (LMMSE) Estimator
Data Model / Assumptions
The first two moments of the joint PDF f(y, x) are known, i.e. the mean and covariances
E x
E y ,

Estimator

Rx
R
yx

Rxy
Ry

x E x Rxy Ry1 y E y

Optimality / Error Criterion


2
x has the minimmum the Bayesian MSE E xi x i of all estimators that are linear

functions of y

Performance: The error i xi x i has zero mean and variance


var i Bmse x i Rx Rxy Ry1Ryx
i ,i
Comment: If y, x are jointly Gaussian, this is identical to the MMSE and MAP estimators. (The Kalman filter can be considered as a particular implementation of the MMSE
estimator within this framework.)
February 4, 2015

Wolfgang Rave

Slide 10

Linear Model
When the linear model can be used to describe the data, the various estimation approaches yield closed form estimators.
In fact, by assuming the linear model we are able to determine the optimal estimator as
well as its performance for both the classical and Bayesian approaches. We first consider the classical approach. The classical general linear model assumes the data to be
described by
y = Hx + v
where x is an N 1 vector of observations, H is a known N p observation matrix (N > p)
of rank p, x is a p 1 vector of parameters to be estimated, and v is an N 1 noise vector
with PDF (0, Rv).
The PDF of the real-valued Gaussian random vector y is

f y ( y; x )

February 4, 2015

1
T
1

exp y Hx Rv-1 y Hx
det Rv
2

Wolfgang Rave

Slide 11

Closed-Form Estimators for linear Model


1. Cramr-Rao Lower Bound (CRLB)

ln f y ( y; x)
1
I ( x) g ( y ) x H T Rv-1 H H T Rv1 H H T Rv1 y - x H T Rv-1 H x x

x
x

1
where x H T Rv1 H H T Rv1 y
so that is the MVU estimator (and also efficient) and has minimum variance
given by the diagonal elements of the error covariance matrix

Rx I ( x) H R H
1

1
v

2. Rao-Blackwell-Lehmann-Scheffe
a factorization of the PDF yields
f y ( y; x )

1
T
T
1

exp x x H T Rv-1H x x exp y Hx Rv-1 y Hx


N

2
2 det Rv 2

h( y )
1

g T ( y ), x

where x H T Rv1 H H T Rv1 y .


The sufficient statistic is T ( y ) x , which can be shown to be unbiased as well
as complete. Thus, it is the MVU estimator.
1

February 4, 2015

Wolfgang Rave

Slide 12

Closed-Form Estimators for linear Model


3. Best Linear Unbiased Estimator (BLUE)
Now we will have the identical estimator as in the previous two cases since x is
already linear in x. However, if v were not Gaussian, x would still be the BLUE
but not the MVU estimator. Note that the data modeling assumption for the
BLUE is satisfied by the general linear model

4. Maximum Likelihood Estimator (MLE)


To find the MLE we maximize fy(y; x) as given above, or equivalently, we
minimize the quadratic form

y Hx

This leads to

Rv-1 y Hx

x H R H H T Rv1 y
T

1
v

which we know to be the MVU estimator. Thus, as expected, since an efficient


estimator exists (satisfies the CRLB), the maximum likelihood procedure produces it.
February 4, 2015

Wolfgang Rave

Slide 13

Closed-Form Estimators for linear Model


5. Least Squares Estimator (LSE)
Viewing Hx as the signal vector s(x), we must minimize

J ( x) y s ( x)

y s ( x)
T
y Hx y Hx
T

2I. The LSE is


=

which is identical
to
the
maximum
likelihood
procedure,
if
R
v
1
x H T Rv1 H H T Rv1 y, which is also the MVU estimator. If Rv 2I, then x will
not be the MVU estimator. However, if we minimize the weighted LS criterion
T
J ( x) y Hx W y Hx ,

with the weighting matrix W = Rv-1. The resultant estimator will be the MVU estimator.
1
T 1
If v is not Gaussian, the weighted LSE is still x H Rv H H T Rv1 y , but it
would only be the BLUE (and not attain the CRLB).

February 4, 2015

Wolfgang Rave

Slide 14

Closed-Form Estimators for linear Model


Proceeding to the Bayesian linear model, we assume that y = Hx + v, where x is an N 1
vector of observations, H is a known N p observation matrix (N > p) of rank p, x is a
p1 random vector to be estimated, and v is an N 1 noise vector with PDF (0, Rv).
The conditional PDF of y is
f y|x ( y | x)

and the prior PDF of x is


f x ( x)

1
T
1

exp y Hx H T Rv-1H y Hx
det Rv
2

1
T
1

exp x x Rx-1 x x
det Rx
2

The posterior PDF fy(x | y) is also Gaussian with mean and covariance given by
Ex x | y x Rx H

HR H
x

Rv

y H x

Two formulas for the


optimal estimator

x R H R H H T Rv-1 y H x
-1
x

-1
v

Rx| y Rx Rx H T HRx H T Rv HRx Rx1 H T Rv1H


1

and 2 formulas for the


associated error covariance matrix; only 2nd
order moments needed!

(the equivalent relations are obtained by applying the matrix inversion lemma)
February 4, 2015

Wolfgang Rave

Slide 15

Closed-Form Estimators for linear Model


6. Minimum Mean Square Error Estimator (MMSE)
The MMSE estimator is just the mean Ex[x|y] of the posterior PDF fy(x | y)
2
stated before. The minimum Bayesian MSE, i.e. E x - x , is

Bmse x i Rx| y
i ,i

since Rx|y does not depend on y.


7. Maximum A-Posteriori Estimator (MAP)
Because the location of the peak (mode) of the Gaussian PDF is equal to the
mean, the MAP estimator is identical to the MMSE estimator
8. Linear Minimum Mean Square Error Estimator (LMMSE)
Since the MMSE estimator is linear in y, the LMMSE estimator is given by the
same expression in this important special case. Hence for the Bayesian linear
model, the MMSE, MAP and LMMSE estimators are identical!
February 4, 2015

Wolfgang Rave

Slide 16

Closed-Form Estimators for linear Model


A last comment concerns the form of the estimator when there is no prior information.
This may be modeled by letting Rx-1 0. Then we have in the zero mean case
Ex x | y x R H R H H T Rv-1 y H x
-1
x

-1
v

R H R H H T Rv-1 y
-1
x

-1
v

H R H H T Rv-1 y
T

-1
v

which is recognized as having the identical form as the MVU estimator for the classical
general linear model.
Of course, the estimators cannot really be compared since they have been derived under
different data modeling assumptions. However, this apparent equivalence has often
been identified by asserting that the Bayesian approach with no prior information is
equivalent to the classical approach.

February 4, 2015

Wolfgang Rave

Slide 17

You might also like