Estimation Theory Overview
Estimation Theory Overview
Kay,
)
We distinguish the following modelling assumptions to estimate a p-dimensional parameter vector x from N observations y:
Classical Estimation Approaches (Parameters x assumed to be deterministic)
1. Cramr-Rao Lower Bound (CRLB) on mean squared estimation error
Data Model / Assumptions
PDF p(y; x) of the observations y for fixed parameter(s) x is known
Estimator
ln fY ( y; x)
I ( x) g ( y ) x
If the equality condition for the CRLB
x
is satisfied i.e. the PDF can be factored into a p p matrix I(x) dependent only on the
parameters and a function g(y) of the data/observations the optimal estimator is given
as x g ( y )
Wolfgang Rave
Slide 1
i ,i
i 1, 2 p
where
I ( x)i , j
ln fY ( y; x) ln fY ( y; x)
E
x
x
i
j
Comment
An efficient estimator may not exist and hence the approach may fail
February 4, 2015
Wolfgang Rave
Slide 2
Estimator
Step 1: Find a sufficient statistic T(y) by factoring the PDF as f(y; x) = g(T(y), x) h(y)
where T(y) is a p-dimensional function of y, g is a function depending only on T and x,
and h depends only on y.
Step 2: If E[T(y)] = x, then x = T(y) . If not, we must find a p-dimensional function g in
such a way that E[g(T)] = x; then the estimator reads x = g(T).
Optimality/Error Criterion
x is the MVU estimator.
Performance
x for i = 1, 2, ... ,p is unbiased. The variance depends on the PDF no general formula
is available
Wolfgang Rave
Slide 3
Estimator
x H T Rv1H H T Rv1 y
1
linear estimator
Optimality/Error Criterion
x for i = 1, 2, ... ,p has the minimum variance of all unbiased estimators that are
linear in the observations y.
Performance
1
x for i = 1, 2, ... ,p is unbiased. The variance is var xi H T Rv1H
i 1, 2 p
ii
Comments: If v is a Gaussian random vector so that v (0, Rv), then x is also the
MVU estimator (for all linear or nonlinear functions of y) .
February 4, 2015
Wolfgang Rave
Slide 4
Optimality/Error Criterion
Not optimal in general. Under certain conditions on the PDF, however, the MLE is efficient for large data records or as N (asymptotically). Hence, asymptotically it is the
MVU estimator.
Performance
for finite N performance depends on PDF no general formula is available. Asymptotically under certain conditions it holds that x ( x, I 1 ( x))
Comments: If an MVU estimator exists, the maximum likelihood estimator will produce it.
February 4, 2015
Wolfgang Rave
Slide 5
Estimator
x is the value of x that minimizes
J ( x) y s ( x)
N 1
y s( x) y[n] s[n; x]
n 0
Optimality/Error Criterion
None in general, because noise statistics are ignored and only the squared distance between observations and the assumed data model is minimized
February 4, 2015
Wolfgang Rave
Slide 6
Estimator
x Ex x | y
f x | yp(x)
x E x Rxy Ry1 y E y
f y | x f x
f y | x f x
f X,Y ( x , y ) f X |Y ( x | y ) fY ( y ) fY |X ( y | x ) f X ( x )
Wolfgang Rave
Slide 7
i ,i
Comment
In the non-Gaussian case, this will be difficult to implement
February 4, 2015
Wolfgang Rave
Slide 8
Estimator
x is the value of x that maximizes f(x|y) or, equivalently, the value that maximizes f(y|x)
f(x). If y, x are jointly Gaussian then x is given by
x E x Rxy Ry1 y E y
(but almost always the solution for the estimator can not be given in closed form)
Performance
Depends on PDF no general formula is available.
Comment
For PDFs whose mean and mode (the location of the maximum) are the same, the
MMSE and MAP estimators will be identical, i.e., the Gaussian PDF, for example.
February 4, 2015
Wolfgang Rave
Slide 9
Estimator
Rx
R
yx
Rxy
Ry
x E x Rxy Ry1 y E y
functions of y
Wolfgang Rave
Slide 10
Linear Model
When the linear model can be used to describe the data, the various estimation approaches yield closed form estimators.
In fact, by assuming the linear model we are able to determine the optimal estimator as
well as its performance for both the classical and Bayesian approaches. We first consider the classical approach. The classical general linear model assumes the data to be
described by
y = Hx + v
where x is an N 1 vector of observations, H is a known N p observation matrix (N > p)
of rank p, x is a p 1 vector of parameters to be estimated, and v is an N 1 noise vector
with PDF (0, Rv).
The PDF of the real-valued Gaussian random vector y is
f y ( y; x )
February 4, 2015
1
T
1
exp y Hx Rv-1 y Hx
det Rv
2
Wolfgang Rave
Slide 11
ln f y ( y; x)
1
I ( x) g ( y ) x H T Rv-1 H H T Rv1 H H T Rv1 y - x H T Rv-1 H x x
x
x
1
where x H T Rv1 H H T Rv1 y
so that is the MVU estimator (and also efficient) and has minimum variance
given by the diagonal elements of the error covariance matrix
Rx I ( x) H R H
1
1
v
2. Rao-Blackwell-Lehmann-Scheffe
a factorization of the PDF yields
f y ( y; x )
1
T
T
1
h( y )
1
g T ( y ), x
February 4, 2015
Wolfgang Rave
Slide 12
y Hx
This leads to
Rv-1 y Hx
x H R H H T Rv1 y
T
1
v
Wolfgang Rave
Slide 13
J ( x) y s ( x)
y s ( x)
T
y Hx y Hx
T
which is identical
to
the
maximum
likelihood
procedure,
if
R
v
1
x H T Rv1 H H T Rv1 y, which is also the MVU estimator. If Rv 2I, then x will
not be the MVU estimator. However, if we minimize the weighted LS criterion
T
J ( x) y Hx W y Hx ,
with the weighting matrix W = Rv-1. The resultant estimator will be the MVU estimator.
1
T 1
If v is not Gaussian, the weighted LSE is still x H Rv H H T Rv1 y , but it
would only be the BLUE (and not attain the CRLB).
February 4, 2015
Wolfgang Rave
Slide 14
1
T
1
exp y Hx H T Rv-1H y Hx
det Rv
2
1
T
1
exp x x Rx-1 x x
det Rx
2
The posterior PDF fy(x | y) is also Gaussian with mean and covariance given by
Ex x | y x Rx H
HR H
x
Rv
y H x
x R H R H H T Rv-1 y H x
-1
x
-1
v
(the equivalent relations are obtained by applying the matrix inversion lemma)
February 4, 2015
Wolfgang Rave
Slide 15
Bmse x i Rx| y
i ,i
Wolfgang Rave
Slide 16
-1
v
R H R H H T Rv-1 y
-1
x
-1
v
H R H H T Rv-1 y
T
-1
v
which is recognized as having the identical form as the MVU estimator for the classical
general linear model.
Of course, the estimators cannot really be compared since they have been derived under
different data modeling assumptions. However, this apparent equivalence has often
been identified by asserting that the Bayesian approach with no prior information is
equivalent to the classical approach.
February 4, 2015
Wolfgang Rave
Slide 17