0% found this document useful (0 votes)

107 views

Advanced Signal Processing Introduction To Estimation Theory

This document provides an introduction to estimation theory. It discusses key concepts like estimators, estimates, bias, variance, and performance metrics like mean square error. Estimation theory aims to find minimum variance unbiased estimators and extend these concepts to vector parameters. It also discusses statistical goodness and applications in areas like communications, biomedicine, and more. An example is provided on model order selection for signal estimation. The document emphasizes that specifying the parametrized probability density function of the data is critical for determining an optimal estimator.

Uploaded by

Alex Stihi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

Advanced Signal Processing Introduction To Estimation Theory

Uploaded by

Alex Stihi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Advanced Signal Processing

Introduction to Estimation Theory

Danilo Mandic,
room 813, ext: 46271

Department of Electrical and Electronic Engineering

Imperial College London, UK
[email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic

c D. P. Mandic Advanced Signal Processing 1
Introduction to Estimation: Aims of this lecture
◦ Notions of an Estimator, Estimate, Estimandum
◦ The bias and variance in statistical estimation theory, asymptotically
unbiased and consistent estimators
◦ Performance metrics, such as the Mean Square Error (MSE)
◦ The bias–variance dilemma and the MSE, feasible MSE estimators
◦ A class of Minimum Variance Unbiased (MVU) estimators, that is, out
of all unbiased estimators find those with the lowest possible variance
◦ Extension to the vector parameter case
◦ Statistical goodness of an estimator, the role of noise
◦ Enabling technology for many applications: radar and sonar (range
and azimuth), image analysis (motion estimation), speech (features in
recognition and identification), seismics (oil reservoirs), communications
(equalisation, symbol detection), biomedicine (ECG, EEG, respiration)

c D. P. Mandic Advanced Signal Processing 2
An example from Lecture 2: Optimality in model order
selection (under- vs. over-fitting)
Original AR(2) process x[n] = −0.2x[n − 1] − 0.9x[n − 2] + w[n],
w[n] ∼ N (0, 1), estimated using AR(1), AR(2) and AR(20) models:

Original and estimated signals AR coefficients

5 0.2

−0.2
0 −0.4 Original AR(2) signal
AR( 1), Error=5.2627
−0.6
AR( 2), Error=1.0421
−0.8 AR( 20), Error=1.0621
−5
360 370 380 390 400 410 0 5 10 15 20

R
Time [sample] Coefficient index

R Can we put this into a bigger ”estimation theory” framework?

Can we quantify the “goodness” of an estimator (bias, variance, prediction
error, optimality, scalability, sensitivity, sufficient statistics)?

c D. P. Mandic Advanced Signal Processing 3
Discrete–time estimation problem
(try also the function specgram in Matlab # it produces the TF diagram below)

Consider e.g. the estimation of

Time–Freq. spectrogram of speech a fundamental frequency, f0, of a
M aaaa tl aaa b speaker from this TF spectrogram
The signal s[n; f0, Φ0] is in noise
x[n] = s[n; f0, Φ0] + w[n]
◦ Each time we observe x[n] it
contains the desired s[n] but also
a different realisation of noise w[n]
◦ Therefore, the estimated fˆ0 and
Φ̂0 are random variables
Our goal: Find an estimator which
maps the data x into the estimates
fˆ0 = g1(x) and Φ̂0 = g2(x)
The RVs fˆ0, Φ̂0 are best described via a
→ Horizontal axis: time
prob. model which depends on: structure
↑ Vertical axis: frequency
of s[n], pdf of w[n], and form of g(x).

c D. P. Mandic Advanced Signal Processing 4
Statistical estimation problem (learning from data)
What captures all the necessary statistical information for an estimation problem?
Problem statement: Given an N -point dataset, x[0], x[1], . . . , x[N − 1],
which depends on an unknown scalar parameter, θ, an estimator is
defined as a function, g(·), of the dataset, that is

θ̂ = g x[0], x[1], . . . , x[N − 1]
which may be used to estimate θ. (single parameter or “scalar” case)
Vector case: Analogously to the scalar case, we seek to determine a set of
parameters, θ = [θ1, . . . , θp]T , from data samples x = [x[0], . . . , x[N − 1]]T
such that the values of these parameters would yield the highest
probability of obtaining the observed data. This can be formalised as
max p(x; θ) where p(x; θ) reads: “p(x) parametrised by θ00
span θ

There are essentially two alternatives to estimate the unknown θ

◦ Classical estimation. Unknown parameter(s) is deterministic with no
means to include a priori information about θ (minimum var., ML, LS)
◦ Bayesian estimation. Parameter θ is a random variable, which allows
us to use prior knowledge on θ (Wiener and Kalman filters, adaptive SP)

c D. P. Mandic Advanced Signal Processing 5
The need for a PDF of the data, parametrised by θ
(really, just re–phrasing the previous slide)

Mathematical statement of a general estimation problem:

T
From measured data x = x[0], x[1], . . . , x[N − 1]
↑ an N-dimensional random vector
Find the unknown (vector) parameter θ = [θ1, θ2, . . . , θN −1]T
↑ θ is not random
Q: What captures all the statistics needed for successful estimation of θ
A: It has to be the N-dimensional PDF of the data, parametrised by θ

So, it is p(x ; θ) that contains all the information needed

RR ↑ we will use p(x ; θ) to find θ̂ = g(x)

When we know this PDF, we can design optimal estimators
In practice, this PDF is not given, our goal is to choose a model which:
◦ captures the essence of the signal generating physical model,
◦ leads to a mathematically tractable form of an estimator

c D. P. Mandic Advanced Signal Processing 6
Joint pdf pXY (x, y) versus parametrised pdf p(x; θ)
We will use p(x; θ) to find θ̂ = g(x)
Joint pdf p(x, y) Parametrised pdf p(x[0]; A)

0.3

p(x[0]; A)
0.2

0.1

0
10
5
0
0
A -10 -5
x[0]

The parametrised p(x; θ) should be

looked at as a function of θ for a fixed
value of observed data x
Right: For x[0]=A+w[0], if we
observe x[0] = 3, then p(x[0] =
3; A) is a slice of the parametrised
p(x[0]; A) for a fixed x[0] = 3

c D. P. Mandic Advanced Signal Processing 7
The statistical estimation problem
First step: To model, mathematically, the data
Consider a single observation of a DC level, A, in WGN, w (that is θ = A)
x[0] = A + w[0] where w[0] ∼ N (0, σ 2). Then x[0] ∼ N (A, σ 2)

The“parametrised” pdf, p(x[0]; θ) = p(x[0]; A), is obviously Gaussian with

the mean of A parametrisation affects the mean of p(x[0]; A).
Example 1: For N = 1, and with θ denoting the mean value, a generic
form of p(x[0]; θ) for the class of Gaussian parametrised PDFs is given by

R
1
1 2

p(x[0]; θi) = √
2
exp − 2σ2 (x[0] − θi) i = 1, 2
2πσ
Clearly, the
observed value
of x[0] critically
impacts upon the
likely value of the
parameter θ (here,
θ 2 =A 2 θ 1 =A 1 x[0] the DC level A).

c D. P. Mandic Advanced Signal Processing 8
Estimator vs. Estimate
specification of the PDF is critical to determining a good estimator

An estimator is a rule, g(x), that assigns a value to the parameter θ from

each realisation of x = x = [x[0], . . . , x[N − 1]]T .
An estimate of the true value of θ, also called ’estimandum’, is obtained

R for a given realisation of x = [x[0], . . . , x[N − 1]]T in the form θ̂ = g(x).

Upon establishing the parametrised p(x; θ), the estimate θ̂ = g(x) itself is
then viewed as a random variable and has a pdf of its own, p(θ̂).
Example 2: Estimate a DC level A in WGN.

x[0] = A + w[0], w[0] ∼ N (0, σ 2)

◦ The mean of p(Â) measures the centroid

◦ The variance of p(Â) measures the spread
of the pdf around the centroid
PDF concentration ↑ =⇒ Accuracy ↑
This pdf displays the quality of performance.

c D. P. Mandic Advanced Signal Processing 9
Example 3: Finding the parameters of a straight line
recall that we have n = 0, . . . , N − 1 observed points in the vector x

In practice, the chosen PDF should fit the problem set–up and incorporate
any “prior” information; it must also be mathematically tractable.
Example: Assume that “on the Data: Straight line embedded in
average” data values are increasing random noise w[n] ∼ N (0, σ 2)

x[n] x[n] = A + Bn + w[n]

noisy line
= s[n; A, B] + w[n]
h PN −1 i
1 2
1 − (x[n]−A−Bn)
p(x; A, B) = (2πσ 2 )N/2
e 2σ2 n=0

A
Unknown parameters:
A, B ⇔ θ ≡ [A B]T
ideal noiseless line
Careful: What would be the effects
0
n of bias in A and B?

c D. P. Mandic Advanced Signal Processing 10
Bias in parameter estimation
Our goal: Estimate the value of an unknown parameter, θ, from a set of
observations of a random variable described by that parameter

θ̂ = g x[0], x[1], . . . , x[N − 1] (θ̂ is a RV too)
Example: Given a set of observations from a Gaussian distribution,
estimate the mean or variance from these observations.
◦ Recall that in linear mean square estimation, when estimating the value
of a random variable y from an observation of a related random variable
x, the coefficients A and B within the estimator y = Ax + B depend
upon the mean and variance of x and y, as well as on their correlation.
The difference between the expected value of the estimate, θ̂, and
the actual value, θ, is called the bias and will be denoted by B.
B = E{θ̂N } − θ
where θ̂N denotes estimation over N data samples, x[0], . . . , x[N − 1].
Example 4: When estimating a DC level in noise, x[n] = A + w[n], the
1
PN −1
estimator, Â = N n=0 | x[n] |, is biased for A < 0. (see Appendix)

c D. P. Mandic Advanced Signal Processing 11
Now that we have a statistical estimation set–up

R
how do we measure “goodness” of the estimate?
Noise w is usually assumed white with i.i.d. samples (independent,
identically distributed)
whiteness often does not hold in real–world scenarios
Gaussianity is more realistic, due to validity of Central Limit Theorem
zero–mean noise is a nearly universal assumption, it is realistic since
w[n] = wzm[n] + µ
non–zero–mean noise ↑ ↑ zero–mean–noise µ is the mean
Good news: We can use these assumptions to find a bound on the
performance of “optimal” estimators.
More good news: Then, the performance of any practical estimator and
for any noise statistics will be bounded by that theoretical bound!
◦ Variance of noise does not always have to be known to make an estimate
◦ But, we must have tools to assess the “goodness” of the estimate
2
◦ Usually, the goodness analysis is a function of noise variance σw ,
expressed in terms of SNR = signal to noise ratio. (noise sets SNR level)

c D. P. Mandic Advanced Signal Processing 12
An alternative assessment via the estimation error
Since θ̂ is a RV, it has a PDF of its own (more in the next lecture on CRLB)

Given that θ̂ = g(x) then θ̂ = θ + η (η is the estimation error)

random variable (RV) ↑ not random variable ↑ ↑ random variable
Since θ̂ is a random variable (RV), the estimation error η is also a RV
η = θ̂ − θ =⇒ η = 0 indicates an unbiased estimator

p(^
θ) p(η )

R
0 θ ^θ 0 η

Quality of the estimator is completely described by the error PDF p(η)

We desire: 1) unbiased, that is, E{η} = 0
2) var(η) = E{(η − E{η})2} −→ small

c D. P. Mandic Advanced Signal Processing 13
Asymptotic unbiasedness
If the bias is zero, then for sufficiently many observations of x[n] (N large),
the expected value of the estimate, θ̂, is equal to its true value, that is
E{θ̂N } = θ ≡ B = E{θ̂N } − θ = 0
and the estimate is said to be unbiased.
If B 6= 0 then the estimator θ̂ = g(x) is said to be biased.
Example 5: Consider the sample mean estimator of the DC level in
WGN, x[n] = A + w[n], w ∼ N (0, 1), given by
N −1
1 X
Â = x̄ = x[n] that is θ = A
N + 2 n=0
Is the above sample mean estimator of the true mean A biased?
Observe: This estimator is biased but the bias B → 0 when N → ∞
lim E{θ̂N } = θ
N →∞

Such as estimator is said to be asymptotically unbiased.

c D. P. Mandic Advanced Signal Processing 14
Example 6: Asymptotically unbiased estimator of DC
level in noise
Consider the measurements x[n] = A + w[n], w ∼ N (1, σ 2 = 1)

N −1
1 X
and the estimator Â = x[n]
N + 2 n=0

For “deterministic” noise

x[n] where w[n] ∈ {−0.2, 0.2}
1.2 1
A=1 Â1 = 1+2 1.2 = 0.4
1
Â2 = 2+2 (1.2 + 0.8) = 0.5
0.8 1
Â3 = 3+2 3.2 = 0.64
noise distribution
.. .. ..
for random noise 1
Â8 = 8+2 8 = 0.8
.. .. ..
1
Â100 = 100+2 100 = 0.98
0
1 2 3 4 5 n

c D. P. Mandic Advanced Signal Processing 15
How about the variance?
◦ It is desirable that an estimator be either unbiased or asymptotically
unbiased (think about the power of estimation error due to DC offset)
◦ For an estimate to be meaningful, it is necessary that we use the
available statistics effectively, that is,
var(θ̂) → 0 as N →∞
or in other words n 2 o
lim var{θ̂N } = lim θ̂N − E{θ̂N } = 0
N →∞ N →∞

If θ̂N is unbiased then E{θ̂N } = θ, and from Tchebycheff inequality ∀ > 0

var{θ̂N }
P r{|θ̂N − θ| ≥ } ≤

R
2

If var → 0 as N → ∞, then the probability that θ̂N differs by more

than from the true value will go to zero (showing consistency).
In this case, θ̂N is said to converge to θ with probability one.

c D. P. Mandic Advanced Signal Processing 16
Mean square convergence
NB: Mean square error criterion is very different from the variance criterion

Another form of convergence, stronger than convergence with probability

one is the mean square convergence.
An estimate θ̂N is said to converge to θ in the mean–square sense, if
lim E{|θ̂N − θ|2} = 0
N →∞ | {z }
mean square error

◦ This is different from the previous slide, as θ is now assumed to be

known, in order to be able to measure the performance

◦ For an unbiased estimator, this is equivalent to the previous

condition that the variance of the estimate goes to zero

◦ An estimate is said to be consistent if it converges, in some sense, to

the true value of the parameter

◦ We say that the estimator is consistent if it is asymptotically

unbiased and has a variance that goes to zero as N → ∞

c D. P. Mandic Advanced Signal Processing 17
Example 7: Assessing the performance of the Sample
Mean as an estimator
Consider the estimation of a DC level, A, in random noise, which can be
modelled as

x[n] = A + w[n]

where w[n] ∼ is some zero-mean random iid process.

Aim: to estimate A given {x[0], x[1], . . . , x[N − 1]}

◦ Intuitively, the sample mean is a reasonable estimator, and has the form

N −1
1 X
Â = x[n]
N n=0

Q1: How close will Â be to A?

Q2: Are there better estimators than the sample mean?

c D. P. Mandic Advanced Signal Processing 18
Example 7 (contd.): Mean and variance of the Sample
Mean estimator
Estimator = f( random data ) =⇒ it is a random variable itself

=⇒ its performance must be judged statistically

(1) What is the mean of Â?
( N −1 ) N −1
n o 1 X 1 X
E Â = E x[n] = E {x[n]} = A # unbiased
N n=0 N n=0

(2) What is the variance of Â?

Assumption: The samples of w[n]s are uncorrelated
( N −1 )
n o 2 1 X
var Â = E Â − E{Â} = var x[n]
| {z } N n=0
variability around the mean
N −1
1 X 1 2 σ2
= 2
var {x[n]} = 2 N σ = (as noise is white i.i.d.)
N n=0 N N
Notice the variance → 0 as N → ∞ # consistent (your P&A sets)

c D. P. Mandic Advanced Signal Processing 19
Some intricacies which are often not fully spelled–out
In our example, each data sample has the same mean, namely A
probability theory ↑

R
and the mean, A, is exactly the quantity we are trying to estimate

1
PN −1
we are estimating A using the sample mean, Â = N n=0 x[n]
statistics ↑

◦ We desire to be always able to perform theoretical analysis to find the

bias and variance of the estimator (measure of its goodness)
theoretical results show how estimates depend on problem spec.
◦ Sometimes it is necessary to make use of simulations
to verify correctness of theoretical results
when we cannot find theoretical results (e.g. Monte Carlo
simulations)
when estimators have no optimality properties, but work in practice

c D. P. Mandic Advanced Signal Processing 20
Minimum Variance Unbiased (MVU) estimation
Aim: To establish “good” estimators of unknown deterministic parameters
Unbiased estimator # “on the average” yields the true value of the
unknown parameter, independently of its particular value, i.e.
E(θ̂) = θ a<θ<b

where (a, b) denotes the range of possible values of θ

Example 8: Consider an unbiased estimator for a DC level in white
Gaussian noise (WGN), observed as
x[n] = A + w[n] n = 0, 1, . . . , N − 1

where A is the unknown, but deterministic, parameter to be estimated

which lies within the interval (−∞, ∞). Then, the sample mean can be
used as an estimator of A, namely
N −1
1 X
Â = x[n]
N n=0

c D. P. Mandic Advanced Signal Processing 21
Careful: The estimator is parameter dependent!
An estimator may be unbiased for certain values of the unknown
parameter but not for all values; such an estimator is biased
Example 9: Consider another sample mean estimator of a DC level:

ˆ 1
PN −1
Â = 2N n=0 x[n]
ˆ
n o
Therefore: E Â = 0 when A = 0 but

ˆ
n o
A
E Â = 2 when A 6= 0 (parameter dependent)
ˆ
Hence Â is not an unbiased estimator.
◦ A biased estimator introduces a “systemic error” which should not be
present it at all possible
◦ Our goal is to avoid bias if we can, as we are interested in stochastic
signal properties and bias is largely deterministic

c D. P. Mandic Advanced Signal Processing 22
Effects of averaging for real world data
Problem 3.4 from your P/A sets: heart rate estimation
The heart rate, h, of a patient is automatically
n o by a computer every 100ms.
recorded
One second of the measurements ĥ1, ĥ2, . . . , ĥ10 are averaged to obtain ĥ. Given
n o
than E ĥi = αh for some constant α and var(ĥi) = 1 for all i, determine whether
averaging improves the estimator, for α = 1 and α =Before
1/2 .
Averaging After Averaging

α=1 α=1
10
1 X
ĥ = ĥi[n],

p(ĥi )
p(ĥi )
10 i=1
10
n o α X
E ĥ = h = αh
10 i=1 h ĥi h

For α = 1, the estimator is unbiased. For

α = 1/2 it will not be unbiased unless the 1 1
P10 α=
estimator is formed as ĥ = 51 i=1 ĥi[n]. p(ĥi ) 2 α=
2

p(ĥi )
10
n o 1 X n o
var ĥ = 2 var ĥi
L i=1
ĥi ĥ
h/2 h h/2 h

c D. P. Mandic Advanced Signal Processing 23
Remedy: How about averaging? Averaging data segments vs
averaging estimators? Also look in your CW Assignment dealing with PSD.

Several unbiased estimators of the same quantity may be averaged

together. For example, given the L independent estimates
n o
θ̂1, θ̂2, . . . , θ̂L

we may choose to average them, to yield

1
PL
θ̂ = L l=1 θ̂l

Our assumption was that the individual estimates, θ̂l = g(x), are unbiased,
with equal variances, and mutually uncorrelated.
Then (NB: averaging biased estimators will not remove the bias)
n o
E θ̂ = θ
and n o n o n o
1
PL 1
var θ̂ = L2 l=1 var θ̂ l = L var θ̂l

Note, as L → ∞, θ̂ → θ (consistent estimator)

c D. P. Mandic Advanced Signal Processing 24
R
Mean square error criterion & bias – variance dilemma
An optimality criterion is necessary to define an optimal estimator
One such natural criterion is the Mean Square Error (MSE), given by
2
2

M SE(θ̂) = E (θ̂ − θ) E error
which measures the average mean squared deviation of the estimate, θ̂,
from the true value (error power).

2
2
M SE (θ̂) = E (θ̂ − θ) =E (θ̂ − E{θ̂}) + ( E{θ̂} − θ )
| {z }
= bias, B(θ̂)
n 2o
+ 2 B(θ̂) E θ̂ − E{θ̂} +B 2(θ̂)

= E θ̂ − E{θ̂}
| {z }
=0

= var(θ̂) + B 2(θ̂)

MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

c D. P. Mandic Advanced Signal Processing 25
Example 10: An MSE estimator with a ’gain factor’
(motivation for unbiased estimators)
Consider the following estimator for DC level in WGN
N −1
1 X
Â = a x[n]
N n=0

Task: Find the value of a which results in the minimum MSE.

Solution:
n o
E Â = aA and

a2 σ 2
var(Â) = N

so that we have
a2σ 2
M SE(Â) = + (a − 1)2A2
N
Of course, the choice a = 1 removes the mean and minimises the variance

c D. P. Mandic Advanced Signal Processing 26
Example 10: (continued) An MSE estimator with a ’gain’
(is a biased estimator feasible?)
Can we find an optimum a analytically? Differentiate wrt a to yield

∂M SA 2aσ 2
(Â) = + 2(a − 1)A2
∂a N
and set the result to zero arrive at the optimal value

A2
aopt = 2

R
A2 + σN

but we do not know the value of A

◦ The optimal value, aopt, depends on A which is the unknown parameter
Without any constraints, this criterion leads to unrealisable estimators

R # those which are not solely a function of the data (see Example 6).
Practically, the minimum MSE (MMSE) estimator needs to be
abandoned, and the estimator must be constrained to be unbiased.

c D. P. Mandic Advanced Signal Processing 27
Minimum variance estimation & MSE criterion, together
Basic idea of MVU: Out of all possible unbiased estimators, find the one
with the lowest variance.
If the Mean Square Error (MSE) is used as a criterion, this means that

M SE (θ̂) = var(θ̂) + B 2(θ̂)

R
| {z }
=0 f or M V U

By constraining the bias to be zero, our task is much easier, that is, to find
an estimator that minimises the variance.
◦ In this way, the realisability problem of MSE is completely avoided.
Have you noticed:
MVU estimator = Minimum mean square error unbiased estimator
We will use the acronym MVUE for minimum variance unbiased estimator.

(see the Appendix for an alternative relation between the error function
and estimator quality)

c D. P. Mandic Advanced Signal Processing 28
Desired: minimum variance unbiased (MVU) estimator
Minimising the variance of an unbiased estimator concentrates the PDF of
the error about zero ⇒ estimation error is therefore less likely to be large
◦ Existence of the MVU estimator
var( ^θ ) var( ^θ )
^θ 1 ^θ 1

^θ
2
^θ
2
^θ
3 ^θ
3
^θ is a MVU estimator no MVU estimator
3

θ θ

The MVU estimator is an unbiased estimator with minimum

variance for all θ, that is, θ̂3 on the graph.

c D. P. Mandic Advanced Signal Processing 29
Methods to find the MVU estimator
The MVU estimator may not always exist, for example, when:
◦ There are no unbiased estimators a search for the MVU is futile
◦ None of the unbiased estimators has uniformly minimum variance, as in
the right hand side figure on the previous slide
If the MVU estimator (MVUE) exists, we may not always be able to find
it. While there is no general “turn-the-crank” method for this purpose,
the approaches to finding the MVUE employ the following procedures:
◦ Determine the Cramer-Rao lower bound (CRLB) and find some
estimator which satisfies the so defined MVU criteria (Lecture 4)
◦ Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem (rare in pract.)
◦ Restrict the class of estimators to be not only unbiased, but also linear
in the parameters, this gives MVU for linear problems (Lecture 5)
◦ Employ optimisation and prior knowledge about the model (Lecture 6)
◦ Choose a suitable real–time adaptive estimation architecture and
perform on-line estimation on streaming data (Lecture 7)

c D. P. Mandic Advanced Signal Processing 30
Extensions to the vector parameter case
h iT
◦ If θ = θ̂1, θ̂2, . . . , θ̂p ∈ Rp×1 is a vector of unknown parameters, an
estimator is said to be unbiased if
E(θ̂i) = θi where ai < θi < bi
for i = 1, 2, . . . , p
By defining  
E(θ1)
 E(θ2) 
E(θ) = 
 .. 

E(θp)
an unbiased estimator has the property E(θ̂) = θ within the
p–dimensional space of parameters spanned by θ = [θ1, . . . , θp]T .

◦ An MVU estimator has the additional property that its var(θˆi), for
i = 1, 2, . . . , p, is the minimum among all unbiased estimators.

c D. P. Mandic Advanced Signal Processing 31
Summary
◦ We are now equipped with performance metrics for assessing the
goodnes of any estimator (bias, variance, MSE).
◦ Since MSE = var + bias2, some biased estimators may yield low MSE.
However, we prefer minimum variance unbiased (MVU) estimators.
◦ Even a simple Sample Mean estimator is an example of the power of
statistical estimators.
◦ The knowledge of the parametrised PDF p(data;parameters) is very
important for designing efficient estimators.
◦ We have introduced statistical “point estimators”, would it be useful to
also know the “confidence” we have in our point estimate?
◦ In many disciplines it is useful to design so called “set membership
estimates”, where the output of an estimator belongs to a pre-definined
bound (range) of values.
◦ In our course, we will address linear, best linear unbiased, maximum
likelihood, least squares, sequential least squares, and adaptive estimators.

c D. P. Mandic Advanced Signal Processing 32
Homework: Check another proof for the MSE expression
MSE(θ̂) = var(θ̂) + bias2(θ)
2
2
Note : var(x) = E[x ] − E[x] (∗)
Idea : Let x = θ̂ − θ →
substitute into (∗)
2
2
to give var(θ̂ − θ) = E[(θ̂ − θ) ] − E[θ̂ − θ] (∗∗)
| {z } | {z } | {z }
term (1) term (2) term (3)

Let us now evaluate these terms:

(1) var(θ̂ − θ) = var(θ̂)
(2) E[θ̂ − θ]2 = MSE
2 2 2
E[θ̂ − θ] = E[θ̂] − E[θ] = E[θ̂ − θ] = bias2(θ̂)

(3)

Substitute (1), (2), (3) into (**) to give

var(θ̂) = MSE − bias2 ⇒ MSE = var(θ̂) + bias2(θ̂)

c D. P. Mandic Advanced Signal Processing 33
Recap: Unbiased estimators
Due to the linearity properties of the statistical expectation operator,
E {·}, that is

E{a + b} = E{a} + E{b}

the sample mean estimator can be shown to be unbiased, i.e.

n o PN −1 PN −1
E Â = N1 n=0 E {x[n]} = N1 n=0 A = A

◦ In some applications, the value of A may be constrained to be positive.

For example, the value of an electronic component such as an inductor,
capacitor or resistor would be positive (prior knowledge).
◦ For N data points in i.i.d. random noise, unbiased estimators generally
have symmetric PDFs centred about their true value, that is

Â ∼ N (A, σ 2/N )

c D. P. Mandic Advanced Signal Processing 34
Appendix: Some usual assumptions in the analysis
How realistic are the assumptions on the noise?
◦ Whiteness of the noise is quite realistic to assume, unless the evidence
or physical insight suggest otherwise
◦ The independent identically distributed (i.i.d.) assumption is
straightforward to remove through e.g. the weighting matrix
W = diag(1/σ02, . . . , 1/σN
2
−1 ) (see Lectures 5 and 6)
◦ In real world scenarios, whiteness is often replaced by bandpass
correlated noise (e.g. pink or 1/f noise in physiological recordings)
◦ The assumption of Gaussianity is often realistic to keep, due to e.g. the
validity of Central Limit Theorem
Is the zero–mean assumption realistic? Yes, as even for non–zero mean
noise, w[n] = wzm[n] + µ, where wzm[n] is zero–mean noise, the mean of
the noise µ can be incorporated into the signal model.
Do we always need to know noise variance? In principle no, but when
assessing performance (goodness) variance is needed to measure the SNR.

c D. P. Mandic Advanced Signal Processing 35
Appendix. Example 11: A counter-example # a little
bias can help (but the estimator is difficult to control)
Q: Let {y[n]}, n = 1, . . . , N be iid Gaussian variables ∼ N (0, σ 2).
Consider the following estimate of σ 2
N
α X
σ̂ 2 = y 2[n] α > 0
N n=1
Find α which minimises the MSE of σ̂ 2.
2 2
A: It is straightforward to show that E{σ } = ασ and
2 2 2 2
4 4
M SE(σ̂ ) = E{ σ̂ − σ } = E{σ̂ } + σ (1 − 2α)
N N
α2 X X 2 2 4 2
= E{y [n]y [s]} + σ (1 − 2α) Hint : Σn = ΣnΣs
N 2 n=1 s=1

α2 2 4 4

4 4
h
4 2 i
= N σ + 2N σ + σ (1 − 2α) = σ α (1 + ) + (1 − 2α)
N2 N
N 2 2σ 4
The MMSE is obtained for αmin = and is M M SE(σ̂ ) =
N +2 N +2 .
Given that the corresponding σ̂ 2 of an optimal unbiased estimator
(CRLB, later) is 2σ 4/N , this is an example of a biased estimator
which obtains a lower MSE than the CRLB.

c D. P. Mandic Advanced Signal Processing 36
Appendix (full analysis of Example 4)
Biased estimator:

N
1 X
Ã = x[n]
N n>1

Therefore,

◦ if A ≥ 1, then x[n] = x[n], and E{Ã} = A

◦ if A < 1, then E{Ã} =

6 A

= 0, A ≥ 0
⇒ Bias =
6= 0, A < 0

c D. P. Mandic Advanced Signal Processing 37
Notes

c D. P. Mandic Advanced Signal Processing 38
Notes

c D. P. Mandic Advanced Signal Processing 39
Notes

c D. P. Mandic Advanced Signal Processing 40

Download Full Business Statistics Abridged: Australia and New Zealand 8th Edition Eliyathamby A. Selvanathan Saroja Selvanathan Gerald Keller PDF All Chapters
100% (4)
Download Full Business Statistics Abridged: Australia and New Zealand 8th Edition Eliyathamby A. Selvanathan Saroja Selvanathan Gerald Keller PDF All Chapters
40 pages
Time Series Analysis Henrik Madsen
100% (1)
Time Series Analysis Henrik Madsen
156 pages
Probability and Random Processes For Electrical Engineering 2nd Ed
No ratings yet
Probability and Random Processes For Electrical Engineering 2nd Ed
310 pages
Experiment # 1: MATLAB Basics For Communication System Design Objective
No ratings yet
Experiment # 1: MATLAB Basics For Communication System Design Objective
19 pages
Wren 80i Gas Turbine Engine Tech Specs
100% (1)
Wren 80i Gas Turbine Engine Tech Specs
25 pages
Estimation Theory Overview
100% (1)
Estimation Theory Overview
17 pages
Review
No ratings yet
Review
33 pages
Chaptetr 1 Solution Steven M KAY
100% (1)
Chaptetr 1 Solution Steven M KAY
4 pages
HW 2 Sol
No ratings yet
HW 2 Sol
5 pages
Convex Optimization HW1 Solution
No ratings yet
Convex Optimization HW1 Solution
7 pages
Optimal Control Lecture 28: Indirect Solution Methods
No ratings yet
Optimal Control Lecture 28: Indirect Solution Methods
2 pages
Chapter 1 - Introduction To Computer Networks
No ratings yet
Chapter 1 - Introduction To Computer Networks
53 pages
Solved-0580 42 FM 21
No ratings yet
Solved-0580 42 FM 21
20 pages
2.8.5.c Worksheet
No ratings yet
2.8.5.c Worksheet
2 pages
Algebra 0580 New
No ratings yet
Algebra 0580 New
41 pages
DSP Practical Workbook PDF
No ratings yet
DSP Practical Workbook PDF
117 pages
Chapter 8 State Space Analysis
No ratings yet
Chapter 8 State Space Analysis
22 pages
Statistical Signal Processing
No ratings yet
Statistical Signal Processing
15 pages
PUE 4110: Optimization Techniques: Lecturer: Prof. Philip Ngare
No ratings yet
PUE 4110: Optimization Techniques: Lecturer: Prof. Philip Ngare
15 pages
Convex Optimization With Engineering Applications
100% (1)
Convex Optimization With Engineering Applications
38 pages
Mean Variance MGF and CF of Distributions
No ratings yet
Mean Variance MGF and CF of Distributions
9 pages
Module - 1 Lecture Notes - 2 Optimization Problem and Model Formulation
No ratings yet
Module - 1 Lecture Notes - 2 Optimization Problem and Model Formulation
5 pages
Trigonometry Sin, Cos Rule
No ratings yet
Trigonometry Sin, Cos Rule
15 pages
Chap3 State Variable Models
No ratings yet
Chap3 State Variable Models
47 pages
Non Linear Optimization in EENG Lecture - 00
No ratings yet
Non Linear Optimization in EENG Lecture - 00
18 pages
Trigonometry (3) Answers
No ratings yet
Trigonometry (3) Answers
15 pages
Pakistan International School (English Section) : Riyadh
No ratings yet
Pakistan International School (English Section) : Riyadh
8 pages
Data-Communication-and-Networks Dip Mod 3 March 2024
100% (1)
Data-Communication-and-Networks Dip Mod 3 March 2024
3 pages
Convex Optimization Theory - Summary
100% (1)
Convex Optimization Theory - Summary
58 pages
State Space Modelling
No ratings yet
State Space Modelling
19 pages
Cyclostationarity PDF
No ratings yet
Cyclostationarity PDF
260 pages
PHY107 Lab Manuals
No ratings yet
PHY107 Lab Manuals
56 pages
State Transition Matrix
No ratings yet
State Transition Matrix
26 pages
Lab 2 Install SQL Server
No ratings yet
Lab 2 Install SQL Server
41 pages
Solved-0580 42 FM 19
No ratings yet
Solved-0580 42 FM 19
20 pages
20110715122354-B SC - M SC - H S - Mathematics PDF
0% (1)
20110715122354-B SC - M SC - H S - Mathematics PDF
75 pages
Lecture 6 State Space Modeling Analysis
No ratings yet
Lecture 6 State Space Modeling Analysis
56 pages
State Transition Matrix
No ratings yet
State Transition Matrix
10 pages
Digital Image Processing (Chapter 4)
100% (1)
Digital Image Processing (Chapter 4)
62 pages
4016 Mathematics Topic 1: Numbers and Algebra
No ratings yet
4016 Mathematics Topic 1: Numbers and Algebra
7 pages
North South University: Department of Electrical & Computer Engineering
No ratings yet
North South University: Department of Electrical & Computer Engineering
4 pages
AY 24-25 CALENDAR - DRAFT 26 March 2024
No ratings yet
AY 24-25 CALENDAR - DRAFT 26 March 2024
4 pages
Gram Schmidt Orthogonalization
100% (1)
Gram Schmidt Orthogonalization
5 pages
Ig WB (3) Trigonometry.1
No ratings yet
Ig WB (3) Trigonometry.1
122 pages
Golden Section Search
No ratings yet
Golden Section Search
6 pages
Flow Control Protocols in DLL
No ratings yet
Flow Control Protocols in DLL
54 pages
DMS MCQ'S
No ratings yet
DMS MCQ'S
10 pages
Fuzzy Arithmetic On LR Fuzzy Numbers
No ratings yet
Fuzzy Arithmetic On LR Fuzzy Numbers
17 pages
N4 Maths SOHCAHTOA Practice Questions2
No ratings yet
N4 Maths SOHCAHTOA Practice Questions2
5 pages
Numerical Integration
No ratings yet
Numerical Integration
6 pages
Lab Manual Phy1Lab Expt. 1
No ratings yet
Lab Manual Phy1Lab Expt. 1
7 pages
Solutions manual for Digital Signal Processing (4th Edition). John G. Proakis, Dimitris K Manolakisdownload
100% (6)
Solutions manual for Digital Signal Processing (4th Edition). John G. Proakis, Dimitris K Manolakisdownload
44 pages
CU5151-Advanced Digital Communication Techniques
No ratings yet
CU5151-Advanced Digital Communication Techniques
15 pages
Solutions Manual To Accompany Miller & Freund's Probability and Statistics For Engineers 8th Edition 0321640772 All Chapter Instant Download
100% (3)
Solutions Manual To Accompany Miller & Freund's Probability and Statistics For Engineers 8th Edition 0321640772 All Chapter Instant Download
40 pages
Indefinite Integrals Calculus
100% (1)
Indefinite Integrals Calculus
5 pages
CEG 2136 - Fall 2016 - Midterm
No ratings yet
CEG 2136 - Fall 2016 - Midterm
9 pages
11th Important Questions 2023
No ratings yet
11th Important Questions 2023
12 pages
Moment Generating Function
No ratings yet
Moment Generating Function
10 pages
Numerical Integration New
No ratings yet
Numerical Integration New
2 pages
Com Solutions Manual
No ratings yet
Com Solutions Manual
966 pages
Four-Bit Adder-Subtractor
No ratings yet
Four-Bit Adder-Subtractor
10 pages
Trackpad Ver. 2.0 Class 7
From Everand
Trackpad Ver. 2.0 Class 7
Nidhi Arora
5/5 (1)
SSPI Lecture 3 Estimation Intro 2025
No ratings yet
SSPI Lecture 3 Estimation Intro 2025
56 pages
Statistical Learning
No ratings yet
Statistical Learning
25 pages
Columbus Tubes 2018 Catalogue V3 PDF
No ratings yet
Columbus Tubes 2018 Catalogue V3 PDF
34 pages
Advanced Signal Processing Linear Stochastic Processes
No ratings yet
Advanced Signal Processing Linear Stochastic Processes
66 pages
Ansys Fluent Guide - Creating Simple Geometries
No ratings yet
Ansys Fluent Guide - Creating Simple Geometries
18 pages
Tutorial Sheet 5
No ratings yet
Tutorial Sheet 5
1 page
Guitar Setup Guide
No ratings yet
Guitar Setup Guide
6 pages
Trek Emonda ALR 5 Disc Specifications
No ratings yet
Trek Emonda ALR 5 Disc Specifications
2 pages
Cheats For GTA San Andreas
No ratings yet
Cheats For GTA San Andreas
4 pages
Components of Research
No ratings yet
Components of Research
4 pages
The Contributions of Statisticians in The Field of Statistics
No ratings yet
The Contributions of Statisticians in The Field of Statistics
2 pages
Ouliers in Statistica
0% (1)
Ouliers in Statistica
5 pages
Probability - Extreme Value Theory - Show - Normal To Gumbel - Cross Validated
No ratings yet
Probability - Extreme Value Theory - Show - Normal To Gumbel - Cross Validated
5 pages
PharmaSUG 2013 MS05
No ratings yet
PharmaSUG 2013 MS05
10 pages
PNQN Poqn: Businessstatisics (MCQS) 1. 2. 3
No ratings yet
PNQN Poqn: Businessstatisics (MCQS) 1. 2. 3
1 page
State The Relevance of Literature Review in Research
No ratings yet
State The Relevance of Literature Review in Research
7 pages
0460 Learner Guide (For Examination From 2020)
No ratings yet
0460 Learner Guide (For Examination From 2020)
35 pages
RL Examples
No ratings yet
RL Examples
6 pages
Cluster Randomised Trials 1st Edition Richard J. Hayes pdf download
100% (1)
Cluster Randomised Trials 1st Edition Richard J. Hayes pdf download
53 pages
UNIT I - SAQ and Assignment
100% (1)
UNIT I - SAQ and Assignment
6 pages
micro teaching on health informatics
No ratings yet
micro teaching on health informatics
12 pages
Stat Final Exam '17-'18
100% (3)
Stat Final Exam '17-'18
2 pages
Generate Two Correlated Noise
No ratings yet
Generate Two Correlated Noise
6 pages
Statcal - Miclat - Module 2 Assignment
No ratings yet
Statcal - Miclat - Module 2 Assignment
4 pages
02.11.2024 Activity
No ratings yet
02.11.2024 Activity
5 pages
Forecasting Time Series With R - Dataiku
No ratings yet
Forecasting Time Series With R - Dataiku
16 pages
List of Courses Ads
No ratings yet
List of Courses Ads
2 pages
History of Business Statistics
No ratings yet
History of Business Statistics
3 pages
(Ebook PDF) Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research 6th Edition Ebook All Chapters PDF
100% (7)
(Ebook PDF) Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research 6th Edition Ebook All Chapters PDF
29 pages
Data Mining Solved PP Short Q's
No ratings yet
Data Mining Solved PP Short Q's
11 pages
MSC DS Entrance Test Syllabus
No ratings yet
MSC DS Entrance Test Syllabus
2 pages
CS1B Workbook Answers
100% (2)
CS1B Workbook Answers
132 pages
4-Accuracy in Forecasting PDF
No ratings yet
4-Accuracy in Forecasting PDF
43 pages
Empowerment Technologies Module 7 (EJBOY)
No ratings yet
Empowerment Technologies Module 7 (EJBOY)
17 pages
15 GIT All Exercises
100% (1)
15 GIT All Exercises
162 pages
Supervised Learning
No ratings yet
Supervised Learning
4 pages
Chapter 7 Population & Sampling
No ratings yet
Chapter 7 Population & Sampling
9 pages
V-Vi Semester Syllabus Cse-Iot 22
No ratings yet
V-Vi Semester Syllabus Cse-Iot 22
39 pages

Advanced Signal Processing Introduction To Estimation Theory

Uploaded by

Advanced Signal Processing Introduction To Estimation Theory

Uploaded by

Advanced Signal Processing

Introduction to Estimation Theory

Department of Electrical and Electronic Engineering

Original and estimated signals AR coefficients

R Can we put this into a bigger ”estimation theory” framework?

Consider e.g. the estimation of

There are essentially two alternatives to estimate the unknown θ

Mathematical statement of a general estimation problem:

So, it is p(x ; θ) that contains all the information needed

RR ↑ we will use p(x ; θ) to find θ̂ = g(x)

The parametrised p(x; θ) should be

The“parametrised” pdf, p(x[0]; θ) = p(x[0]; A), is obviously Gaussian with

An estimator is a rule, g(x), that assigns a value to the parameter θ from

R for a given realisation of x = [x[0], . . . , x[N − 1]]T in the form θ̂ = g(x).

x[0] = A + w[0], w[0] ∼ N (0, σ 2)

◦ The mean of p(Â) measures the centroid

x[n] x[n] = A + Bn + w[n]

Given that θ̂ = g(x) then θ̂ = θ + η (η is the estimation error)

Quality of the estimator is completely described by the error PDF p(η)

Such as estimator is said to be asymptotically unbiased.

For “deterministic” noise

If θ̂N is unbiased then E{θ̂N } = θ, and from Tchebycheff inequality ∀  > 0

If var → 0 as N → ∞, then the probability that θ̂N differs by more

Another form of convergence, stronger than convergence with probability

◦ This is different from the previous slide, as θ is now assumed to be

◦ For an unbiased estimator, this is equivalent to the previous

◦ An estimate is said to be consistent if it converges, in some sense, to

◦ We say that the estimator is consistent if it is asymptotically

where w[n] ∼ is some zero-mean random iid process.

Q1: How close will Â be to A?

Q2: Are there better estimators than the sample mean?

=⇒ its performance must be judged statistically

(2) What is the variance of Â?

◦ We desire to be always able to perform theoretical analysis to find the

where (a, b) denotes the range of possible values of θ

where A is the unknown, but deterministic, parameter to be estimated

For α = 1, the estimator is unbiased. For

Several unbiased estimators of the same quantity may be averaged

we may choose to average them, to yield

Note, as L → ∞, θ̂ → θ (consistent estimator)

MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

Task: Find the value of a which results in the minimum MSE.

but we do not know the value of A

M SE (θ̂) = var(θ̂) + B 2(θ̂)

The MVU estimator is an unbiased estimator with minimum

Let us now evaluate these terms:

Substitute (1), (2), (3) into (**) to give

E{a + b} = E{a} + E{b}

the sample mean estimator can be shown to be unbiased, i.e.

◦ In some applications, the value of A may be constrained to be positive.

◦ if A < 1, then E{Ã} =

You might also like

If θ̂N is unbiased then E{θ̂N } = θ, and from Tchebycheff inequality ∀ > 0