0% found this document useful (0 votes)

12 views56 pages

SSPI Lecture 3 Estimation Intro 2025

The lecture introduces estimation theory in statistical signal processing, emphasizing the importance of historical data for accurate modeling and prediction. It covers concepts such as estimators, bias, variance, and performance metrics, along with practical applications in various fields like finance and communications. The document also discusses the need for a probability density function (PDF) to effectively estimate unknown parameters from observed data.

Uploaded by

zhouwenwei0526

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views56 pages

SSPI Lecture 3 Estimation Intro 2025

Uploaded by

zhouwenwei0526

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Statistical Signal Processing & Inference

Introduction to Estimation Theory

Danilo Mandic
room 813, ext: 46271

Department of Electrical and Electronic Engineering

Imperial College London, UK
[email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic

c D. P. Mandic Statistical Signal Processing & Inference 1

Aims of this lecture
◦ We are seen that it is often the case that data, which may be generated
through even most complicated physical signal generating mechanisms,
still admit accurate modelling based on only the available historical data
◦ For example, when concerned with an incredibly complex phenomenon
of the generation and number of sunspots, it is sufficient to consider just

R
historical sunspot samples in the task of Sunspot Number Prediction
This highlights the need for a unifying & rigorous framework for the
assessment of “goodness of performance” of any Data Analytics model,
from the simplest “persistent” estimate, to linear ARMA processes,
through to nonlinear Neural Network models a subject of this Lecture
We will typically consider prediction/forecasting scenarios:
Prediction: Employs an already built model (based on in-sample data,
training data) to estimate out-of-sample values (prediction, inference).
Forecasting: A type of prediction which implicitly assumes time-series,
where historical data are used to predict future data. Often involves
“confidence intervals” (there is 20% chance of rain at 14:00).
c D. P. Mandic Statistical Signal Processing & Inference 2
Example from Lecture 2: How expressive are these
inference models (e.g. under-fitting vs. over-fitting)
Original AR(2) process x[n] = −0.2x[n − 1] − 0.9x[n − 2] + w[n],
w[n] ∼ N (0, 1), is estimated using AR(1), AR(2) and AR(20) models.

Original and estimated signals AR coefficients

5 0.2

−0.2
0 −0.4 Original AR(2) signal
AR( 1), Error=5.2627
−0.6
AR( 2), Error=1.0421
−0.8 AR( 20), Error=1.0621
−5
360 370 380 390 400 410 0 5 10 15 20

R
Time [sample] Coefficient index

R Can we consider this within a bigger ”estimation theory” framework?

Can we quantify the “goodness” of an estimator (bias, variance, prediction
error, optimality, scalability, sensitivity, sufficient statistics)?

c D. P. Mandic Statistical Signal Processing & Inference 3

Example from Lecture 6: Method of Least Squares (LS)
Least Squares Order Selection Ineractive, Animation Sequential LS

◦ The LS estimator of a ’noisy line’ x[n] = s[n; A, B] + w[n] is very

sensitive to the correct model of the signal of interest, s, as shown in
the figure below for the LS fit of x[n] = A + Bn + w[n].
◦ The error power for fitting the seen (training) data monotonically
decreases with the model order.
◦ The goodness of inference of the higher order model (extrapolation, test
data) is however not adequate (overparametrisation, lack of expressivity)
Observations of x[n] = A + Bn + q[n] (blue dots) and LS estimates of varying order
5
Experimental data and LS models

4
single-parameter model, s[n]=A
3

0
Raw data
Order-0: error power =195.05
-1 two-parameter model, s[n]=A+Bn Order-1: error power =98.39
Order-7: error power =94.11
-2
0 10 20 30 40 50 60 70 80 90 100
Sample index, n

c D. P. Mandic Statistical Signal Processing & Inference 4

Objectives: Introduction to Estimation Theory
◦ Notions of an Estimator, Estimate, Estimandum
◦ The bias and variance in statistical estimation theory, asymptotically
unbiased and consistent estimators
◦ Performance metrics, such as the Mean Square Error (MSE)
◦ The bias–variance dilemma and the MSE, feasible MSE estimators
◦ A class of Minimum Variance Unbiased (MVU) estimators, that is,
those with the lowest possible variance of all unbiased estimators
◦ Extension to the vector parameter case
◦ Statistical goodness of an estimator, the role of noise
◦ Enabling technology for many applications: Radar and sonar (range
and azimuth), image analysis (motion estimation), speech (recognition
and identification), finance, seismics (oil reservoirs), communications
(equalisation, symbol detection), biomedicine (ECG, EEG, respiration)

c D. P. Mandic Statistical Signal Processing & Inference 5

Discrete–time statistical estimation problem
(try also the function specgram in Matlab # it produces the TF diagram below)

Consider e.g. the estimation of

Time–Freq. spectrogram of speech
a fundamental frequency, f0, of a
M aaaa tl aaa b speaker from this TF spectrogram
Signal s[n; f0, Φ0] is burried in noise
x[n] = s[n; f0, Φ0] + w[n]
◦ Each time we observe x[n] it

R
contains the desired s[n] but also
a different realisation of noise w[n]
The estimated frequency, fˆ0,
and phase, Φ̂0, are random variables
Our goal: Find an estimator which
maps the data x to the estimates
fˆ0 = g1(x) and Φ̂0 = g2(x).
↑ Frequency → Time The RVs fˆ0, Φ̂0 are best described via a
prob. model which depends on: structure
Observe also mathematical artefacts of s[n], pdf of w[n], and form of g(x).
c D. P. Mandic Statistical Signal Processing & Inference 6
Statistical estimation problem (learning from data)
The notation p(x; θ) indicates that θ are not random but unknown parameters
Problem statement: Given an N -point dataset, x[0], x[1], . . . , x[N − 1],
which depends on an unknown scalar parameter, θ, an estimator is
defined as a function, g(·), of the dataset {x}, that is

θ̂ = g x[0], x[1], . . . , x[N − 1]
which may be used to estimate θ. (single parameter or “scalar” case)
Vector case: Analogously to the scalar case, we seek to determine a set of
parameters, θ = [θ1, . . . , θp]T , from data samples x = [x[0], . . . , x[N − 1]]T
such that the values of these parameters would yield the highest
probability of obtaining the observed data. This can be formalised as
max p(x; θ) where p(x; θ) reads: “p(x) parametrised by θ 00
span θ

There are essentially two alternatives to estimate the unknown θ

◦ Classical estimation: Unknown parameter(s) is deterministic with no
means to include a priori information about θ (minimum var., ML, LS)
◦ Bayesian estimation: Parameter θ is a random variable, which allows
us to use prior knowledge on θ (Wiener and Kalman filters, adaptive SP)
c D. P. Mandic Statistical Signal Processing & Inference 7
The need for a PDF of the data, parametrised by θ
(really, just re–phrasing the previous slide)

Mathematical statement of the general estimation problem:

T
From the measured data x = x[0], x[1], . . . , x[N − 1]
↑ an N-dimensional random vector
Find the unknown (vector) parameter θ = [θ1, θ2, . . . , θp]T
↑ θ is not random
Q: What captures all the statistics needed for successful estimation of θ
A: It has to be the N-dimensional PDF of the data, parametrised by θ

So, it is p(x ; θ) that contains all the information needed

R
R
↑ we will use p(x ; θ) to find θ̂ = g(x)
When we know this PDF, we can design optimal estimators
In practice, this PDF is not given, and our goal is to choose a model which:
◦ Captures the essence of the signal generating physical model;
◦ Leads to a mathematically tractable form of an estimator.
c D. P. Mandic Statistical Signal Processing & Inference 8
Random variable (RV), some general observations
Random variable # quantifies the outcome of a random event.
For example, “heads” or “tails” on a coin or a blue square on Rubik’s cube
are not random variables per se, but can be made random variables

R through numerical characterisation.

We therefore do not know how to determine the value of a RV, but can
specify the probability of occurrence of a certain value of a RV.
A random var. X with the pdf Variance effect on Gaussian pdf
1 (x−µ)2 p (x) area = 0.684
− X
pX (x) = √ e 2σ 2 (68.3% of area
σ 2π 2σ
under the pdf)

is called a Gaussian RV. x= µ X

p (x) large σ
→ µ is the mean of a RV X X
large variability
(large uncertainty)
→ σ is the standard deviation of a RV
X , and σ > 0 X
small
→ σ 2 is the variance of a RV X p
X
(x) σ
small variability
So, we can write X ∼ N (µ, σ 2) (small uncertainty)

c D. P. Mandic Statistical Signal Processing & Inference 9

Conditional pdf
“slice and normalise” the joint pdf p(x, y)
Formal definition
( or more often
(
pXY (x,y) p(x,y)
pX (x) , pX (x) 6= 0 p(y) , p(y) 6= 0
pY |X (y|x) = x p(x|y) = x
0, 
 otherwise 0, 
 otherwise
x is held fixed y is held fixed

Conditional pdf p(x|y)

Depends on joint pdf p(x, y) because
there are two rand. variables, x and y.
Example: Length of holidays, X,
conditioned on the salary Y =£60k?
Ans: Find all people who make exactly
£60k, how is holiday length distributed?
We therefore:
◦ slice the joint p(x, y) at Y = £60k
◦ normalise by pY (60, 000) so that
p(x|y)=p(x, 60k)/pY (60k) is valid pdf

c D. P. Mandic Statistical Signal Processing & Inference 10

Joint pdf pXY (x, y) versus parametrised pdf p(x; θ)
We will use p(x; θ) to find θ̂ = g(x)
Joint pdf p(x, y) Parametrised pdf p(x[0]; A)

0.3

p(x[0]; A)
0.2

0.1

0
10
5
0
0
A -10 -5
x[0]

The parametrised p(x; θ) should be

looked at as a function of θ for a fixed
value of observed data x
Right: For x[0]=A+w[0], if we
observe x[0] = 3, then p(x[0] =
3; A) is a slice of the parametrised
p(x[0]; A) for a fixed x[0] = 3.
c D. P. Mandic Statistical Signal Processing & Inference 11
The statistical estimation problem
First step: To model, mathematically, the data
Consider a single observation of a DC level, A, in WGN, w (that is, θ = A)
x[0] = A + w[0] where w[0] ∼ N (0, σ 2). Then x[0] ∼ N (A, σ 2).

The“parametrised” pdf, p(x[0]; θ) = p(x[0]; A), is obviously Gaussian with

the mean of A parametrisation affects the mean of p(x[0]; A).
Example 1: For N = 1, and with θ denoting the mean value, a generic
form of p(x[0]; θ) for the class of Gaussian parametrised PDFs is given by

R
1
1 2

p(x[0]; θi) = √
2
exp − 2σ2 (x[0] − θi) i = 1, 2
2πσ
Clearly, the
observed value
of x[0] critically
impacts upon the
likely value of the
parameter θ (here,
θ 2 =A 2 θ 1 =A 1 x[0] the DC level A)

c D. P. Mandic Statistical Signal Processing & Inference 12

Estimator vs. Estimate
specification of the PDF is critical to determining a good estimator

An estimator is a rule, g(x), that assigns a value to the parameter θ from

each realisation of x = x = [x[0], . . . , x[N − 1]]T .
An estimate of the true value of θ, also called ’estimandum’, is obtained

R for a given realisation of x = [x[0], . . . , x[N − 1]]T in the form θ̂ = g(x).

Upon establishing the parametrised p(x; θ), the estimate θ̂ = g(x) itself is
then viewed as a random variable and has a pdf of its own, p(θ̂).
Example 2: Estimate a DC level A in WGN.

x[0] = A + w[0], w[0] ∼ N (0, σ 2)

◦ The mean of p(Â) measures the centroid

◦ The variance of p(Â) measures the spread
of the pdf around the centroid
PDF concentration ↑ =⇒ Accuracy ↑
This pdf displays the quality of performance.

c D. P. Mandic Statistical Signal Processing & Inference 13

Example 3a: Need for pdf concentration in real–world
An application of risk assessment in finance

In finance, the risk of investment is minimised through “diversification”,

that is, the investment in many assets (a portfolio) as opposed to
single–asset investment. Here, “returns”, Rt = pt/pt−1, with p a price.

PDF concentration ↑ # accuraccy ↑

c D. P. Mandic Statistical Signal Processing & Inference 14
Example 3a (contd): Justification for a Gaussian PDF
the closer a pdf to a Gaussian one the more appropriate a 2nd order linear model
An application to COVID-19 infection rate prediction in the UK

◦ The raw data have asymmetric distribution (non–Gaussian)

◦ The x[n]/x[n-1] transformed data have a more concentrated distribution
◦ The ln(x[n]/x[n-1]) transformation → pdf is closer to a Gaussian one
c D. P. Mandic Statistical Signal Processing & Inference 15
Linear Models (regression models) (see Lecture 6)
These underpin many areas e.g. the CAPM and Fama-French models in finance

Daily Returns of Crude Oil vs. Energy Sector Residuals of Linear Fit, Oil vs. Energy
Data from Apr. 2024 0.010 Residuals
0.010 Regression Line
Vangard Energy ETF

0.005
0.005

Residuals
0.000 0.000

0.005 0.005

0.010 0.010
0.03 0.02 0.01 0.00 0.01 0.02 0.03 0.02 0.01 0.00 0.01
Crude Oil x

S&P 500 vs. Gold Prices in April 2024 Residuals of Both Fits, S&P500 vs. Gold
2400 75
50
2350
25
Gold Prices

2300 Residuals 0
25
2250
Data 50

2200
Linear Fit 75 Linear Fit Residuals, sum(Res2)=47375
Quadratic Fit 100 Quadratic Fit Residuals, sum(Res2)=42270
5000 5050 5100 5150 5200 5250 5000 5050 5100 5150 5200 5250
S&P 500 x

c D. P. Mandic Statistical Signal Processing & Inference 16

Example 3b: Finding the parameters of a line from N
observed data points (important generic problem, e.g. in regression)

In practice, the chosen PDF should fit the problem set–up and incorporate
any “prior” information; it must also be mathematically tractable.
Example: Assume that “on the Data: Straight line embedded in
average” data values are increasing random noise w[n] ∼ N (0, σ 2)

x[n] x[n] = A + Bn + w[n]

noisy line
= s[n; A, B] + w[n]
h PN −1 i
1 2
1 − (x[n]−A−Bn)
p(x; A, B) = (2πσ 2 )N/2
e 2σ2 n=0

A
Unknown parameters:
A, B ⇔ θ ≡ [A B]T
ideal noiseless line
Careful: What are the effects of bias
0
n in A and B on the previous example?

c D. P. Mandic Statistical Signal Processing & Inference 17

Bias in parameter estimation
Our goal: Estimate the value of an unknown parameter, θ, from a set of
observations of a random variable described by that parameter

θ̂ = g x[0], x[1], . . . , x[N − 1] (θ̂ is a RV too)
Example: Given a set of observations from a Gaussian distribution,
estimate the mean or variance from these observations.
◦ Recall that in linear mean square estimation, when estimating the value
of a random variable y from an observation of a related random variable
x, the coefficients A and B within the estimator y = Ax + B depend
upon the mean and variance of x and y, as well as on their correlation.
The difference between the expected value of the estimate, θ̂, and
the actual value, θ, is called the bias and will be denoted by B.
B = E{θ̂N } − θ
where θ̂N designates estimation over N data samples, x[0], . . . , x[N − 1].
Example 4: When estimating a DC level in noise, x[n] = A + w[n], the
1
PN −1
estimator, Â = N n=0 | x[n] |, is biased for A < 0. (see Appendix)

c D. P. Mandic Statistical Signal Processing & Inference 18

Now that we have a statistical estimation set–up

R
how do we measure “goodness” of the estimate?
Noise w is usually assumed white with i.i.d. samples (independent,
identically distributed)
whiteness often does not hold in real–world scenarios
Gaussianity is more realistic, due to validity of Central Limit Theorem
zero–mean noise is a nearly universal assumption, and it is realistic since
w[n] = wzm[n] + µ
non–zero–mean noise ↑ ↑ zero–mean–noise µ is the mean
Good news: We can use these assumptions to find a bound on the
performance of “optimal” estimators.
More good news: Then, the performance of any practical estimator and
for any noise statistics will be bounded by that theoretical bound!
◦ Variance of noise does not always have to be known to make an estimate
◦ But, we must have tools to assess the “goodness” of the estimate
2
◦ Usually, the goodness analysis is a function of noise variance, σw ,
expressed in terms of SNR = signal to noise ratio. (noise sets SNR level)
c D. P. Mandic Statistical Signal Processing & Inference 19
Assessing the performance of an estimator
Recall that the estimate θ̂ = g(x) is a random variable. As such, it has a
pdf of its own, and this pdf completely depicts the quality of the estimate.
^)
p(θ −1 2 We can only assess performance when
sqrt ( 2πσ )
the value of θ is known
68% The quality (goodness) of an estimator is
^
typically captured through the mean and
θ
variance of θ̂ = g(x).
θ−σ θ θ+σ
We desire: µθ̂ = E{θ̂} = θ
θ−2σ 95% θ+2σ
and σθ̂2 = E{(θ̂ − E{θ̂})2} # small
θ−3σ >99% θ+3σ

◦ In an ideal scenario, we would like to always be able to theoretically

analyse the problem to assess its goodness (bias and variance). This also
shows how performance depends on problem specification.
◦ Sometimes, we have to make use of simulations: i) to verify theoretical
analysis, ii) if theoretical results cannot be found.

c D. P. Mandic Statistical Signal Processing & Inference 20

An equivalent assessment via the estimation error
Since θ̂ is a RV, it has a PDF of its own (more in the next lecture on CRLB)

Given that θ̂ = g(x) then θ̂ = θ + η (η is the estimation error)

random variable (RV) ↑ not random variable ↑ ↑ random variable
Since θ̂ is a random variable (RV), the estimation error η is also a RV
η = θ̂ − θ =⇒ η = 0 indicates an unbiased estimator

p(^
θ) p(η )

R
0 θ ^θ 0 η

Quality of the estimator is completely described by the error PDF p(η)

We desire: 1) an unbiased estimator, that is, E{η} = 0
2) minimum variance, var(η) = E{(η − E{η})2} −→ small

c D. P. Mandic Statistical Signal Processing & Inference 21

Can we resort to (approximately) Gaussian distribution?
Yes, very often, if we re–cast our problem in an appropriate way
Top panel. Share prices, pn, of Apple (AAPL), General Electric (GE) and
Boeing (BA) and their histogram (right). Bottom panel. Logarithmic
returns for these assets, ln(pn/pn−1), that is, the log of price differences at
consecutive days (left) and the histogram of log returns (right).

Clearly, by a suitable data transformation, we may arrive at symmetric

distributions which are more amenable to analysis (bottom right).
c D. P. Mandic Statistical Signal Processing & Inference 22
Asymptotic unbiasedness
If the bias is zero, then for sufficiently many observations of x[n] (N large),
the expected value of the estimate, θ̂, is equal to its true value, that is
E{θ̂N } = θ ≡ B = E{θ̂N } − θ = 0
and the estimate is said to be unbiased.
If B 6= 0 then the estimator θ̂ = g(x) is said to be biased.
Example 5: Consider the sample mean estimator of a DC level, A, in
WGN, x[n] = A + w[n], w ∼ N (0, 1), given by
N −1
1 X
Â = x̄ = x[n] that is θ = A
N + 2 n=0
Is the above sample mean estimator of the true mean A biased?
Observe: This estimator is biased but the bias B → 0 when N → ∞
lim E{θ̂N } = θ
N →∞

Such as estimator is said to be asymptotically unbiased.

c D. P. Mandic Statistical Signal Processing & Inference 23

Example 6: Asymptotically unbiased estimator of DC
level in noise
Consider the measurements x[n] = A + w[n], w ∼ N (1, σ 2 = 1)

N −1
1 X
and the estimator Â = x[n]
N + 2 n=0

For “deterministic” noise

x[n] where w[n] ∈ {−0.2, 0.2}
1.2 1
A=1 Â1 = 1+2 1.2 = 0.4
1
Â2 = 2+2 (1.2 + 0.8) = 0.5
0.8 1
Â3 = 3+2 3.2 = 0.64
noise distribution
.. .. ..
for random noise 1
Â8 = 8+2 8 = 0.8
.. .. ..
1
Â100 = 100+2 100 = 0.98
0
1 2 3 4 5 n

c D. P. Mandic Statistical Signal Processing & Inference 24

How about the variance of the estimate θ̂
◦ It is desirable that an estimator be either unbiased or asymptotically
unbiased (think about the power of estimation error due to DC offset)
◦ For an estimate to be meaningful, it is necessary that we use the
available statistics effectively, that is,
var(θ̂) → 0 as N →∞
or in other words n o
2
lim var{θ̂N } = lim E θ̂N − E{θ̂N } =0
N →∞ N →∞

If θ̂N is unbiased then E{θ̂N } = θ, and from Tchebycheff inequality ∀ > 0

var{θ̂N }
P r{|θ̂N − θ| ≥ } ≤

R
2

If var{θ̂N } → 0 as N → ∞, then the probability that θ̂N differs by

more than from the true value will go to zero (showing consistency).
In this case, θ̂N is said to converge to θ with probability one. (see Appendix)

c D. P. Mandic Statistical Signal Processing & Inference 25

Mean square convergence
NB: Mean square error criterion is very different from the variance criterion

Another form of convergence, stronger than convergence with probability

one is the mean square convergence.
An estimate, θ̂N , is said to converge to θ in the mean–square sense, if

lim E{|θ̂N − θ|2} = 0

N →∞ | {z }
mean square error

◦ This is different from the previous slide, as θ is now assumed to be

known, in order to be able to measure the performance
◦ For an unbiased estimator, this is equivalent to the previous
condition that the variance of the estimate goes to zero
◦ An estimate is said to be consistent if it converges, in some sense, to

R the true value of the parameter, or more formally:

We say that the estimator is consistent if it is asymptotically unbiased
and has a variance that goes to zero as N → ∞
c D. P. Mandic Statistical Signal Processing & Inference 26
Example 7: Assessing the performance of the Sample
Mean as a statistical estimator
Consider the estimation of a DC level, A, in random noise, w[n], whereby
the measured signal, x[n], can be modelled as

x[n] = A + w[n]

where w[n] ∼ some zero-mean random i.i.d. process.

Goal: To estimate DC level, A, from the data {x[0], x[1], . . . , x[N − 1]}

◦ Intuitively, the sample mean is a reasonable estimator, and has the form

N −1
1 X
Â = x[n]
N n=0

Q1: How close will Â be to A?

Q2: Are there better estimators than the sample mean?

c D. P. Mandic Statistical Signal Processing & Inference 27

Example 7 (contd.): Mean and variance of the Sample
Mean estimator
Estimator = f ( random data ) # it is a random variable itself

=⇒ its performance must be judged statistically

(1) What is the mean of Â?
( N −1 ) N −1
n o 1 X 1 X
E Â = E x[n] = E {x[n]} = A # unbiased
N n=0 N n=0

(2) What is the variance of Â?

Assumption: The samples of w[n] are uncorrelated
( N −1 )
n o 2 1 X
var Â = E Â − E{Â} = var x[n]
| {z } N n=0
variability around the mean
N −1
1 X 1 2 σ2
= 2
var {x[n]} = 2 N σ = (as noise is white i.i.d.)
N n=0 N N
Since var{Â} → 0 as N → ∞ # consistent estimator (see P&A sets)

c D. P. Mandic Statistical Signal Processing & Inference 28

Some intricacies which are often not fully spelled–out
In our example, each random data sample has the same mean, namely A
probability theory ↑

R
and the mean, A, is exactly the quantity we are trying to estimate

1
PN −1
We are estimating A using the sample mean, Â = N n=0 x[n]
statistics ↑

◦ We desire to be always able to perform theoretical analysis to find the

bias and variance of the estimator (measure of its goodness)
theoretical results show how estimates depend on problem spec.
◦ Sometimes it is necessary to make use of simulations
to verify correctness of theoretical results
when we cannot find theoretical results (e.g. Monte Carlo
simulations, see Lecture 5)
when estimators have no optimality properties, but do work in practice

c D. P. Mandic Statistical Signal Processing & Inference 29

Minimum Variance Unbiased (MVU) estimation
Aim: To establish “good” estimators of unknown deterministic parameters
Unbiased estimator # “on the average” yields the true value of the
unknown parameter, independently of its particular value, i.e.
E(θ̂) = θ a<θ<b

where (a, b) denotes the range of possible values of θ

Example 8: Consider an unbiased estimator for a DC level in white
Gaussian noise (WGN), observed as
x[n] = A + w[n] n = 0, 1, . . . , N − 1

where A is the unknown, but deterministic, parameter to be estimated

which lies within the interval (−∞, ∞). Then, the sample mean can be
used as an estimator of A, namely
N −1
1 X
Â = x[n]
N n=0

c D. P. Mandic Statistical Signal Processing & Inference 30

Careful: The estimator is parameter dependent!
An estimator may be unbiased for certain values of the unknown
parameter but not for all values; such an estimator is biased
Example 9: Consider another sample mean estimator of a DC level:

ˆ 1
PN −1
Â = 2N n=0 x[n]
ˆ
n o
Therefore: E Â = 0 when A = 0 but

ˆ
n o
A
E Â = 2 when A 6= 0 (parameter dependent)
ˆ
Hence Â is not an unbiased estimator.
◦ A biased estimator introduces a “systemic error” which should not be
present if at all possible
◦ Our goal is to avoid bias if we can, as we are interested in stochastic
signal properties and bias is largely deterministic

c D. P. Mandic Statistical Signal Processing & Inference 31

Effects of averaging for real world data
Problem 3.4 from your P/A sets: heart rate estimation
The heart rate, h, of a patient is automatically
n o by a computer every 100 ms.
recorded
One second of the measurements ĥ1, ĥ2, . . . , ĥ10 are averaged to obtain ĥ. Given
n o
that E ĥi = αh for some constant α and var(ĥi) = 1 for all i, determine whether
averaging improves this estimator, for α = 1 and α = 1/2
Before .
Averaging After Averaging

α=1 α=1
10
1 X
ĥ = ĥi[n],

p(ĥi )
p(ĥi )
10 i=1
10
n o α X
E ĥ = h = αh
10 i=1 h ĥi h

For α = 1, the estimator is unbiased. For

α = 1/2 it will not be unbiased unless the 1 1
P10 α=
estimator is formed as ĥ = 51 i=1 ĥi[n]. p(ĥi ) 2 α=
2

p(ĥi )
10
n o 1 X n o
var ĥ = 2 var ĥi
L i=1
ĥi ĥ
h/2 h h/2 h

c D. P. Mandic Statistical Signal Processing & Inference 32

Remedy: How about averaging? Averaging data segments vs
averaging estimators? Also look in your CW Assignment dealing with PSD.

Several unbiased estimators of the same quantity may be averaged

together. For example, given the L independent estimates
n o
θ̂1, θ̂2, . . . , θ̂L

we may choose to average them, to yield

1
PL
θ̂ = L l=1 θ̂l

Our assumption was that the individual estimates, θ̂l = g(x), are unbiased,
with equal variances, and mutually uncorrelated.
Then (NB: averaging biased estimators will not remove the bias)
n o
E θ̂ = θ
and n o n o n o
1
PL 1
var θ̂ = L2 l=1 var θ̂ l = L var θ̂l

Note, as L → ∞, θ̂ → θ (consistent estimator)

c D. P. Mandic Statistical Signal Processing & Inference 33

Example 10: Effect of averaging in spectral estimation
Averaging power spectra of 50 independent realisations of a mixture of two
sinewaves in noise, x[n] = sin(0.4πn) + sin(0.45πn) + w[n]

c D. P. Mandic Statistical Signal Processing & Inference 34

R
Mean square error criterion & bias – variance dilemma
An optimality criterion is necessary to define an optimal estimator
One such natural criterion is the Mean Square Error (MSE), given by

M SE(θ̂) = E (θ̂ − θ)2 # E{error2} = error power
which measures the average mean squared deviation of the estimate, θ̂,
from the true value (error power).

2
2
M SE (θ̂) = E (θ̂ − θ) =E (θ̂ − E{θ̂}) + ( E{θ̂} − θ )
| {z }
= bias, B(θ̂)
n 2o
+ 2 B(θ̂) E θ̂ − E{θ̂} +B 2(θ̂)

= E θ̂ − E{θ̂}
| {z }
=0

= var(θ̂) + B 2(θ̂)

MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

c D. P. Mandic Statistical Signal Processing & Inference 35

Example 11: An MSE estimator with a ’gain factor’
(motivation for unbiased estimators)
Consider the following estimator for DC level in WGN
N −1
1 X
Â = a x[n]
N n=0

Task: Find the value of a which results in the minimum MSE.

Solution:
n o
E Â = aA and

a2 σ 2
var(Â) = N

so that we have
a2σ 2
M SE(Â) = + (a − 1)2A2
N
Of course, the choice a = 1 removes the mean and minimises the variance

c D. P. Mandic Statistical Signal Processing & Inference 36

Example 11 (continued): An MSE estimator with a ’gain’
(is a biased estimator feasible?)
Can we find an optimum a analytically? Differentiate wrt a to yield

∂M SA 2aσ 2
(Â) = + 2(a − 1)A2
∂a N
and set the result to zero arrive at the optimal value

A2
aopt = 2

R L
A2 + σN

R but we do not know the value of θ = A

Although MSE makes sense, estimates usually rely on the unknown θ
Without any constraints, this criterion leads to unrealisable estimators

R # those which are not solely a function of the data (see Example 6).
Practically, the minimum MSE (MMSE) estimator needs to be
abandoned, and the estimator must be constrained to be unbiased.

c D. P. Mandic Statistical Signal Processing & Inference 37

Minimum variance estimation & MSE criterion, together
Basic idea behind MVU: Out of all possible unbiased estimators, find the
one with the lowest variance.
If the Mean Square Error (MSE) is used as a criterion, this means that

B 2(θ̂)

R
M SE (θ̂) = var(θ̂) +
| {z }
=0 f or M V U
By constraining the bias to be zero, our task is much easier, that is, to find
an estimator that minimises the variance.
◦ In this way, the feasibility problem of MSE is completely avoided.
Therefore:
MVU estimator = Minimum mean square error unbiased estimator

R We will use the acronym MVUE for minimum variance unbiased estimator.
Course goal: To find optimal statistical estimators and inference
(see the Appendix for an alternative relation between the error function
and the quality (goodness) of an estimator)
c D. P. Mandic Statistical Signal Processing & Inference 38
Bias–variance illustration: ARIMA prediction of
COVID-19 death data
Consider the prediction of COVID-19 death rates in the UK.
AR(1) model, 10 days ahead ARIMA(7,1,1) 10 days ahead

◦ The AR(1) prediction exhibits bias, as the mean of the predicted data (in
red) is “off-set” from the mean of true data (in blue) for most of the plot
◦ The ARIMA(7,1,1) prediction coincides with the original data in terms of
the mean for the whole plot, but exhibits large variability (what do you prefer)

c D. P. Mandic Statistical Signal Processing & Inference 39

Desired: Minimum variance unbiased (MVU) estimator
Minimising the variance of an unbiased estimator concentrates the PDF of
the error about zero # estimation error is therefore less likely to be large
◦ Existence of the MVU estimator
var( ^θ ) var( ^θ )
^θ 1 ^θ 1

^θ
2
^θ
2
^θ
3 ^θ
3
^θ is a MVU estimator no MVU estimator
3

θ θ

The MVU estimator is an unbiased estimator with minimum

variance for all θ, that is, θ̂3 on the plot above

c D. P. Mandic Statistical Signal Processing & Inference 40

Extensions to the vector parameter case
T
◦ If θ = [θ1, θ2, . . . , θp] ∈ Rp×1 is a vector of unknown parameters,
then a vector parameter estimator is said to be unbiased if
E(θ̂i) = θi that is, every θi is unbiased, for i = 1, 2, . . . , p
By defining  
E(θ1)
 E(θ2) 
E(θ) =   .. 

E(θp)
an unbiased vector parameter estimator has the property
E(θ̂) = θ

within the p–dimensional space of parameters spanned by

θ = [θ1, . . . , θp]T .

◦ An MVU estimator has the additional property that its var(θˆi), for
i = 1, 2, . . . , p, is the minimum among all unbiased estimators.
c D. P. Mandic Statistical Signal Processing & Inference 41
Multivariate inference often helps (see also Lecture 2)
For a rigorous account of multivariate inference, see Lecture 4

Apple stock prediction using a vector autoregressive VAR(5) model (Apple

as one variate and 4 other stocks from S&P 500 as other variates)
c D. P. Mandic Statistical Signal Processing & Inference 42
Methods to find the MVU estimator
The MVU estimator may not always exist, for example, when:
◦ There are no unbiased estimators a search for the MVU is futile
◦ None of the unbiased estimators has uniformly minimum variance, as on
the right hand side of the figure on the Slide 40
If the MVU estimator (MVUE) exists, we may not always be able to find
it. While there is no general “turn-the-crank” method for this purpose,
the approaches to finding the MVUE employ the following procedures:
◦ Determine the Cramer-Rao lower bound (CRLB) and find some
estimator which satisfies the so defined MVU criteria (Lecture 4)
◦ Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem (rare in pract.)
◦ Restrict the class of estimators to be not only unbiased, but also linear
in the parameters, this gives MVU for linear problems (Lecture 5)
◦ Employ optimisation and prior knowledge about the model (Lecture 6)
◦ Drop all assumptions, employ real–time adaptive estimation schemes
and perform on-line inference on streaming data (Lecture 7)
c D. P. Mandic Statistical Signal Processing & Inference 43
Summary
◦ We are now equipped with performance metrics for assessing the
goodness of any estimator (bias, variance, MSE).
◦ Since MSE = var + bias2, some biased estimators may yield low MSE.
However, we prefer minimum variance unbiased (MVU) estimators.
◦ Even a simple Sample Mean estimator is an example of the power of
statistical estimators.
◦ The knowledge of the parametrised PDF p(data;parameters) is very
important for designing efficient estimators.
◦ We have introduced statistical “point estimators”, would it be useful to
also know the “confidence” we have in our point estimate? (Bayesian est.)
◦ In many disciplines it is useful to design so called “set membership
estimates”, where the output of an estimator belongs to a pre-definined
bound (range) of values.
◦ In our SSPI course, we will address linear, best linear unbiased, maximum
likelihood, least squares, sequential least squares, and adaptive estimators.

c D. P. Mandic Statistical Signal Processing & Inference 44

Homework: Check another proof for the MSE expression
MSE(θ̂) = var(θ̂) + bias2(θ)
2
2
Note : var(x) = E[x ] − E[x] (∗)
Idea : Let x = θ̂ − θ →
substitute into (∗)
2
2
to give var(θ̂ − θ) = E[(θ̂ − θ) ] − E[θ̂ − θ] (∗∗)
| {z } | {z } | {z }
term (1) term (2) term (3)

Let us now evaluate these terms:

(1) var(θ̂ − θ) = var(θ̂)
(2) E[θ̂ − θ]2 = MSE
2 2 2
E[θ̂ − θ] = E[θ̂] − E[θ] = E[θ̂ − θ] = bias2(θ̂)

(3)

Substitute (1), (2), (3) into (**) to give

var(θ̂) = MSE − bias2 ⇒ MSE = var(θ̂) + bias2(θ̂)

c D. P. Mandic Statistical Signal Processing & Inference 45

Recap: Unbiased estimators
Due to the linearity property of the statistical expectation operator, E {·},
that is

E{a + b} = E{a} + E{b}

the sample mean estimator can be shown to be unbiased, i.e.

n o PN −1 PN −1
1 1
E Â = N n=0 E {x[n]} = N n=0 A = A

◦ In some applications, the value of A may be constrained to be positive.

For example, the value of an electronic component such as an inductor,
capacitor or resistor would be positive (prior knowledge).
◦ For N data points in i.i.d. random noise, unbiased estimators generally
have symmetric PDFs centred about their true value, that is

Â ∼ N (A, σ 2/N )

c D. P. Mandic Statistical Signal Processing & Inference 46

Appendix: Some usual assumptions in the analysis
How realistic are the assumptions on the noise?
◦ Whiteness of the noise is quite realistic to assume, unless the evidence
or physical insight suggest otherwise
◦ The independent identically distributed (i.i.d.) assumption is
straightforward to implement through e.g. the weighting matrix
W = diag(1/σ02, . . . , 1/σN
2
−1 ) (see Lectures 5 and 6)
◦ In real world scenarios, we often deal with e.g. bandpass or correlated
noise (e.g. so called pink or 1/f noise in physiological recordings)
◦ The assumption of Gaussianity is often realistic to keep, due to e.g. the
validity of Central Limit Theorem, or an appropriate data transformation
Is the zero–mean assumption realistic? Yes, as even for non–zero mean
noise, w[n] = wzm[n] + µ, where wzm[n] is zero–mean noise, the mean of
the noise µ can be incorporated into the signal model.
Do we always need to know noise variance? In principle not, but when
assessing performance (goodness), variance is needed to measure the SNR.

c D. P. Mandic Statistical Signal Processing & Inference 47

Appendix. Example 12: A counter-example # a little
bias can help (but the estimator is difficult to control)
Q: Let {y[n]}, n = 1, . . . , N be iid Gaussian variables ∼ N (0, σ 2).
Consider the following estimate of σ 2 (The Stoica-Moses counter-example)
N
2 αX 2
σ̂ = y [n] α > 0
N n=1
Find α which minimises the MSE of σ̂ 2.
2 2
A: It is straightforward to show that E{σ } = ασ and
2 2 2 2 4 4
M SE(σ̂ ) = E{ σ̂ − σ } = E{σ̂ } + σ (1 − 2α)
N N
α2 X X 2 2 4 2
= E{y [n]y [s]} + σ (1 − 2α) Hint : Σn = ΣnΣs
N 2 n=1 s=1

α2 2 4 4

4 4
h
4 2 i
= N σ + 2N σ + σ (1 − 2α) = σ α (1 + ) + (1 − 2α)
N2 N
N 2 2σ 4
The MMSE is obtained for αmin = and is M M SE(σ̂ ) =
N +2 N +2 .
Given that the corresponding σ̂ 2 of an optimal unbiased estimator
(CRLB, later) is 2σ 4/N , this is an example of a biased estimator
which obtains a lower MSE than the CRLB.

c D. P. Mandic Statistical Signal Processing & Inference 48

Appendix (full analysis of Example 4)
Biased estimator:

N
1 X
Ã = x[n]
N n>1

Therefore,
◦ if A ≥ 0, then x[n] = x[n], and E{Ã} = A
◦ if A < 0, then E{Ã} =
6 A

= 0, A ≥ 0
⇒ Bias =
6= 0, A < 0

c D. P. Mandic Statistical Signal Processing & Inference 49

Appendix: Some Tips and Tricks
N −1 2
Le us show that the unbiased estimator of the variance is N σ

Assume that the mean of a random variable x is µ = 0. Then, by def.

N −1 N −1
1 X 1 X
µ̂ = x[n] σ̂ 2 = (x[n] − µ̂)2
N n=0 N n=0

Upon applying the statistical expectation operator, E{·}, we have

N −1 N −1 h
2 1 X 2 1 X
2 2
i
E{σ̂ } = E (x[n] − µ̂) = E{x [n]} − 2E{µ̂x[n]} + E{µ̂ }
N n=0 | {z } N n=0
σ2
N −1 −1
n 1 NX −1
n 1 NX 2 o i
1 Xh 2 o
= σ − 2E x[j]x[n] + E x[n]
N n=0 N j=0 N n=0
| {z } | {z }
2 E{x x +···+x x +···+x 2σ2 = 12
PN −1 PN −1
E{x x }= N σ2= σ2
=N 0 n n n N −1 xn }= N N n=0 j=0 n j N2 N

N −1
2 1 X 2 2 2 1 2 N − 1 2
⇒ E{σ̂ } = σ − σ + σ = σ
N n=0 N N N

c D. P. Mandic Statistical Signal Processing & Inference 50

Appendix: Tschebycheff (or Chebyshev) inequality, Slide 25
We do not need full knowledge of pdf, just knowledge of the mean and variance

It provides an upper bound to the probability of the absolute deviation of a

random variable (RV) exceeding a given threshold (probability of rare
events). One way to prove it is through the Markov inequality.
Markov inequality: For a positive RV, X, and a positive threshold, , the
following holds:
E{X} µ
P (X ≥ ) ≤ =

For a positive X, this property holds almost surely (with probability one).
Example 13: An average salary in a company is 40,000 GBP. What is the
probability of a given person having salary greater than 100,000 GBP?
Answer: Markov’s inequality gives the upper bound on this probability
40, 000 1

R
P (X ≥ 100, 000) ≤ =
100, 000 2.5
Markov’s inequality can be used to prove that mean square
convergence implies convergence in probability, and also to prove
Chebyshev’s inequality. (see Lecture 1 for more detail)

c D. P. Mandic Statistical Signal Processing & Inference 51

Appendix: Tschebycheff (or Chebyshev) inequality, Slide 25
We do not need full knowledge of pdf, just knowledge of the mean and variance

Chebyshev’s inequality: Consider a random variable, X, with a finite

mean µ and a finite variance σ 2, and a positive number . Then
σ2
P (|X − µ| ≥ ) ≤ 2

To arrive to the Chebyshev inequality on Slide 24, substitute
X → θ̂, µ → θ, σ 2 → var{θ̂N } into the Markov inequality. The proof
follows immediately.
Example 14: An average salary in a company is 40,000 GBP, with a stand.
dev. of 20,000 GBP. What is the probability of a given person having
salary which is either less than 10,000 GBP or greater than 70,000 GBP?
Answer: This probability cannot be computed exactly, however,
Chebyshev’s inequality will give an upper bound to this probability.
We are looking for the bound on |X − µ| ≥ , with µ = 40, 000 and
= 30, 000. The probability of this happening is then
σ 2 400, 000, 000 4
P (|X − µ| ≥ ) ≤ 2 = =
900, 000, 000 9

For more details, examples, and proofs, see Lecture 1.

c D. P. Mandic Statistical Signal Processing & Inference 52
Appendix: A note on generating correlated noise
Let η ∈ N (0, 1) and let r be the desired correlation coefficient. Then, the
correlated noise, w[n], with the correlation coeff., r, can be generated as
p
w[n + 1] = r × w[n] + 1 − r2 × η[n + 1] with w[0] = η[0]
(the correlated noise, w[n], is Gaussian as it is a sum of Gaussians)
Assume now the autocorrelation function of the form r(k) = σ 2e−|k|/τ ,
which is exponentially decaying and is governed by τ .
Then, the corresponding power spectral density PSDpink noise (f ) ∝ f1α
For α = 1, we refer to such signal as pink noise, or ”1/f noise”.
Exponentially Correlated Gaussian Samples Autocorrelation Function (ACF) 101
Power Spectral Density (PSD)
3 1.0
Pink Noise
Autocorrelation

2 0.8 White Noise

Amplitude

1 0.6

Power
100
0 0.4
1 0.2
2 White Noise 0.0 10 1 Pink Noise
3 Pink Noise 0.2 White Noise
0 200 400 600 800 1000 0 20 40 60 80 100 0.0 0.1 0.2 0.3 0.4 0.5
Sample Index Lag Frequency
c D. P. Mandic Statistical Signal Processing & Inference 53
Notes

c D. P. Mandic Statistical Signal Processing & Inference 54

Notes

c D. P. Mandic Statistical Signal Processing & Inference 55

Notes

c D. P. Mandic Statistical Signal Processing & Inference 56

Lessons in Estimation Theory For PDF
100% (2)
Lessons in Estimation Theory For PDF
570 pages
EECE 522 Notes - 02 CH - 1 Intro To Estimation - 2
No ratings yet
EECE 522 Notes - 02 CH - 1 Intro To Estimation - 2
15 pages
Good Lecture For Est Theory
No ratings yet
Good Lecture For Est Theory
99 pages
Estimation and Detection: Lecture 1: Introduction and Minimum Variance Unbiased Estimators
No ratings yet
Estimation and Detection: Lecture 1: Introduction and Minimum Variance Unbiased Estimators
17 pages
SummerSchool Pascal
No ratings yet
SummerSchool Pascal
108 pages
Intro Aml FP
No ratings yet
Intro Aml FP
92 pages
Detection Estimation
No ratings yet
Detection Estimation
25 pages
Estimation Theory Overview
100% (1)
Estimation Theory Overview
17 pages
Unit 2
No ratings yet
Unit 2
63 pages
Matlab Codes and Report
No ratings yet
Matlab Codes and Report
42 pages
Estimation Theory Lec 1 - InTRODUCTION
No ratings yet
Estimation Theory Lec 1 - InTRODUCTION
21 pages
Parameter Estimation
No ratings yet
Parameter Estimation
44 pages
Solution Manual For of Concepts and Applications of Finite Element
100% (1)
Solution Manual For of Concepts and Applications of Finite Element
522 pages
User's Guide: Group 5 Controller
No ratings yet
User's Guide: Group 5 Controller
36 pages
Statistical Signal Processing
No ratings yet
Statistical Signal Processing
21 pages
CH 5
No ratings yet
CH 5
24 pages
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay
100% (1)
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay
303 pages
Estimation
No ratings yet
Estimation
39 pages
SSPI Lecture 4 MVU Estimation 2025
No ratings yet
SSPI Lecture 4 MVU Estimation 2025
64 pages
SSP Estimation
No ratings yet
SSP Estimation
303 pages
SSPI Lecture 1 Slides 2025
No ratings yet
SSPI Lecture 1 Slides 2025
68 pages
Estimation Theory
No ratings yet
Estimation Theory
603 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
18.650 - Fundamentals of Statistics
No ratings yet
18.650 - Fundamentals of Statistics
61 pages
Estimation and Detection: Lecture 6: The Bayesian Philosophy
No ratings yet
Estimation and Detection: Lecture 6: The Bayesian Philosophy
19 pages
Lessons in Digital Estimation Theory
100% (1)
Lessons in Digital Estimation Theory
161 pages
Project Report
No ratings yet
Project Report
56 pages
SSP Exam WS1920
No ratings yet
SSP Exam WS1920
5 pages
Signal Processing
No ratings yet
Signal Processing
5 pages
Module02 Slides Print 1
No ratings yet
Module02 Slides Print 1
65 pages
Estimation Theory Eng
No ratings yet
Estimation Theory Eng
40 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
16 pages
Lecture 4
No ratings yet
Lecture 4
8 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Detection & Estimation chp-4
No ratings yet
Detection & Estimation chp-4
4 pages
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
No ratings yet
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
25 pages
LN Estimation Theory
No ratings yet
LN Estimation Theory
11 pages
Practice Problem Set 2
No ratings yet
Practice Problem Set 2
1 page
Fall 2023 - CS302P - 1
No ratings yet
Fall 2023 - CS302P - 1
2 pages
Toward General Solutions To Time-Series Problems: Notes On Obstacles and Noise
No ratings yet
Toward General Solutions To Time-Series Problems: Notes On Obstacles and Noise
14 pages
How To Run Celeb On Twitter
No ratings yet
How To Run Celeb On Twitter
7 pages
ECE 531: Detection and Estimation Theory: Natasha Devroye Devroye@ece - Uic.edu Spring 2011
No ratings yet
ECE 531: Detection and Estimation Theory: Natasha Devroye Devroye@ece - Uic.edu Spring 2011
15 pages
Estimation
No ratings yet
Estimation
6 pages
Estimation Theory Presentation
100% (2)
Estimation Theory Presentation
66 pages
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
No ratings yet
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
14 pages
Randomization in Matlab
No ratings yet
Randomization in Matlab
30 pages
Bài Tập Ước Lượng C12346
No ratings yet
Bài Tập Ước Lượng C12346
55 pages
Advanced Signal Processing Introduction To Estimation Theory
No ratings yet
Advanced Signal Processing Introduction To Estimation Theory
40 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Exam Et4386 Estimation and Detection: January 21st, 2016
No ratings yet
Exam Et4386 Estimation and Detection: January 21st, 2016
5 pages
Unesco - Eolss Sample Chapters: Statistical Parameter Estimation
No ratings yet
Unesco - Eolss Sample Chapters: Statistical Parameter Estimation
9 pages
Fundamentals of Statistical Signal Processing Estimation 3001q9c4fj
No ratings yet
Fundamentals of Statistical Signal Processing Estimation 3001q9c4fj
5 pages
Week (Multivariable Functions)
100% (1)
Week (Multivariable Functions)
19 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
5 pages
Analog PPT 2
No ratings yet
Analog PPT 2
86 pages
Electromagnetic Clutches
100% (1)
Electromagnetic Clutches
12 pages
Estimation Theory
No ratings yet
Estimation Theory
40 pages
Powerex Air Compressor Service and Maintenance Manual
No ratings yet
Powerex Air Compressor Service and Maintenance Manual
12 pages
6G Spectrum - Analyzer Device User Manual
No ratings yet
6G Spectrum - Analyzer Device User Manual
23 pages
Deltapilot S Db50
No ratings yet
Deltapilot S Db50
60 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Omron Servo 1S Startup Guide
No ratings yet
Omron Servo 1S Startup Guide
59 pages
Vlsi Module-3
No ratings yet
Vlsi Module-3
129 pages
Lecture 1: Statistical Signal Processing
No ratings yet
Lecture 1: Statistical Signal Processing
3 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
Smart Port: Design and Perspectives
No ratings yet
Smart Port: Design and Perspectives
6 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
PW2 - Type of Fiber and Stripping Process SESI 1 2022 - 2023
No ratings yet
PW2 - Type of Fiber and Stripping Process SESI 1 2022 - 2023
12 pages
IT Reviewer
No ratings yet
IT Reviewer
13 pages
HUAWEI FLA-LX3 9.1.0.116 (C605E5R1P1) Release Notes
No ratings yet
HUAWEI FLA-LX3 9.1.0.116 (C605E5R1P1) Release Notes
10 pages
Definition and Evolution of Marketing Management
No ratings yet
Definition and Evolution of Marketing Management
13 pages
14 NLP
No ratings yet
14 NLP
20 pages
Stahl Control 6 K
No ratings yet
Stahl Control 6 K
12 pages
Sheeting Accessories
No ratings yet
Sheeting Accessories
6 pages
WP99-UPC RI - Expense Claim Form - Rediansyah - Maret 2024-3
No ratings yet
WP99-UPC RI - Expense Claim Form - Rediansyah - Maret 2024-3
11 pages
ABAP Proxies
No ratings yet
ABAP Proxies
50 pages
ImageFlow 1
No ratings yet
ImageFlow 1
9 pages
Conference
No ratings yet
Conference
3 pages
Microland Limited
No ratings yet
Microland Limited
3 pages
Learning Episode 11 Updated
No ratings yet
Learning Episode 11 Updated
7 pages
Introduction of Statistical Signal Processing: Random Signals
No ratings yet
Introduction of Statistical Signal Processing: Random Signals
7 pages
FD Pro 8.1 Admin Guide
No ratings yet
FD Pro 8.1 Admin Guide
22 pages
Learnovative Mock Test Access Step by Step Procedure
No ratings yet
Learnovative Mock Test Access Step by Step Procedure
7 pages
Ramesh 02 Mar 2025
No ratings yet
Ramesh 02 Mar 2025
1 page
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

SSPI Lecture 3 Estimation Intro 2025

Uploaded by

SSPI Lecture 3 Estimation Intro 2025

Uploaded by

Statistical Signal Processing & Inference

Introduction to Estimation Theory

Department of Electrical and Electronic Engineering

c D. P. Mandic Statistical Signal Processing & Inference 1

Original and estimated signals AR coefficients

R Can we consider this within a bigger ”estimation theory” framework?

c D. P. Mandic Statistical Signal Processing & Inference 3

◦ The LS estimator of a ’noisy line’ x[n] = s[n; A, B] + w[n] is very

c D. P. Mandic Statistical Signal Processing & Inference 4

c D. P. Mandic Statistical Signal Processing & Inference 5

Consider e.g. the estimation of

There are essentially two alternatives to estimate the unknown θ

Mathematical statement of the general estimation problem:

So, it is p(x ; θ) that contains all the information needed

R through numerical characterisation.

is called a Gaussian RV. x= µ X

c D. P. Mandic Statistical Signal Processing & Inference 9

Conditional pdf p(x|y)

c D. P. Mandic Statistical Signal Processing & Inference 10

The parametrised p(x; θ) should be

The“parametrised” pdf, p(x[0]; θ) = p(x[0]; A), is obviously Gaussian with

c D. P. Mandic Statistical Signal Processing & Inference 12

An estimator is a rule, g(x), that assigns a value to the parameter θ from

R for a given realisation of x = [x[0], . . . , x[N − 1]]T in the form θ̂ = g(x).

x[0] = A + w[0], w[0] ∼ N (0, σ 2)

◦ The mean of p(Â) measures the centroid

c D. P. Mandic Statistical Signal Processing & Inference 13

In finance, the risk of investment is minimised through “diversification”,

PDF concentration ↑ # accuraccy ↑

◦ The raw data have asymmetric distribution (non–Gaussian)

c D. P. Mandic Statistical Signal Processing & Inference 16

x[n] x[n] = A + Bn + w[n]

c D. P. Mandic Statistical Signal Processing & Inference 17

c D. P. Mandic Statistical Signal Processing & Inference 18

◦ In an ideal scenario, we would like to always be able to theoretically

c D. P. Mandic Statistical Signal Processing & Inference 20

Given that θ̂ = g(x) then θ̂ = θ + η (η is the estimation error)

Quality of the estimator is completely described by the error PDF p(η)

c D. P. Mandic Statistical Signal Processing & Inference 21

Clearly, by a suitable data transformation, we may arrive at symmetric

Such as estimator is said to be asymptotically unbiased.

c D. P. Mandic Statistical Signal Processing & Inference 23

For “deterministic” noise

c D. P. Mandic Statistical Signal Processing & Inference 24

If θ̂N is unbiased then E{θ̂N } = θ, and from Tchebycheff inequality ∀  > 0

If var{θ̂N } → 0 as N → ∞, then the probability that θ̂N differs by

c D. P. Mandic Statistical Signal Processing & Inference 25

Another form of convergence, stronger than convergence with probability

lim E{|θ̂N − θ|2} = 0

◦ This is different from the previous slide, as θ is now assumed to be

R the true value of the parameter, or more formally:

where w[n] ∼ some zero-mean random i.i.d. process.

Q1: How close will Â be to A?

Q2: Are there better estimators than the sample mean?

c D. P. Mandic Statistical Signal Processing & Inference 27

=⇒ its performance must be judged statistically

(2) What is the variance of Â?

c D. P. Mandic Statistical Signal Processing & Inference 28

◦ We desire to be always able to perform theoretical analysis to find the

c D. P. Mandic Statistical Signal Processing & Inference 29

where (a, b) denotes the range of possible values of θ

where A is the unknown, but deterministic, parameter to be estimated

c D. P. Mandic Statistical Signal Processing & Inference 30

c D. P. Mandic Statistical Signal Processing & Inference 31

For α = 1, the estimator is unbiased. For

c D. P. Mandic Statistical Signal Processing & Inference 32

Several unbiased estimators of the same quantity may be averaged

we may choose to average them, to yield

Note, as L → ∞, θ̂ → θ (consistent estimator)

c D. P. Mandic Statistical Signal Processing & Inference 33

c D. P. Mandic Statistical Signal Processing & Inference 34

MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

c D. P. Mandic Statistical Signal Processing & Inference 35

Task: Find the value of a which results in the minimum MSE.

c D. P. Mandic Statistical Signal Processing & Inference 36

R but we do not know the value of θ = A

If θ̂N is unbiased then E{θ̂N } = θ, and from Tchebycheff inequality ∀ > 0