0% found this document useful (0 votes)
54 views14 pages

Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n

The document discusses various statistical techniques for estimation and prediction, including: - Maximum likelihood estimation, which selects parameters that make observed data most probable. - Efficiency of estimators, which aim for zero bias and minimum error. Optimal estimators are unbiased, efficient, and consistent. - Least mean square error criterion, which minimizes the mean squared error between estimates and true values. - Wiener filters, which provide optimal linear estimates of a desired signal based on minimizing the mean square error. - The Kalman filter, which provides recursive state estimates by minimizing the mean square error using a predictor-corrector type of mechanism.

Uploaded by

S.Durga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views14 pages

Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n

The document discusses various statistical techniques for estimation and prediction, including: - Maximum likelihood estimation, which selects parameters that make observed data most probable. - Efficiency of estimators, which aim for zero bias and minimum error. Optimal estimators are unbiased, efficient, and consistent. - Least mean square error criterion, which minimizes the mean squared error between estimates and true values. - Wiener filters, which provide optimal linear estimates of a desired signal based on minimizing the mean square error. - The Kalman filter, which provides recursive state estimates by minimizing the mean square error using a predictor-corrector type of mechanism.

Uploaded by

S.Durga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT 3 – ESTIMATION AND PREDICTION

MAXIMUM LIKELIHOOD ESTIMATION

X1, X2, X3, . . . Xn have joint density denoted fθ(x1, x2, . . . , xn) = f(x1, x2, . . . , xn|θ).
Given observed values X1 = x1, X2 = x2, . . . , Xn = xn, the likelihood of θ is the function lik(θ) =
f(x1, x2, . . . , xn|θ) considered as a function of θ. If the distribution is discrete, f will be the
frequency distribution function. In words: lik(θ)=probability of observing the given data as a
function of θ.

Definition: The maximum likelihood estimate (mle) of θ is that value of θ that maximises
lik(θ): it is the value that makes the observed data the “most probable”. If the Xi are iid, then the
likelihood simplifies to

Rather than maximising this product which can be quite tedious, we often use the fact
that the logarithm is an increasing function so it will be equivalent to maximise the log
likelihood:

The maximum-likelihood (ML) estimate θ^ML is obtained as the parameter vector that
maximises the likelihood function fY|Θ(y|θ). The ML estimator corresponds to a Bayesian
estimator with a notch-shaped cost function and a uniform parameter prior pdf:

where the prior function fΘ (θ) = const.


From a Bayesian perspective the main difference between the ML and MAP estimators is that
the ML estimator assumes that the prior pdf of θ is uniform. Note that a uniform prior, in
addition to modelling genuinely uniform pdfs, is also used when the parameter prior pdf is
unknown, or when the parameter is an unknown constant. Minimisation of the risk function is
achieved by maximization of the likelihood function

In practice it is convenient to maximise the log-likelihood function instead of the likelihood:


The log-likelihood is usually chosen in practice because of the following properties:
1. The logarithm is a monotonic function, and hence the log-likelihood has the same turning
points as the likelihood function.
2. The joint log-likelihood of a set of independent variables is the sum of the log-likelihood of
individual variables.
3. Unlike the likelihood function, the log-likelihood has a dynamic range that does not cause
computational underflow.

EFFICIENCY OF AN ESTIMATOR

In estimation of a parameter vector θ from N observation samples y, a set of performance


measures is used to quantify and compare the characteristics of different estimators. In general
an estimate of a parameter vector is a function of the observation vector y, the length of the
observation N and the process model M. This dependence may be expressed as

Different parameter estimators produce different results depending on the estimation method and
utilisa-tion of the observation and the influence of the prior information. Due to randomness of
the observations, even the same estimator would produce different results with different
observations from the same pro-cess. Therefore an estimate is itself a random variable, it has a
mean and a variance, and it may be described by a probability density function. However, for
most cases, it is sufficient to characterise an estimator

in terms of the mean and the variance of the estimation error. The most commonly used
performance measures for an estimator are the following:
1. Expected value of estimate:
2. Bias of the estimate:

3. Covariance of the estimate:

Optimal estimators aim for zero bias and minimum estimation error covariance. The desirable
properties of an estimator can be listed as follows:
1. Unbiased estimator: an estimator of θ is unbiased if the expectation of the estimate is
equal to the true parameter value:
An estimator is asymptotically unbiased if for increasing length of observations N we have

(2) Efficient estimator: an unbiased estimator of θ is an efficient estimator if it has the smallest
covariance matrix compared with all other unbiased estimates of θ۸ :

where θ۸ is any other estimate of θ.


(3) Consistent estimator: an estimator is consistent if the estimate improves with the increasing
length of the observation N , such that the estimate θ۸ converges probabilistically to the true
value θ as N becomes infinitely large:

where ε is arbitrary small.

LEAST MEAN SQUARE ERROR CRITERION

The least squares solution, for input matrix x and output vector y is

The FIR least mean squares filter is related to the Wiener filter, but minimizing the error
criterion of the former does not rely on cross-correlations or auto-correlations. Its solution
converges to the Wiener filter solution. Most linear adaptive filtering problems can be
formulated using the block diagram above. That is, an unknown system h(n) is to be identified
and the adaptive filter attempts to adapt the filter h^(n) to make it as close as possible to h(n),
while using only observable signals x(n), d(n) and e(n); but y(n), v(n) and h(n) are not directly
observable. Its solution is closely related to the Wiener filter.

WEINER FILTERS
For the FIR filter structure, the coefficient values in W(n) that minimize J MSE(n)
are well-defined if the statistics of the input and desired response signals are known. The
formulation of this problem for continuous-time signals and the resulting solution was first
derived by Wiener. Hence, this optimum coefficient vector WMSE(n) is often called the Wiener
solution to the adaptive filtering problem. The extension of Wiener‟s analysis to the discrete-time
case is attributed to Levinson. To determine WMSE(n), we note that the function JMSE(n) in is
quadratic in the parameters {wi(n)}, and the function is also differentiable. Thus, we can use a
result from optimization theory that states that the derivatives of a smooth cost function with
respect to each of the parameters is zero at a minimizing point on the cost function error surface.
Thus, WMSE(n) can be found from the solution to the system of equations

Taking derivatives of JMSE(n) and noting that e(n) and y(n) are given, we obtain

where we have used the definitions of e(n) and of y(n)for the FIR filter structure to expand the
last result. By defining the matrix RXX(n) and vector PdX(n) as
RXX = E{X(n)XT(n)} and PdX(n) = E{d(n)X(n)}
We can combine both the equations to obtain the system of equations in vector form as
RXX(n)WMSE(n) − PdX(n) = 0 , where 0 is the zero vector.
Thus, so long as the matrix R XX(n) is invertible, the optimum Wiener solution vector for
this problem is WMSE(n) = R−1 XX (n)PdX(n)

IIR WIENER FILTER (NONCAUSAL)

The estimator xˆ[n] is given by

For LMMSE, the error is orthogonal to data.


This form Wiener Hopf Equation is simple to analyse.
Easily solved in frequency domain.
Not realizable in real time

KALMAN FILTER
The basic mechanism in Kalman filter is to estimate the signal recursively by the following
relation
xˆ[n] = A xˆ[n-1] + Kny[n]
The whole of Kalman filter is also based on the innovation representation of the signal.
The simplest Kalman filter uses the first-order AR signal model
x[n] = ax[n-1]+ w[n]
where w[n] is a white noise sequence.
The observed data is given by
y[n] = x[n ] + v[n]
where v[n] is another white noise sequence independent of the signal.
The general stationary signal is modeled by a difference equation representing the ARMA (p,q)
model. Such a signal can be modeled by the state-space model and is given by
x [n] = A x[n -1] + Bw[n]
The observations can be represented as a linear combination of the „states‟ and the observation
noise.
y[n] = c′ x [n] + v[n]
Both the Equations have direct relation with the state space model in the control system where
you have to estimate the „unobservable‟ states of the system through an observer that performs
well against noise.
KALMAN FILTER ALGORITHM

INVERSE FILTER AND WHITENING FILTER

Inverse
LINEAR PREDICTION

In the filtering problem, we use the L-most recent samples x(n), x(n − 1), . . . , x(n − L + 1) and
estimate the value of the reference signal at time n. The idea behind a forward linear prediction is
to use a certain set of samples x(n −1), x(n −2), . . . to estimate (with a linear combination) the
value x(n +k) for k ≥ 0. On the other hand, in a backward linear prediction (also known
assmoothing) the set of samples x(n), x(n − 1), . . . , x(n − M + 1) is used to linearly estimate the
value x(n − k) for k ≥ M.

FORWARD LINEAR PREDICTION


Firstly, we explore the forward prediction case of estimating x(n) based on the previous L
T
samples. Since x(n − 1) = [x(n − 1), x(n − 2), . . . , x(n − L)] , using a transversal filter w the
forward linear prediction error can be put as

To find the optimum forward filter w f,L = [wf,1,wf,2,...,wf,L ] T we minimize the MSE. The input
correlation matrix would be

where rx (k) is the autocorrelation function for lag k of the WSS input process.
As for the cross correlation vector, the desired signal would be x(n), so
rf = E [x(n − 1)x(n)] = [rx(1),rx(2), . ………….. ,rx(L)] T.
As wf,L will be the Wiener filter, it satisfies the modified Wiener–Hopf equation
Rxwf,L = rf
In addition, the forward prediction error power
Pf,L = rx (0) − rTf wf,L
Putting the equations together into the augmented Wiener–Hopf equation as:
where aL = [1 − wT f,L]
T
. In fact the block matrix on the left hand side is the autocorrelation
T
matrix of the (L+1)×1 input vector [x(n), x(n−1), . . . , x(n−L)] . When this vector passes
through the filter aL it produces the forward linear prediction error as its output. For this reason,
aL is known as the forward prediction error filter. Now, in order to estimate x(n) we might use
only the (L −i)-most recent samples, leading to a prediction error

But the orthogonality principle tells us that when using the optimum forward filter
E[ef,L(n)x(n − 1)] = 0L×1
Then, we can see that for 1 ≤ i ≤ L.

Therefore, we see that as L→∞, E[ef(n) ef(n−i)] = 0, which means that which means that the
sequence of forward errors ef(n) is asymptotically white. This means that a sufficiently long
forward prediction error filter is capable of whitening a stationary discrete-time stochastic
process applied to its input.
BACKWARD LINEAR PREDICTION
In this case we start by trying to estimate x(n − L) based on the next L samples, so the
backward linear prediction error can be put as

To find the optimum backward filter wb,L = [wb,1,wb,2,...,wb,L]T we minimize the MSE. Following
a similar procedure as before to solve the Wiener filter, the augmented Wiener–Hopf equation
has the form

where rb = E [x(n)x(n − L)] = [rx(L), rx(L − 1), . . . ,rx(1)]T


Pb,L = rx(0) − rTb wb,L
bL = [−wTb,L 1]T is the backward prediction error filter.
Consider now a stack of backward prediction error filters from order 0 to L. If we compute the
errors eb,i(n) for 0 ≤ i ≤ L, it leads to
The L×L matrix Tb, which is defined in terms of the backward prediction error filter coefficients,
is lower triangular with 1‟s along its main diagonal. The transformation is known as Gram–
Schmidt orthogonalization, which defines a one-to-one correspondence between eb(n) and x(n).
In this case, the principle of orthogonality states that The Gram–Schmidt process is also used for
the orthogonalization of a set of linearly independent vectors in a linear space with a defined
inner product.
E[eb,i(n)x(n − k)] = 0, 0 ≤ k ≤ i – 1
Then, it is easy to show that, at each time n, the sequence of backward prediction errors of
increasing order{eb,i(n)} will be decorrelated. This means that the autocorrelation matrix of the
backward prediction errors is diagonal. More precisely,
E[eb(n)eTb(n)] = diag{Pb,i} 0≤i≤L–1
Another way to get to this result comes from
E[eb(n)eTb(n)] = TbRxTTb
By definition, this is a symmetric matrix and it is easy to show that RxTTb is a lower triangular
matrix with Pb,i being the elements on its main diagonal. However, since T b is also a lower
triangular matrix, the product of both matrices must retain the same structure. But it has to be
also symmetric, and hence, it must be diagonal. Moreover, since the determinant of Tb is 1, it is a
nonsingular matrix. Therefore,
R−1x = TTb diag{Pb,i}−1Tb = (diag{Pb,i}−1/2Tb)T diag{Pb,i}−1/2Tb
This is called Cholesky decomposition of the inverse of the autocorrelation matrix. Notice that
the inverse of the autocorrelation matrix is factorized into the product of an upper and lower
triangular matrices that are related to each other through a transposition operation. These
matrices are completely determined by the coefficients of the backward prediction error filter
and the backward prediction error powers.
LATTICE REALIZATION
LEVINSON DURBIN ALGORITHM

Levinson Durbin algorithm is the most popular technique for determining the LPC parameters
from a given autocorrelation sequence. Consider the Yule Walker equation for mth order linear
predictor.

You might also like