Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
X1, X2, X3, . . . Xn have joint density denoted fθ(x1, x2, . . . , xn) = f(x1, x2, . . . , xn|θ).
Given observed values X1 = x1, X2 = x2, . . . , Xn = xn, the likelihood of θ is the function lik(θ) =
f(x1, x2, . . . , xn|θ) considered as a function of θ. If the distribution is discrete, f will be the
frequency distribution function. In words: lik(θ)=probability of observing the given data as a
function of θ.
Definition: The maximum likelihood estimate (mle) of θ is that value of θ that maximises
lik(θ): it is the value that makes the observed data the “most probable”. If the Xi are iid, then the
likelihood simplifies to
Rather than maximising this product which can be quite tedious, we often use the fact
that the logarithm is an increasing function so it will be equivalent to maximise the log
likelihood:
The maximum-likelihood (ML) estimate θ^ML is obtained as the parameter vector that
maximises the likelihood function fY|Θ(y|θ). The ML estimator corresponds to a Bayesian
estimator with a notch-shaped cost function and a uniform parameter prior pdf:
EFFICIENCY OF AN ESTIMATOR
Different parameter estimators produce different results depending on the estimation method and
utilisa-tion of the observation and the influence of the prior information. Due to randomness of
the observations, even the same estimator would produce different results with different
observations from the same pro-cess. Therefore an estimate is itself a random variable, it has a
mean and a variance, and it may be described by a probability density function. However, for
most cases, it is sufficient to characterise an estimator
in terms of the mean and the variance of the estimation error. The most commonly used
performance measures for an estimator are the following:
1. Expected value of estimate:
2. Bias of the estimate:
Optimal estimators aim for zero bias and minimum estimation error covariance. The desirable
properties of an estimator can be listed as follows:
1. Unbiased estimator: an estimator of θ is unbiased if the expectation of the estimate is
equal to the true parameter value:
An estimator is asymptotically unbiased if for increasing length of observations N we have
(2) Efficient estimator: an unbiased estimator of θ is an efficient estimator if it has the smallest
covariance matrix compared with all other unbiased estimates of θ۸ :
The least squares solution, for input matrix x and output vector y is
The FIR least mean squares filter is related to the Wiener filter, but minimizing the error
criterion of the former does not rely on cross-correlations or auto-correlations. Its solution
converges to the Wiener filter solution. Most linear adaptive filtering problems can be
formulated using the block diagram above. That is, an unknown system h(n) is to be identified
and the adaptive filter attempts to adapt the filter h^(n) to make it as close as possible to h(n),
while using only observable signals x(n), d(n) and e(n); but y(n), v(n) and h(n) are not directly
observable. Its solution is closely related to the Wiener filter.
WEINER FILTERS
For the FIR filter structure, the coefficient values in W(n) that minimize J MSE(n)
are well-defined if the statistics of the input and desired response signals are known. The
formulation of this problem for continuous-time signals and the resulting solution was first
derived by Wiener. Hence, this optimum coefficient vector WMSE(n) is often called the Wiener
solution to the adaptive filtering problem. The extension of Wiener‟s analysis to the discrete-time
case is attributed to Levinson. To determine WMSE(n), we note that the function JMSE(n) in is
quadratic in the parameters {wi(n)}, and the function is also differentiable. Thus, we can use a
result from optimization theory that states that the derivatives of a smooth cost function with
respect to each of the parameters is zero at a minimizing point on the cost function error surface.
Thus, WMSE(n) can be found from the solution to the system of equations
Taking derivatives of JMSE(n) and noting that e(n) and y(n) are given, we obtain
where we have used the definitions of e(n) and of y(n)for the FIR filter structure to expand the
last result. By defining the matrix RXX(n) and vector PdX(n) as
RXX = E{X(n)XT(n)} and PdX(n) = E{d(n)X(n)}
We can combine both the equations to obtain the system of equations in vector form as
RXX(n)WMSE(n) − PdX(n) = 0 , where 0 is the zero vector.
Thus, so long as the matrix R XX(n) is invertible, the optimum Wiener solution vector for
this problem is WMSE(n) = R−1 XX (n)PdX(n)
KALMAN FILTER
The basic mechanism in Kalman filter is to estimate the signal recursively by the following
relation
xˆ[n] = A xˆ[n-1] + Kny[n]
The whole of Kalman filter is also based on the innovation representation of the signal.
The simplest Kalman filter uses the first-order AR signal model
x[n] = ax[n-1]+ w[n]
where w[n] is a white noise sequence.
The observed data is given by
y[n] = x[n ] + v[n]
where v[n] is another white noise sequence independent of the signal.
The general stationary signal is modeled by a difference equation representing the ARMA (p,q)
model. Such a signal can be modeled by the state-space model and is given by
x [n] = A x[n -1] + Bw[n]
The observations can be represented as a linear combination of the „states‟ and the observation
noise.
y[n] = c′ x [n] + v[n]
Both the Equations have direct relation with the state space model in the control system where
you have to estimate the „unobservable‟ states of the system through an observer that performs
well against noise.
KALMAN FILTER ALGORITHM
Inverse
LINEAR PREDICTION
In the filtering problem, we use the L-most recent samples x(n), x(n − 1), . . . , x(n − L + 1) and
estimate the value of the reference signal at time n. The idea behind a forward linear prediction is
to use a certain set of samples x(n −1), x(n −2), . . . to estimate (with a linear combination) the
value x(n +k) for k ≥ 0. On the other hand, in a backward linear prediction (also known
assmoothing) the set of samples x(n), x(n − 1), . . . , x(n − M + 1) is used to linearly estimate the
value x(n − k) for k ≥ M.
To find the optimum forward filter w f,L = [wf,1,wf,2,...,wf,L ] T we minimize the MSE. The input
correlation matrix would be
where rx (k) is the autocorrelation function for lag k of the WSS input process.
As for the cross correlation vector, the desired signal would be x(n), so
rf = E [x(n − 1)x(n)] = [rx(1),rx(2), . ………….. ,rx(L)] T.
As wf,L will be the Wiener filter, it satisfies the modified Wiener–Hopf equation
Rxwf,L = rf
In addition, the forward prediction error power
Pf,L = rx (0) − rTf wf,L
Putting the equations together into the augmented Wiener–Hopf equation as:
where aL = [1 − wT f,L]
T
. In fact the block matrix on the left hand side is the autocorrelation
T
matrix of the (L+1)×1 input vector [x(n), x(n−1), . . . , x(n−L)] . When this vector passes
through the filter aL it produces the forward linear prediction error as its output. For this reason,
aL is known as the forward prediction error filter. Now, in order to estimate x(n) we might use
only the (L −i)-most recent samples, leading to a prediction error
But the orthogonality principle tells us that when using the optimum forward filter
E[ef,L(n)x(n − 1)] = 0L×1
Then, we can see that for 1 ≤ i ≤ L.
Therefore, we see that as L→∞, E[ef(n) ef(n−i)] = 0, which means that which means that the
sequence of forward errors ef(n) is asymptotically white. This means that a sufficiently long
forward prediction error filter is capable of whitening a stationary discrete-time stochastic
process applied to its input.
BACKWARD LINEAR PREDICTION
In this case we start by trying to estimate x(n − L) based on the next L samples, so the
backward linear prediction error can be put as
To find the optimum backward filter wb,L = [wb,1,wb,2,...,wb,L]T we minimize the MSE. Following
a similar procedure as before to solve the Wiener filter, the augmented Wiener–Hopf equation
has the form
Levinson Durbin algorithm is the most popular technique for determining the LPC parameters
from a given autocorrelation sequence. Consider the Yule Walker equation for mth order linear
predictor.