0% found this document useful (0 votes)
20 views34 pages

ADSP Unit 3

Unit III discusses linear prediction and filtering techniques, focusing on forward and backward linear prediction methods, including Wiener filters and Kalman filters. It explains how future values of a signal can be predicted from past values using linear combinations, with applications in speech analysis, filter design, and system identification. The document also details the mathematical formulations for prediction errors and the relationships between forward and backward predictors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views34 pages

ADSP Unit 3

Unit III discusses linear prediction and filtering techniques, focusing on forward and backward linear prediction methods, including Wiener filters and Kalman filters. It explains how future values of a signal can be predicted from past values using linear combinations, with applications in speech analysis, filter design, and system identification. The document also details the mathematical formulations for prediction errors and the relationships between forward and backward predictors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Unit III

LINEAR PREDICTION AND FILTERING


Linear Prediction – Forward and Backward - Wiener filters for filtering and prediction – FIR
Wiener Filter – IIR Wiener Filter – Kalman Filter
Linear Prediction

o Linear prediction is a technique for analyzing time series; It allows us to predict future values from historical data. It
is often used in digital signal processing, because it allows the future values of a signal to be estimated in terms of a
linear function of past samples.

o Linear prediction is a signal processing technique where future values of a discrete-time signal are estimated as a
linear function of its past values, often used in speech analysis and synthesis.

o The technique assumes that the current value of a signal can be predicted accurately by considering a linear
combination of its past values.

Applications

o Linear prediction is widely used in speech analysis and synthesis, forming the basis for many speech coding and
compression algorithms.

o It's also used in areas like filter design, spectral analysis, and system identification.

o We can predict linearly the value of a stationay random process in the forward and backward direction in time.
Forward Linear Prediction

o Let us begin with the problem of predicting a future value of a stationary random process from observation of past values of the process.

o In particular, we consider the one-step forward linear predictor, which forms the prediction of the value 𝑥(𝑛) by a weighted linear
combination of the past values 𝑥(𝑛 − 1), 𝑥(𝑛 − 2), … , 𝑥(𝑛 − 𝑝).

o Hence the linearly predicted value of 𝑥(𝑛) is

𝒙 𝒏 =− 𝒂𝒑 𝒌 𝒙(𝒏 − 𝒌) → ①
𝒌=𝟏

o Where the −𝑎𝑝 (𝑘) represent the weights in the linear combination. These weights are called the Prediction Coefficients of the one-
step forward linear predictor of order 𝑝.

o The negative sign in the definition of 𝑥(𝑛) is for mathematical convenience and conforms with current practice in the technical literature.
o The difference between the values of 𝑥(𝑛) and the predicted value𝑥 𝑛 is called the forward Prediction Error, denoted as 𝑓𝑝(𝑛):

𝒇𝒑 𝒏 = 𝒙 𝒏 − 𝒙 𝒏 = 𝒙 𝒏 + 𝒂𝒑 𝒌 𝒙(𝒏 − 𝒌) → ②
𝒌=𝟏

o We view linear prediction as equivalent to linear filtering where the predictor is embedded in the linear filter, as shown in Figure 11.2.
This is called a prediction error filter with input sequence 𝑥 𝑛 and output sequence 𝑓𝑝 (𝑛) .

o An equivalent realization for the prediction-error filter as shown in Figure 11.3. This realization is a direct-form FIR filter with system
function
𝒑

𝑨𝒑 𝒛 = 𝒂𝒑 (𝒌)𝒛−𝒌 → ③
𝒌=𝟎

o Where, by definition, 𝑎𝑝 0 = 1.
o The direct form FIR filter is equivalent to an all zero lattice filter.

o The lattice filter is generally described by the following set of order-recursive equation:
𝒇𝟎 𝒏 = 𝒈𝟎 𝒏 = 𝒙 𝒏
𝒇𝒎 𝒏 = 𝒇𝒎−𝟏 𝒏 + 𝑲𝒎 𝒈𝒎−𝟏 𝒏 − 𝟏 𝒎 = 𝟏, 𝟐, … , 𝒑

𝒈𝒎 𝒏 = 𝑲∗𝒎 𝒇𝒎−𝟏 𝒏 − 𝟏 + 𝒈𝒎−𝟏 𝒏 − 𝟏 𝒎 = 𝟏, 𝟐, … 𝒑 → ④

o Where {𝐾𝑚 } are the reflection coefficients and 𝑔𝑚(𝑛) is the backward prediction error.

o Note that for complex valued data, the conjugate of 𝐾𝑚 is used in the equation for 𝑔𝑚(𝑛). Figure 11.4 illustrates a 𝑝-stage lattice filter in
block diagram form along with a typical stage showing the computation given by ④.
o As a consequence of the equivalence between the direct-form prediction error FIR filter and the FIR lattice filter, the output of the p-
stage lattice filter is expressed as
𝒑

𝒇𝒑 𝒏 = 𝒂𝒑 𝒌 𝒙(𝒏 − 𝒌) 𝒂𝒑 𝟎 = 𝟏 → ⑤
𝒌=𝟎

o Since ⑤ is a convolution sum, the z-transform relationship is

𝑭𝒑 𝒛 = 𝑨𝒑 𝒛 𝑿 𝒛 → ⑥

o Or, equivalently,

𝑭𝒑 𝒛 𝑭𝒑 𝒛
𝑨𝒑 𝒛 = = → ⑦
𝑿(𝒛) 𝑭𝟎 𝒛

o The mean-square value of the forward linear prediction error 𝑓𝑝(𝑛) is

𝒑 𝒑 𝒑
𝒇 𝟐
𝜺𝒑 = 𝑬 𝒇𝒑 (𝒏) = 𝜰𝒙𝒙 𝟎 + 𝟐𝑹𝒆 𝒂𝒑∗ 𝒌 𝜰𝒙𝒙 (𝒌) + 𝒂∗𝒑 𝒍 𝒂𝒑 (𝒌)𝜰𝒙𝒙 (𝒍 − 𝒌) → ⑧
𝒌=𝟏 𝒌=𝟏 𝒍=𝟏

𝑓
o 𝜀𝑝 is a quadratic function of the predictor coefficients and its minimization leads to the set of linear equations
𝒑

𝜰𝒙𝒙 𝒍 = − 𝒂𝒑 𝒌 𝜰𝒙𝒙 𝒍 − 𝒌 𝒍 = 𝟏, 𝟐, … , 𝒑 → ⑨
𝒌=𝟏

o These are called the normal equations for the coefficients of the linear predictor.
o The minimum mean-square prediction error is simply
𝒑
𝒇 𝒇
𝒎𝒊𝒏 𝜺𝒑 ≡ 𝑬𝒑 = 𝜰𝒙𝒙 𝟎 + 𝒂𝒑 𝒌 𝜰𝒙𝒙 (−𝒌) → ⑩
𝒌=𝟏

o In the following section we extend the development above to the problem of predicting the value of a time series in the opposite
direction, namely backward in time.

Backward Linear Prediction

o Let us assume that we have the data sequence 𝑥(𝑛), 𝑥(𝑛 − 1), … 𝑥(𝑛 − 𝑝 + 1) from a stationary random process and we wish to
predict the value 𝑥(𝑛 − 𝑝) of the process.

o In this case we employ a one-step backward linear predictor of order 𝑝. Hence


𝒑−𝟏

𝒙 𝒏−𝒑 = − 𝒃𝒑 𝒌 𝒙(𝒏 − 𝒌) → ⑪
𝒌=𝟏

o The difference between the value 𝑥(𝑛 − 𝑝) and the estimate 𝑥 𝑛 − 𝑝 is called the Backward Prediction Error, denoted as 𝑔𝑝(𝑛):
𝒑−𝟏 𝒑

𝒈𝒑 𝒏 = 𝒙 𝒏 − 𝒑 − 𝒃𝒑 𝒌 𝒙(𝒏 − 𝒌) = 𝒃𝒑 𝒌 𝒙(𝒏 − 𝒌) 𝒃𝒑(𝒑) = 𝟏 → ⑫


𝒌=𝟏 𝒌=𝟎

o The backward linear predictor can be realized either by a direct-form FIR filter structure similar to the structure shown in Figure 11.2 or
as a lattice structure.

o The lattice structure shown in Figure 11.4 provides the backward linear predictor as well as the forward linear predictor.
o The weighing coefficients in the backward linear predictor are the complex conjugates of the coefficients for the forward linear predictor,
but they occur in reverse order. Thus, we have

𝒃𝒑 𝒌 = 𝒂𝒑∗ 𝒑 − 𝒌 𝒌 = 𝟎, 𝟏, . . . , 𝒑 →⑬

o In the z-domain, the convolution sum in ⑫ becomes

𝑮𝒑 𝒛 = 𝑩𝒑 𝒛 𝑿 𝒛 → ⑭

o or, equivalently,
𝑮𝒑 𝒛 𝑮𝒑 𝒛
𝑩𝒑 𝒛 = = → ⑮
𝑿(𝒛) 𝑮𝟎 𝒛
o where 𝐵𝑝 𝑧 represents the system function of the FIR filter with coefficients 𝑏𝑝 𝑘 .

o Since 𝑏𝑝 𝑘 = 𝑎∗ 𝑝 − 𝑘 , 𝐺𝑝 𝑧 is related to 𝐴𝑝 𝑧
𝒑 𝒑 𝒑
−𝒌
𝑩𝒑 𝒛 = 𝒃𝒑 (𝒌)𝒛 = 𝒂𝒑∗ (𝒑 − 𝒌)𝒛 −𝒌
=𝒛 −𝒑
𝒂𝒑∗ (𝒌)𝒛𝒌 = 𝒛−𝒑 𝑨𝒑∗ 𝒛−𝟏 → ⑯
𝒌=𝟎 𝒌=𝟎 𝒌=𝟎

o The relationship in ⑯ implies that the zeroes of the FIR filter with system function 𝐵𝑝 𝑧 are simply the (conjugate) reciprocals of the
zeros of 𝐴𝑝 𝑧 . Hence 𝐵𝑝 𝑧 is called the reciprocal or reverse polynomial of 𝐴𝑝 𝑧 .

o Now that we have established these interesting relationships between the forward and backward one-step predictors, let us return to
the recursive lattice equation in ④ and transform them to the z-domain.
o Thus, we have
𝑭𝟎 (𝒛) = 𝑮𝟎 𝒛 = 𝑿(𝒛)
𝑭𝒎 𝒛 = 𝑭𝒎−𝟏 𝒛 + 𝑲𝒎 𝒛−𝟏 𝑮𝒎−𝟏 𝒛 𝒎 = 𝟏, 𝟐, . . . , 𝒑 → ⑰
𝑮𝒎 𝒛 = 𝒌∗𝒎 𝑭𝒎−𝟏 𝒛 + 𝒛−𝟏 𝑮𝒎−𝟏 𝒛 𝒎 = 𝟏, 𝟐, . . . , 𝒑

o If we divide each equation by 𝑋(𝑧), we obtain the desired results in the form
𝑨𝟎 𝒛 = 𝑩𝟎 𝒛 = 𝟏
𝑨𝒎 𝒛 = 𝑨𝒎−𝟏 𝒛 + 𝑲𝒎 𝒛−𝟏 𝑩𝒎−𝟏 𝒛 𝒎 = 𝟏, 𝟐, . . . , 𝒑 → ⑱

𝑩𝒎 𝒛 = 𝒌𝒎 𝑨𝒎−𝟏 𝒛 + 𝒛−𝟏 𝑩𝒎−𝟏 𝒛 𝒎 = 𝟏, 𝟐, . . . , 𝒑

o Thus a lattice filter is described in the z-domain by the matrix equation

𝑨𝒎 (𝒛) 𝟏 𝑲𝒎 𝒛−𝟏 𝑨𝒎−𝟏 (𝒛)


= ∗ → ⑲
𝑩𝒎 (𝒛) 𝒌𝒎 𝒛−𝟏 𝑩𝒎−𝟏 (𝒛)

o The relations in ⑰ for 𝐴𝑚 (𝑧) and 𝐵𝑚 (𝑧) allow us to obtain the direct-form FIR filter coefficients 𝑎𝑚 (𝑘) from the reflection coefficients
𝐾𝑚 , and vice versa.

o The lattice structure with parameters 𝐾1 , 𝐾2 , . . . , 𝐾𝑝 corresponds to a class of 𝑝 direct-form FIR filters with system functions 𝐴1 (𝑧),
𝐴2 (𝑧), . . . , 𝐴𝑝 (𝑧).

o It is interesting to note that the characterization requires only the 𝑝 reflection coefficients 𝐾𝑖 .
o The reason the lattice provides a more compact representation for the call of 𝑝 FIR filters is because appending stages to the lattice does
not alter the parameters of the previous stages.

o On the other hand, appending 𝑝𝑡ℎ stage to a lattice with (𝑝 − 1) stages is equivalent to increasing the length of an FIR filter by one
coefficient.

o The resulting FIR filter with system function 𝐴𝑝 (𝑧) has coefficients totally different from the coefficients of the lower-order FIR filter with
system function 𝐴𝑝−1 (𝑧).

o The formula for determining the filter coefficients [𝑎𝑝 (𝑘)] recursively is easily derived from polynomial relationship ⑱. We have
𝑨𝒎 𝒛 = 𝑨𝒎−𝟏 𝒛 + 𝑲𝒎 𝒛−𝟏 𝑩𝒎−𝟏 (𝒛)

𝒎 𝒎−𝟏 𝒎

𝒂𝒎 (𝒌) 𝒛−𝒌 = 𝒂𝒎−𝟏 (𝒌) 𝒛−𝒌 + 𝑲𝒎 𝒂∗𝒎−𝟏 (𝒎 − 𝟏 − 𝒌) 𝒛−(𝒌+𝟏) → ⑳


𝒌=𝟎 𝒌=𝟎 𝒌=𝟎
−1
o By equating the coefficients of equal powers of 𝑧 and recalling that 𝑎𝑚 0 = 1 for m = 1, 2, . . . , p, we obtain the desire recursive
equation for the FIR filter coefficients in the form
𝒂𝒎 𝟎 = 𝟏
𝒂𝒎 𝒎 = 𝑲𝒎
𝒂𝒎 𝒌 = 𝒂𝒎−𝟏 𝒌 + 𝑲𝒎 𝒂∗𝒎−𝟏 𝒎 − 𝒌 = 𝒂𝒎−𝟏 𝒌 + 𝒂𝒎 𝒂∗𝒎−𝟏 𝒎 − 𝒌 𝟏 ≤ 𝒌 ≤ 𝒎 − 𝟏 𝒎 = 𝟏, 𝟐, . . . , 𝒑 →㉑

o The conversion formula from the direct-form FIR filter coefficients [𝑎𝑝 (𝑘)] to the lattice reflection coefficients 𝐾𝑖 is also very simple.
o For the 𝑝 − 𝑠𝑡𝑎𝑔𝑒 lattice we immediately obtain the reflection coefficient 𝐾𝑝 = 𝑎𝑝 (𝑝). To obtain 𝐾𝑝−1 , . . . , 𝐾1 , we need the polynomials
𝐴𝑚 (𝑧) for 𝑚 = 𝑝 − 1, . . . , 1. From ⑲ we obtain

𝑨𝒎 𝒛 − 𝑲𝒎 𝑩𝒎 (𝒛)
𝑨𝒎−𝟏 𝒛 = 𝒎 = 𝒑, . . . , 𝟏 →㉒
𝟏 − 𝑲𝒎 𝟐

o Which is just a step-down recursion. Thus, we compute all lower-degree polynomials 𝐴𝑚 (𝑧) beginning with 𝐴𝑝−1 (𝑧) and obtain the
desired lattice reflection coefficients form the relation 𝐾𝑚 = 𝑎𝑚 (𝑚).

o We observe that the procedure works as long as 𝐾𝑚 ≠ 1 for m = 1, 2, . . . , p − 1. From this step-down recursion for the polynomials, it
is relatively easy to obtain a formula for recursively and directly computing 𝐾𝑚 , 𝑚 = 𝑝 − 1, . . . 1. For 𝑚 = 𝑝 − 1, . . . , 1, we have
𝑲𝒎 = 𝒂𝒎 (𝒎)

𝒂𝒎 𝒌 − 𝑲𝒎 𝒃𝒎 (𝒌) 𝒂𝒎 𝒌 − 𝒂𝒎 (𝒎)𝒂∗𝒎 (𝒎 − 𝒌)
𝒂𝒎−𝟏 𝒌 = = →㉓
𝟏 − 𝑲𝒎 𝟐 𝟏 − 𝒂𝒎 (𝒎) 𝟐

o Which is just the recursion in the Schur-Cohn stability test for the polynomial 𝐴𝑚 (𝑧).

o As just indicated, the recursive equation ㉓ breaks down if any of the lattice parameters 𝐾𝑚 = 1.

o If this occurs, it is indicative that the polynomial 𝐴𝑚−1 𝑧 has a root located on the units circle. Such a root can be factored out from
𝐴𝑚−1 𝑧 and the iterative process in ㉓ carried out for the reduced-order system.
o Finally, let us consider the minimization of the mean-square error in a backward linear predictor. The backward prediction error is

𝒑−𝟏 𝒑

𝒈𝒑 (𝒏) = 𝒙(𝒏 − 𝒑) + 𝒃𝒑 𝒌 𝒙(𝒏 − 𝒌) = 𝒙(𝒏 − 𝒑) + 𝒂𝒑∗ 𝒌 𝒙(𝒏 − 𝒑 + 𝒌) →㉔


𝒌=𝟎 𝒌=𝟏

o And its mean square value is


𝟐
𝜺𝒃𝒑 = 𝑬 𝒈𝒑(𝒏) →㉕

o The minimization of 𝜀𝑝𝑏 with respect to the prediction coefficients yields the same set of linear equations as in ⑨. Hence the minimum
mean-square error is
𝒇
𝒎𝒊𝒏 𝜺𝒃𝒑 ≡ 𝑬𝒃𝒑 = 𝑬𝒑 →㉖

o which is given by ⑩.
Wiener Filters for Filtering and Prediction

o In many practical applications we are given an input signal {𝑥(𝑛)}, consisting of the sum of a desired signal {𝑠(𝑛)} and an undesired
noise or interference {𝑤(𝑛)}, and we are asked to design a filter that suppresses the undesired interreference component.

o In such a case, the objective is to design a system that filters out the additive interference while preserving the characteristics of the
desired signal {𝑠(𝑛)}.

o Here we treat the problem of signal estimation in the presence of an additive noise disturbance.

o The estimator is constrained to be a linear filter with impulse response {ℎ(𝑛)}, designed so that its output approximates some specified
desired signal sequence {𝑑(𝑛)}.

o Figure 11.8 illustrates the linear estimation problem.


o The input sequence to the filter 𝒙(𝒏) = 𝒔(𝒏) + 𝒘(𝒏), and its output sequence is 𝑦(𝑛).

o The difference between the desired signal and the filter output is the error sequence 𝒆(𝒏) = 𝒅(𝒏) − 𝒚(𝒏).

o We distinguish three special cases:

o If 𝒅(𝒏) = 𝒔(𝒏), the linear estimation problem is referred to as filtering.

o If 𝒅(𝒏) = 𝒔(𝒏 + 𝑫), where 𝐷 > 0, the linear estimation problem is referred to as signal prediction.

o If 𝒅(𝒏) = 𝒔(𝒏 − 𝑫), where 𝐷 > 0, the linear estimation problem is referred to as signal smoothing.

o The criterion selected for optimizing the filter impulse response {ℎ(𝑛)} is the minimization of the mean-square error.

o This criterion has the advantages of simplicity and mathematical traceability.

o The basic assumptions are that the sequences {𝑠(𝑛)}, {𝑤(𝑛)}, 𝑎𝑛𝑑 {𝑑(𝑛)} are zero mean and wide sense stationary.

o The linear filter will be assumed to be either FIR or IIR.

o If it is IIR, we assume that the input data {𝑥(𝑛)} are available over the infinite past.

o The optimum linear filter, in the sense of Minimum Mean-Square Error (MMSE), is called a Wiener Filter.
FIR Wiener Filter
o Suppose that the filter is constrained to be of length M with coefficients {ℎ𝑘 , 0 ≤ 𝑘 ≤ 𝑀 − 1}. Hence its output 𝑦(𝑛) depends on the
finite data record 𝑥 𝑛 , x n − 1 , . . . , x(n − M + 1),

𝑴−𝟏

𝒚 𝒏 = 𝒉 𝒌 𝒙(𝒏 − 𝒌) → ❶
𝒌=𝟎

o The mean-square value of the error between the desired output 𝑑(𝑛) and 𝑦(𝑛) is

𝑴−𝟏
𝟐
𝜺𝑴 = 𝑬 𝒆(𝒏) =𝑬 𝒅 𝒏 − 𝒉 𝒌 𝒙(𝒏 − 𝒌) → ❷
𝒌=𝟎

o Since this is a quadratic function of the filter coefficients, the minimization of 𝜀𝑀 yields the set of linear equations

𝑴−𝟏

𝒉 𝒌 𝜸𝒙𝒙 (𝒍 − 𝒌) = 𝜸𝒅𝒙 𝒍 𝒍 = 𝟎, 𝟏, . . . , 𝑴 − 𝟏 → ❸
𝒌=𝟎

o Where 𝛾𝑥𝑥 (𝑘) is the autocorrelation of the input sequence {𝑥(𝑛)} and 𝛾𝑑𝑥 𝑘 = 𝐸 𝑑(𝑛)𝑥 ∗ (𝑛 − 𝑘) is the cross correlation between the
desired sequence {𝑑(𝑛)} and the input sequence {𝑥 𝑛 , 0 ≤ 𝑛 ≤ 𝑀 − 1}.
o The set of linear equations that specify the optimum filter is called the Wiener-Hopf Equation. These equations are also called the
normal equations.
o In general, the equations in ❸ can be expressed in matrix form as
𝜞𝑴 𝒉𝑴 = 𝜸𝒅 → ❹
o Where Γ𝑀 is and M x M (Hermitian) Toeplitz matrix with elements Γ𝑙𝑘 = 𝛾𝑥𝑥 (𝑙 − 𝑘) and 𝛾𝑑 is the M x 1 cross correlation vector with
elements 𝛾𝑑𝑥 𝑙 , 𝑙 = 0, 1, . . . , 𝑀 − 1. The solution for the optimum filter coefficients is
𝒉𝒐𝒑𝒕 = 𝜞−𝟏
𝑴 𝜸𝒅 → ❺

o And the resulting minimum MSE achieved by the wiener filter is

𝑴−𝟏
𝒎𝒊𝒏
𝑴𝑴𝑺𝑬𝑴 = 𝜺 = 𝝈𝟐𝒅 − 𝒉𝒐𝒑𝒕 𝒌 𝜰∗𝒅𝒙 𝒌 → ❻
𝒉𝑴 𝑴
𝒌=𝟎

o Or, equivalently,
𝑴𝑴𝑺𝑬𝑴 = 𝝈𝟐𝒅 − 𝜰∗𝒅 𝜞−𝟏
𝑴 𝜸𝒅 → ❼

o Where 𝝈𝟐𝒅 = 𝑬 𝒅(𝒏) 𝟐 .

o Let us consider some special cases of ❸.


o If we are dealing with filtering, the 𝑑(𝑛) = 𝑠(𝑛). Furthermore, if 𝑠(𝑛) and 𝑤(𝑛) are uncorrelated random sequences, as is usually the
case in practice, then
𝜸𝒙𝒙 𝒌 = 𝜸𝒔𝒔 𝒌 + 𝜸𝒘𝒘 𝒌 → ❽
𝜸𝒅𝒙 𝒌 = 𝜸𝒔𝒔 (𝒌)
o And the normal equations in ❸ become

𝑴−𝟏

𝒉 𝒌 𝜸𝒔𝒔 𝒍 − 𝒌 + 𝜸𝒘𝒘 (𝒍 − 𝒌) = 𝜸𝒔𝒔 𝒍 𝒍 = 𝟎, 𝟏, . . . , 𝑴 − 𝟏 → ❾


𝒌=𝟎

o If we are dealing with prediction, then 𝑑(𝑛) = 𝑠(𝑛 + 𝐷) where 𝐷 > 0. Assuming that 𝑠(𝑛) and 𝑤(𝑛) are uncorrelated random
sequences, we have
𝜸𝒅𝒙 𝒌 = 𝜸𝒔𝒔 𝒍 + 𝑫 → ❿
o Hence the equations for the Wiener prediction filter become

𝑴−𝟏

𝒉 𝒌 𝜸𝒔𝒔 𝒍 − 𝒌 + 𝜸𝒘𝒘 (𝒍 − 𝒌) = 𝜸𝒔𝒔 𝒍 + 𝑫 𝒍 = 𝟎, 𝟏, . . . , 𝑴 − 𝟏 → ⓫


𝒌=𝟎

o In all these cases, the correlation matrix to be inverted in Toeplitz. Hence the (generalized) Levinson-Durbin algorithm may be used to
solve for the optimum filter coefficients.
IIR Wiener Filter
o Here we allow the filer to be infinite in duration (IIR) and the data sequence to be infinite as well. Hence the filter output is

𝒚 𝒏 = 𝒉 𝒌 𝒙(𝒏 − 𝒌) → ①
𝒌=𝟎
o The filter coefficients are selected to minimize the mean-square error between the desired output 𝑑(𝑛) and 𝑦(𝑛), that is,
∞ 𝟐

𝜺∞ = 𝑬|𝒆 𝒏 |𝟐 = 𝑬 𝒅 𝒏 − 𝒉 𝒌 𝒙 𝒏−𝒌 → ②
𝒌=𝟎

o Application of the orthogonality principle leads to the Wiener-Hopf equation,


𝒉 𝒌 𝜸𝒙𝒙 𝒍 − 𝒌 = 𝜸𝒅𝒙 𝒍 𝒍≥𝟎 → ③


𝒌=𝟎

o The residual MMSE is simply obtained by application of the condition given by


𝑴𝑴𝑺𝑬𝑴 = 𝑬 𝒆 𝒏 𝒅∗ (𝒏) 𝟐

o Thus we obtain

𝒎𝒊𝒏
𝑴𝑴𝑺𝑬∞ = 𝜺 = 𝝈𝟐𝒅 − 𝒉𝒐𝒑𝒕 𝒌 𝜰∗𝒅𝒙 𝒌 → ④
𝒉 ∞
𝒌=𝟎

o The Wiener-Hopf equation given by ③ cannot be solved directly with z-transform techniques, because the equation holds only for 𝑙 ≥ 0.
we shall solve for the optimum IIR Wiener filter based on the innovations representation of the stationary random process {𝑥 𝑛 }.

o Recall that a stationary random process {𝑥 𝑛 } with autocorrelation 𝛾𝑥𝑥 𝑘 and power spectral density Γ𝑥𝑥 𝑓 can be represented by an
equivalent innovation process, {i 𝑛 } by passing {𝑥 𝑛 } through a noise-whitening filter with system function of 1 𝐺(𝑧), where 𝐺(𝑧) is the
minimum phase part obtained form the spectral factorization of Γ𝑥𝑥 𝑧 :
𝜞𝒙𝒙 𝒛 = 𝝈𝟐𝒊 𝑮 𝒛 𝑮 𝒛−𝟏 → ⑤
o Hence 𝐺(𝑧) is analytic in the region 𝑧 > 𝑟1 , where 𝑟1 < 1.

o Now, the optimum Wiener filter can be viewed as the cascade of the whitening filter 1 with a second filter, say Q(𝑧), whose
𝐺(𝑧)

output y(𝑛) is identical to the output of the optimum Wiener filter. Since

𝒚 𝒏 = 𝒒 𝒌 𝒊(𝒏 − 𝒌) → ⑥
𝒌=𝟎

o And 𝑒(𝑛) = 𝑑(𝑛) − 𝑦(𝑛), application of the orthogonality principle yield s the new Wiener-Hopf equation as

𝒒 𝒌 𝜸𝒊𝒊 𝒍 − 𝒌 = 𝜸𝒅𝒊 𝒍 𝒍≥𝟎 → ⑦


𝒌=𝟎

o But since {𝑖(𝑛)} is white, it follows that 𝛾𝑖𝑖 𝑙 − 𝑘 = 0 unless 𝑙 = 𝑘. Thus we obtain the solution is

𝜸𝒅𝒊 𝒍 𝜸𝒅𝒊 𝒍
𝒒 𝒍 = = 𝒍≥𝟎 → ⑧
𝜸𝒊𝒊 𝟎 𝝈𝟐𝒊
o The z-transform of the sequence {𝑞(𝑙)} is

∞ ∞
−𝒌
𝟏
𝑸 𝒛 = 𝒒 𝒌 𝒛 = 𝜸𝒅𝒊 𝒌 𝒛−𝒌 → ⑨
𝝈𝟐𝒊
𝒌=𝟎 𝒌=𝟎
o If we denote the z-transform of the two-sided cross-correlation sequence 𝛾𝑑𝑖 𝑘 by Γ𝑑𝑖 𝑧 :

𝜞𝒅𝒊 𝒛 = 𝜸𝒅𝒊 𝒌 𝒛−𝒌 → ⑩


𝒌=−∞

o And define Γ𝑑𝑖 𝑧 + as

𝜞𝒅𝒊 𝒛 + = 𝜸𝒅𝒊 𝒌 𝒛−𝒌 → ⑪


𝒌=𝟎

o Then
𝟏
𝑸 𝒛 = 𝜞𝒅𝒊 𝒛 + → ⑫
𝝈𝟐𝒊
o To determine Γ𝑑𝑖 𝑧 +, we begin with the output of the noise-whitening filter, which can be expressed as

𝒊 𝒏 = 𝒗 𝒌 𝒙(𝒏 − 𝒌) → ⑬
𝒌=𝟎

o Then
∞ ∞
∗ ∗
𝜸𝒅𝒊 𝒌 = 𝑬 𝒅(𝒏)𝒊 (𝒏 − 𝒌) = 𝒗 𝒎 𝑬 𝒅(𝒏)𝒙 (𝒏 − 𝒎 − 𝒌) = 𝒗 𝒎 𝜸𝒅𝒙 (𝒌 + 𝒎) → ⑭
𝒎=𝟎 𝒎=𝟎
o The z-transform of the cross-correlation 𝛾𝑑𝑖 (𝑘) is

∞ ∞ ∞ ∞ ∞ ∞

𝜞𝒅𝒊 𝒛 = 𝒗(𝒎)𝜸𝒅𝒙 𝒌 + 𝒎 𝒛−𝒌 = 𝒗(𝒎)𝒛𝒎 𝜸𝒅𝒙 𝒌 + 𝒎 𝒛−𝒌 = 𝒗(𝒎)𝒛𝒎 𝜸𝒅𝒙 𝒌 𝒛−𝒌


𝒌=−∞ 𝒎=𝟎 𝒎=𝟎 𝒌=−∞ 𝒎=𝟎 𝒌=−∞

𝜞𝒅𝒙 𝒛
𝜞𝒅𝒊 𝒛 = 𝑽 𝒛−𝟏 𝜞𝒅𝒙 𝒛 = → ⑮
𝑮 𝒛−𝟏

o Therefore,

𝟏 𝜞𝒅𝒙 𝒛
𝑸 𝒛 = → ⑯
𝝈𝟐𝒊 𝑮(𝒛−𝟏 ) +

o Finally, the optimum IIR Wiener filter has the system function

𝑸 𝒛 𝟏 𝜞𝒅𝒙 𝒛
𝑯𝒐𝒑𝒕 𝒛 = = 𝟐 → ⑰
𝑮 𝒛 𝝈𝒊 𝑮(𝒛) 𝑮(𝒛−𝟏 ) +

o In summary, the solution for the optimum IIR Wiener filer requires that we perform a spectral factorization of Γ𝑥𝑥 𝑧 to obtain 𝐺(𝑧), the
𝜞 𝒛
minimum-phase component, and then we solve for the causal part of 𝑮 𝒅𝒙𝒛−𝟏 .

o We can express the minimum MSE given by ④ in terms of frequency-domain characteristics of the filter. First, we note that 𝝈𝟐𝒅 ≡
𝟐
𝑬 𝒅(𝒏) is simply the value of autocorrelation sequence {𝛾𝑑𝑑 (𝑘)} evaluated at 𝑘 = 0.
o Since
𝟏 𝒌−𝟏
𝜸𝒅𝒅 𝒌 = 𝑪𝜞𝒅𝒅 (𝒛)𝒛 𝒅𝒛 → ⑱
𝟐𝝅𝒋

o It follows that

𝟏 𝜞𝒅𝒅 (𝒛)
𝝈𝟐𝒅 = 𝜸𝒅𝒅 𝟎 = 𝑪 𝒅𝒛 → ⑲
𝟐𝝅𝒋 𝒛

o Where the contour integral is evaluated along a closed path encircling the origin in the region of convergence of Γ𝑑𝑑 (𝑧).

o The second term in ④ is also easily transformed to the frequency domain by application of Parseval’s theorem. Since ℎ𝑜𝑝𝑡 𝑘 = 0 for
𝑘 < 0, we have

𝟏
𝒉𝒐𝒑𝒕 𝒌 𝜸∗𝒅𝒙 𝒌 = 𝑪𝑯𝒐𝒑𝒕 (𝒛)𝜞𝒅𝒙 𝒛−𝟏 𝒛−𝟏 𝒅𝒛 → ⑳
𝟐𝝅𝒋
𝒌=−∞

o Where C is a closed contour encircling the origin that lies within the common region of convergence of 𝐻𝑜𝑝𝑡(𝑧) and Γ𝑑𝑥 𝑧 −1 .

o By combining ⑲ & ⑳, we obtain the desired expression for the 𝑀𝑀𝑆𝐸∞ in the form

𝟏
𝑴𝑴𝑺𝑬∞ = 𝑪 𝜞𝒅𝒅 𝒛 − 𝑯𝒐𝒑𝒕 (𝒛)𝜞𝒅𝒙 𝒛−𝟏 𝒛−𝟏 𝒅𝒛 →㉑
𝟐𝝅𝒋
Non-causal Wiener Filter

o In the preceding section we constrained the optimum Wiener filter to be causal [𝑖. 𝑒. , ℎ𝑜𝑝𝑡 𝑛 = 0 𝑓𝑜𝑟 𝑛 < 0]. In this section we drop
this condition and allow the filter to include both the infinite past and the infinite future of the sequence {𝑥(𝑛)} in forming the output
𝑦(𝑛), that is

𝒚 𝒏 = 𝒉 𝒌 𝒙(𝒏 − 𝒌) → ❶
𝒌=−∞

o The resulting filter is physically unrealizable. It can be viewed as a smoothing filter in which the infinite future signal values are used to
smooth the estimate 𝑑 𝑛 = 𝑦(𝑛) of the desired signal 𝑑(𝑛).

o Application of the orthogonality principle yields the Wiener-Hopf equation for the non-causal filter in the form

𝒉 𝒌 𝜸𝒙𝒙 𝒍 − 𝒌 = 𝜸𝒅𝒙 𝒍 −∞ <𝒍 <∞ → ❷


𝒌=−∞

o And the resulting 𝑀𝑀𝑆𝐸𝑛𝑐 as

𝑴𝑴𝑺𝑬𝒏𝒄 = 𝝈𝟐𝒅 − 𝒉 𝒌 𝜸∗𝒅𝒙 𝒌 → ❸


𝒌=−∞
o Since ❷ holds −∞ < 𝑙 < ∞, this equation can be transformed directly to yield the optimum non-causal Wiener filter as

𝜞𝒅𝒙 (𝒛)
𝑯𝒏𝒄 𝒛 = → ❹
𝜞𝒙𝒙 (𝒛)

o The 𝑀𝑀𝑆𝐸𝑛𝑐 can also be simply expressed in the z-domain as

𝟏
𝑴𝑴𝑺𝑬𝒏𝒄 = 𝑪 𝜞𝒅𝒅 𝒛 − 𝑯𝒏𝒄 (𝒛)𝜞𝒅𝒙 𝒛−𝟏 𝒛−𝟏 𝒅𝒛 → ❺
𝟐𝝅𝒋

Kalman Filter

o In the previous section we considered the problem of designing a Wiener filter to estimate a process from a set of noisy observations.
The primary limitation with the solution that was derived is that it requires that the desired sequence and input sequence be jointly wide
sense stationary processes.

o Since most of the process encountered in practice are non-stationary, this constraint limits the usefulness of the Wiener Filter.

o Therefore, now we re-examine this estimation problem within the contest of non-stationary processes and derive what is known as the
Discrete Kalman Filter.

o [Note – Here we have introduced a slight change in the notation in order to be consistent with the notation that is commonly used in
the Kalman filtering literature. Instead of d(n), the signal that is to be estimated will be denoted by x(n) and the noisy observations
denoted by y(n).]
o To begin, let us look briefly once again at the causal Wiener filter for estimating a process x(n) from noisy measurements
𝑦 𝑛 =𝑥 𝑛 +𝑣 𝑛 → ①

o in the literature of Kalman filtering, the process x(n) is usually assumed to be generated according to difference equation
𝑥 𝑛 = 𝑎 1 𝑥(𝑛 − 1) + 𝑤 𝑛

o From 𝑦 𝑛 = 𝑥 𝑛 + 𝑣 𝑛 , where 𝑤 𝑛 𝑎𝑛𝑑 𝑣 𝑛 are uncorrelated white noise processes. What we discovered was that the optimum
linear estimate of 𝑥 𝑛 , using all measurements, 𝑦 𝑘 , 𝑓𝑜𝑟 𝑘 ≤ 𝑛, could be computed with a recursion of the form
𝑥 𝑛 = 𝑎 1 𝑥 𝑛 − 1 + 𝐾 𝑦 𝑛 − 𝑎(1)𝑥 𝑛 − 1 → ②
2
o Where K is constant, referred to as the Kalman gain, that minimized the mean-square error E 𝑥 𝑛 − 𝑥 𝑛 .

o However, there are two problems with this solution that needs to be addressed.

o First is the requirement that 𝑥 𝑛 𝑎𝑛𝑑 𝑦(𝑛) be jointly wide-sense stationary processes.

o For example, ② is not the optimum linear estimate if 𝑥(𝑛) is a non-stationary process, such as the one that is generated by the time-
varying difference equation
𝑥 𝑛 = 𝑎𝑛−1 1 𝑥(𝑛 − 1) + 𝑤 𝑛

o Nevertheless, what we will discover is that the optimum estimate may be written as
𝑥 𝑛 = 𝑎𝑛−1 1 𝑥 𝑛 − 1 + 𝐾 𝑛 𝑦 𝑛 − 𝑎𝑛−1 1 𝑥 𝑛 − 1 → ③

o Where 𝐾(𝑛) is suitable chosen (time-varying) gain.


o The second problem with Wiener solution is that it does not allow the filter to be “turned on” at time 𝑛 = 0. in other words, implicit in
the development of the causal Wiener filter is the assumption that the observations 𝑦(𝑘) are available for all 𝑘 ≤ 𝑛.

o Again, however, we will find that this problem is addressed with an estimate of the form given in ③.

o Although the discussion above is only concerned with the estimation of an AR(1) process from noisy measurements, using state variables
we may easily extend these results to more general processes.

o For example, let 𝑥(𝑛) be and AR(p) process that is generated according to the difference equation

𝑥 𝑛 = 𝑎 𝑘 𝑥 𝑛 − 𝑘 + 𝑤(𝑛) → ④
𝑘=1

o And suppose that 𝑥(𝑛) is measured in the presence of additive noise


𝑦 𝑛 =𝑥 𝑛 +𝑣 𝑛 →⑤

o If we let 𝑿(𝑛) be the p-dimensional state vector

𝑥(𝑛)
𝑥(𝑛 − 1)
.
𝑿 𝑛 = .
.
.
𝑥(𝑛 − 𝑝 + 1)
o Then ④ & ⑤ may be written in terms of 𝑿(𝑛) as follows
𝑎 1 𝑎 2 . . . 𝑎 𝑝−1 𝑎 𝑝 1
1 0 . . . 0 0 0
0 1 . . . 0 0 0
𝑿 𝑛 = . . . . . . . 𝑿 𝑛−1 + . 𝑤 𝑛 → ⑥
. . . . . . . .
. . . . . . . .
0 0 . . . 1 0 0
o And
𝑦 𝑛 = 1, 0, . . . , 0 𝑿 𝑛 + 𝑣 𝑛 → ⑦

o Using matrix notation to simplify these equations we have


𝑿 𝑛 = 𝑨𝑿 𝑛 − 1 + 𝑾 𝑛
𝑦 𝑛 = 𝒄𝑻 𝑿 𝑛 + 𝑣 𝑛 → ⑧

o Where A is a 𝒑 𝒙 𝒑 state transition matrix, 𝑾(𝑛) = [𝑤 𝑛 , 0, . . . , 0]𝑇 is a vector noise process, and 𝒄 is unit vector of length 𝑝. As in ③
for the case of an AR(1) process, the optimum estimate of the state vector 𝑿(𝑛), using all of the measurement up to time 𝑛, may be
expressed in the form
𝑿 𝑛 = 𝑨𝑿 𝑛 − 1 + 𝑲 𝑦 𝑛 − 𝒄𝑻 𝑨𝑿 𝑛 − 1 → ⑨

o Where 𝑲 is Kalman gain vector.

o Although only applicable to stationary AR(p) processes, ⑧ may be easily generalized to non-stationary processes as follows.
o Let 𝑿(𝑛) be a state vector of dimension p that evolves according to the difference equation
𝑿 𝑛 =𝑨 𝑛−1 𝑿 𝑛−1 +𝑾 𝑛 → ⑩

o Where 𝑨 𝑛 − 1 is a time-varying 𝑝 𝑥 𝑝 state transition matrix and 𝑾(𝑛) is a vector of zero-mean white noise processes with

𝑄𝑤 𝑛 ; 𝑘=𝑛
𝐸 𝑾 𝑛 𝑾𝑯 𝑘 = → ⑪
0 ; 𝑘≠𝑛
o In addition, let 𝒚(𝑛) be a vector of observations that are formed according to
𝒚 𝑛 =𝑪 𝑛 𝑿 𝑛 +𝑽 𝑛 → ⑫

o Where 𝒚 𝑛 is a vector of length 𝑞, 𝑪(𝑛) is a time varying 𝑞 𝑥 𝑝 matrix, and 𝑽(𝑛) is a vector of zero mean white noise processes that are
statistically independent of 𝑾(𝑛) with

𝑄𝑣 𝑛 ; 𝑘=𝑛
𝐸 𝑽 𝑛 𝑽𝑯 𝑘 = → ⑬
0 ; 𝑘≠𝑛

o Generalizing the result given in ⑨, we expect the optimum linear estimate of 𝑿(𝑛) to be expressible in the form
𝑿 𝑛 = 𝑨 𝑛 − 1 𝑿 𝑛 − 1 + 𝑲(𝑛) 𝒚 𝑛 − 𝑪(𝑛)𝑨 𝑛 − 1 𝑿 𝑛 − 1

o With the appropriate Kalman gain matrix 𝑲(𝑛), this recursion corresponds to the discrete Kalman filter. We will now show that the
optimum linear recursive estimate of 𝑿(𝑛) has this form and derive the optimum Kalman gain K(n) that minimizes the mean-square
estimation error.

o In the following discussion it is assumed that 𝑨 𝑛 , 𝑪 𝑛 , 𝑸𝒘 𝑛 𝑎𝑛𝑑 𝑸𝒗 (𝑛) are known.


o In our development of the discrete Kalman filter, we will let 𝑿 𝑛|𝑛 denote the best linear estimate of 𝑿(𝑛) at time 𝑛 given the
observations 𝒚(𝑖) 𝑓𝑜𝑟 𝑖 = 1, 2, . . . , 𝑛, and we will let 𝑿 𝑛|𝑛 − 1 denote the best estimate given the observations up to time 𝑛 − 1. With
𝒆(𝑛|𝑛) and 𝒆(𝑛|𝑛 − 1) the corresponding state estimation errors,
𝒆 𝑛 𝑛 = 𝑿(𝑛) − 𝑿 𝑛|𝑛
𝒆 𝑛 𝑛 − 1 = 𝑿(𝑛) − 𝑿 𝑛|𝑛 − 1

o And 𝐏 𝑛 𝑛 𝑎𝑛𝑑 𝑷 𝑛|𝑛 − 1 the error covariance matrices,


𝐏 𝑛 𝑛 = 𝑬{𝒆 𝑛 𝑛 𝒆𝑯 𝑛 𝑛 }
𝐏 𝑛 𝑛 − 1 = 𝑬 𝒆 𝑛 𝑛 − 1 𝒆𝑯 𝑛 𝑛 − 1 → ⑭

o Suppose that matrix we are given an estimate 𝑿 0|0 of the state 𝑿 0 , and that the error covariance matrix for this estimate, 𝑷(0|0), is
known.

o When the measurement 𝒚(1) becomes available the goal is to update 𝑿 0|0 and find the estimate 𝑿 1|1 of the state at time 𝑛 = 1
that minimizes the mean-square error

𝑝−1
2 2
𝜉 1 = 𝐸 𝒆(1|1) = 𝑡𝑟 𝑷(1|1) = 𝐸 𝒆(1|1) → ⑮
𝑖=0

o After 𝑿 1|1 has been determined and the error covariance 𝑷(1|1) found, the estimation is repeated for the next observation 𝒚(2).
o Thus, for each 𝑛 > 0, given 𝑿 𝑛 − 1|𝑛 − 1 and 𝐏 𝑛 − 1|𝑛 − 1 , when a new observation, 𝒚(𝑛), becomes available, the problem is to
find the minimum mean-square estimate 𝑿 𝑛|𝑛 of the state vector 𝑿(𝑛).

o The solution to this problem will be derived in two steps.

o First, given 𝑿 𝑛 − 1|𝑛 − 1 we will find 𝑿 𝑛|𝑛 − 1 , which is the best estimate of 𝑿(𝑛) without the observation 𝒚(𝑛) and 𝑿 𝑛|𝑛 − 1
we will estimate 𝑿(𝑛).

o In the first step, since no new measurements are used to estimate 𝑿(𝑛), all that is known is that 𝑿(𝑛) evolves according to the state
equation
𝑿 𝑛 = 𝑨 𝑛 − 1 𝑿 𝑛 − 1 + 𝑾(𝑛)

o Since 𝑾(𝑛) is a zero mean white noise process (and the values of 𝑾(𝑛) are unknown), then we may predict 𝑿(𝑛) as follows,
𝑿 𝑛|𝑛 − 1 = 𝑨 𝑛 − 1 𝑿 𝑛 − 1|𝑛 − 1 → ⑯

o Which has an estimation error given by


𝒆 𝑛|𝑛 − 1 = 𝑿 𝑛 − 𝑿 𝑛|𝑛 − 1 = 𝑨 𝑛 − 1 𝑿 𝑛 − 1 + 𝑾 𝑛 − 𝑨 𝑛 − 1 𝑿 𝑛 − 1|𝑛 − 1 = 𝑨 𝑛 − 1 𝒆 𝑛 − 1|𝑛 − 1 + 𝑾 𝑛 → ⑰

o Note that since 𝑾(𝑛) has zero mean, if 𝑿 𝑛 − 1|𝑛 − 1 is an unbiased estimate 𝐗 𝑛 − 1 , i.e.,
𝐸{𝒆 𝑛 − 1 𝑛 − 1 = 0

o then 𝑿 𝑛|𝑛 − 1 will be an unbiased estimate of 𝐗 𝑛 ,


𝐸{𝒆 𝑛 𝑛 − 1 = 0
o Finally, since the estimation error 𝒆 𝑛 − 1 𝑛 − 1 is uncorrelated with 𝑾(𝑛) (a consequence of the fact that 𝑾(𝑛) is a white noise
sequence), then
𝑷 𝑛 𝑛 − 1 = 𝑨 𝑛 − 1 𝑷 𝑛 − 1 𝑛 − 1 𝑨𝐻 𝑛 − 1 + 𝑸𝑤 𝑛 → ⑱
o where 𝑸𝑤 𝑛 is the covariance matrix for the noise process 𝑾(𝑛). This completes the first step of the Kalman filter.
o In the second step we incorporate the new measurement 𝒚(𝑛) into the estimate 𝑿 𝑛|𝑛 − 1 .
o A linear estimate of 𝑿(𝑛) that is based on 𝑿 𝑛|𝑛 − 1 and 𝒚(𝑛) is of the form
𝑿 𝑛|𝑛 = 𝑲′ 𝑛 𝑿 𝑛|𝑛 − 1 + 𝑲 𝑛 𝒚 𝑛 → ⑲
o Where 𝑲(𝑛) and 𝑲′ 𝑛 are matrices, yet to be specified.
o The requirement that is imposed on 𝑿 𝑛|𝑛 is that it be unbiased, 𝐸{𝒆 𝑛 𝑛 = 0, and that is minimize the mean-square error,
2
𝐸 𝒆(1|1) .
o Using ⑲ we may express 𝒆(𝑛|𝑛) in terms of 𝒆(𝑛|𝑛 − 1) as follows
𝒆 𝑛 𝑛 = 𝑿 𝑛 − 𝑲′ 𝑛 𝑿 𝑛|𝑛 − 1 − 𝑲 𝑛 𝒚 𝑛 = 𝑿 𝑛 − 𝑲′ 𝑛 𝑿 𝑛 − 𝒆 𝑛|𝑛 − 1 − 𝑲 𝑛 𝑪 𝑛 𝑿 𝑛 + 𝑽 𝑛
= 𝑰 − 𝑲′ 𝑛 − 𝑲 𝑛 𝑪 𝑛 𝑿 𝑛 + 𝑲′ 𝑛 𝒆 𝑛|𝑛 − 1 − 𝑲 𝑛 𝑽 𝑛 → ⑳

o Since 𝐸 𝑽 𝑛 = 0 and 𝑬{𝒆 𝑛 𝑛 − 1 = 0, then 𝑿 𝑛|𝑛 will be unbiased for any 𝑿 𝑛 only if the term in brackets is zero,
𝑲′ 𝑛 = 𝑰 − 𝑲 𝑛 𝑪 𝑛
o With this constraint, it follows ⑲ that 𝑿 𝑛|𝑛 has the form
𝑿 𝑛|𝑛 = 𝑰 − 𝑲 𝑛 𝑪 𝑛 𝑿 𝑛|𝑛 − 1 + 𝑲 𝑛 𝒚 𝑛 →㉑
o Or
𝑿 𝑛|𝑛 = 𝑿 𝑛|𝑛 − 1 + 𝑲 𝑛 𝒚 𝑛 − 𝑪 𝑛 𝑿 𝑛|𝑛 − 1 →㉒

o And the error is


𝒆 𝑛 𝑛 = 𝑲′ 𝑛 𝒆 𝑛 𝑛 − 1 − 𝑲 𝑛 𝑽 𝑛 = 𝑰 − 𝑲 𝑛 𝑪 𝑛 𝒆 𝑛 𝑛 − 1 − 𝑲 𝑛 𝑽 𝑛 →㉓

o Since 𝑽(𝑛) is uncorrelated with 𝑾(𝑛), then 𝑽(𝑛) is uncorrelated with 𝑿(𝑛) and, therefore, it is uncorrelated with 𝑿 𝑛|𝑛 − 1 . In
addition, since 𝒆 𝑛 𝑛 − 1 = 𝑿 𝑛 − 𝑿 𝑛|𝑛 − 1 , then 𝑽(𝑛) is uncorrelated with 𝒆(𝑛|𝑛 − 1),
𝐸 𝒆 𝑛 𝑛−1 𝑽 𝑛 𝑛 =0

o Thus, the error covariance matrix for 𝒆(𝑛|𝑛) is


𝑷 𝑛 𝑛 = 𝐸 𝒆 𝑛 𝑛 𝒆𝑯 𝑛 𝑛 →㉔
𝐻
𝑷 𝑛 𝑛 = 𝑰−𝑲 𝑛 𝑪 𝑛 𝑷 𝑛 𝑛−1 𝑰−𝑲 𝑛 𝑪 𝑛 + 𝑲 𝑛 𝑸𝒗 𝑛 𝐾 𝐻 𝑛 →㉕

o Next, we must find the value for the Kalman gain 𝑲(𝑛) that minimizes the mean-square error
𝜉 𝑛 = 𝑡𝑟{𝑷(𝑛|𝑛)}

o This may be accomplished in a couple of different ways. Although requiring some special matrix differential formulas, we will take the
most expedient approach of differentiating 𝜉(𝑛) with respect to 𝑲(𝑛), setting the derivative to zero, and solving for 𝑲(𝑛). Using the
matrix differentiation formulas

𝑑 𝑑
= 𝑡𝑟 𝑲𝑨 = 𝑨𝐻 →㉖ and = 𝑡𝑟 𝑲𝑨𝑲𝐻 = 2𝑲𝑨 →㉗
𝑑𝑲 𝑑𝑲
o We have

𝑑
𝑡𝑟 𝑷 𝑛 𝑛 = −2 𝑰 − 𝑲 𝑛 𝑪 𝑛 𝑷 𝑛 𝑛 − 1 𝑪𝐻 𝑛 + 2𝑲 𝑛 𝑸𝒗 𝑛 = 0 →㉘
𝑑𝑲
o Solving for K(n) gives the desired expression for the Kalman gain,
𝑲 𝑛 = 𝑷 𝑛 𝑛 − 1 𝑪𝑯 𝑛 𝑪 𝑛 𝑷 𝑛 𝑛 − 1 𝑪𝐻 𝑛 + 𝑸𝑣 𝑛 −1
→㉙
o Having found the Kalman gain vector, we may simplify the expression given in ㉕ for the error covariance. First, we rewrite the
expression for 𝑷 𝑛 𝑛 as follows,
𝑷 𝑛 𝑛 = 𝑰 − 𝑲 𝑛 𝑪 𝑛 𝑷 𝑛 𝑛 − 1 − 𝑰 − 𝑲 𝑛 𝑪 𝑛 𝑷 𝑛 𝑛 − 1 𝑪𝐻 𝑛 + 𝑲 𝑛 𝑸𝑣 𝑛 𝑲𝐻 (𝑛)
o From ㉘, however, it follows that the second term is equal to zero, which leads to the desired expression for the error covariance matrix
𝑷 𝑛 𝑛 = 𝑰−𝑲 𝑛 𝑪 𝑛 𝑷 𝑛 𝑛−1 →㉚
o Thus far we have derived the Kalman filtering equations for recursively estimating the state vector 𝑿(𝑛).
o All that needs to be done to complete the recursion is to determine how the recursion should be initialized at time 𝑛 = 0.
o Since the value of the initial state is unknown, in the absence of any observed data at time 𝑛 = 0, the initial estimate is chosen to be
𝑿 00 =𝐸 𝑿 0
o And, for the initial value for the error covariance matrix, we have
𝑷 0 0 = 𝐸 𝑿 0 𝑿𝐻 0
o This choice for the initial conditions makes 𝑿 0 0 an unbiased estimate of 𝑿(0) and ensures that 𝑿 𝑛 𝑛 will be unbiased for all 𝑛.
o This completes the derivation of the discrete Kalman filter which is summarized in Table 7.4.

o One interesting property to note about the Kalman filter is that the Kalman gain 𝑲(𝑛) and the error covariance matrix 𝑷(𝑛|𝑛) do not
depend on the data 𝑿(𝑛).

o Therefore, it is possible for both of these teams to be computed off-line prior to any filtering.

o The goal of the discrete Kalman filter is to use the measurements, 𝒚(𝑛), to estimate the state 𝑿(𝑛) of a dynamic system. The Kalman
filter is remarkably versatile and powerful recursive estimation algorithm that has found applications in a wide variety of different areas
including spacecraft orbit determination, radar tracking, estimation and prediction of target trajectories, adaptive equalization of
telephone channels, adaptive equalization of fading dispersive channels, and adaptive antenna arrays.

You might also like