Aosp ch11
Aosp ch11
Wavelets
7 7
11
Wiener Filtering
levels
levels
6 6
5 5
7 7 The problem of estimating one signal from another is one of the most important in
signal processing. In many applications, the desired signal is not available or observable
directly. Instead, the observable signal is a degraded or distorted version of the original
levels
levels
6 6 signal. The signal estimation problem is to recover, in the best way possible, the desired
signal from its degraded replica.
We mention some typical examples: (1) The desired signal may be corrupted by
5 5
strong additive noise, such as weak evoked brain potentials measured against the strong
background of ongoing EEGs; or weak radar returns from a target in the presence of
0 0.25 0.5 0.75 1 0 64 128 192 256
time UWT index strong clutter. (2) An antenna array designed to be sensitive towards a particular “look”
direction may be vulnerable to strong jammers from other directions due to sidelobe
Fig. 10.9.1 UWT/DWT decompositions and wavelet coefficients of housing data. leakage; the signal processing task here is to null the jammers while at the same time
maintaining the sensitivity of the array towards the desired look direction. (3) A signal
transmitted over a communications channel can suffer phase and amplitude distortions
10.3 Prove the downsampling replication property (10.4.11) by working backwards, that is, start and can be subject to additive channel noise; the problem is to recover the transmitted
from the Fourier transform expression and show that
signal from the distorted received signal. (4) A Doppler radar processor tracking a
1
L−1
moving target must take into account dynamical noise—such as small purely random
X(f − mfsdown )= s(k)x(k)e−2πjf k/fs = x(nL)e−2πjf nL/fs = Ydown (f )
L n
accelerations—affecting the dynamics of the target, as well as measurement errors. (5)
m=0 k
An image recorded by an imaging system is subject to distortions such as blurring due to
where s(k) is the periodic “sampling function” with the following representations: motion or to the finite aperture of the system, or other geometric distortions; the prob-
1 lem here is to undo the distortions introduced by the imaging system and restore the
L−1
1 1 − e−2πjk
s(k)= e−2πjkm/L = = δ(k − nL) original image. A related problem, of interest in medical image processing, is that of re-
L m=0
L 1 − e−2πjk/L n
constructing an image from its projections. (6) In remote sensing and inverse scattering
Moreover, show that the above representations are nothing but the inverse L-point DFT of applications, the basic problem is, again, to infer one signal from another; for example,
the DFT of one period of the periodic pulse train: to infer the temperature profile of the atmosphere from measurements of the spectral
s(k)= [. . . , 1, 0, 0, . . . , 0, 1, 0, 0, . . . , 0, 1, 0, 0, . . . , 0, . . . ]= δ(k − nL) distribution of infrared energy; or to deduce the structure of a dielectric medium, such
n
L−1 zeros L−1 zeros L−1 zeros as the ionosphere, by studying its response to electromagnetic wave scattering; or, in
oil exploration to infer the layered structure of the earth by measuring its response to
10.4 Show that the solution to the optimization problem (10.7.7) is the soft-thresholding rule of
an impulsive input near its surface.
Eq. (10.7.8).
In this chapter, we pose the signal estimation problem and discuss some of the
10.5 Study the “Tikhonov regularizer” wavelet thresholding function:
criteria used in the design of signal estimation algorithms.
|d|a We do not present a complete discussion of all methods of signal recovery and es-
dthr = f (d, λ, a)= d , a > 0, λ > 0
|d|a + λa timation that have been invented for applications as diverse as those mentioned above.
476 11. Wiener Filtering 11.1. Linear and Nonlinear Estimation of Signals 477
Our emphasis is on traditional linear least-squares estimation methods, not only be- the vectors ⎡ ⎤ ⎡ ⎤
cause they are widely used, but also because they have served as the motivating force xn yn
⎢ a ⎥ ⎢ a ⎥
for the development of other estimation techniques and as the yardstick for evaluating ⎢ xna +1 ⎥ ⎢ yna +1 ⎥
⎢ ⎥ ⎢ ⎥
x=⎢ . ⎥, y=⎢ . ⎥
them. ⎢ .. ⎥ ⎢ .. ⎥
⎣ ⎦ ⎣ ⎦
We develop the theoretical solution of the Wiener filter both in the stationary and xnb ynb
nonstationary cases, and discuss its connection to the orthogonal projection, Gram-
Schmidt constructions, and correlation canceling ideas of Chap. 1. By means of an ex- For each value of n, we seek the functional dependence
ample, we introduce Kalman filtering concepts and discuss their connection to Wiener
x̂n = x̂n (y)
filtering and to signal modeling. Practical implementations of the Wiener filter are dis-
cussed in Chapters 12 and 16. Other signal recovery methods for deconvolution applica- of x̂n on the given observation vector y that provides the best estimate of xn .
tions that are based on alternative design criteria are briefly discussed in Chap. 12, where
we also discuss some interesting connections between Wiener filtering/linear prediction 1. The criterion for the MAP estimate is to maximize the a posteriori conditional
methods and inverse scattering methods. density of xn given that y already occurred; namely,
The resulting estimate x̂n will be a function of the observations yn . If the optimal This criterion selects x̂n as though the already collected observations y were the
processor is linear, such as a linear filter, then the estimate x̂n will be a linear function most likely to occur.
of the observations. We are going to concentrate mainly on linear processors. However,
3. The MS criterion minimizes the mean-square estimation error
we would like to point out that, depending on the estimation criterion, there are cases
where the estimate x̂n may turn out to be a nonlinear function of the yn s. E = E[e2n ]= min, where en = xn − x̂n (11.1.3)
We discuss briefly four major estimation criteria for designing such optimal proces-
sors. They are:
that is, the best choice of the functional dependence x̂n = x̂n (y) is sought that
minimizes this expression. We know from our results of Sec. 1.4 that the required
(1) The maximum a posteriori (MAP) criterion.
solution is the corresponding conditional mean
(2) The maximum likelihood (ML) criterion.
(3) The mean square (MS) criterion. x̂n = E[xn |y]= MS estimate (11.1.4)
(4) The linear mean-square (LMS) criterion.
To explain the various estimation criteria, let us assume that the desired signal xn x̂n = h(n, i)yi (11.1.5)
i=na
is to be estimated over a finite time interval na ≤ n ≤ nb Without loss of generality, we
may assume that the observed signal yn is also available over the same interval. Define For each n, the weights h(n, i), na ≤ i ≤ nb are selected to minimize the mean-
† Note that the acronym LMS is also used in the context of adaptive filtering, for least mean-square. square estimation error
E = E[e2n ]= E (xn − x̂n )2 = minimum (11.1.6)
478 11. Wiener Filtering 11.1. Linear and Nonlinear Estimation of Signals 479
With the exception of the LMS estimate, all other estimates x̂n (y) are, in general, that the maximum among the three coefficients p(y|1), p(y|0), p(y| − 1) will determine
nonlinear functions of y. the value of x. Thus, for a given y we select that x that
Example 11.1.1: If both xn and y are zero-mean and jointly gaussian, then Examples 1.4.1 and p(y|x)= maximum of p(y|1), p(y|0), p(y| − 1)}
1.4.2 imply that the MS and LMS estimates of xn are the same. Furthermore, since p(xn |y)
is gaussian it will be symmetric about its maximum, which occurs at its mean, that is, at Using the gaussian nature of p(y|x), we find equivalently
E[xn |y]. Therefore, the MAP estimate of xn is equal to the MS estimate. In conclusion, for
zero-mean jointly gaussian xn and y, the three estimates MAP, MS, and LMS coincide.
(y − c x)2 = minimum of (y − c)2 , y2 , (y + c)2 }
Example 11.1.2: To see the nonlinear character and the differences among the various esti-
Subtracting y2 from both sides, dividing by cT c, and denoting
mates, consider the following example: A discrete-amplitude, constant-in-time signal x
can take on the three values cT y
x = −1, x = 0, x=1 ȳ =
cT c
each with probability of 1/3. This signal is placed on a known carrier waveform cn and
we find the equivalent equation
transmitted over a noisy channel. The received samples are of the form
where vn are zero-mean white gaussian noise samples of variance σv2 , assumed to be inde- and in particular, applying these for +1, 0, −1, we find
pendent of x. The above set of measurements can be written in an obvious vector notation
⎧
⎪ 1
y = cx + v ⎪
⎪ 1, if ȳ >
⎪
⎪ 2
⎨ 1 1
x̂MAP = 0, if − < ȳ <
(a) Determine the conditional densities p(y|x) and p(x|y). ⎪
⎪ 2 2
⎪
⎪
⎪
⎩ −1, 1
(b) Determine and compare the four alternative estimates MAP, ML, MS, and LMS. if ȳ < −
2
Solution: To compute p(y|x), note that if x is given, then the only randomness left in y arises To determine the ML estimate, we must maximize p(y|x) with respect to x. The ML esti-
from the noise term v. Since vn are uncorrelated and gaussian, they will be independent; mate does not require knowledge of the a priori probability density p(x) of x. Therefore,
therefore, differentiating p(y|x) with respect to x and setting the derivative to zero gives
⎡ ⎤
M
−M/2 1 2⎦
M
∂ ∂ ∂
p(y|x) = p(v)= p(vn )= 2πσv2 exp ⎣− vn p(y|x)= 0 or ln p(y|x)= 0 or (y − c x)2 = 0
n=1
2σv2 n=1 ∂x ∂x ∂x
−M/2 1 2 −M/2 1 which gives
= 2πσv2 exp − 2v = 2πσv2 exp − 2 (y − c x)
2
2σv 2σv
cT y
Using Bayes’ rule we find p(x|y)= p(y|x)p(x)/p(y). Since x̂ML = = ȳ
cT c
1
p(x)= δ(x − 1)+δ(x)+δ(x + 1) The MS estimate is obtained by computing the conditional mean
3
we find 1
E[x|y] = xp(x|y)dx = x p(y|1)δ(x − 1)+p(y|0)δ(x)+p(y| − 1)δ(x + 1) dx
A
1
p(x|y)= p(y|1)δ(x − 1)+p(y|0)δ(x)+p(y| − 1)δ(x + 1) 1
A = p(y|1)−p(y| − 1) , or,
A
where the constant A is
p(y|1)−p(y| − 1)
A = 3p(y)= 3 p(y|x)p(x)dx = p(y|1)+p(y|0)+p(y| − 1) x̂MS =
p(y|1)+p(y|0)+p(y| − 1)
To find the MAP estimate of x, the quantity p(x|y) must be maximized with respect to x. Canceling some common factors from the numerator and denominator, we find the simpler
Since the expression for p(x|y) forces x to be one of the three values +1, 0, −1, it follows expression
480 11. Wiener Filtering 11.2. Orthogonality and Normal Equations 481
All four estimates have been expressed in terms of ȳ. Note that the ML estimate is linear Since some of the observations are to the future of xn , the linear operation is not
but has a different slope than the LMS estimate. The nonlinearity of the various estimates causal. This does not present a problem if the sequence yn is already available and
is best seen in the following figure: stored in memory.
The optimal filtering problem, on the other hand, requires the linear operation
(11.1.5) to be causal, that is, only those observations that are in the present and past of
the current sample xn must be used in making up the estimate x̂n . This requires that
the matrix of optimal weights h(n, i) be lower triangular, that is,
Thus, in reference to the figure below, only the shaded portion of the observation
interval is used at the current time instant:
as channel equalization and echo cancellation; we also discuss two alternative adaptive These determine the optimal weights at the current time instant n. In the vector
implementations—the so-called “gradient lattice,” and the “recursive least-squares.” notation of Sec. 11.1, we write Eq. (11.2.3) as
Finally, the linear prediction problem is a special case of the optimal filtering problem
with the additional stipulation that observations only up to time instant n − D must be E[xyT ]= HE[yyT ]
used in obtaining the current estimate x̂n ; this is equivalent to the problem of predicting
D units of time into the future. The range of observations used in this case is shown where H is the matrix of weights h(n, i). The optimal H and the estimate are then
below:
n−D x̂ = Hy = E[xyT ]E[yyT ]−1 y
x̂n = h(n, i)yi
i=na This is identical to the correlation canceler of Sec. 1.4. The orthogonality equations
Of special interest to us will be the case of one-step prediction, corresponding to the (11.2.2) are precisely the correlation cancellation conditions. Extracting the nth row of
choice D = 1. This is depicted below: this matrix equation, we find an explicit expression for the nth estimate x̂n
1
n− x̂n = E[xn yT ]E[yyT ]−1 y
x̂n = h(n, i)yi
i=na which is recognized as the projection of the random variable xn onto the subspace
If we demand that the prediction be based only on the past M samples (from the spanned by the available observations; namely, Y = {yna , yna +1 , . . . , ynb }. This is a
current sample), we obtain the FIR version of the prediction problem, referred to as general result: The minimum mean-square linear estimate x̂n is the projection of xn onto
linear prediction based on the past M samples, which is depicted below: the subspace spanned by all the observations that are used to make up that estimate.
This result is a direct consequence of the quadratic minimization criterion (11.1.6) and
1
n−
M the orthogonal projection theorem discussed in Sec. 1.6.
x̂n = h(n, i)yi = h(n, n − m)yn−m Using the methods of Sec. 1.4, the minimized estimation error at time instant n is
i=n−M m=1
easily computed by
Next, we set up the orthogonality and normal equations for the optimal weights. We
begin with the smoothing problem. The estimation error is in this case
nb
En = E[en en ]= E[en xn ]= E xn − h(n, i)yi xn
i=na
nb
en = xn − x̂n = xn − h(n, i)yi (11.2.1)
nb
i=na = E[x2n ]− h(n, i)E[yi xn ]= E[x2n ]−E[xn yT ]E[yyT ]−1 E[yxn ]
Differentiating the mean-square estimation error (11.1.6) with respect to each weight i=na
h(n, i), na ≤ i ≤ nb , and setting the derivative to zero, we obtain the orthogonality
which corresponds to the diagonal entries of the covariance matrix of the estimation
equations that are enough to determine the weights:
error e :
∂E ∂en Ree = E[eeT ]= E[xxT ]−E[xyT ]E[yyT ]−1 E[yxT ]
= 2E e n = −2E[en yi ]= 0 , for na ≤ i ≤ nb , or,
∂h(n, i) ∂h(n, i) The optimum filtering problem is somewhat more complicated because of the causal-
Rey (n, i)= E[en yi ]= 0 (orthogonality equations) (11.2.2) ity condition. In this case, the estimate at time n is given by
observation yi used in making up the estimate x̂n . The orthogonality equations provide x̂n = h(n, i)yi (11.2.4)
i=na
exactly as many equations as there are unknown weights.
Inserting Eq. (11.2.1) for en , the orthogonality equations may be written in an equiv- Inserting this into the minimization criterion (11.1.6) and differentiating with respect
alent form, known as the normal equations to h(n, i) for na ≤ i ≤ n, we find again the orthogonality conditions
nb
E xn − h(n, k)yk yi = 0 , or, Rey (n, i)= E[en yi ]= 0 for na ≤ i ≤ n (11.2.5)
k=na
where the most important difference from Eq. (11.2.2) is the restriction on the range
nb
of i, that is, en is decorrelated only from the present and past values of yi . Again, the
E[xn yi ]= h(n, k)E[yk yi ] (normal equations) (11.2.3)
estimation error en is orthogonal to each observation yi that is being used to make up
k=na
484 11. Wiener Filtering 11.3. Stationary Wiener Filter 485
the estimate. The orthogonality equations can be converted into the normal equations Making a change of variables i → i + d and k → k + d, we rewrite Eq. (11.3.1) as
as follows:
n
n
E[en yi ]= E xn − h(n, k)yk yi = 0 , or, Rxy (n+d, i+d)= h(n+d, k+d)Ryy (k+d, i+d), for na −d ≤ i ≤ n (11.3.2)
k=na k=na −d
n Now, if we assume stationarity, Eqs. (11.2.7) and (11.3.2) become
E[xn yi ]= h(n, k)E[yk yi ] for na ≤ i ≤ n , or, (11.2.6)
k=na
n
Rxy (n − i) = h(n, k)Ryy (k − i) , for na ≤ i ≤ n
n k=na
Rxy (n, i)= h(n, k)Ryy (k, i) for na ≤ i ≤ n (11.2.7) (11.3.3)
k=na
n
Rxy (n − i) = h(n + d, k + d)Ryy (k − i) , for na − d ≤ i ≤ n
Such equations are generally known as Wiener-Hopf equations. Introducing the vec- k=na −d
tor of observations up to the current time n, namely,
If it were not for the differences in the ranges of i and k, these two equations would
T be the same. But this is exactly what happens when we make the second assumption
yn = [yna , yna +1 , . . . , yn ]
that na = −∞. Therefore, by uniqueness of the solution, we find in this case
we may write Eq. (11.2.6) in vector form as
h(n + d, k + d)= h(n, k)
E[xn yT
n ]= h(n, na ), h(n, na + 1), . . . , h(n, n) E[yn yT
n] and since d is arbitrary, it follows that h(n, k) must be a function of the difference of
its arguments, that is,
which can be solved for the vector of weights
h(n, k)= h(n − k) (11.3.4)
T −1
h(n, na ), h(n, na + 1), . . . , h(n, n) = E[xn yT
n ]E[yn yn ] Thus, the optimal linear processor becomes a shift-invariant causal linear filter and
the estimate is given by
and for the estimate x̂n : ∞
T −1
n
x̂n = E[xn yT
n ]E[yn yn ] yn (11.2.8) x̂n = h(n − i)yi = h(i)yn−i (11.3.5)
i=−∞ i=0
Again, x̂n is recognized as the projection of xn onto the space spanned by the ob-
servations that are used in making up the estimate; namely, Yn = {yna , yna +1 , . . . , yn }. and Eq. (11.3.3) becomes in this case
This solution of Eqs. (11.2.5) and (11.2.7) will be discussed in more detail in Sec. 11.8,
n
using covariance factorization methods. Rxy (n − i)= h(n, k)Ryy (k − i) , for − ∞ < i ≤ n
k=−∞
and autocorrelation appearing in Eq. (11.2.7) become functions of the differences of their and written in matrix form
arguments. The second assumption is to take the initial time na to be the infinite past, ⎡ ⎤⎡ ⎤ ⎡ ⎤
Ryy (0) Ryy (1) Ryy (2) Ryy (3) ··· h(0) Rxy (0)
na = −∞, that is, the observation interval is Yn = {yi , −∞ < i ≤ n}. ⎢ R (1)
⎢ yy Ryy (0) Ryy (1) Ryy (2) ··· ⎥ ⎢ ⎥ ⎢ ⎥
⎥ ⎢ h(1) ⎥ ⎢ Rxy (1) ⎥
The assumption of stationarity can be used as follows: Suppose we have the solution ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ Ryy (2) Ryy (1) Ryy (0) Ryy (1) ··· ⎥ ⎢ h(2) ⎥ ⎢ Rxy (2) ⎥
of h(n, i) of Eq. (11.2.7) for the best weights to estimate xn , and wish to determine the ⎢ ⎥⎢ ⎥=⎢ ⎥ (11.3.7)
⎢ R (3) · · · ⎥ ⎢ h(3) ⎥ ⎢ Rxy (3) ⎥
⎥ ⎢ ⎥ ⎢
best weights h(n + d, i), na ≤ i ≤ n + d for estimating the sample xn+d at the future ⎢ yy Ryy (2) Ryy (1) Ryy (0) ⎥
⎣ . .. .. .. ⎦⎣ . ⎦ ⎣ . ⎦
time n + d. Then, the new weights will satisfy the same equations as (11.2.7) with the .. . . . .. ..
changes
These are the discrete-time Wiener-Hopf equations. Were it not for the restriction
n+d n ≥ 0 (which reflects the requirement of causality), they could be solved easily by z-
Rxy (n + d, i)= h(n + d, k)Ryy (k, i), for na ≤ i ≤ n + d (11.3.1) transform methods. As written above, they require methods of spectral factorization
k=na for their solution.
486 11. Wiener Filtering 11.4. Construction of the Wiener Filter by Prewhitening 487
Before we discuss such methods, we mention in passing the continuous-time version Rx2 y = 0, then Rxy = Rx1 y . Therefore, the solution of Eq. (11.3.7) for the best weights to
of the Wiener-Hopf equation: estimate xn is also the solution for the best weights to estimate x1 (n). The filter may
∞ also be thought of as the optimal signal separator of the two signal components x1 (n)
Rxy (t)= Ryy (t − t )h(t ) dt , t≥0 and x2 (n).
0
We also consider the FIR Wiener filtering problem in the stationary case. The obser-
vation interval in this case is Yn = {yi , n − M ≤ i ≤ n}. Using the same arguments as 11.4 Construction of the Wiener Filter by Prewhitening
above we have h(n, i)= h(n − i), and the estimate x̂n is obtained by an ordinary FIR
The normal equations (11.3.6) would have a trivial solution if the sequence yn were a
linear filter
white-noise sequence with delta-function autocorrelation. Thus, the solution procedure
n
is first to whiten the sequence yn and then solve the normal equations. To this end, let
x̂n = h(n − i)yi = h(0)yn + h(1)yn−1 + · · · + h(M)yn−M (11.3.8) yn have a signal model, as guaranteed by the spectral factorization theorem
i=n−M
where the (M+ 1) filter weights h(0), h(1), . . . , h(M) are obtained by the (M+ 1)×(M+ Syy (z)= σ2 B(z)B(z−1 ) (11.4.1)
1) matrix version of the Wiener-Hopf normal equations:
⎡ ⎤⎡ ⎤ ⎡ ⎤ where n is the driving white noise, and B(z) a minimal-phase filter. The problem
Ryy (0) Ryy (1) Ryy (2) ··· Ryy (M) h(0) Rxy (0)
⎢ R (1) ··· Ryy (M − 1) ⎥ ⎢ h(1) ⎥ ⎢ Rxy (1) ⎥
⎥ ⎢ ⎥ ⎢ of estimating xn in terms of the sequence yn becomes equivalent to the problem of
⎢ yy Ryy (0) Ryy (1) ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥ estimating xn in terms of the white-noise sequence n :
⎢ Ryy (2) Ryy (1) Ryy (0) ··· Ryy (M − 2) ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ h(2) ⎥ = ⎢ Rxy (2) ⎥
⎢. .. .. .. ⎥⎢ . ⎥ ⎢ . ⎥
⎢. ⎥⎢ . ⎥ ⎢ .. ⎥
⎣. . . . ⎦⎣ . ⎦ ⎣ ⎦
Ryy (M) Ryy (M − 1) Ryy (M − 2) ··· Ryy (0) h(M) Rxy (M)
(11.3.9)
If we could determine the combined filter
Exploiting the Toeplitz property of the matrix Ryy , the above matrix equation can
be solved efficiently using Levinson’s algorithm. This will be discussed in Chap. 12.
F(z)= B(z)H(z)
In Chap. 16, we will consider adaptive implementations of the FIR Wiener filter which
produce the optimal filter weights adaptively without requiring prior knowledge of the
we would then solve for the desired Wiener filter H(z)
autocorrelation and cross-correlation matrices Ryy and Rxy and without requiring any
matrix inversion. F(z)
H(z)= (11.4.2)
B(z)
where Sx (z) + denotes the causal part of the double-sided z-transform Sx (z). Gen- where wn is white noise of variance σw 2
= 0.82. Enough information is given above to
erally, the causal part of a z-transform determine the required power spectral densities Sxy (z) and Syy (z). First, we note that
the signal generator transfer function for xn is
∞
−1
∞
G(z)= gn z−n = gn z−n + gn z−n 1
n=−∞ n=−∞ n=0 M(z)=
z − 0.6
is defined as
∞
so that
G(z) + = gn z−n
0.82 0.82
n=0 2
Sxx (z)= σw M(z)M(z−1 )= =
(z − 0.6)(z−1 − 0.6) (1 − 0.6z−1 )(1 − 0.6z)
The causal instruction in Eq. (11.4.5) was necessary since the above solution for fn
was valid only for n ≥ 0. Since yn is the output of the filter B(z) driven by n , it follows Then, we find
that
Sxy (z) 0.82
Sxy (z)= Sx (z)B(z−1 ) or Sx (z)= Sxy (z) = Sx(x+v) (z)= Sxx (z)+Sxv (z)= Sxx (z)=
B(z−1 ) (1 − 0.6z−1 )(1 − 0.6z)
Combining Eqs. (11.4.2) and (11.4.5), we finally find Syy (z) = S(x+v)(x+v) (z)= Sxx (z)+Sxv (z)+Svx (z)+Svv (z)= Sxx (z)+Svv (z)
1 Sxy (z) 0.82 0.82 + (1 − 0.6z−1 )(1 − 0.6z)
H(z)= (Wiener filter) (11.4.6) = +1=
2
σ B(z) B(z−1 ) + (1 − 0.6z−1 )(1 − 0.6z) (1 − 0.6z−1 )(1 − 0.6z)
−1
Thus, the construction of the optimal filter first requires the spectral factorization of 2(1 − 0.3z )(1 − 0.3z) 1 − 0.3z−1 1 − 0.3z
= −
=2· ·
Syy (z) to obtain B(z), and then use of the above formula. This is the optimal realizable (1 − 0.6z )(1 − 0.6z)
1 1 − 0.6z−1 1 − 0.6z
Wiener filter based on the infinite past. If the causal instruction is ignored, one obtains
= σ2 B(z)B(z−1 )
the optimal unrealizable Wiener filter
Then according to Eq. (11.4.6), we must compute the causal part of
Sxy (z) Sxy (z)
Hunreal (z)= = (11.4.7)
σ2 B(z)B(z−1 ) Syy (z) 0.82
Sxy (z) (1 − 0.6z−1 )(1 − 0.6z) 0.82
The minimum value of the mean-square estimation error can be conveniently ex- G(z)= = =
B(z−1 ) 1 − 0.3z (1 − 0.6z−1 )(1 − 0.3z)
pressed by a contour integral, as follows
1 − 0.6z
E = E[e2n ]= E en (xn − x̂n ) = E[en xn ]−E[en x̂n ]= E[en xn ]= Rex (0) This may be done by partial fraction expansion, but the fastest way is to use the
contour inversion formula to compute gk for k ≥ 0, and then resum the z-transform:
dz dz
= Sex (z) = Sxx (z)−Sx̂x (z) , or,
u.c. 2πjz u.c. 2πjz dz 0.82zk dz
gk = G(z)zk =
dz u.c. 2πjz u.c. (1 − 0.3z)(z − 0.6) 2πj
E= Sxx (z)−H(z)Syx (z) (11.4.8)
u.c. 2πjz 0.82(0.6)k
= (residue at z = 0.6) = = (0.6)k , k≥0
1 − (0.3)(0.6)
11.5 Wiener Filter Example
Resumming, we find the causal part
This example, in addition to illustrating the above ideas, will also serve as a short intro- ∞
1
duction to Kalman filtering. It is desired to estimate the signal xn on the basis of noisy G(z) + = gk z−k =
1 − 0.6z−1
observations k=0
yn = xn + vn
Finally, the optimum Wiener estimation filter is
where vn is white noise of unit variance, σv2 = 1, uncorrelated with xn . The signal xn is
a first order Markov process, having a signal model 1 Sxy (z) G(z) + 0.5
H(z)= = = (11.5.1)
σ2 B(z) B(z−1 ) + σ2 B(z) 1 − 0.3z−1
xn+1 = 0.6xn + wn
490 11. Wiener Filtering 11.6. Wiener Filter as Kalman Filter 491
which can be realized as the difference equation If this prediction were perfect, and if the next observation yn were noise free, then
this would be the value that we would observe. Since we actually observe yn , the obser-
x̂n = 0.3x̂n−1 + 0.5yn (11.5.2) vation or innovations residual will be
The estimation error is also easily computed using the contour formula of Eq. (11.4.8): αn = yn − ŷn/n−1 (11.6.4)
dz This quantity represents that part of yn that cannot be predicted on the basis of
E = E[e2n ]= σe2 = Sxx (z)−H(z)Syx (z) = 0.5
the previous observations Yn−1 . It represents the truly new information contained in
u.c. 2πjz
the observation yn . Actually, if we are making the best prediction possible, then the
To appreciate the improvement afforded by filtering, this error must be compared
most we can expect of our prediction is to make the innovations residual a white-noise
with the error in case no processing is made and yn is itself taken to represent a noisy
(uncorrelated) signal, that is, what remains after we make the best possible prediction
estimate of xn . The estimation error in the latter case is yn − xn = vn , so that σv2 = 1.
should be unpredictable. According to the general discussion of the relationship be-
Thus, the gain afforded by processing is
tween signal models and linear prediction given in Sec. 1.17, it follows that if ŷn/n−1 is
σe2 the best predictor of yn then αn must be the whitening sequence that drives the signal
= 0.5 or 3 dB model of yn . We shall verify this fact shortly. This establishes an intimate connection
σv2
between the Wiener/Kalman filtering problem and the signal modeling problem. If we
overestimate the observation yn the innovation residual will be negative; and if we un-
11.6 Wiener Filter as Kalman Filter
derestimate it, the residual will be positive. In either case, we would like to correct our
We would like to cast this example in a Kalman filter form. The difference equation tentative estimate in the right direction. This may be accomplished by
Eq. (11.5.2) for the Wiener filter seems to have the “wrong” state transition matrix; x̂n/n = x̂n/n−1 + G(yn − ŷn/n−1 )= 0.6x̂n−1/n−1 + G(yn − 0.6x̂n−1/n−1 ) (11.6.5)
namely, 0.3 instead of 0.6, which is the state matrix for the state model of xn . How-
ever, it is not accidental that the Wiener filter difference equation may be rewritten in where the gain G, known as the Kalman gain, should be a positive quantity. The pre-
the alternative form diction/correction procedure defined by Eqs. (11.6.2) through (11.6.5) is known as the
x̂n = 0.6x̂n−1 + 0.5(yn − 0.6x̂n−1 ) Kalman filter. It should be clear that any value for the gain G will provide an estimate,
even if suboptimal, of xn . Our solution for the Wiener filter has precisely the above
The quantity x̂n is the best estimate of xn , at time n, based on all the observations
structure with a gain G = 0.5. This value is optimal for the given example. It is a very
up to that time, that is, Yn = {yi , −∞ < i ≤ n}. To simplify the subsequent notation,
instructive exercise to show this in two ways: First, with G arbitrary, the estimation filter
we denote it by x̂n/n . It is the projection of xn on the space Yn . Similarly, x̂n−1 denotes
of Eq. (11.6.5) has transfer function
the best estimate of xn−1 , based on the observations up to time n − 1, that is, Yn−1 =
{yi , −∞ < i ≤ n − 1}. The above filtering equation is written in this notation as G
H(z)=
1 − 0.6(1 − G)z−1
x̂n/n = 0.6x̂n−1/n−1 + 0.5(yn − 0.6x̂n−1/n−1 ) (11.6.1)
Insert this expression into the mean-square estimation error E = E[e2n ], where en =
It allows the computation of the current best estimate x̂n/n , in terms of the previous
xn − x̂n/n , and minimize it with respect to the parameter G. This should give G = 0.5.
best estimate x̂n−1/n−1 and the new observation yn that becomes available at the current
Alternatively, G should be such that to render the innovations residual (11.6.4) a
time instant n.
white noise signal. In requiring this, it is useful to use the spectral factorization model
The various terms of Eq. (11.6.1) have nice interpretations: Suppose that the best
for yn , that is, the fact that yn is the output of B(z) when driven by the white noise
estimate x̂n−1/n−1 of the previous sample xn−1 is available. Even before the next obser-
signal n . Working with z-transforms, we have:
vation yn comes in, we may use this estimate to make a reasonable prediction as to what
the next best estimate ought to be. Since we know the system dynamics of xn , we may α(z) = Y(z)−0.6z−1 X̂(z)= Y(z)−0.6z−1 H(z)Y(z)
try to “boost” x̂n−1/n−1 to the next time instant n according to the system dynamics,
that is, we take −1 G 1 − 0.6z−1
= 1 − 0.6z Y(z)= Y(z)
1 − 0.6(1 − G)z−1 1 − 0.6(1 − G)z−1
x̂n/n−1 = 0.6x̂n−1/n−1 = prediction of xn on the basis of Yn−1 (11.6.2)
1 − 0.6z−1 1 − 0.3z−1 1 − 0.3z−1
= (z)= (z)
Since yn = xn + vn , we may use this prediction of xn to make a prediction of the 1 − 0.6(1 − G)z−1 1 − 0.6z−1 1 − 0.6(1 − G)z−1
next measurement yn , that is, we take
Since n is white, it follows that the transfer function relationship between αn and
ŷn/n−1 = x̂n/n−1 = prediction of yn on the basis of Yn−1 (11.6.3) n must be trivial; otherwise, there will be sequential correlations present in αn . Thus,
492 11. Wiener Filtering 11.6. Wiener Filter as Kalman Filter 493
we must have 0.6(1 − G)= 0.3, or G = 0.5; and in this case, αn = n . It is also possible where we used the filtering equation X1 (z)= zX(z). The spectral density of yn can be
to set 0.6(1 − G)= 1/0.3, but this would correspond to an unstable filter. factored as follows:
We have obtained a most interesting result; namely, that when the Wiener filtering
c2 Q
problem is recast into its Kalman filter form given by Eq. (11.6.1), then the innovations Syy (z) = c2 Sxx (z)+Svv (z)= +R
(1 − az−1 )(1 − az)
residual αn , which is computable on line with the estimate x̂n/n , is identical to the
whitening sequence n of the signal model of yn . In other words, the Kalman filter can c2 Q + R(1 − az−1 )(1 − az) 2 1 − f z−1 1 − fz
= ≡ σ
be thought of as the whitening filter for the observation signal yn . (1 − az−1 )(1 − az) 1 − az−1 1 − az
To appreciate further the connection between Wiener and Kalman filters and between
Kalman filters and the whitening filters of signal models, we consider a generalized where f and σ2 satisfy the equations
version of the above example and cast it in standard Kalman filter notation.
It is desired to estimate xn from yn . The signal model for xn is taken to be the f σ2 = aR (11.6.9)
first-order autoregressive model
(1 + f 2 )σ2 = c2 Q + (1 + a2 )R (11.6.10)
xn+1 = axn + wn (state model) (11.6.6) and f has magnitude less than one. Thus, the corresponding signal model for yn is
which is the precise justification of Eq. (11.6.2). The difference equations of the two where the term v̂n/n−1 was dropped. This term represents the estimate of vn on the
filters are basis of the past ys; that is, Yn−1 . Since vn is white and also uncorrelated with xn , it
x̂n+1/n = f x̂n/n−1 + Kyn follows that it will be uncorrelated with all past ys; therefore, v̂n/n−1 = 0. The second
(11.6.17)
x̂n/n = f x̂n−1/n−1 + Gyn way to show that ŷn/n−1 is the best prediction of yn is to show that the innovations
residual
Using the results of Problem 1.50, we may express all the quantities f , σ2 , K, and G αn = yn − ŷn/n−1 = yn − cx̂n/n−1 (11.6.23)
in terms of a single positive quantity P which satisfies the algebraic Riccati equation:
is a white-noise sequence and coincides with the whitening sequence n of yn . Indeed,
PRa2 working in the z-domain and using Eq. (11.6.17) and the signal model of yn we find
Q =P− (11.6.18)
R + c2 P
α(z) = Y(z)−cz−1 X̂1 (z)= Y(z)−cz−1 H1 (z)Y(z)
Then, we find the interrelationships
K 1 − (f + cK)z−1
acP Ra = 1 − cz−1 Y(z)= Y(z)
K = aG = , σ2 = R + c2 P , f = a − cK = (11.6.19) 1 − f z−1 1 − f z−1
R + c2 P R + c2 P
It is left as an exercise to show that the minimized mean-square estimation errors 1 − az−1 1
= Y(z)= Y(z)= (z)
are given in terms of P by 1 − f z−1 B(z)
A realization of the estimation filter based on (11.6.20) is shown below: Fig. 11.6.1 shows 100 samples of the observed signal yn together with the desired
signal xn . The signal yn processed through the Wiener filter H(z) defined by the above
parameters is shown in Fig. 11.6.2 together with xn . The tracking properties of the filter
are evident from the graph. It should be emphasized that this is the best one can do by
means of ordinary causal linear filtering.
ŷn/n−1 = cx!
n + vn = cx̂n/n−1 + v̂n/n−1 = cx̂n/n−1 g(k)= Rey (k)= E[en yn−k ] , for − ∞ < k < ∞ (11.7.1)
496 11. Wiener Filtering 11.8. Construction of the Wiener Filter by Covariance Factorization 497
3
G(z) = Sxy (z)−H(z)Syy (z)= Sxy (z)−H(z)σ2 B(z)B(z−1 )
2
−1 The z-transform B(z−1 ) is anticausal and, because of the gap conditions, so is the
−2 ratio G(z)/B(z−1 ). Therefore, taking causal parts of both sides and noting that the
x̂n/n product H(z)B(z) is already causal, we find
−3 xn
−4 Sxy (z)
0 20 40 60 80 100 0= − σ2 H(z)B(z)
n (time samples) B(z−1 ) +
This definition is motivated by the orthogonality equations which state that the 11.8 Construction of the Wiener Filter by Covariance Factor-
prediction error en must be orthogonal to all of the available observations; namely, ization
Yn = {yi , −∞ < i ≤ n} = {yn−k , k ≥ 0}. That is, for the optimal set of filter weights
we must have In this section, we present a generalization of the gapped-function method to the more
g(k)= Rey (k)= E[en yn−k ]= 0 , for k ≥ 0 (11.7.2) general non-stationary and/or finite-past Wiener filter. This is defined by the Wiener-
Hopf equations (11.2.7), which are equivalent to the orthogonality equations (11.2.5).
The latter are the non-stationary versions of the gapped function of the previous section.
The best way to proceed is to cast Eqs. (11.2.5) in matrix form as follows: Without loss
of generality we may take the starting point na = 0. The final point nb is left arbitrary.
Introduce the vectors ⎡ ⎤ ⎡ ⎤
x0 y0
⎢ ⎥ ⎢ ⎥
⎢ x1 ⎥ ⎢ y1 ⎥
⎢ ⎥ ⎢ ⎥
x=⎢ .. ⎥ , y=⎢ .. ⎥
⎢ ⎥ ⎢ ⎥
and g(k) develops a right-hand side gap. On the other hand, g(k) may be written in ⎣ . ⎦ ⎣ . ⎦
xnb ynb
498 11. Wiener Filtering 11.8. Construction of the Wiener Filter by Covariance Factorization 499
and the corresponding correlation matrices This is the most general solution of the Wiener filtering problem [18, 19]. It includes
the results of the stationary case, as a special case. Indeed, if all the signals are station-
Rxy = E[xyT ] , Ryy = E[yyT ] ary, then the matrices Rxy , B, and BT become Toeplitz and have a z-transform associated
The filtering equation (11.2.4) may be written in vector form as with them as discussed in Problem 1.51. Using the results of that problem, it is easily
seen that Eq. (11.8.7) is the time-domain equivalent of Eq. (11.4.6).
x̂ = Hy (11.8.1) The prewhitening approach of Sec. 11.4 can also be understood in the present matrix
framework. Making the change of variables
where H is the matrix of optimal weights {h(n, i)}. The causality of the filtering oper-
ation (11.8.1), requires H to be lower-triangular. The minimization problem becomes y = B
equivalent to the problem of minimizing the mean-square estimation error subject to
the constraint that H be lower-triangular. The minimization conditions are the normal we find that Rxy = E[xyT ]= E[x T ]BT = Rx BT , and therefore, Rxy B−T = Rx and the
equations (11.2.5) which, in this matrix notation, state that the matrix Rey has no lower- filter H becomes H = [Rx ]+ R−1 −1
B . The corresponding estimate is then
triangular (causal) part; or, equivalently, that Rey is strictly upper-triangular (i.e., even
the main diagonal of Rey is zero), therefore x̂ = Hy = HB
= F
, where F = HB = [Rx ]+ R−
1
(11.8.8)
The Kalman filter discussion of Sec. 11.6 and its equivalence to the Wiener filter was n = yn − ŷn/n−1 = (cxn + vn )−cx̂n/n−1 = cen/n−1 + vn
based on the asymptotic Kalman filter for which the observations were available from
the infinite past to the present, namely, {yi , −∞ < i ≤ n}. In Sec. 11.7, we solved the Using the fact that en/n−1 depends only on xn and Yn−1 , it follows that the two terms
most general Wiener filtering problem based on the finite past for which the observation in the right-hand side are uncorrelated with each other. Thus,
space was
E[2n ]= c2 E[e2n/n−1 ]+E[vn2 ]= c2 Pn/n−1 + R (11.9.5)
Yn = {y0 , y1 , . . . , yn } (11.9.1)
Here, we recast these results in a time-recursive form and obtain the time-varying also
Kalman filter for estimating xn based on the finite observation subspace Yn . We also E[n xn ]= cE[en/n−1 xn ]+E[vn xn ]= cPn/n−1 (11.9.6)
discuss its asymptotic properties for large n and show that it converges to the steady-
Therefore, the gain Gn is computable by
state Kalman filter of Sec. 11.6.
Our discussion is based on Eq. (11.8.9), which is essentially the starting point in E[n xn ] cPn/n−1
Gn = = (11.9.7)
Kalman’s original derivation [852]. To make Eq. (11.8.9) truly recursive, we must have a E[2n ] R + c2 Pn/n−1
means of recursively computing the required gain Gn from one time instant to the next.
As in Sec. 11.8, we denote by x̂n/n and x̂n/n−1 the optimal estimates of xn based on the Using Eqs. (11.9.4), (11.9.6), and (11.9.7) into Eq. (11.9.3), we obtain
observation subspaces Yn and Yn−1 , defined in Eq. (11.9.1), with the initial condition c2 Pn/n−1 RPn/n−1
x̂0/−1 = 0. Iterating the state and measurement models (11.6.6) and (11.6.7) starting at Pn/n = Pn/n−1 − Gn cPn/n−1 = Pn/n−1 − = (11.9.8)
R + c2 Pn/n−1 R + c2 Pn/n−1
n = 0, we obtain the following two results, previously derived for the steady-state case
The subtracted term in (11.9.8) represents the improvement in estimating xn using
x̂n+1/n = ax̂n/n , ŷn/n−1 = cx̂n/n−1 (11.9.2) x̂n/n over using x̂n/n−1 . Equations (11.9.3), (11.9.7), and (11.9.8) admit a nice geometrical
The proof of both is based on the linearity property of estimates; for example, interpretation [867]. The two right-hand side terms in n = cen/n−1 + vn are orthogonal
and can be represented by the orthogonal triangle
x̂n+1/n = ax!
n + wn = ax̂n/n + ŵn/n = ax̂n/n
where ŵn/n was set to zero because wn does not depend on any of the observations
Yn . This is seen as follows. The iteration of the state equation (11.6.6) leads to the
expression xn = an x0 + an−1 w0 + an−2 w1 + · · · + awn−2 + wn−1 . It follows from this
and Eq. (11.6.7) that the observation subspace Yn will depend only on
{x0 , w0 , w1 , . . . , wn−1 , v0 , v1 , . . . , vn } where the prediction error en/n−1 has been scaled up by the factor c. Thus, Eq. (11.9.5)
is the statement of the Pythagorean theorem for this triangle. Next, write the equation
Making the additional assumption that x0 is uncorrelated with wn it follows that
en/n = en/n−1 − Gn n as
wn will be uncorrelated with all random variables in the above set, and thus, with Yn .
en/n−1 = en/n + Gn n
The second part of Eq. (11.9.2) is shown by similar arguments. Next, we develop the
recursions for the gain Gn . Using Eq. (11.8.9), the estimation and prediction errors may Because en/n is orthogonal to all the observations in Yn and n is a linear combination
be related as follows of the same observations, it follows that the two terms in the right-hand side will be
orthogonal. Thus, en/n−1 may be resolved in two orthogonal parts, one being in the
en/n = xn − x̂n/n = xn − x̂n/n−1 − Gn n = en/n−1 − Gn n
direction of n . This is represented by the smaller orthogonal triangle in the previous
Taking the correlation of both sides with xn we find diagram. Clearly, the length of the side en/n is minimized at right angles at point A. It
follows from the similarity of the two orthogonal triangles that
E[en/n xn ]= E[en/n−1 xn ]−Gn E[n xn ] (11.9.3) " "
Gn E[2n ] c E[e2n/n−1 ]
Using the orthogonality properties E[en/n x̂n/n ]= 0 and E[en/n−1 x̂n/n−1 ]= 0, which " = "
follow from the optimality of the two estimates x̂n/n and x̂n/n−1 , we can write the mean- E[e2n/n−1 ] E[2n ]
square estimation and prediction errors as
which is equivalent to Eq. (11.9.7). Finally, the Pythagorean theorem applied to the
Pn/n = E[e2n/n ]= E[en/n xn ] , Pn/n−1 = E[e2n/n−1 ]= E[en/n−1 xn ] (11.9.4) smaller triangle implies E[e2n/n−1 ]= E[e2n/n ]+G2n E[2n ], which is equivalent to Eq. (11.9.8).
502 11. Wiener Filtering 11.9. The Kalman Filter 503
To obtain a truly recursive scheme, we need next to find a relationship between constant, we expect the solution of the Riccati equation (11.9.12) to converge, for large
Pn/n and the next prediction error Pn+1/n . It is found as follows. From the state model n, to some steady-state value Pn/n−1 → P. In this limit, the Riccati difference equation
(11.6.6) and (11.9.2), we have (11.9.12) tends to the steady-state algebraic Riccati equation (11.6.18), which determines
the limiting value P. The Kalman filter parameters will converge to the limiting values
en+1/n = xn+1 − x̂n+1/n = (axn + wn )−ax̂n/n = aen/n + wn fn → f , Kn → K, and Gn → G given by Eq. (11.6.19).
It is possible to solve Eq. (11.9.12) in closed form and explicitly demonstrate these
Because en/n depends only on xn and Yn , it follows that the two terms in the right- convergence properties. Using the techniques of [871,872], we obtain
hand side will be uncorrelated. Therefore, E[e2n+1/n ]= a2 E[e2n/n ]+E[wn
2
], or,
f 2n E0
Pn+1/n = a2 Pn/n + Q (11.9.9) Pn/n−1 = P + , for n = 0, 1, 2, . . . , (11.9.13)
1 + Sn E0
The first term corresponds to the propagation of the estimate x̂n/n forward in time where E0 = P0/−1 − P and
according to the system dynamics; the second term represents the worsening of the
estimate due to the presence of the dynamical noise wn . The Kalman filter algorithm is 1 − f 2n c2
Sn = B , B=
now complete. It is summarized below: 1 − f2 R + c2 P
We have already mentioned (see Problem 1.50) that the stability of the signal model
0. Initialize by x̂0/−1 = 0 and P0/−1 = E[x20 ].
and the positivity of the asymptotic solution P imply the minimum phase condition
1. At time n, x̂n/n−1 , Pn/n−1 , and the new measurement yn are available.
|f | < 1. Thus, the second term of Eq. (11.9.13) converges to zero exponentially with a
2. Compute ŷn/n−1 = cx̂n/n−1 , n = yn − ŷn/n−1 , and the gain Gn using Eq. (11.9.7). time constant determined by f .
3. Correct the predicted estimate x̂n/n = x̂n/n−1 +Gn n and compute its mean-square
error Pn/n , using Eq. (11.9.8). Example 11.9.1: Determine the closed form solutions of the time-varying Kalman filter for the
state and measurement models:
4. Predict the next estimate x̂n+1/n = ax̂n/n , and compute the mean-square predic-
tion error Pn+1/n , using Eq. (11.9.9).
xn+1 = xn + wn , yn = xn + vn
5. Go to the next time instant, n → n + 1.
with Q = 0.5 and R = 1. Thus, a = 1 and c = 1. The Riccati equations are
The optimal predictor x̂n/n−1 satisfies the Kalman filtering equation
Pn/n−1 P
Pn+1/n = + 0.5 , P= + 0 .5
x̂n+1/n = ax̂n/n = a(x̂n/n−1 + Gn n )= ax̂n/n−1 + aGn (yn − cx̂n/n−1 ) , or, 1 + Pn/n−1 1+P
x̂n+1/n = fn x̂n/n−1 + Kn yn (11.9.10) The solution of the algebraic Riccati equation is P = 1. This implies that f = aR/(R +
c2 P)= 0.5. To illustrate the solution (11.9.13), we take the initial condition to be zero
where we defined
P0/−1 = 0. We find B = c2 /(R + c2 P)= 0.5 and
Kn = aGn , fn = a − cKn (11.9.11)
2
These are the time-varying analogs of Eqs. (11.6.17) and (11.6.19). Equations (11.9.8) Sn = 1 − (0.5)2n
3
and (11.9.9) may be combined into one updating equation for Pn/n−1 , known as the
discrete Riccati difference equation Thus,
11.10 Problems Determine the optimal estimate of x(n) based on just these two samples in the form
11.1 Let x = [xna , . . . , xnb ]T and y = [yna , . . . , ynb ]T be the desired and available signal vectors. x̂(n)= h(n, na )y(na )+h(n, nb )y(nb )
The relationship between x and y is assumed to be linear of the form
for the following values of n: (a) na ≤ n ≤ nb , (b) n ≤ na , (c) n ≥ nb .
y = Cx + v 11.6 A stationary random signal xn is to be estimated on the basis of the noisy observations
(b) Repeat for the optimal linear predictor of order M = 2 for predicting xn on the basis Q + K2 R Q + K2 R
E = E[e2n/n−1 ]= = (P.2)
of the past two samples yn−1 and yn−2 . 1 − f2 1 − (a − cK)2
11.5 A stationary random signal x(n) has autocorrelation function Rxx (k)= σx2 a|k| , for all k.
(c) To select the optimal value of the Kalman gain K, differentiate E with respect to K
Consider a time interval [na , nb ]. The random signal x(n) is known only at the end-points
and set the derivative to zero. Show that the resulting equation for K can be expressed in
of that interval; that is, the only available observations are
the form
caP
y(na )= x(na ), y(nb )= x(nb ) K=
R + c2 P
506 11. Wiener Filtering 11.10. Problems 507
where P stands for the minimized value of E; that is, P = Emin . for n < i. To do this, introduce a set of Lagrange multipliers Λni for n < i, one for each
(d) Inserting this expression for K back into the expression (P.2) for E, show that the constraint equation, and incorporate them into an effective performance index
quantity P must satisfy the algebraic Riccati equation
J = E[eeT ]+ΛHT + HΛT = min
a2 RP
Q =P−
R + c2 P where the matrix Λ is strictly upper-triangular. Show that this formulation of the minimiza-
tion problem yields exactly the same solution as Eq. (11.8.7).
Thus, the resulting estimator filter is identical to the optimal one-step prediction filter dis-
11.13 Exponential Moving Average as Wiener Filter. The single EMA filter for estimating the local
cussed in Sec. 11.6.
level of a signal that we discussed in Chap. 6 admits a nice Wiener-Kalman filtering interpre-
11.9 Show that Eq. (P.2) of Problem 11.8 can be derived without using z-transforms, by using only tation. Consider the noisy random walk signal model,
stationarity, as suggested below: Using the state and measurement model equations and
Eq. (P. l), show that the estimation error en/n−1 satisfies the difference equation xn+1 = xn + wn
(11.10.1)
en+1/n = f en/n−1 + wn − Kvn yn = xn + vn
Then, invoking stationarity, derive Eq. (P.2). Using similar methods, show that the mean- where wn , vn are mutually uncorrelated, zero-mean, white noise signals of variances Q = σw 2
square estimation error is given by and R = σv . Based on the material in Section 12.6, show that the optimum Wiener/Kalman
2
filter for predicting xn from yn is equivalent to the exponential smoother, that is, show that
RP it is given by,
E[e2n/n ]=
R + c2 P x̂n+1/n = f x̂n/n−1 + (1 − f )yn (11.10.2)
where en/n = xn − x̂n/n is the estimation error of the optimal filter (11.6.13). so that the forgetting-factor parameter λ of EMA is identified as the closed-loop parameter
11.10 Consider the general example of Sec. 11.6. It was shown there that the innovations residual f of the Kalman filter, and show further that f is given in terms of Q, R as follows,
was the same as the whitening sequence n driving the signal model of yn "
Q 2 + 4QR − Q
1−f =
n = yn − ŷn/n−1 = yn − cx̂n/n−1 2R
6
observations yn
signal xn
12
prediction xn/n−1
4 Linear Prediction
2
−2
−4
0 50 100 150 200 250 300
time samples, n
12.1 Pure Prediction and Signal Modeling
In Sec. 1.17, we discussed the connection between linear prediction and signal modeling.
Here, we rederive the same results by considering the linear prediction problem as a
special case of the Wiener filtering problem, given by Eq. (11.4.6). Our aim is to cast
the results in a form that will suggest a practical way to solve the prediction problem
and hence also the modeling problem. Consider a stationary signal yn having a signal
model
Syy (z)= σ2 B(z)B(z−1 ) (12.1.1)
as guaranteed by the spectral factorization theorem. Let Ryy (k) denote the autocorre-
lation of yn :
Ryy (k)= E[yn+k yn ]
The linear prediction problem is to predict the current value yn on the basis of all the
past values Yn−1 = {yi , −∞ < i ≤ n − 1}. If we define the delayed signal y1 (n)= yn−1 ,
then the linear prediction problem is equivalent to the optimal Wiener filtering problem
of estimating yn from the related signal y1 (n). The optimal estimation filter H(z) is
given by Eq. (11.4.6), where we must identify xn and yn with yn and y1 (n) of the present
notation. Using the filtering equation Y1 (z)= z−1 Y(z), we find that yn and y1 (n) have
the same spectral factor B(z)
1
H(z)= zB(z) + (12.1.2)
B(z)