0% found this document useful (0 votes)
7 views18 pages

Aosp ch11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Aosp ch11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

474 10.

Wavelets

DWT decomposition DWT detail coefficients

7 7
11
Wiener Filtering
levels

levels
6 6

5 5

0 0.25 0.5 0.75 1 0 64 128 192 256


time DWT index

UWT decomposition UWT detail coefficients

7 7 The problem of estimating one signal from another is one of the most important in
signal processing. In many applications, the desired signal is not available or observable
directly. Instead, the observable signal is a degraded or distorted version of the original
levels

levels

6 6 signal. The signal estimation problem is to recover, in the best way possible, the desired
signal from its degraded replica.
We mention some typical examples: (1) The desired signal may be corrupted by
5 5
strong additive noise, such as weak evoked brain potentials measured against the strong
background of ongoing EEGs; or weak radar returns from a target in the presence of
0 0.25 0.5 0.75 1 0 64 128 192 256
time UWT index strong clutter. (2) An antenna array designed to be sensitive towards a particular “look”
direction may be vulnerable to strong jammers from other directions due to sidelobe
Fig. 10.9.1 UWT/DWT decompositions and wavelet coefficients of housing data. leakage; the signal processing task here is to null the jammers while at the same time
maintaining the sensitivity of the array towards the desired look direction. (3) A signal
transmitted over a communications channel can suffer phase and amplitude distortions
10.3 Prove the downsampling replication property (10.4.11) by working backwards, that is, start and can be subject to additive channel noise; the problem is to recover the transmitted
from the Fourier transform expression and show that
signal from the distorted received signal. (4) A Doppler radar processor tracking a
1   
L−1
moving target must take into account dynamical noise—such as small purely random
X(f − mfsdown )= s(k)x(k)e−2πjf k/fs = x(nL)e−2πjf nL/fs = Ydown (f )
L n
accelerations—affecting the dynamics of the target, as well as measurement errors. (5)
m=0 k
An image recorded by an imaging system is subject to distortions such as blurring due to
where s(k) is the periodic “sampling function” with the following representations: motion or to the finite aperture of the system, or other geometric distortions; the prob-
1   lem here is to undo the distortions introduced by the imaging system and restore the
L−1
1 1 − e−2πjk
s(k)= e−2πjkm/L = = δ(k − nL) original image. A related problem, of interest in medical image processing, is that of re-
L m=0
L 1 − e−2πjk/L n
constructing an image from its projections. (6) In remote sensing and inverse scattering
Moreover, show that the above representations are nothing but the inverse L-point DFT of applications, the basic problem is, again, to infer one signal from another; for example,
the DFT of one period of the periodic pulse train: to infer the temperature profile of the atmosphere from measurements of the spectral

s(k)= [. . . , 1, 0, 0, . . . , 0, 1, 0, 0, . . . , 0, 1, 0, 0, . . . , 0, . . . ]= δ(k − nL) distribution of infrared energy; or to deduce the structure of a dielectric medium, such
         n
L−1 zeros L−1 zeros L−1 zeros as the ionosphere, by studying its response to electromagnetic wave scattering; or, in
oil exploration to infer the layered structure of the earth by measuring its response to
10.4 Show that the solution to the optimization problem (10.7.7) is the soft-thresholding rule of
an impulsive input near its surface.
Eq. (10.7.8).
In this chapter, we pose the signal estimation problem and discuss some of the
10.5 Study the “Tikhonov regularizer” wavelet thresholding function:
criteria used in the design of signal estimation algorithms.
|d|a We do not present a complete discussion of all methods of signal recovery and es-
dthr = f (d, λ, a)= d , a > 0, λ > 0
|d|a + λa timation that have been invented for applications as diverse as those mentioned above.
476 11. Wiener Filtering 11.1. Linear and Nonlinear Estimation of Signals 477

Our emphasis is on traditional linear least-squares estimation methods, not only be- the vectors ⎡ ⎤ ⎡ ⎤
cause they are widely used, but also because they have served as the motivating force xn yn
⎢ a ⎥ ⎢ a ⎥
for the development of other estimation techniques and as the yardstick for evaluating ⎢ xna +1 ⎥ ⎢ yna +1 ⎥
⎢ ⎥ ⎢ ⎥
x=⎢ . ⎥, y=⎢ . ⎥
them. ⎢ .. ⎥ ⎢ .. ⎥
⎣ ⎦ ⎣ ⎦
We develop the theoretical solution of the Wiener filter both in the stationary and xnb ynb
nonstationary cases, and discuss its connection to the orthogonal projection, Gram-
Schmidt constructions, and correlation canceling ideas of Chap. 1. By means of an ex- For each value of n, we seek the functional dependence
ample, we introduce Kalman filtering concepts and discuss their connection to Wiener
x̂n = x̂n (y)
filtering and to signal modeling. Practical implementations of the Wiener filter are dis-
cussed in Chapters 12 and 16. Other signal recovery methods for deconvolution applica- of x̂n on the given observation vector y that provides the best estimate of xn .
tions that are based on alternative design criteria are briefly discussed in Chap. 12, where
we also discuss some interesting connections between Wiener filtering/linear prediction 1. The criterion for the MAP estimate is to maximize the a posteriori conditional
methods and inverse scattering methods. density of xn given that y already occurred; namely,

p(xn |y)= maximum (11.1.1)


11.1 Linear and Nonlinear Estimation of Signals
in other words, the optimal estimate x̂n is that xn that maximizes this quantity
The signal estimation problem can be stated as follows: We wish to estimate a random for the given vector y; x̂n is therefore the most probable choice resulting from the
signal xn on the basis of available observations of a related signal yn . The available given observations y.
signal yn is to be processed by an optimal processor that produces the best possible
estimate of xn : 2. The ML criterion, on the other hand, selects x̂n to maximize the conditional density
of y given xn , that is,
p(y|xn )= maximum (11.1.2)

The resulting estimate x̂n will be a function of the observations yn . If the optimal This criterion selects x̂n as though the already collected observations y were the
processor is linear, such as a linear filter, then the estimate x̂n will be a linear function most likely to occur.
of the observations. We are going to concentrate mainly on linear processors. However,
3. The MS criterion minimizes the mean-square estimation error
we would like to point out that, depending on the estimation criterion, there are cases
where the estimate x̂n may turn out to be a nonlinear function of the yn s. E = E[e2n ]= min, where en = xn − x̂n (11.1.3)
We discuss briefly four major estimation criteria for designing such optimal proces-
sors. They are:
that is, the best choice of the functional dependence x̂n = x̂n (y) is sought that
minimizes this expression. We know from our results of Sec. 1.4 that the required
(1) The maximum a posteriori (MAP) criterion.
solution is the corresponding conditional mean
(2) The maximum likelihood (ML) criterion.
(3) The mean square (MS) criterion. x̂n = E[xn |y]= MS estimate (11.1.4)
(4) The linear mean-square (LMS) criterion.

computed with respect to the conditional density p(xn |y).


The LMS criterion is a special case of the MS criterion. It requires, a priori, that the
estimate x̂n be a linear function of the yn s.† The main advantage of the LMS processor 4. Finally, the LMS criterion requires the estimate to be a linear function of the ob-
is that it requires only knowledge of second order statistics for its design, whereas the servations
other, nonlinear, processors require more detailed knowledge of probability densities. 
nb

To explain the various estimation criteria, let us assume that the desired signal xn x̂n = h(n, i)yi (11.1.5)
i=na
is to be estimated over a finite time interval na ≤ n ≤ nb Without loss of generality, we
may assume that the observed signal yn is also available over the same interval. Define For each n, the weights h(n, i), na ≤ i ≤ nb are selected to minimize the mean-
† Note that the acronym LMS is also used in the context of adaptive filtering, for least mean-square. square estimation error

E = E[e2n ]= E (xn − x̂n )2 = minimum (11.1.6)
478 11. Wiener Filtering 11.1. Linear and Nonlinear Estimation of Signals 479

With the exception of the LMS estimate, all other estimates x̂n (y) are, in general, that the maximum among the three coefficients p(y|1), p(y|0), p(y| − 1) will determine
nonlinear functions of y. the value of x. Thus, for a given y we select that x that

Example 11.1.1: If both xn and y are zero-mean and jointly gaussian, then Examples 1.4.1 and p(y|x)= maximum of p(y|1), p(y|0), p(y| − 1)}
1.4.2 imply that the MS and LMS estimates of xn are the same. Furthermore, since p(xn |y)
is gaussian it will be symmetric about its maximum, which occurs at its mean, that is, at Using the gaussian nature of p(y|x), we find equivalently
E[xn |y]. Therefore, the MAP estimate of xn is equal to the MS estimate. In conclusion, for
zero-mean jointly gaussian xn and y, the three estimates MAP, MS, and LMS coincide.   
(y − c x)2 = minimum of (y − c)2 , y2 , (y + c)2 }

Example 11.1.2: To see the nonlinear character and the differences among the various esti-
Subtracting y2 from both sides, dividing by cT c, and denoting
mates, consider the following example: A discrete-amplitude, constant-in-time signal x
can take on the three values cT y
x = −1, x = 0, x=1 ȳ =
cT c
each with probability of 1/3. This signal is placed on a known carrier waveform cn and
we find the equivalent equation
transmitted over a noisy channel. The received samples are of the form

yn = cn x + vn , n = 1, 2, . . . , M x2 − 2xȳ = min{1 − 2ȳ, 0, 1 + 2ȳ}

where vn are zero-mean white gaussian noise samples of variance σv2 , assumed to be inde- and in particular, applying these for +1, 0, −1, we find
pendent of x. The above set of measurements can be written in an obvious vector notation

⎪ 1
y = cx + v ⎪
⎪ 1, if ȳ >

⎪ 2
⎨ 1 1
x̂MAP = 0, if − < ȳ <
(a) Determine the conditional densities p(y|x) and p(x|y). ⎪
⎪ 2 2



⎩ −1, 1
(b) Determine and compare the four alternative estimates MAP, ML, MS, and LMS. if ȳ < −
2

Solution: To compute p(y|x), note that if x is given, then the only randomness left in y arises To determine the ML estimate, we must maximize p(y|x) with respect to x. The ML esti-
from the noise term v. Since vn are uncorrelated and gaussian, they will be independent; mate does not require knowledge of the a priori probability density p(x) of x. Therefore,
therefore, differentiating p(y|x) with respect to x and setting the derivative to zero gives
⎡ ⎤

M
 −M/2 1  2⎦
M
∂ ∂ ∂
p(y|x) = p(v)= p(vn )= 2πσv2 exp ⎣− vn p(y|x)= 0 or ln p(y|x)= 0 or (y − c x)2 = 0
n=1
2σv2 n=1 ∂x ∂x ∂x
   
 −M/2 1 2  −M/2 1 which gives
= 2πσv2 exp − 2v = 2πσv2 exp − 2 (y − c x)
2
2σv 2σv
cT y
Using Bayes’ rule we find p(x|y)= p(y|x)p(x)/p(y). Since x̂ML = = ȳ
cT c
1 
p(x)= δ(x − 1)+δ(x)+δ(x + 1) The MS estimate is obtained by computing the conditional mean
3
 
we find 1 
E[x|y] = xp(x|y)dx = x p(y|1)δ(x − 1)+p(y|0)δ(x)+p(y| − 1)δ(x + 1) dx
A
1 
p(x|y)= p(y|1)δ(x − 1)+p(y|0)δ(x)+p(y| − 1)δ(x + 1) 1 
A = p(y|1)−p(y| − 1) , or,
A
where the constant A is
 p(y|1)−p(y| − 1)
A = 3p(y)= 3 p(y|x)p(x)dx = p(y|1)+p(y|0)+p(y| − 1) x̂MS =
p(y|1)+p(y|0)+p(y| − 1)

To find the MAP estimate of x, the quantity p(x|y) must be maximized with respect to x. Canceling some common factors from the numerator and denominator, we find the simpler
Since the expression for p(x|y) forces x to be one of the three values +1, 0, −1, it follows expression
480 11. Wiener Filtering 11.2. Orthogonality and Normal Equations 481

yn in the interval na ≤ n ≤ nb . The division into three types of problems depends on


which of the available observations in that interval are taken into account in making up
2 sinh(2aȳ) cT c
x̂MS = , where a = the linear combination (11.1.5).
ea + 2 cosh(2aȳ) 2
2 σv
In the smoothing problem, all the observations in the interval [na , nb ] are taken
Finally, the LMS estimate can be computed as in Example 1.4.3. We find into account. The shaded part in the following figure denotes the range of observations
that are used in the summation of Eq. (11.1.5):
cT y cT c
x̂LMS = = ȳ 
nb
σv2 σv2
+ cT c + cT c x̂n = h(n, i)yi
σx2 σx2 i=na

All four estimates have been expressed in terms of ȳ. Note that the ML estimate is linear Since some of the observations are to the future of xn , the linear operation is not
but has a different slope than the LMS estimate. The nonlinearity of the various estimates causal. This does not present a problem if the sequence yn is already available and
is best seen in the following figure: stored in memory.
The optimal filtering problem, on the other hand, requires the linear operation
(11.1.5) to be causal, that is, only those observations that are in the present and past of
the current sample xn must be used in making up the estimate x̂n . This requires that
the matrix of optimal weights h(n, i) be lower triangular, that is,

h(n, i)= 0, for n < i

Thus, in reference to the figure below, only the shaded portion of the observation
interval is used at the current time instant:

11.2 Orthogonality and Normal Equations 


n
x̂n = h(n, i)yi
i=na
From now on, we will concentrate on the optimal linear estimate defined by Eqs. (11.1.5)
and (11.1.6). For each time instant n at which an estimate x̂n is sought, the optimal The estimate x̂n depends on the present and all the past observations, from the fixed
weights h(n, i), na ≤ i ≤ nb must be determined that minimize the error criterion starting point na to the current time instant n. As n increases, more and more observa-
(11.1.6). In general, a new set of optimal weights must be computed for each time instant tions are taken into account in making up the estimate, and the actual computation of
n. In the special case when the processes xn and yn are stationary and the observations x̂n becomes less and less efficient. It is desirable, then, to be able to recast the expres-
are available for a long time, that is, na = −∞, the weights become time-invariant in sion for x̂n a time-recursive form. This is what is done in Kalman filtering. But, there is
the sense that h(n, i)= h(n − i), and the linear processor becomes an ordinary time- another way to make the Wiener filter computationally manageable. Instead of allowing
invariant linear filter. We will discuss the solution for h(n, i) both for the time-invariant a growing number of observations, only the current and the past M observations yi ,
and the more general cases. i = n, n − 1, . . . , n − M are taken into account. In this case, only (M + 1) filter weights
The problem of determining the optimal weights h(n, i) according to the mean- are to be computed at each time instant n. This is depicted below:
square error minimization criterion (11.1.6) is in general referred to as the Wiener fil-
tering problem [849–866]. An interesting historical account of the development of this 
n 
M
x̂n = h(n, i)yi = h(n, n − m)yn−m
problem and its ramifications is given in the review article by Kailath [866]. Wiener
i=n−M m=0
filtering problems are conventionally divided into three types:
This is referred to as the finite impulse response (FIR) Wiener filter. Because of its
1. The optimal smoothing problem, simple implementation, the FIR Wiener filter has enjoyed widespread popularity. De-
2. The optimal filtering problem, and pending on the particular application, the practical implementation of the filter may
3. The optimal prediction problem. vary. In Sec. 11.3 we present the theoretical formulation that applies to the stationary
case; in Chap. 12 we reconsider it as a waveshaping and spiking filter and discuss a
In all cases, the optimal estimate of xn at a given time instant n is given by an number of deconvolution applications. In Chap. 16, we consider its adaptive implemen-
expression of the form (11.1.5), as a linear combination of the available observations tation using the Widrow-Hoff LMS algorithm and discuss a number of applications such
482 11. Wiener Filtering 11.2. Orthogonality and Normal Equations 483

as channel equalization and echo cancellation; we also discuss two alternative adaptive These determine the optimal weights at the current time instant n. In the vector
implementations—the so-called “gradient lattice,” and the “recursive least-squares.” notation of Sec. 11.1, we write Eq. (11.2.3) as
Finally, the linear prediction problem is a special case of the optimal filtering problem
with the additional stipulation that observations only up to time instant n − D must be E[xyT ]= HE[yyT ]
used in obtaining the current estimate x̂n ; this is equivalent to the problem of predicting
D units of time into the future. The range of observations used in this case is shown where H is the matrix of weights h(n, i). The optimal H and the estimate are then
below:

n−D x̂ = Hy = E[xyT ]E[yyT ]−1 y
x̂n = h(n, i)yi
i=na This is identical to the correlation canceler of Sec. 1.4. The orthogonality equations
Of special interest to us will be the case of one-step prediction, corresponding to the (11.2.2) are precisely the correlation cancellation conditions. Extracting the nth row of
choice D = 1. This is depicted below: this matrix equation, we find an explicit expression for the nth estimate x̂n

1
n− x̂n = E[xn yT ]E[yyT ]−1 y
x̂n = h(n, i)yi
i=na which is recognized as the projection of the random variable xn onto the subspace
If we demand that the prediction be based only on the past M samples (from the spanned by the available observations; namely, Y = {yna , yna +1 , . . . , ynb }. This is a
current sample), we obtain the FIR version of the prediction problem, referred to as general result: The minimum mean-square linear estimate x̂n is the projection of xn onto
linear prediction based on the past M samples, which is depicted below: the subspace spanned by all the observations that are used to make up that estimate.
This result is a direct consequence of the quadratic minimization criterion (11.1.6) and
1
n− 
M the orthogonal projection theorem discussed in Sec. 1.6.
x̂n = h(n, i)yi = h(n, n − m)yn−m Using the methods of Sec. 1.4, the minimized estimation error at time instant n is
i=n−M m=1
easily computed by
Next, we set up the orthogonality and normal equations for the optimal weights. We
begin with the smoothing problem. The estimation error is in this case  
nb
 
En = E[en en ]= E[en xn ]= E xn − h(n, i)yi xn
i=na

nb
en = xn − x̂n = xn − h(n, i)yi (11.2.1) 
nb
i=na = E[x2n ]− h(n, i)E[yi xn ]= E[x2n ]−E[xn yT ]E[yyT ]−1 E[yxn ]
Differentiating the mean-square estimation error (11.1.6) with respect to each weight i=na

h(n, i), na ≤ i ≤ nb , and setting the derivative to zero, we obtain the orthogonality
which corresponds to the diagonal entries of the covariance matrix of the estimation
equations that are enough to determine the weights:
error e :
 
∂E ∂en Ree = E[eeT ]= E[xxT ]−E[xyT ]E[yyT ]−1 E[yxT ]
= 2E e n = −2E[en yi ]= 0 , for na ≤ i ≤ nb , or,
∂h(n, i) ∂h(n, i) The optimum filtering problem is somewhat more complicated because of the causal-
Rey (n, i)= E[en yi ]= 0 (orthogonality equations) (11.2.2) ity condition. In this case, the estimate at time n is given by

for na ≤ i ≤ nb . Thus, the estimation error en is orthogonal (uncorrelated) to each 


n

observation yi used in making up the estimate x̂n . The orthogonality equations provide x̂n = h(n, i)yi (11.2.4)
i=na
exactly as many equations as there are unknown weights.
Inserting Eq. (11.2.1) for en , the orthogonality equations may be written in an equiv- Inserting this into the minimization criterion (11.1.6) and differentiating with respect
alent form, known as the normal equations to h(n, i) for na ≤ i ≤ n, we find again the orthogonality conditions
 
nb
 
E xn − h(n, k)yk yi = 0 , or, Rey (n, i)= E[en yi ]= 0 for na ≤ i ≤ n (11.2.5)
k=na
where the most important difference from Eq. (11.2.2) is the restriction on the range

nb
of i, that is, en is decorrelated only from the present and past values of yi . Again, the
E[xn yi ]= h(n, k)E[yk yi ] (normal equations) (11.2.3)
estimation error en is orthogonal to each observation yi that is being used to make up
k=na
484 11. Wiener Filtering 11.3. Stationary Wiener Filter 485

the estimate. The orthogonality equations can be converted into the normal equations Making a change of variables i → i + d and k → k + d, we rewrite Eq. (11.3.1) as
as follows:
 
n
  
n

E[en yi ]= E xn − h(n, k)yk yi = 0 , or, Rxy (n+d, i+d)= h(n+d, k+d)Ryy (k+d, i+d), for na −d ≤ i ≤ n (11.3.2)
k=na k=na −d


n Now, if we assume stationarity, Eqs. (11.2.7) and (11.3.2) become
E[xn yi ]= h(n, k)E[yk yi ] for na ≤ i ≤ n , or, (11.2.6)
k=na

n
Rxy (n − i) = h(n, k)Ryy (k − i) , for na ≤ i ≤ n

n k=na
Rxy (n, i)= h(n, k)Ryy (k, i) for na ≤ i ≤ n (11.2.7) (11.3.3)
k=na

n
Rxy (n − i) = h(n + d, k + d)Ryy (k − i) , for na − d ≤ i ≤ n
Such equations are generally known as Wiener-Hopf equations. Introducing the vec- k=na −d
tor of observations up to the current time n, namely,
If it were not for the differences in the ranges of i and k, these two equations would
T be the same. But this is exactly what happens when we make the second assumption
yn = [yna , yna +1 , . . . , yn ]
that na = −∞. Therefore, by uniqueness of the solution, we find in this case
we may write Eq. (11.2.6) in vector form as
h(n + d, k + d)= h(n, k)

E[xn yT
n ]= h(n, na ), h(n, na + 1), . . . , h(n, n) E[yn yT
n] and since d is arbitrary, it follows that h(n, k) must be a function of the difference of
its arguments, that is,
which can be solved for the vector of weights
h(n, k)= h(n − k) (11.3.4)
 T −1
h(n, na ), h(n, na + 1), . . . , h(n, n) = E[xn yT
n ]E[yn yn ] Thus, the optimal linear processor becomes a shift-invariant causal linear filter and
the estimate is given by
and for the estimate x̂n : ∞
T −1 
n 
x̂n = E[xn yT
n ]E[yn yn ] yn (11.2.8) x̂n = h(n − i)yi = h(i)yn−i (11.3.5)
i=−∞ i=0
Again, x̂n is recognized as the projection of xn onto the space spanned by the ob-
servations that are used in making up the estimate; namely, Yn = {yna , yna +1 , . . . , yn }. and Eq. (11.3.3) becomes in this case
This solution of Eqs. (11.2.5) and (11.2.7) will be discussed in more detail in Sec. 11.8,

n
using covariance factorization methods. Rxy (n − i)= h(n, k)Ryy (k − i) , for − ∞ < i ≤ n
k=−∞

With the change of variables n − i → n and n − k → k, we find


11.3 Stationary Wiener Filter


In this section, we make two assumptions that simplify the structure of Eqs. (11.2.6) and Rxy (n)= Ryy (n − k)h(k) , for n ≥ 0 (11.3.6)
(11.2.7). The first is to assume stationarity for all signals so that the cross-correlation k=0

and autocorrelation appearing in Eq. (11.2.7) become functions of the differences of their and written in matrix form
arguments. The second assumption is to take the initial time na to be the infinite past, ⎡ ⎤⎡ ⎤ ⎡ ⎤
Ryy (0) Ryy (1) Ryy (2) Ryy (3) ··· h(0) Rxy (0)
na = −∞, that is, the observation interval is Yn = {yi , −∞ < i ≤ n}. ⎢ R (1)
⎢ yy Ryy (0) Ryy (1) Ryy (2) ··· ⎥ ⎢ ⎥ ⎢ ⎥
⎥ ⎢ h(1) ⎥ ⎢ Rxy (1) ⎥
The assumption of stationarity can be used as follows: Suppose we have the solution ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ Ryy (2) Ryy (1) Ryy (0) Ryy (1) ··· ⎥ ⎢ h(2) ⎥ ⎢ Rxy (2) ⎥
of h(n, i) of Eq. (11.2.7) for the best weights to estimate xn , and wish to determine the ⎢ ⎥⎢ ⎥=⎢ ⎥ (11.3.7)
⎢ R (3) · · · ⎥ ⎢ h(3) ⎥ ⎢ Rxy (3) ⎥
⎥ ⎢ ⎥ ⎢
best weights h(n + d, i), na ≤ i ≤ n + d for estimating the sample xn+d at the future ⎢ yy Ryy (2) Ryy (1) Ryy (0) ⎥
⎣ . .. .. .. ⎦⎣ . ⎦ ⎣ . ⎦
time n + d. Then, the new weights will satisfy the same equations as (11.2.7) with the .. . . . .. ..
changes
These are the discrete-time Wiener-Hopf equations. Were it not for the restriction

n+d n ≥ 0 (which reflects the requirement of causality), they could be solved easily by z-
Rxy (n + d, i)= h(n + d, k)Ryy (k, i), for na ≤ i ≤ n + d (11.3.1) transform methods. As written above, they require methods of spectral factorization
k=na for their solution.
486 11. Wiener Filtering 11.4. Construction of the Wiener Filter by Prewhitening 487

Before we discuss such methods, we mention in passing the continuous-time version Rx2 y = 0, then Rxy = Rx1 y . Therefore, the solution of Eq. (11.3.7) for the best weights to
of the Wiener-Hopf equation: estimate xn is also the solution for the best weights to estimate x1 (n). The filter may
∞ also be thought of as the optimal signal separator of the two signal components x1 (n)
Rxy (t)= Ryy (t − t )h(t ) dt , t≥0 and x2 (n).
0

We also consider the FIR Wiener filtering problem in the stationary case. The obser-
vation interval in this case is Yn = {yi , n − M ≤ i ≤ n}. Using the same arguments as 11.4 Construction of the Wiener Filter by Prewhitening
above we have h(n, i)= h(n − i), and the estimate x̂n is obtained by an ordinary FIR
The normal equations (11.3.6) would have a trivial solution if the sequence yn were a
linear filter
white-noise sequence with delta-function autocorrelation. Thus, the solution procedure

n
is first to whiten the sequence yn and then solve the normal equations. To this end, let
x̂n = h(n − i)yi = h(0)yn + h(1)yn−1 + · · · + h(M)yn−M (11.3.8) yn have a signal model, as guaranteed by the spectral factorization theorem
i=n−M

where the (M+ 1) filter weights h(0), h(1), . . . , h(M) are obtained by the (M+ 1)×(M+ Syy (z)= σ2 B(z)B(z−1 ) (11.4.1)
1) matrix version of the Wiener-Hopf normal equations:
⎡ ⎤⎡ ⎤ ⎡ ⎤ where n is the driving white noise, and B(z) a minimal-phase filter. The problem
Ryy (0) Ryy (1) Ryy (2) ··· Ryy (M) h(0) Rxy (0)
⎢ R (1) ··· Ryy (M − 1) ⎥ ⎢ h(1) ⎥ ⎢ Rxy (1) ⎥
⎥ ⎢ ⎥ ⎢ of estimating xn in terms of the sequence yn becomes equivalent to the problem of
⎢ yy Ryy (0) Ryy (1) ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥ estimating xn in terms of the white-noise sequence n :
⎢ Ryy (2) Ryy (1) Ryy (0) ··· Ryy (M − 2) ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ h(2) ⎥ = ⎢ Rxy (2) ⎥
⎢. .. .. .. ⎥⎢ . ⎥ ⎢ . ⎥
⎢. ⎥⎢ . ⎥ ⎢ .. ⎥
⎣. . . . ⎦⎣ . ⎦ ⎣ ⎦
Ryy (M) Ryy (M − 1) Ryy (M − 2) ··· Ryy (0) h(M) Rxy (M)
(11.3.9)
If we could determine the combined filter
Exploiting the Toeplitz property of the matrix Ryy , the above matrix equation can
be solved efficiently using Levinson’s algorithm. This will be discussed in Chap. 12.
F(z)= B(z)H(z)
In Chap. 16, we will consider adaptive implementations of the FIR Wiener filter which
produce the optimal filter weights adaptively without requiring prior knowledge of the
we would then solve for the desired Wiener filter H(z)
autocorrelation and cross-correlation matrices Ryy and Rxy and without requiring any
matrix inversion. F(z)
H(z)= (11.4.2)
B(z)

Since B(z) is minimum-phase, the indicated inverse 1/B(z) is guaranteed to be


stable and causal. Let fn be the causal impulse response of F(z). Then, it satisfies the
normal equations of the type of Eq. (11.3.6):


Rx (n)= fi R (n − i) , n≥0 (11.4.3)
i=0
Fig. 11.3.1 Time-Invariant Wiener Filter.
Since R (n − i)= σ2 δ(n − i), Eq. (11.4.3) collapses to
We summarize our results on the stationary Wiener filter in Fig. 11.3.1. The optimal
filter weights h(n), n = 0, 1, 2, . . . are computed from Eq. (11.3.7) or Eq. (11.3.9). The Rx (n)= σ2 fn , n ≥ 0, or,
action of the filter is precisely that of the correlation canceler: The filter processes the
observation signal yn causally to produce the best possible estimate x̂n of xn , and then Rx (n)
fn = , for n ≥ 0 (11.4.4)
it proceeds to cancel it from the output en . As a result, the output en is no longer σ2
correlated with any of the present and past values of yn , that is, E[en yn−i ]= 0, for Next, we compute the corresponding z-transform F(z)
i = 0, 1, 2, . . . . As we remarked in Sec. 1.4, it is better to think of x̂n as the optimal

 ∞
estimate of that part of the primary signal xn which happens to be correlated with the 1  1 
F(z)= fn z−n = Rx (n)z−n = Sx (z) + (11.4.5)
secondary signal yn . This follows from the property that if xn = x1 (n)+x2 (n) with n=0
σ2 n=0
σ2
488 11. Wiener Filtering 11.5. Wiener Filter Example 489


where Sx (z) + denotes the causal part of the double-sided z-transform Sx (z). Gen- where wn is white noise of variance σw 2
= 0.82. Enough information is given above to
erally, the causal part of a z-transform determine the required power spectral densities Sxy (z) and Syy (z). First, we note that
the signal generator transfer function for xn is

 −1
 ∞

G(z)= gn z−n = gn z−n + gn z−n 1
n=−∞ n=−∞ n=0 M(z)=
z − 0.6
is defined as
 ∞
 so that
G(z) + = gn z−n
0.82 0.82
n=0 2
Sxx (z)= σw M(z)M(z−1 )= =
(z − 0.6)(z−1 − 0.6) (1 − 0.6z−1 )(1 − 0.6z)
The causal instruction in Eq. (11.4.5) was necessary since the above solution for fn
was valid only for n ≥ 0. Since yn is the output of the filter B(z) driven by n , it follows Then, we find
that
Sxy (z) 0.82
Sxy (z)= Sx (z)B(z−1 ) or Sx (z)= Sxy (z) = Sx(x+v) (z)= Sxx (z)+Sxv (z)= Sxx (z)=
B(z−1 ) (1 − 0.6z−1 )(1 − 0.6z)
Combining Eqs. (11.4.2) and (11.4.5), we finally find Syy (z) = S(x+v)(x+v) (z)= Sxx (z)+Sxv (z)+Svx (z)+Svv (z)= Sxx (z)+Svv (z)
 
1 Sxy (z) 0.82 0.82 + (1 − 0.6z−1 )(1 − 0.6z)
H(z)= (Wiener filter) (11.4.6) = +1=
2
σ B(z) B(z−1 ) + (1 − 0.6z−1 )(1 − 0.6z) (1 − 0.6z−1 )(1 − 0.6z)
−1
Thus, the construction of the optimal filter first requires the spectral factorization of 2(1 − 0.3z )(1 − 0.3z) 1 − 0.3z−1 1 − 0.3z
= −
=2· ·
Syy (z) to obtain B(z), and then use of the above formula. This is the optimal realizable (1 − 0.6z )(1 − 0.6z)
1 1 − 0.6z−1 1 − 0.6z
Wiener filter based on the infinite past. If the causal instruction is ignored, one obtains
= σ2 B(z)B(z−1 )
the optimal unrealizable Wiener filter
Then according to Eq. (11.4.6), we must compute the causal part of
Sxy (z) Sxy (z)
Hunreal (z)= = (11.4.7)
σ2 B(z)B(z−1 ) Syy (z) 0.82
Sxy (z) (1 − 0.6z−1 )(1 − 0.6z) 0.82
The minimum value of the mean-square estimation error can be conveniently ex- G(z)= = =
B(z−1 ) 1 − 0.3z (1 − 0.6z−1 )(1 − 0.3z)
pressed by a contour integral, as follows
1 − 0.6z

E = E[e2n ]= E en (xn − x̂n ) = E[en xn ]−E[en x̂n ]= E[en xn ]= Rex (0) This may be done by partial fraction expansion, but the fastest way is to use the
  contour inversion formula to compute gk for k ≥ 0, and then resum the z-transform:
dz  dz
= Sex (z) = Sxx (z)−Sx̂x (z) , or,  
u.c. 2πjz u.c. 2πjz dz 0.82zk dz
 gk = G(z)zk =
 dz u.c. 2πjz u.c. (1 − 0.3z)(z − 0.6) 2πj
E= Sxx (z)−H(z)Syx (z) (11.4.8)
u.c. 2πjz 0.82(0.6)k
= (residue at z = 0.6) = = (0.6)k , k≥0
1 − (0.3)(0.6)
11.5 Wiener Filter Example
Resumming, we find the causal part
This example, in addition to illustrating the above ideas, will also serve as a short intro- ∞

 1
duction to Kalman filtering. It is desired to estimate the signal xn on the basis of noisy G(z) + = gk z−k =
1 − 0.6z−1
observations k=0
yn = xn + vn
Finally, the optimum Wiener estimation filter is
where vn is white noise of unit variance, σv2 = 1, uncorrelated with xn . The signal xn is   
a first order Markov process, having a signal model 1 Sxy (z) G(z) + 0.5
H(z)= = = (11.5.1)
σ2 B(z) B(z−1 ) + σ2 B(z) 1 − 0.3z−1
xn+1 = 0.6xn + wn
490 11. Wiener Filtering 11.6. Wiener Filter as Kalman Filter 491

which can be realized as the difference equation If this prediction were perfect, and if the next observation yn were noise free, then
this would be the value that we would observe. Since we actually observe yn , the obser-
x̂n = 0.3x̂n−1 + 0.5yn (11.5.2) vation or innovations residual will be

The estimation error is also easily computed using the contour formula of Eq. (11.4.8): αn = yn − ŷn/n−1 (11.6.4)

 dz This quantity represents that part of yn that cannot be predicted on the basis of
E = E[e2n ]= σe2 = Sxx (z)−H(z)Syx (z) = 0.5
the previous observations Yn−1 . It represents the truly new information contained in
u.c. 2πjz
the observation yn . Actually, if we are making the best prediction possible, then the
To appreciate the improvement afforded by filtering, this error must be compared
most we can expect of our prediction is to make the innovations residual a white-noise
with the error in case no processing is made and yn is itself taken to represent a noisy
(uncorrelated) signal, that is, what remains after we make the best possible prediction
estimate of xn . The estimation error in the latter case is yn − xn = vn , so that σv2 = 1.
should be unpredictable. According to the general discussion of the relationship be-
Thus, the gain afforded by processing is
tween signal models and linear prediction given in Sec. 1.17, it follows that if ŷn/n−1 is
σe2 the best predictor of yn then αn must be the whitening sequence that drives the signal
= 0.5 or 3 dB model of yn . We shall verify this fact shortly. This establishes an intimate connection
σv2
between the Wiener/Kalman filtering problem and the signal modeling problem. If we
overestimate the observation yn the innovation residual will be negative; and if we un-
11.6 Wiener Filter as Kalman Filter
derestimate it, the residual will be positive. In either case, we would like to correct our
We would like to cast this example in a Kalman filter form. The difference equation tentative estimate in the right direction. This may be accomplished by
Eq. (11.5.2) for the Wiener filter seems to have the “wrong” state transition matrix; x̂n/n = x̂n/n−1 + G(yn − ŷn/n−1 )= 0.6x̂n−1/n−1 + G(yn − 0.6x̂n−1/n−1 ) (11.6.5)
namely, 0.3 instead of 0.6, which is the state matrix for the state model of xn . How-
ever, it is not accidental that the Wiener filter difference equation may be rewritten in where the gain G, known as the Kalman gain, should be a positive quantity. The pre-
the alternative form diction/correction procedure defined by Eqs. (11.6.2) through (11.6.5) is known as the
x̂n = 0.6x̂n−1 + 0.5(yn − 0.6x̂n−1 ) Kalman filter. It should be clear that any value for the gain G will provide an estimate,
even if suboptimal, of xn . Our solution for the Wiener filter has precisely the above
The quantity x̂n is the best estimate of xn , at time n, based on all the observations
structure with a gain G = 0.5. This value is optimal for the given example. It is a very
up to that time, that is, Yn = {yi , −∞ < i ≤ n}. To simplify the subsequent notation,
instructive exercise to show this in two ways: First, with G arbitrary, the estimation filter
we denote it by x̂n/n . It is the projection of xn on the space Yn . Similarly, x̂n−1 denotes
of Eq. (11.6.5) has transfer function
the best estimate of xn−1 , based on the observations up to time n − 1, that is, Yn−1 =
{yi , −∞ < i ≤ n − 1}. The above filtering equation is written in this notation as G
H(z)=
1 − 0.6(1 − G)z−1
x̂n/n = 0.6x̂n−1/n−1 + 0.5(yn − 0.6x̂n−1/n−1 ) (11.6.1)
Insert this expression into the mean-square estimation error E = E[e2n ], where en =
It allows the computation of the current best estimate x̂n/n , in terms of the previous
xn − x̂n/n , and minimize it with respect to the parameter G. This should give G = 0.5.
best estimate x̂n−1/n−1 and the new observation yn that becomes available at the current
Alternatively, G should be such that to render the innovations residual (11.6.4) a
time instant n.
white noise signal. In requiring this, it is useful to use the spectral factorization model
The various terms of Eq. (11.6.1) have nice interpretations: Suppose that the best
for yn , that is, the fact that yn is the output of B(z) when driven by the white noise
estimate x̂n−1/n−1 of the previous sample xn−1 is available. Even before the next obser-
signal n . Working with z-transforms, we have:
vation yn comes in, we may use this estimate to make a reasonable prediction as to what
the next best estimate ought to be. Since we know the system dynamics of xn , we may α(z) = Y(z)−0.6z−1 X̂(z)= Y(z)−0.6z−1 H(z)Y(z)
try to “boost” x̂n−1/n−1 to the next time instant n according to the system dynamics,    
that is, we take −1 G 1 − 0.6z−1
= 1 − 0.6z Y(z)= Y(z)
1 − 0.6(1 − G)z−1 1 − 0.6(1 − G)z−1
x̂n/n−1 = 0.6x̂n−1/n−1 = prediction of xn on the basis of Yn−1 (11.6.2)     
1 − 0.6z−1 1 − 0.3z−1 1 − 0.3z−1
= (z)= (z)
Since yn = xn + vn , we may use this prediction of xn to make a prediction of the 1 − 0.6(1 − G)z−1 1 − 0.6z−1 1 − 0.6(1 − G)z−1
next measurement yn , that is, we take
Since n is white, it follows that the transfer function relationship between αn and
ŷn/n−1 = x̂n/n−1 = prediction of yn on the basis of Yn−1 (11.6.3) n must be trivial; otherwise, there will be sequential correlations present in αn . Thus,
492 11. Wiener Filtering 11.6. Wiener Filter as Kalman Filter 493

we must have 0.6(1 − G)= 0.3, or G = 0.5; and in this case, αn = n . It is also possible where we used the filtering equation X1 (z)= zX(z). The spectral density of yn can be
to set 0.6(1 − G)= 1/0.3, but this would correspond to an unstable filter. factored as follows:
We have obtained a most interesting result; namely, that when the Wiener filtering
c2 Q
problem is recast into its Kalman filter form given by Eq. (11.6.1), then the innovations Syy (z) = c2 Sxx (z)+Svv (z)= +R
(1 − az−1 )(1 − az)
residual αn , which is computable on line with the estimate x̂n/n , is identical to the
 
whitening sequence n of the signal model of yn . In other words, the Kalman filter can c2 Q + R(1 − az−1 )(1 − az) 2 1 − f z−1 1 − fz
= ≡ σ
be thought of as the whitening filter for the observation signal yn . (1 − az−1 )(1 − az) 1 − az−1 1 − az
To appreciate further the connection between Wiener and Kalman filters and between
Kalman filters and the whitening filters of signal models, we consider a generalized where f and σ2 satisfy the equations
version of the above example and cast it in standard Kalman filter notation.
It is desired to estimate xn from yn . The signal model for xn is taken to be the f σ2 = aR (11.6.9)
first-order autoregressive model
(1 + f 2 )σ2 = c2 Q + (1 + a2 )R (11.6.10)
xn+1 = axn + wn (state model) (11.6.6) and f has magnitude less than one. Thus, the corresponding signal model for yn is

with |a| < 1. The observation signal yn is related to xn by 1 − f z−1


B(z)= (11.6.11)
1 − az−1
yn = cxn + vn (measurement model) (11.6.7)
Next, we compute the causal parts as required by Eq. (11.4.6):
   
It is further assumed that the state and measurement noises, wn and vn , are zero- Sxy (z) cQ cQ 1
mean, mutually uncorrelated, white noises of variances Q and R, respectively, that is, = =
B(z−1 ) + (1 − az−1 )(1 − f z) +
1 − f a 1 − az−1
   
E[wn wi ]= Qδni , E[vn vi ]= Rδni , E[wn vi ]= 0 (11.6.8) Sx1 y (z) cQz cQa 1
= =
B(z−1 ) + (1 − az−1 )(1 − f z) +
1 − f a 1 − az−1
We also assume that vn is uncorrelated with the initial value of xn so that vn and xn
will be uncorrelated for all n. The parameters a, c, Q, R are assumed to be known. Let Using Eq. (11.4.6), we determine the Wiener filters H(z) and H1 (z) as follows:
x1 (n) be the time-advanced version of xn :  
cQ/(1 − f a) cQ
x1 (n)= xn+1  
1 Sxy (z) 1 − az−1 ) 
( σ2 (1 − f a)
H(z)= = =
σ2 B(z) B(z−1 ) + 1 − f z−1 1 − f z−1
and consider the two related Wiener filtering problems of estimating xn and x1 (n) on σ2
1 − az−1
the basis of Yn = {yi , −∞ < i ≤ n}, depicted below
or, defining the gain G by
cQ
G= (11.6.12)
σ2 (1 − f a)
The problem of estimating x1 (n)= xn+1 is equivalent to the problem of one-step we finally find
prediction into the future on the basis of the past and present. Therefore, we will denote G
H(z)= (11.6.13)
this estimate by x̂1 (n)= x̂n+1/n . The state equation (11.6.6) determines the spectral 1 − f z−1
density of xn : K
H1 (z)= aH(z)= (11.6.14)
1 − f z−1
1 Q
Sxx (z)= Sww (z)= where in Eq. (11.6.14) we defined a related gain, also called the Kalman gain, as follows:
(z − a)(z−1 − a) (1 − az−1 )(1 − az)

The observation equation (11.6.7) determines the cross-densities cQa


K = aG = (11.6.15)
σ2 (1 − f a)
Sxy (z) = cSxx (z)+Sxv (z)= cSxx (z)
Eq. (11.6.14) immediately implies that
Sx1 y (z) = zSxy (z)= zcSxx (z)
x̂n+1/n = ax̂n/n (11.6.16)
494 11. Wiener Filtering 11.7. Construction of the Wiener Filter by the Gapped Function 495

which is the precise justification of Eq. (11.6.2). The difference equations of the two where the term v̂n/n−1 was dropped. This term represents the estimate of vn on the
filters are basis of the past ys; that is, Yn−1 . Since vn is white and also uncorrelated with xn , it
x̂n+1/n = f x̂n/n−1 + Kyn follows that it will be uncorrelated with all past ys; therefore, v̂n/n−1 = 0. The second
(11.6.17)
x̂n/n = f x̂n−1/n−1 + Gyn way to show that ŷn/n−1 is the best prediction of yn is to show that the innovations
residual
Using the results of Problem 1.50, we may express all the quantities f , σ2 , K, and G αn = yn − ŷn/n−1 = yn − cx̂n/n−1 (11.6.23)
in terms of a single positive quantity P which satisfies the algebraic Riccati equation:
is a white-noise sequence and coincides with the whitening sequence n of yn . Indeed,
PRa2 working in the z-domain and using Eq. (11.6.17) and the signal model of yn we find
Q =P− (11.6.18)
R + c2 P
α(z) = Y(z)−cz−1 X̂1 (z)= Y(z)−cz−1 H1 (z)Y(z)
Then, we find the interrelationships    
K 1 − (f + cK)z−1
acP Ra = 1 − cz−1 Y(z)= Y(z)
K = aG = , σ2 = R + c2 P , f = a − cK = (11.6.19) 1 − f z−1 1 − f z−1
R + c2 P R + c2 P  
It is left as an exercise to show that the minimized mean-square estimation errors 1 − az−1 1
= Y(z)= Y(z)= (z)
are given in terms of P by 1 − f z−1 B(z)

RP which implies that


E[e2n/n−1 ]= P , E[e2n/n ]= αn = n
R + c2 P
where Finally, we note that the recursive updating of the estimate of xn given by Eq. (11.6.22)
is identical to the result of Problem 1.11.
en/n−1 = xn − x̂n/n−1 , en/n = xn − x̂n/n Our purpose in presenting this example was to tie together a number of ideas from
Chapter 1 (correlation canceling, estimation, Gram-Schmidt orthogonalization, linear
are the corresponding estimation errors for the optimally predicted and filtered esti- prediction, and signal modeling) to ideas from this chapter on Wiener filtering and its
mates, respectively. Using Eq. (11.6.19)), we may rewrite the filtering equation (11.6.17) recursive reformulation as a Kalman filter.
in the following forms: We conclude this section by presenting a simulation of this example defined by the
following choice of parameters:
x̂n+1/n = (a − cK)x̂n/n−1 + Kyn , or,
a = 0.95 , c = 1, Q = 1 − a2 , R=1
x̂n+1/n = ax̂n/n−1 + K(yn − cx̂n/n−1 ) , or, (11.6.20)
The above choice for Q normalizes the variance of xn to unity. Solving the Riccati
x̂n+1/n = ax̂n/n−1 + K(yn − ŷn/n−1 )
equation (11.6.18) and using Eq. (11.6.19), we find
where we set
ŷn/n−1 = cx̂n/n−1 (11.6.21) P = 0.3122 , K = 0.2261 , G = 0.2380 , f = a − cK = 0.7239

A realization of the estimation filter based on (11.6.20) is shown below: Fig. 11.6.1 shows 100 samples of the observed signal yn together with the desired
signal xn . The signal yn processed through the Wiener filter H(z) defined by the above
parameters is shown in Fig. 11.6.2 together with xn . The tracking properties of the filter
are evident from the graph. It should be emphasized that this is the best one can do by
means of ordinary causal linear filtering.

Replacing K = aG and using Eq. (11.6.16) in (11.6.20), we also find


11.7 Construction of the Wiener Filter by the Gapped Function
x̂n/n = x̂n/n−1 + G(yn − ŷn/n−1 ) (11.6.22)
Next, we would like to give an alternative construction of the optimal Wiener filter based
The quantity ŷn/n−1 defined in Eq. (11.6.21) is the best estimate of yn based on its on the concept of the gapped function. This approach is especially useful in linear pre-
past Yn−1 . This can be seen in two ways: First, using the results of Problem 1.7 on the diction. The gapped function is defined as the cross-correlation between the estimation
linearity of the estimates, we find error en and the observation sequence yn , as follows:

ŷn/n−1 = cx!
n + vn = cx̂n/n−1 + v̂n/n−1 = cx̂n/n−1 g(k)= Rey (k)= E[en yn−k ] , for − ∞ < k < ∞ (11.7.1)
496 11. Wiener Filtering 11.8. Construction of the Wiener Filter by Covariance Factorization 497

4 the alternative form


3 ∞ ∞
    
2 g(k)= E[en yn−k ]= E xn − hi yn−i yn−k = Rxy (k)− hi Ryy (k − i) , or,
i=0 i=0
1


0
g(k)= Rey (k)= Rxy (k)− hi Ryy (k − i) (11.7.3)
−1 i=0

−2 Taking z-transforms of both sides we find


yn
−3 xn
G(z)= Sey (z)= Sxy (z)−H(z)Syy (z)
−4
0 20 40 60 80 100
n (time samples) Because of the gap conditions, the left-hand side contains only positive powers of
z, whereas the right-hand side contains both positive and negative powers of z. Thus,
Fig. 11.6.1 Desired signal and its noisy observation. the non-positive powers of z must drop out of the right side. This condition precisely
determines H(z). Introducing the spectral factorization of Syy (z) and dividing both
sides by B(z−1 ) we find
4

3
G(z) = Sxy (z)−H(z)Syy (z)= Sxy (z)−H(z)σ2 B(z)B(z−1 )
2

1 G(z) Sxy (z)


= − σ2 H(z)B(z)
0
B(z−1 ) B(z−1 )

−1 The z-transform B(z−1 ) is anticausal and, because of the gap conditions, so is the
−2 ratio G(z)/B(z−1 ). Therefore, taking causal parts of both sides and noting that the
x̂n/n product H(z)B(z) is already causal, we find
−3 xn
 
−4 Sxy (z)
0 20 40 60 80 100 0= − σ2 H(z)B(z)
n (time samples) B(z−1 ) +

which may be solved for H(z) to give Eq. (11.4.6).


Fig. 11.6.2 Best estimate of desired signal.

This definition is motivated by the orthogonality equations which state that the 11.8 Construction of the Wiener Filter by Covariance Factor-
prediction error en must be orthogonal to all of the available observations; namely, ization
Yn = {yi , −∞ < i ≤ n} = {yn−k , k ≥ 0}. That is, for the optimal set of filter weights
we must have In this section, we present a generalization of the gapped-function method to the more
g(k)= Rey (k)= E[en yn−k ]= 0 , for k ≥ 0 (11.7.2) general non-stationary and/or finite-past Wiener filter. This is defined by the Wiener-
Hopf equations (11.2.7), which are equivalent to the orthogonality equations (11.2.5).
The latter are the non-stationary versions of the gapped function of the previous section.
The best way to proceed is to cast Eqs. (11.2.5) in matrix form as follows: Without loss
of generality we may take the starting point na = 0. The final point nb is left arbitrary.
Introduce the vectors ⎡ ⎤ ⎡ ⎤
x0 y0
⎢ ⎥ ⎢ ⎥
⎢ x1 ⎥ ⎢ y1 ⎥
⎢ ⎥ ⎢ ⎥
x=⎢ .. ⎥ , y=⎢ .. ⎥
⎢ ⎥ ⎢ ⎥
and g(k) develops a right-hand side gap. On the other hand, g(k) may be written in ⎣ . ⎦ ⎣ . ⎦
xnb ynb
498 11. Wiener Filtering 11.8. Construction of the Wiener Filter by Covariance Factorization 499

and the corresponding correlation matrices This is the most general solution of the Wiener filtering problem [18, 19]. It includes
the results of the stationary case, as a special case. Indeed, if all the signals are station-
Rxy = E[xyT ] , Ryy = E[yyT ] ary, then the matrices Rxy , B, and BT become Toeplitz and have a z-transform associated
The filtering equation (11.2.4) may be written in vector form as with them as discussed in Problem 1.51. Using the results of that problem, it is easily
seen that Eq. (11.8.7) is the time-domain equivalent of Eq. (11.4.6).
x̂ = Hy (11.8.1) The prewhitening approach of Sec. 11.4 can also be understood in the present matrix
framework. Making the change of variables
where H is the matrix of optimal weights {h(n, i)}. The causality of the filtering oper-
ation (11.8.1), requires H to be lower-triangular. The minimization problem becomes y = B

equivalent to the problem of minimizing the mean-square estimation error subject to
the constraint that H be lower-triangular. The minimization conditions are the normal we find that Rxy = E[xyT ]= E[x T ]BT = Rx BT , and therefore, Rxy B−T = Rx and the
equations (11.2.5) which, in this matrix notation, state that the matrix Rey has no lower- filter H becomes H = [Rx ]+ R−1 −1
 B . The corresponding estimate is then
triangular (causal) part; or, equivalently, that Rey is strictly upper-triangular (i.e., even
the main diagonal of Rey is zero), therefore x̂ = Hy = HB
 = F
, where F = HB = [Rx ]+ R−

1
(11.8.8)

This is the matrix equivalent of Eq. (11.4.5). The matrix F is lower-triangular by


Rey = strictly upper triangular (11.8.2) construction. Therefore, to extract the nth component x̂n of Eq. (11.8.8), it is enough to
consider the n×n submatrices as shown below:
Inserting Eq. (11.8.1) into Rey we find

Rey = E[eyT ]= E (x − Hy)yT , or,

Rey = Rxy − HRyy (11.8.3)


The minimization conditions (11.8.2) require H to be that lower-triangular matrix −1
The nth row of F is f(n)T = E[xn T  n T
n ]E[ n ] . Therefore, the nth estimate be-
which renders the combination (11.8.3) upper-triangular. In other words, H should
comes
be such that the lower triangular part of the right-hand side must vanish. To solve −1
x̂n = f(n)T n = E[xn Tn ]E[ n T
n ] n
Eqs. (11.8.2) and (11.8.3), we introduce the LU Cholesky factorization of the covariance
matrix Ryy given by which may also be written in the recursive form
Ryy = BR BT (11.8.4)

n 1
n−
where B is unit lower-triangular, and R is diagonal. This was discussed in Sec. 1.6. x̂n/n = E[xn i ]E[i i ]−1 i = E[xn i ]E[i i ]−1 i + Gn n , or,
Inserting this into Eq. (11.8.3) we find i=0 i=0

Rey = Rxy − HRyy = Rxy − HBR BT (11.8.5) x̂n/n = x̂n/n−1 + Gn n (11.8.9)


−1
Multiplying by the inverse transpose of B we obtain where we made an obvious change in notation, and Gn = E[xn n ]E[n n ] . This is
identical to Eq. (11.6.22); in the stationary case, Gn is a constant, independent of n.
Rey B−T = Rxy B−T − HBR (11.8.6) We can also recast the nth estimate in “batch” form, expressed directly in terms of the
observation vector yn = [y0 , y1 , . . . , yn ]T . By considering the n×n subblock part of the
Now, the matrix B−T is unit upper-triangular, but Rey is strictly upper, therefore, Gram-Schmidt construction, we may write yn = Bn n , where Bn is unit lower-triangular.
the product Rxy B−T will be strictly upper. This can be verified easily for any two such Then, x̂n can be expressed as
matrices. Extracting the lower-triangular parts of both sides of Eq. (11.8.6) we find
 −1 T −1
x̂n = E[xn T n T
n ]E[
T
n ]  n = E[xn yn ]E[yn yn ] yn
0 = Rxy B−T + − HBR
which is identical to Eq. (11.2.8).
where we used the fact that the left-hand side was strictly upper and that the term
HBR was already lower-triangular. The notation [ ]+ denotes the lower triangular
part of a matrix including the diagonal. We find finally

H = Rxy B−T − 1 −1
+ R B (11.8.7)
500 11. Wiener Filtering 11.9. The Kalman Filter 501

11.9 The Kalman Filter We find also

The Kalman filter discussion of Sec. 11.6 and its equivalence to the Wiener filter was n = yn − ŷn/n−1 = (cxn + vn )−cx̂n/n−1 = cen/n−1 + vn
based on the asymptotic Kalman filter for which the observations were available from
the infinite past to the present, namely, {yi , −∞ < i ≤ n}. In Sec. 11.7, we solved the Using the fact that en/n−1 depends only on xn and Yn−1 , it follows that the two terms
most general Wiener filtering problem based on the finite past for which the observation in the right-hand side are uncorrelated with each other. Thus,
space was
E[2n ]= c2 E[e2n/n−1 ]+E[vn2 ]= c2 Pn/n−1 + R (11.9.5)
Yn = {y0 , y1 , . . . , yn } (11.9.1)
Here, we recast these results in a time-recursive form and obtain the time-varying also
Kalman filter for estimating xn based on the finite observation subspace Yn . We also E[n xn ]= cE[en/n−1 xn ]+E[vn xn ]= cPn/n−1 (11.9.6)
discuss its asymptotic properties for large n and show that it converges to the steady-
Therefore, the gain Gn is computable by
state Kalman filter of Sec. 11.6.
Our discussion is based on Eq. (11.8.9), which is essentially the starting point in E[n xn ] cPn/n−1
Gn = = (11.9.7)
Kalman’s original derivation [852]. To make Eq. (11.8.9) truly recursive, we must have a E[2n ] R + c2 Pn/n−1
means of recursively computing the required gain Gn from one time instant to the next.
As in Sec. 11.8, we denote by x̂n/n and x̂n/n−1 the optimal estimates of xn based on the Using Eqs. (11.9.4), (11.9.6), and (11.9.7) into Eq. (11.9.3), we obtain
observation subspaces Yn and Yn−1 , defined in Eq. (11.9.1), with the initial condition c2 Pn/n−1 RPn/n−1
x̂0/−1 = 0. Iterating the state and measurement models (11.6.6) and (11.6.7) starting at Pn/n = Pn/n−1 − Gn cPn/n−1 = Pn/n−1 − = (11.9.8)
R + c2 Pn/n−1 R + c2 Pn/n−1
n = 0, we obtain the following two results, previously derived for the steady-state case
The subtracted term in (11.9.8) represents the improvement in estimating xn using
x̂n+1/n = ax̂n/n , ŷn/n−1 = cx̂n/n−1 (11.9.2) x̂n/n over using x̂n/n−1 . Equations (11.9.3), (11.9.7), and (11.9.8) admit a nice geometrical
The proof of both is based on the linearity property of estimates; for example, interpretation [867]. The two right-hand side terms in n = cen/n−1 + vn are orthogonal
and can be represented by the orthogonal triangle
x̂n+1/n = ax!
n + wn = ax̂n/n + ŵn/n = ax̂n/n

where ŵn/n was set to zero because wn does not depend on any of the observations
Yn . This is seen as follows. The iteration of the state equation (11.6.6) leads to the
expression xn = an x0 + an−1 w0 + an−2 w1 + · · · + awn−2 + wn−1 . It follows from this
and Eq. (11.6.7) that the observation subspace Yn will depend only on

{x0 , w0 , w1 , . . . , wn−1 , v0 , v1 , . . . , vn } where the prediction error en/n−1 has been scaled up by the factor c. Thus, Eq. (11.9.5)
is the statement of the Pythagorean theorem for this triangle. Next, write the equation
Making the additional assumption that x0 is uncorrelated with wn it follows that
en/n = en/n−1 − Gn n as
wn will be uncorrelated with all random variables in the above set, and thus, with Yn .
en/n−1 = en/n + Gn n
The second part of Eq. (11.9.2) is shown by similar arguments. Next, we develop the
recursions for the gain Gn . Using Eq. (11.8.9), the estimation and prediction errors may Because en/n is orthogonal to all the observations in Yn and n is a linear combination
be related as follows of the same observations, it follows that the two terms in the right-hand side will be
orthogonal. Thus, en/n−1 may be resolved in two orthogonal parts, one being in the
en/n = xn − x̂n/n = xn − x̂n/n−1 − Gn n = en/n−1 − Gn n
direction of n . This is represented by the smaller orthogonal triangle in the previous
Taking the correlation of both sides with xn we find diagram. Clearly, the length of the side en/n is minimized at right angles at point A. It
follows from the similarity of the two orthogonal triangles that
E[en/n xn ]= E[en/n−1 xn ]−Gn E[n xn ] (11.9.3) " "
Gn E[2n ] c E[e2n/n−1 ]
Using the orthogonality properties E[en/n x̂n/n ]= 0 and E[en/n−1 x̂n/n−1 ]= 0, which " = "
follow from the optimality of the two estimates x̂n/n and x̂n/n−1 , we can write the mean- E[e2n/n−1 ] E[2n ]
square estimation and prediction errors as
which is equivalent to Eq. (11.9.7). Finally, the Pythagorean theorem applied to the
Pn/n = E[e2n/n ]= E[en/n xn ] , Pn/n−1 = E[e2n/n−1 ]= E[en/n−1 xn ] (11.9.4) smaller triangle implies E[e2n/n−1 ]= E[e2n/n ]+G2n E[2n ], which is equivalent to Eq. (11.9.8).
502 11. Wiener Filtering 11.9. The Kalman Filter 503

To obtain a truly recursive scheme, we need next to find a relationship between constant, we expect the solution of the Riccati equation (11.9.12) to converge, for large
Pn/n and the next prediction error Pn+1/n . It is found as follows. From the state model n, to some steady-state value Pn/n−1 → P. In this limit, the Riccati difference equation
(11.6.6) and (11.9.2), we have (11.9.12) tends to the steady-state algebraic Riccati equation (11.6.18), which determines
the limiting value P. The Kalman filter parameters will converge to the limiting values
en+1/n = xn+1 − x̂n+1/n = (axn + wn )−ax̂n/n = aen/n + wn fn → f , Kn → K, and Gn → G given by Eq. (11.6.19).
It is possible to solve Eq. (11.9.12) in closed form and explicitly demonstrate these
Because en/n depends only on xn and Yn , it follows that the two terms in the right- convergence properties. Using the techniques of [871,872], we obtain
hand side will be uncorrelated. Therefore, E[e2n+1/n ]= a2 E[e2n/n ]+E[wn
2
], or,
f 2n E0
Pn+1/n = a2 Pn/n + Q (11.9.9) Pn/n−1 = P + , for n = 0, 1, 2, . . . , (11.9.13)
1 + Sn E0

The first term corresponds to the propagation of the estimate x̂n/n forward in time where E0 = P0/−1 − P and
according to the system dynamics; the second term represents the worsening of the
estimate due to the presence of the dynamical noise wn . The Kalman filter algorithm is 1 − f 2n c2
Sn = B , B=
now complete. It is summarized below: 1 − f2 R + c2 P

We have already mentioned (see Problem 1.50) that the stability of the signal model
0. Initialize by x̂0/−1 = 0 and P0/−1 = E[x20 ].
and the positivity of the asymptotic solution P imply the minimum phase condition
1. At time n, x̂n/n−1 , Pn/n−1 , and the new measurement yn are available.
|f | < 1. Thus, the second term of Eq. (11.9.13) converges to zero exponentially with a
2. Compute ŷn/n−1 = cx̂n/n−1 , n = yn − ŷn/n−1 , and the gain Gn using Eq. (11.9.7). time constant determined by f .
3. Correct the predicted estimate x̂n/n = x̂n/n−1 +Gn n and compute its mean-square
error Pn/n , using Eq. (11.9.8). Example 11.9.1: Determine the closed form solutions of the time-varying Kalman filter for the
state and measurement models:
4. Predict the next estimate x̂n+1/n = ax̂n/n , and compute the mean-square predic-
tion error Pn+1/n , using Eq. (11.9.9).
xn+1 = xn + wn , yn = xn + vn
5. Go to the next time instant, n → n + 1.
with Q = 0.5 and R = 1. Thus, a = 1 and c = 1. The Riccati equations are
The optimal predictor x̂n/n−1 satisfies the Kalman filtering equation
Pn/n−1 P
Pn+1/n = + 0.5 , P= + 0 .5
x̂n+1/n = ax̂n/n = a(x̂n/n−1 + Gn n )= ax̂n/n−1 + aGn (yn − cx̂n/n−1 ) , or, 1 + Pn/n−1 1+P

x̂n+1/n = fn x̂n/n−1 + Kn yn (11.9.10) The solution of the algebraic Riccati equation is P = 1. This implies that f = aR/(R +
c2 P)= 0.5. To illustrate the solution (11.9.13), we take the initial condition to be zero
where we defined
P0/−1 = 0. We find B = c2 /(R + c2 P)= 0.5 and
Kn = aGn , fn = a − cKn (11.9.11)
2 
These are the time-varying analogs of Eqs. (11.6.17) and (11.6.19). Equations (11.9.8) Sn = 1 − (0.5)2n
3
and (11.9.9) may be combined into one updating equation for Pn/n−1 , known as the
discrete Riccati difference equation Thus,

a2 RPn/n−1 (0.5)2n 1 − (0.5)2n


Pn+1/n = +Q (11.9.12) Pn/n−1 = 1 −  = 1 + 2(0.5)2n
R + c2 Pn/n−1 2 2n
1− 1 − (0.5)
3
It is the time-varying version of Eq. (11.6.18). We note that in deriving all of the
above results, we did not need to assume that the model parameters {a, c, Q, R} were The first few values calculated from this formula are
constants, independent of time. They can just as well be replaced by time-varying model 1 5 21
parameters: P1/0 = , P2/1 = , P3/2 = ,...
2 6 22
{an , cn , Qn , Rn }
and quickly converge to P = 1. They may also be obtained by iterating Eq. (11.9.12). 

The asymptotic properties of the Kalman filter depend, of course, on the particular
time variations in the model parameters. In the time-invariant case, with {a, c, Q, R}
504 11. Wiener Filtering 11.10. Problems 505

11.10 Problems Determine the optimal estimate of x(n) based on just these two samples in the form

11.1 Let x = [xna , . . . , xnb ]T and y = [yna , . . . , ynb ]T be the desired and available signal vectors. x̂(n)= h(n, na )y(na )+h(n, nb )y(nb )
The relationship between x and y is assumed to be linear of the form
for the following values of n: (a) na ≤ n ≤ nb , (b) n ≤ na , (c) n ≥ nb .
y = Cx + v 11.6 A stationary random signal xn is to be estimated on the basis of the noisy observations

where C represents a linear degradation and v is a vector of zero-mean independent gaussian yn = xn + vn


samples with a common variance σv2 . Show that the maximum likelihood (ME) estimation
It is given that
criterion is in this case equivalent to the following least-squares criterion, based on the
quadratic vector norm: 1
Sxx (z)= , Svv (z)= 5, Sxv (z)= 0
(1 − 0.5z−1 )(1 − 0.5z)
E= y − Cx 2
= minimum with respect to x
(a) Determine the optimal realizable Wiener filter for estimating the signal xn on the
Show that the resulting estimate is given by basis of the observations Yn = {yi , i ≤ n}. Write the difference equation of this filter.
Compute the mean-square estimation error.
x̂ = (CT C)−1 CT y
(b) Determine the optimal realizable Wiener filter for predicting one step into the future;
that is, estimate xn+1 on the basis of Yn .
11.2 Let x̂ = Hy be the optimal linear smoothing estimate of x given by Eq. (11.1.5). It is obtained
by minimizing the mean-square estimation error En = E[e2n ] for each n in the interval (c) Cast the results of (a) and (b) in a predictor/corrector Kalman filter form, and show
[na , nb ]. explicitly that the innovations residual of the observation signal yn is identical to the corre-
sponding whitening sequence n driving the signal model of yn .
(a) Show that the solution for H also minimizes the error covariance matrix
11.7 Repeat the previous problem for the following choice of state and measurement models
Ree = E[eeT ]
xn+1 = xn + wn , yn = x n + v n
where e is the vector of estimation errors e = [ena , . . . , enb ]T .
where wn and vn have variances Q = 0.5 and R = 1, respectively.
(b) Show that H also minimizes every quadratic index of the form, for any positive semi-
definite matrix Q : 11.8 Consider the state and measurement equations
E[eT Q e]= min
xn+1 = axn + wn , yn = cxn + vn
(c) Explain how the minimization of each E[e2n ] can be understood in terms of part (b).
as discussed in Sec. 11.6. For any value of the Kalman gain K, consider the Kalman predic-
11.3 Consider the smoothing problem of estimating the signal vector x from the signal vector y. tor/corrector algorithm defined by the equation
Assume that x and y are linearly related by
x̂n+1/n = ax̂n/n−1 + K(yn − cx̂n/n−1 )= f x̂n/n−1 + Kyn (P.1)
y = Cx + v
where f = a − cK. The stability requirement of this estimation filter requires further that K
and that v and x are uncorrelated from each other, and that the covariance matrices of x be such that |f | < 1.
and v, Rxx and Rvv , are known. Show that the smoothing estimate of x is in this case (a) Let en/n−1 = xn − x̂n/n−1 be the corresponding estimation error. Assuming that all
T T −1 signals are stationary, and working with z-transforms, show that the power spectral density
x̂ = Rxx C [CRxx C + Rvv ] y
of en/n−1 is given by
Q + K2 R
11.4 A stationary random signal has autocorrelation function Rxx (k)= σx2 a|k| , for all k. The See (z)=
(1 − f z−1 )(1 − f z)
observation signal is yn = xn + vn , where vn is a zero-mean, white noise sequence of
variance σv2 , uncorrelated from xn . (b) Integrating See (z) around the unit circle, show that the mean-square value of the
(a) Determine the optimal FIR Wiener filter of order M = 1 for estimating xn from yn . estimation error is given by

(b) Repeat for the optimal linear predictor of order M = 2 for predicting xn on the basis Q + K2 R Q + K2 R
E = E[e2n/n−1 ]= = (P.2)
of the past two samples yn−1 and yn−2 . 1 − f2 1 − (a − cK)2
11.5 A stationary random signal x(n) has autocorrelation function Rxx (k)= σx2 a|k| , for all k.
(c) To select the optimal value of the Kalman gain K, differentiate E with respect to K
Consider a time interval [na , nb ]. The random signal x(n) is known only at the end-points
and set the derivative to zero. Show that the resulting equation for K can be expressed in
of that interval; that is, the only available observations are
the form
caP
y(na )= x(na ), y(nb )= x(nb ) K=
R + c2 P
506 11. Wiener Filtering 11.10. Problems 507

where P stands for the minimized value of E; that is, P = Emin . for n < i. To do this, introduce a set of Lagrange multipliers Λni for n < i, one for each
(d) Inserting this expression for K back into the expression (P.2) for E, show that the constraint equation, and incorporate them into an effective performance index
quantity P must satisfy the algebraic Riccati equation
J = E[eeT ]+ΛHT + HΛT = min
a2 RP
Q =P−
R + c2 P where the matrix Λ is strictly upper-triangular. Show that this formulation of the minimiza-
tion problem yields exactly the same solution as Eq. (11.8.7).
Thus, the resulting estimator filter is identical to the optimal one-step prediction filter dis-
11.13 Exponential Moving Average as Wiener Filter. The single EMA filter for estimating the local
cussed in Sec. 11.6.
level of a signal that we discussed in Chap. 6 admits a nice Wiener-Kalman filtering interpre-
11.9 Show that Eq. (P.2) of Problem 11.8 can be derived without using z-transforms, by using only tation. Consider the noisy random walk signal model,
stationarity, as suggested below: Using the state and measurement model equations and
Eq. (P. l), show that the estimation error en/n−1 satisfies the difference equation xn+1 = xn + wn
(11.10.1)
en+1/n = f en/n−1 + wn − Kvn yn = xn + vn

Then, invoking stationarity, derive Eq. (P.2). Using similar methods, show that the mean- where wn , vn are mutually uncorrelated, zero-mean, white noise signals of variances Q = σw 2

square estimation error is given by and R = σv . Based on the material in Section 12.6, show that the optimum Wiener/Kalman
2

filter for predicting xn from yn is equivalent to the exponential smoother, that is, show that
RP it is given by,
E[e2n/n ]=
R + c2 P x̂n+1/n = f x̂n/n−1 + (1 − f )yn (11.10.2)

where en/n = xn − x̂n/n is the estimation error of the optimal filter (11.6.13). so that the forgetting-factor parameter λ of EMA is identified as the closed-loop parameter
11.10 Consider the general example of Sec. 11.6. It was shown there that the innovations residual f of the Kalman filter, and show further that f is given in terms of Q, R as follows,
was the same as the whitening sequence n driving the signal model of yn "
Q 2 + 4QR − Q
1−f =
n = yn − ŷn/n−1 = yn − cx̂n/n−1 2R

Show that it can be written as Show also the x̂n+1/n = x̂n/n .


n = cen/n−1 + vn
a. For the following values σw = 0.1 and σv = 1, generate N = 300 samples of xn , yn
where en/n−1 = xn − x̂n/n−1 is the prediction error. Then, show that from Eq. (11.10.1) and run yn through the equivalent Kalman filter of Eq. (11.10.2)
to compute x̂n/n−1 . On the same graph, plot all three signals yn , xn , x̂n/n−1 versus
σ2 = E[2n ]= R + c2 P 0 ≤ n ≤ N − 1. An example graph is shown at the end.
b. A possible way to determine λ or f from the data yn is as follows. Assume a tentative
11.11 Computer Experiment. Consider the signal and measurement model defined by Eqs. (11.6.6)
value for λ, compute x̂n/n−1 , then the error en/n−1 = xn − x̂n/n−1 , and the mean-square
through (11.6.8), with the choices a = 0.9, c = 1, Q = 1 − a2 , and R = 1. Generate 1500
error: 
samples of the random noises wn and vn . Generate the corresponding signals xn and yn
MSE(λ)= e2n/n−1
according to the state and measurement equations. Determine the optimal Wiener filter of n
the form (11.6.13) for estimating xn on the basis of yn . Filter the sequence yn through the
Repeat the calculation of MSE(λ) over a range of λs, for example, 0.80 ≤ λ ≤ 0.95,
Wiener filter to generate the sequence x̂n/n .
chosen such that the interval [0.80, 0.95] contain the true λ. Then find that λ that
(a) On the same graph, plot the desired signal xn and the available noisy version yn for
minimizes MSE(λ), which should be close to the true value.
n ranging over the last 100 values (i.e., n = 1400–1500.)
Because the estimated λ depends on the particular realization of the model (11.10.1),
(b) On the same graph, plot the recovered signal x̂n/n together with the original signal
generate 20 different realizations of the pair xn , yn with the same Q, R, and for each
xn for n ranging over the last 100 values.
realization carry out the estimate of λ as described above, and finally form the average
(c) Repeat (a) and (b) using a different realization of wn and vn . of the 20 estimated λs. Discuss if this method generates an acceptable estimate of λ
(d) Repeat (a), (b), and (c) for the choice a = −0.9. or f .
11.12 Consider the optimal Wiener filtering problem in its matrix formulation of Sec. 11.8. Let c. Repeat part (b), by replacing the MSE by the mean-absolute-error:
e = x − x̂ = x − Hy be the estimation error corresponding to a particular choice of the

lower-triangular matrix H. Minimize the error covariance matrix Ree = E[eeT ] with respect MAE(λ)= |en/n−1 |
to H subject to the constraint that H be lower-triangular. These constraints are Hni = 0 n
508 11. Wiener Filtering

noisy random walk


8

6
observations yn
signal xn
12
prediction xn/n−1

4 Linear Prediction
2

−2

−4
0 50 100 150 200 250 300
time samples, n
12.1 Pure Prediction and Signal Modeling
In Sec. 1.17, we discussed the connection between linear prediction and signal modeling.
Here, we rederive the same results by considering the linear prediction problem as a
special case of the Wiener filtering problem, given by Eq. (11.4.6). Our aim is to cast
the results in a form that will suggest a practical way to solve the prediction problem
and hence also the modeling problem. Consider a stationary signal yn having a signal
model
Syy (z)= σ2 B(z)B(z−1 ) (12.1.1)

as guaranteed by the spectral factorization theorem. Let Ryy (k) denote the autocorre-
lation of yn :
Ryy (k)= E[yn+k yn ]
The linear prediction problem is to predict the current value yn on the basis of all the
past values Yn−1 = {yi , −∞ < i ≤ n − 1}. If we define the delayed signal y1 (n)= yn−1 ,
then the linear prediction problem is equivalent to the optimal Wiener filtering problem
of estimating yn from the related signal y1 (n). The optimal estimation filter H(z) is
given by Eq. (11.4.6), where we must identify xn and yn with yn and y1 (n) of the present
notation. Using the filtering equation Y1 (z)= z−1 Y(z), we find that yn and y1 (n) have
the same spectral factor B(z)

Sy1 y1 (z)= (z−1 )(z)Syy (z)= Syy (z)= σ2 B(z)B(z−1 )

and also that


Syy1 (z)= Syy (z)z = zσ2 B(z)B(z−1 )
Inserting these into Eq. (11.4.6), we find for the optimal filter H(z)
   
1 Syy1 (z) 1 zσ2 B(z)B(z−1 )
H(z)= = , or,
σ2 B(z) B(z−1 ) + σ2 B(z) B(z−1 ) +

1 
H(z)= zB(z) + (12.1.2)
B(z)

You might also like