Introduction To Adaptive Filters - Simon Haykin
Introduction To Adaptive Filters - Simon Haykin
gee
a ers
Digitized by the Internet Archive
in 2022 with funding from
Kahle/Austin Foundation
httos://archive.org/details/introductiontoad0000hayk
INTRODUCTION TO ADAPTIVE FILTERS
INTRODUCTION TO ADAPTIVE FILTERS
SIMON HAYKIN
Communications Research Laboratory
McMaster University
printing number
WY Bhs)
G7 th 10)
Preface
Chapter 1. Introduction
ipl Filters
1.2. Adaptivity ‘
1.3. Classifications of Sampled-Data and Digital Filters
1.4 Examples of Adaptivity
1.5 What Do These Examples of Adaptivity Have in Common?
Notes
References
Index : 215
- wert ‘Fa 2 7 > | 7
7 : ; ¥ ibe
: 7 yaad "2
» "ex 23? em.) "pede oe Pot S ;
7 ) » “ev bied * st aah
hie 4 a —_ ey : AA, 7
7 Pop-irs »
unc aC —
a » 7 _ =
€ _ 7 a 5
- mn eee:
id »7 a & oa
—
= -
p> Tal? —
ow Se pai -
> oe
fl ee |
e@
=
— -
i.
7 ®
go @
>=
a a
-
—
S ? ae _
7 —- rl
7
|
iF
: 7 7
; 7
.
- — 1°
7 :
Ae. ond)
PREFACE
xi
xii INTRODUCTION TO ADAPTIVE FILTERS
SIMON HAYKIN
Hamilton, Ontario, Canada
CHAPTER
ONE
INTRODUCTION
1.1 FILTERS s
The term “filter” is often used to describe a device in the form of a piece of
physical hardware or computer software that is applied to a set of noisy
data in order to extract information about a prescribed quantity of interest.
The noise may arise from a variety of sources. For example, the data may
have been derived by means of noisy sensors, or may represent a useful
signal component that has been corrupted by transmission through a
communication channel. In any event, we may use a filter to perform three
basic information-processing operations:
1.2 ADAPTIVITY
In the case of adaptive filters using an IIR structure, two major difficulties
may arise due to the feedback paths: (1) the filter may become unstable,
unless special precautions are taken, and (2) the presence of feedback may
Figure 1.1 (a) FIR filter, (b) IIR filter.
6 INTRODUCTION TO ADAPTIVE FILTERS
have an adverse effect on the accuracy with which the filter coefficients have
to be specified. It is for these reasons that in practical applications requiring
the use of adaptive filters, we find that adaptive FIR filters are used almost
exclusively. In this book, we will concentrate on adaptive filters using FIR
structures.
Another filter structure that we will consider is the multistage lattice
filter, so called because each stage of the filter has a latticelike form. This
filter has some interesting properties, which make it an attractive alternative
structure to a tapped-delay-line structure for adaptive filter applications.
Pale ZT
| 43) oO ite ny) h( M)
Adaptive eu)
algorithm
u(n) Unknown
dynamic
system
s
where T is the duration of the signalling interval, and p(t) is the impulse
response of the cascade connection of the transmitting filter, the channel,
and the receiving filter. By sampling u(¢) synchronously with the trans-
mitter, and defining u(n) = u(nT) and p(n) = p(nT), we get
The first term on the right-hand side of Eq. (1.4) defines the desired symbol,
whereas the remaining series represents the intersymbol interference caused
by the combined action of the transmitting filter, the channel, and the
receiving filter. This intersymbol interference, if left unchecked, can result in
erroneous decisions when the sampled signal at the receiving filter output is
compared with some preassigned threshold by means of a decision device.
To overcome the intersymbol interference problem, control of the time
function p(t) is required. In principle, if the characteristics of the channel
are known precisely, then it is virtually always possible to design a pair of
transmitting and receiving filters that will make the effect of intersymbol
interference (at sampling times) arbitrarily small, and at the same time limit
the effect of the additive receiver noise by minimizing the average probabil-
ity of symbol error. In practice, however, we find that a channel is random
in the sense that it is one of an ensemble of possible channels. Accordingly,
the use of a fixed pair of transmitting and receiving filters, designed on the
induy
ynding
AIBUIG
Areurq
eiep S as[nd |, iutsued
un3 WATOIOY
ee) 3 aandepy ISI99q
UOTSIOa erep
JO\e19Ua3 Joyy jouueys Jayy
= Jozipenba dOIAap
Jojdures
andy
ET YOolg WRIZeIP
JO & Puegaseg
BVP UOISSIUSUI]
“WOISAS
10 INTRODUCTION TO ADAPTIVE FILTERS
Decision
device
Switch Test
Input Adaptive
signal
signal equalizer
generator
with a high probability, so that the estimate of the error signal is correct
often enough to allow the adaptive equalizer to maintain proper adjustment
of its coefficients. Another attractive feature of a decision-directed adaptive
equalizer is the fact that it can track slow variations in the channel
characteristics or perturbations in the receiver front end, such as slow jitter
in the sampler phase.
PTD ARE symbol /-/ is used to denote the phoneme, a basic linguistic unit.
12 INTRODUCTION TO ADAPTIVE FILTERS
The frequency response of the vocal-tract filter for unvoiced speech or that
of the vocal tract multiplied by the spectrum of the vocal-cord sound pulses
determines the short-time spectral envelope of the speech signal.
Linear Predictive Coding
The method of /inear predictive coding (LPC) is an example of source
coding. This method is important because it provides not only a powerful
technique for the digital transmission of speech at low bit rates but also
accurate estimates of basic speech parameters.
The development of LPC relies on the model of Fig. 1.5 for the speech
production process. The frequency response of the vocal tract for unvoiced
speech or that of the vocal tract multiplied by the spectrum of the vocal-cord
sound pulse for voiced speech is described by the transfer function
Hae M o (1.5)
1+ >) a(k)z&
wey
where G is a gain parameter and z ' is the unit-delay operator. The form of
excitation applied to this filter is changed by switching between voiced and
Pitch
period
Impulse
train
generator
Vocal-cord
sound
pulse
Vocal-
Voiced /unvoiced switch Synthesized
tract
speech
filter
White-
noise
generator
Vocal-tract
parameters
Figure 1.5 Block diagram of simplified model for the speech production process.
INTRODUCTION 13
unvoiced sounds. Thus the filter with transfer function H(z) is excited by a
sequence of impulses to generate voiced sounds or a white-noise sequence to
generate unvoiced sounds.
In linear predictive coding, as the name implies, linear prediction is used
to estimate the speech parameters. Given a set of past samples of a speech
signal, namely, u(n — 1),u(n — 2),...,u(n — M), a linear prediction of
u(n), the present sample value of the signal, is defined by
M
i(n|n—1,...,.n-M)=Y h(k)u(n—k) (1.6)
k=1
The predictor coefficients, h(1), h(2),...,h(M), are optimized by minimiz-
ing the mean square value of the prediction error, e(n), defined as the
difference between u(n) and i(n|n — 1,...,n — M). The use of the mini-
mum-mean-squared-error criterion for optimizing the predictor may be
justified for two basic reasons:
1. If the speech signal satisfies the model described by Eq. (1.5), and if the
mean square value of the error signal e(7) is minimized, then we find
that e() equals the excitation x(n) multiplied by the gain parameter G
in the model of Fig. 1.5, and a(k) = —h(k), k = 1,2,..., M.* Thus the
error signal e(n) consists of a train of impulses in the case of voiced
sounds or a white noise sequence in the case of unvoiced sounds. In
either case, the error signal e(n) would be small most of the time.
2. The use of the minimum-mean-squared-error criterion leads to tractable
mathematics.
elise relationship between the set of predictor coefficients, {h(k)}, and the set of all-pole
filter coefficients, {a(k)}, is derived in Chapter 3.
14 INTRODUCTION TO ADAPTIVE FILTERS
Pitch
detector
From Reproduction
Speech
channel Decoder of speech
synthesizer
output signal
(b)
Figure 1.6 Block diagram of LPC vocoder: (a) transmitter, (b) receiver.
receiver uses these parameters to synthesize the speech signal by utilizing the
model of Fig. 1.5.
Waveform Coding
In waveform coding the operations performed on the speech signal are
designed to preserve the shape of the signal. Specifically, the operations
include sampling (time discretization) and quantization (amplitude discreti-
zation). The rationale for sampling follows from a basic property of all
speech signals, namely, they are bandlimited. This means that a speech
signal can be sampled in time at a finite rate in accordance with the
sampling theorem. For example, commercial telephone networks designed
to transmit speech signals occupy a bandwidth from 200 to 3200 Hz. To
satisfy the sampling theorem, a conservative sampling rate of 8 kHz is
commonly used in practice. Quantization is justified on the following
grounds. Although a speech signal has a continuous range of amplitudes
(and therefore its samples also have a continuous amplitude range), never-
theless, it is not necessary to transmit the exact amplitudes of the samples.
Basically, the human ear (as ultimate receiver) can only detect finite
amplitude differences.
Examples of waveform coding include pulse-code modulation (PCM)
and differential pulse-code modulation (DPCM). In PCM, as used in tele-
phony, the speech signal (after low-pass filtering) is sampled at the rate of
8 kHz, nonlinearly quantized, and then coded into 8-bit words, as in Fig.
1.7(a). The result is a good signal-to-quantization-noise ratio over a wide
dynamic range of input signal levels. DPCM involves the use of a predictor
as in Fig. 1.7(b). The predictor is designed to exploit the correlation that
exists between adjacent samples of the speech signal, in order to realize a
Sampled
speech Nonuniform PCM
input quantizer wave
(a)
Sampled
speech & Quantizer Bees
(b)
Adaptive
algorithm
Sampled
speech eae
av
eed
algorithm
(c)
Figure 1.7 Waveform coders: (a) PCM, (b) DPCM, (c) ADPCM.
15
16 INTRODUCTION TO ADAPTIVE FILTERS
reduction in the number of bits required for the transmission of each sample
of the speech signal and yet maintain a prescribed quality of performance.
This is achieved by quantizing and then coding the prediction error that
results from the subtraction of the predictor output from the input signal. If
the prediction is optimized, the variance of the prediction error will be
significantly smaller than that of the input signal, so that a quantizer with a
given number of levels can be adjusted to produce a quantizing error with a
smaller variance than would be possible if the input signal were quantized
directly as in a standard PCM system. Equivalently, for a quantizing error
of prescribed variance, DPCM requires a smaller number of quantizing
levels (and therefore a smaller bit rate) than PCM.
Differential pulse-code modulation uses a fixed quantizer and a fixed
predictor. A further reduction in the transmission rate can be achieved by
using an adaptive quantizer and an adaptive predictor, as in Fig. 1.7(c). This
type of waveform coding is called adaptive differential pulse-code modulation
(ADPCM). An adaptive predictor is used in order to account for the
nonstationary nature of speech signals.
when person A on the left speaks, his speech should follow the upper
transmission path to the hybrid on the right and from there be directed to
the two-wire circuit. In practice, however, not all the speech energy is
directed to this two-wire circuit, with the result that some is returned along
the lower four-wire path to be heard by the person on the left as an echo
that is delayed by 540 ms.
To overcome this problem, echo cancellers are installed in the network
in pairs, as illustrated in Fig. 1.9(a). The cancellation is achieved by making
an estimate of the echo and subtracting it from the return signal. The
underlying assumption here is that the echo return path, from the point
where the canceller bridges to the point where the echo estimate is sub-
tracted, is linear and time-invariant.
Thus, referring to the single canceller in Fig. 1.9(b) for definitions, the
return signal at time n may be expressed as
see k=0SA =n
Speaker Echo
Echo
A canceller
canceller
Speaker
Echo B
canceller
Figure 1.9 (a) Satellite circuit with a pair of echo suppressors. (b) Signal definitions.
18 INTRODUCTION TO ADAPTIVE FILTERS
where u(n), u(n — 1),..., are samples of the far-end speech (from speaker
A), v(n) is the near-end speech (from speaker B) plus any additive noise at
time n, and {h(k)} is the impulse response of the echo path. The echo
canceller makes an estimate {hi k)} of the impulse response of the echo
path, and then estimates the echo as the convolution sum
Adaptive
algorithm
Input = u(n)
signal
where A is equal to or greater than the sample period. The main function of
the delay parameter A is to remove correlation that may exist between the
noise component in the original input signal u(n) and the noise component
in the delayed predictor input u(n — A). For this reason, the delay parame-
ter A is called the decorrelation parameter of the ALE. An ALE may thus be
viewed as an adaptive filter that is designed to suppress broad-band
components (e.g., white noise) contained in the input while at the same time
passing narrow-band components (e.g., sine waves) with little attenuation.
In other words, it can be used to enhance the presence of sine waves (whose
spectrum consists of harmonic J/ines) in an adaptive manner—hence the
name.
Table 1.1
Application Desired response
Adaptive prediction (as used in the digital Present value of the input signal
representation of speech signals, and adaptive
line enhancement)
1.6 NOTES
cation network. Flanagan et al. [26] present a most detailed review of the
subject, emphasizing the many practical issues involved in speech-coder
design. Gibson [27] focuses on the analysis and design of adaptive predic-
tors for the differential encoding of speech. The books by Flanagan [28],
Markel and Gray [29], and Rabiner and Schafer [30] are devoted to the
various issues involved in the analysis and synthesis of speech, and its
coding.
The initial work on the use of adaptivity for echo cancellation started
around 1965. It appears that Kelly was the first to propose the use of an
adaptive filter (with the speech signal itself utilized in performing the
adaptation) for echo cancellation. Kelly’s contribution is recognized in the
paper by Sondhi [31]. This invention and its refinement are described in
the patents by Kelly and Logan [32] and Sondhi [33]. The description given
in Example 4 on echo cancellation is based on the paper by Duttweiler and
Chen [34].
The adaptive line enhancer was originated by Widrow and his co-
workers. An early version of this device was built in 1965 to cancel 60-Hz
interference at the output of an electrocardiographic amplifier and recorder.
This work is described in the paper by Widrow et al. [35]. The adaptive line
enhancer and its application as an adaptive detector are patented by
McCool et al. (36, 37].
It should be noted that although the echo canceller and the adaptive
line enhancer are intended for different applications, nevertheless, they
represent special forms of the adaptive noise canceller [35].
Falconer [38] presents an overview of adaptive-filter theory and its
applications to adaptive equalization, adaptive prediction in speech coding,
and echo cancellation.
The theory of adaptive filters (operating on a time series) is closely
related to that of adaptive antennas (operating on blocks of spatial samples).
For material on adaptive antennas, the reader is referred to the book by
Monzingo and Miller [39] and the collection of papers edited by Haykin
[40].
REFERENCES
TWO
WIENER FILTERS
Estimation theory deals with the intelligent use of information derived from
observations in order to make optimum decisions about physical parameters
of interest, with the decision being weighted by all available information. By
information we mean data of practical value in the decision-making process.
The subject of estimation theory is a vast one. However, our interest will be
limited to linear estimation performed by discrete-time devices whose im-
pulse response has a finite duration. In this chapter we will consider a
tapped-delay-line filter to perform the estimation, and use the classical
Wiener filter theory for the statistical characterization of the problem.
where w(t) is the noise. We are interested in the use of discrete-time devices
to process the received signal (7). To accommodate this requirement, let
the signal u(t) be sampled uniformly at a rate equal to 1/T samples per
second, where T is the sample period. The result of this sampling process is
24
WIENER FILTERS 25
s(n) Discrete-time
linear y(n)
filter
of k-1:
p(k-1)=E[d(n)u(n-—k+1)], k=1,2,...,M (2.16)
We may therefore rewrite the single summation term on the right-hand
side of val(2.14) as follows:
M M
Thus, substituting Eqs. (2.15), (2.17), and (2.19) in (2,14), we find that the
expression for the mean squared error ae be rewritten in the form
c= P= 2¥ hl
k) p( ees Ak h(m)r(m—k) (2.20)
k=1 m=1
Equation (2.20) states that, for the case when the desired response
{d(n)} and the input signal { u(7)} are jointly stationary, the mean squared
error e€ is precisely a second-order function of the tap coefficients
h(i), h(2),...,h(M) of the tapped-delay-line filter. Accordingly, we may
visualize the dependence of the mean squared error ¢ on the tap coefficients
as a bowl-shaped surface with a unique minimum. We refer to this surface
as the error-performance surface of the tapped-delay-line filter. The require-
ment is to design this filter so that it operates at the bottom or minimum
point of the error-performance surface.
The mean squared error e attains its minimum value when its deriva-
tives with respect to the tap coefficients h(k), for k =1,2,...,M, are
simultaneously zero. Differentiating the expression for mean squared error
e, defined by Eq. (2.20), with respect to h(k), we get
M
de
an(k) PCR W422 h(m)r(m — k) (2.21)
m=1
Setting this result equal to zero, we obtain the optimum values of the tap
coefficients. Let these values be denoted by hy(1), ho(2),...,9(M). They
WIENER FILTERS 29
x hol (Tie
We)eaapd eter she) 8 6 yd
2s ed AT a (2.22)
m=1
Let e,,;, denote the minimum value of the mean squared error, which results
when the tapped-delay-line filter assumes its optimum condition. Using
ho (1), Ao(2),..., 49(M) for the tap coefficients in Eq. (2.20), we get
By substituting Eq. (2.22) in (2.23), we may simplify the expression for the
minimum mean squared error as follows:
M
E BG ay sz hy(k) p(k - 1) (2.24)
Ra
We refer to a tapped delay line filter whose impulse response is defined
by the normal equations (2.22) as optimum in the mean-square sense. There
is no other linear filter that we can design which can produce a mean
squared error [between the desired response d(n) and the filter output y(7)]
smaller in value than the minimum mean squared error ¢,,;, of Eq. (2.24).
The normal equations (2.22) define the tap coefficients of the optimum
tapped-delay-line filter in the minimum-mean-square sense. We may rewrite
this set of equations by using the definitions of Eqs. (2.16) and (2.18) for the
cross-correlation function p(k — 1) and autocorrelation function r(k — m),
30 INTRODUCTION TO ADAPTIVE FILTERS
respectively, as follows:
M
y Ay(m)E[u(n — m+ 1)u(n—k +1)] = E[d(n)u(n — k + 1),
m=1
ke Wy 2k W225)
e|d(n) — ¥ hy(m)u(n
— m+ 1)Ju(n—k+ 1)) = 0,
m=1 ee
k= es M (2.26)
However, the summation term inside the square brackets in Eq. (2.26) is
recognized as the signal y,() resulting at the optimum filter output in
response to the set of input samples u(n), u(n — 1),...,u(n — M + 1). We
may therefore view y(n) as the minimum-mean-square estimate of the
desired response d(n), based on an input signal consisting of the samples
uli), uin=— A)...Wn = Mek). Let: “this estiinate “be “denoted, soy
d(n|n,...,n — M + 1). We may thus write
Vaida ted ae n3M-+1)
M
= DY hy(m)u(n — m+ 1) (2.27)
m=1
where €)(n) = d(n) — yo(n) is the error signal resulting from use of the
optimum filter. Equation (2.28) states that, for the optimum filter, the error
signal and any of the tap inputs are orthogonal. This result is known as the
principle of orthogonality. Hence, we conclude that the two criteria: “mini-
mum mean squared error” and “orthogonality between error and input”
yield identical optimum filters.
As a corollary of the principle of orthogonality, we may also state that
the error signal ey(7) and the optimum filter output y)(”) are orthogonal,
as shown by
d
7)
I
pase eal) ee =k + 1)|
=0
where in the last line we have made use of Eq. (2.28).
Equation (2.29) has an interesting geometric interpretation. If we view
the random variables representing the filter output, the desired response,
and the error signal as vectors, and recall that, by definition, the desired
response equals the filter output plus the error signal, we see that these three
vector quantities are related geometrically as shown in Fig. 2.3. In particu-
lar, the vector ey is drawn “normal” to the vector yy—hence the name
“normal equations.” Clearly, it is only when this condition is satisfied that
the vector representing the error signal attains its minimum length.
h,(1)
h, = tile) (2.30)
ho(M)
2. The m-by-1 cross-correlation vector, whose elements consist of the corre-
lation between the desired response d(n) and the tap inputs u(n),
32 INTRODUCTION TO ADAPTIVE FILTERS
p(0)
pl) (2.31)
p(M— 1)
where the kth element is
p(k-1)=E[d(n)u(n-—k+1)], k=1,2,...,M. (2.32)
3. The M-by-M correlation matrix, whose elements consist of the mean-
square values of the individual tap inputs u(n),u(n — 1),...,
u(n — M + 1), as well as the correlations between these tap inputs, is
given by
r(0) ra) a url ug)
ate ae - - r(1 : M) (2.33)
gree 1) ie 2) th (0)
where
r(m—k)=E[u(n—k+1)u(n—m+1)], Whe
= 1s as M
(2.34)
Note that r(m — k) is the mkth element located at the intersection of row
m and column k of the matrix R.
Thus, using the definitions of Eqs. (2.30), (2.31), and (2.33), we may
rewrite the normal equations (2.22) in matrix form, as follows:
Rh, = p (2535)
This equation represents the discrete-time version of the well-known
Wiener-Hopf equation.
To solve for the coefficient vector of the optimum filter, we premultiply
both sides of Eq. (2.35) by the inverse of the correlation matrix R. Denoting
this inverse by R~', we may thus write
h,=R™“‘p (2.36)
For the inverse matrix R~' to exist, the correlation matrix R has to be
nonsingular. The justification for this is given in the next section.
Correspondingly, we may rewrite the expression for the minimum mean
squared error, given in Eq. (2.24), as follows:
e at ph,
where the 1-by-M vector p’ is the transpose of the vector p. Throughout the
book, we will use the superscript T to indicate matrix transposition.
The correlation matrix R of the filter input plays a key role in the solution
of the optimum filter, as evidenced by the matrix form of the normal
equations in (2.35). The development of efficient procedures for the
computation of this solution capitalizes on certain properties of the correla-
tion matrix, as we will see in subsequent chapters. It is therefore important
that we understand the properties of the correlation matrix R and their
implications.
Using the definition given in Section 2.6, we find that the correlation
matrix R of a stationary process has the following properties:
R’=R (2.38)
R = E[{u(n)u’(n)| (2.39)
where u(7) is the M-by-1 tap-input vector, defined by
u(n)
toe ue )) (2.40)
me ee)
By substituting Eq. (2.40) in (2.39) and expanding, it is a straightforward
matter to show that the result is the same as in Eq. (2.33). Taking the
transpose of both sides of Eq. (2.39), we get the result given in Eq. (2.38).
The statement that R’ = R is equivalent to saying that the m&th
element and Amth element of the correlation matrix R are equal. Accord-
ingly, the expanded form for the correlation matrix R takes on the following
special structure:
PA eae Weta)
34 INTRODUCTION TO ADAPTIVE FILTERS
Q = [41,42,---54a] (2.45)
and the diagonal M-by-M matrix
A =idiag (Ag) N55. Ny) (2.46)
Then it is a straightforward matter to show that the set of equations (2.44) is
equivalent to the single matrix equation:
RQ = QA (2.47)
Q TRO=A (2.48)
A matrix transformation of special interest is the unitary similarity trans-
formation, for which we have
QQ (2.49)
or equivalently
Q'0 =I (2.50)
where Q’ is the transpose of matrix Q. A matrix Q that satisfies this
condition is called a unitary matrix. Accordingly, we may rewrite Eq. (2.48)
in the form
Q’RQ=A tech}
Note that substitution of Eq. (2.45) in (2.50) yields
ip me Lak
0, otherwise (2.52)
between the filter coefficient vector h and its optimum value ho:
y= Q7(h—h,) (2.59)
Using Eq. (2.59) in (2.58), we may then express the mean squared error in
the new form
€ = Enin
+ v/Av (2.60)
Since the matrix A is a diagonal matrix, the quadratic form v‘Ay is in its
canonical condition in that it contains no cross-product terms, as shown by
M
ome Hy Awe (2.61)
ct
where A, is the Ath eigenvalue of the correlation matrix R, and v, is the
kth component of the transformed coefficient error vector vy. The feature
that makes the canonical form of Eq. (2.61) a useful representation of the
error-performance surface is the fact that the components of the vector v
are uncoupled from each other. In other words, the M components of the
transformed coefficient error vector v constitute the principal axes of the
error-performance surface. The significance of this result will become
apparent in the next chapter.
2.9 NOTES
In 1795, Gauss [1] used an estimation procedure called the method of least
squares in his efforts to determine the orbital parameters of the asteroid
Ceres. Accordingly, the method of least squares is credited to Gauss even
though it was first published by Legendre [2] in 1805. Since then, there has
been a vast literature on various aspects of the least-squares method. In
particular, Kolmogorov [3] in 1941 and Wiener [4] in 1942 reintroduced and
reformuiated independently the linear least-squares problem for the filter-
ing, smoothing, and prediction of stochastic processes. Kolmogorov studied
discrete-time problems and solved them by using a recursive orthogonaliza-
tion procedure known as the Wold decomposition. Wiener, on the other
hand, studied the continuous-time problem and formulated the optimum
filter in terms of the famous Wiener—Hopf integral equation, which requires
knowledge of the correlation functions of the signal process. The solution of
this equation is rather difficult for all but the simplest problems. Equation
(2.35) is the matrix form of the discrete-time version of the Wiener—Hopf
equation.
When the filter input is a real-valued stationary process, the correlation
matrix contained in this equation is symmetric, Toeplitz, nonnegative defi-
nite, and almost always positive definite in practice. For a discussion of
Toeplitz matrices, see Grenander and Szegé [5] and Widom [6]. For a
WIENER FILTERS 39
REFERENCES
THREE
LINEAR PREDICTION
40
LINEAR PREDICTION 4]
Consider the time series u(n — 1), u(n — 2),...,u(n — M) obtained from a
stationary process. In the forward linear prediction (FLP) problem, this set
of samples is used to make a prediction of u(n). We refer to this special
form of forward prediction as one-step prediction, as we are looking exactly
one step into the future. Let a(n|n — 1,...,n2 — M) denote the value of
this prediction. Although this notation may at first sight appear cumber-
some, nevertheless, it properly describes the one-step prediction at time n,
given the sample values at times n — 1,...,n — M. With u(n) denoting the
actual sample value of the process at time n, we define the forward
prediction error as
ties appearing in Fig. 3.1 and those in Fig. 2.2; it is given in Table 3.1. Thus,
adapting the normal equations (2.22) to the one-step prediction problem, in
accordance with the correspondences indicated in the table, we may write
Phm 4 E| fa(n)|
M
where r(0) is the mean-square value of the desired response u(n), given by
r(0) = E[u*(n)]
and r(m) is the correlation between the desired response and the tap inputs
for lag m = 1,2,...,M.
The normal equations (3.3) for one-step prediction and Eq. (3.5) for the
mean-square value of the forward prediction error are formulated in terms
of the predictor coefficients hy(1), Ao(2),..., 4p(M). We may combine these
equations into a single set by introducing a new set of filter coefficients
Table 3.1
Tapped-delay-line Tapped-delay-line
Description filter of Fig. 3.1 filter of Fig. 2.2
Tap inputs Ul) en) a ees TAP NE). HCA OKO =D) oe 6u(n — M+ 1)
Desired response u(n) d(n)
Error signal fy(n) e(n)
LINEAR PREDICTION 43
defined by
il, m= 0
@y(m)= —ho(m), m=1,...,M (3.6)
0, m>M
Accordingly, we may reformulate the normal equations (3.3) by moving
r(k) inside the summation on the left-hand side, and so write
M
>) ay(m)r(m — k) = 0, Ke A tok Md (3.7)
m=0
Similarly, we may reformulate Eq. (3.5) ty moving r(0) inside the summa-
tion, and so write
M
DS ay(m)r(m) = Pr vg (3.8)
m=0
where u(n — M) plays the role of the desired response. Here again we have
used the subscript M in the symbol b,,(”) to indicate that M input
samples are used to make the backward one-step prediction.
Prediction-error filter
Figure 3.3 Relationship between the predictor and the predictor-error filter.
LINEAR PREDICTION 45
Table 3.2
Tapped-delay-line Tapped-delay-line
Description filter of Fig. 3.4 filter of Fig. 2.2
Ne n= M+ 1)
86 hag E[by,(n)]
M
=r(0)— > g,(m)r(M
—m +1) (3.15)
m=1
where r(0) is the mean-square value of the “desired response” and r(M —
m + 1) is the correlation between the “desired response” and the “tap
inputs”.
If in Eq. (3.13) we replace m with M — m + 1, replace k with M—k
+ 1, and also recognize that, for a stationary real-valued process, r(k — m)
equals r(m — k), we may rewrite this equation in the following equivalent
form:
Comparing Eqs. (3.16) and (3.3), we see that they have the same mathemati-
cal form with
hy(m) = go( M — m + 1), mM Sa ot (3:17)
Equivalently, we may write
9(m) =h,.(M -— m+), R= hake M (3.18)
Equation (3.18) suggests that we may use the forward predictor, with its
coefficients arranged in reverse order as in Fig. 3.5, to compute the back-
ward prediction error b,,(n).
If in Eq. (3.5) we replace m with M — m + 1, and then use Eq. (3.18),
we find that
|e Oe ae (3.19)
That is, for a stationary input process the backward-prediction error by(n)
and the forward prediction error f,,(”) have exactly the same mean-square
value.
LINEAR PREDICTION 47
(3.20)
Next, we replace k — 1 with j and replace m — 1 with /, obtaining the
result
M-1
y Ag(M —-I)rUl-j)=r(M-j), jf=09,1,...,.M—-1
!=0
(3.21)
We could have indeed obtained Eq. (3.21) directly from Eq. (3.20) by
replacing k — 1 with k and m — 1 with m. The only reason for making the
substitutions in two stages was for the sake of clarity. In any event, we
observe in Eg. (3.21) that r(M — k) equals r(m — k) for m = M. Hence,
moving the term r(M — k) inside the summation on the left-hand side of
Eg. (3.21), and also using Eq. (3.6), we may rewrite the normal equations for
backward prediction in terms of the forward prediction-error filter coeffi-
Figure 3.5 Realization of the backward predictor using forward predictor coefficients in reverse
order.
48 INTRODUCTION TO ADAPTIVE FILTERS
cients as follows
M
Y ay(M-—m)r(m-—k)=0, k=0,1,...,.M-—1 (3.22)
m=0
Figure 3.6 Backward prediction-error filter, based on forward prediction-error filter coeffi-
cients.
50 INTRODUCTION TO ADAPTIVE FILTERS
M
Ayw= >) ay(m)r(M+ 1 =m) (3.28)
m=0
M
Aw= > ay(m)r(m- M-1) (3.29)
m=0
We note that the summation on the right-hand side of Eq. (3.29) equals the
summation on the left-hand side of the augmented normal equations (3.9)
for the case of forward linear prediction of order M and with k = M + 1.
We may therefore combine Eqs. (3.9) and (3.29) into a single set of M + 2
simultaneous equations as follows:
M Pees k = ()
In this set of equations the variable k takes on values inside the interval
0 <k <M + 1, whereas the variable m takes on values inside the interval
0 <m<M. Our ultimate aim is to develop a set of augmented normal
equations for forward linear prediction of order M + 1, which requires that
m take on the same range of values as k. To do this, we first recognize that
the prediction-error filter coefficient a,,(M + 1) is zero, because for a filter
of order M this coefficient is nonexistent. This means that ay(M + 1)r(M
+ 1 —k) is also zero, regardless of the value of r(M + 1 —k). The term
ay(M + l)r(M+1-—k) equals ay(m)r(m—k) for m=M +1.
Accordingly, we may extend the summation on the left-hand side of Ed
(3.30) up to M + 1 without affecting the validity of this equation in any
LINEAR PREDICTION 51
Now we see that the summation on the right-hand side of Eq. (3.32) equals
the summation on the left-hand side of Eq. (3.25) with k = —1. Hence, we
may combine Eqs. (3.25) and (3.32) into a single set of M + 2 simultaneous
equations as follows:
M ee k= 1
Here again we see that the variables A and m have different ranges of
values; k lies inside the interval —1 < k < M, whereas m lies inside the
interval 0 < m < M. We may make both & and m take on the same range
of values by again recognizing that the prediction-error filter coefficient
ay (M + 1) is zero. This means that a,,(M + 1)r(—1 — k) 1s also zero,
regardless of the value of r(—1 — k). Since ay(M + 1)r(—1 — k) equals
ady(M— m)r(m—k) for m= —1, it follows that we may extend the
summation on the left-hand side of Eq. (3.33) down to m = —1 without
affecting the validity of this equation in any way. We may thus write
M Ay: k = all
where both & and m now lie inside the same range of values (— 1, M).
The next manipulation we wish to perform is to combine Eqs. (3.31)
and (3.34) together. However, before we can do this, we have to modify Eq.
(3.34) so that both k and m lie inside the range of values (0, /), as they do
in Eq. (3.31). To satisfy this requirement, we replace m with m — 1, and
replace k with k — 1 in Eq. (3.34), and thus rewrite these equations in the
52 INTRODUCTION TO ADAPTIVE FILTERS
equivalent form
We are now ready for the final step. Specifically, we multiply both sides of
Eq. (3.35) by a constant y,,,,, and then add the resultant to Eq. (3.31),
thereby obtaining
M+1
Be [ayy(m) ny iy (Minit 1)|r(m ke)
m=0(0
Pheut+Yu+14o, k=90
Fie) |
goes eee M (3.36)
Auch yyetee a eS ee
The reason for introducing the constant y,,,, is to give us the extra
degree of freedom we need, in order to ensure that this new set of M + 2
simultaneous equations represents the augmented normal equations for
forward linear prediction of order M+ 2. Let ay,,,(0), ay.,(1),
..+5Ay44,(M + 1) denote the coefficients of a prediction-error filter of order
M + 1. Let P, 474, denote the mean-square value of the forward prediction
fuy+1(”) produced at the output of this filter. Then, using the standard form
for the augmented normal equations for forward linear prediction of order
M + 1, we may write
M+1
P, k=0
ay.i(m)r(m—k)= Fy, mi» (3.37)
Z, sg lo, Ke Nace cs Maal
Accordingly, comparing Egs. (3.36) and (3.37), we may make the following
deductions:
Let us now try to summarize the results we have obtained thus far, and
develop physical interpretations for them:
1. Equation (3.42), in effect, states that the constant y,,,, simply equals the
last coefficient, a,,,,(@M +1), of the prediction-error filter of order
Maa.
2. Equation (3.41) has the same mathematical form as the equation that
defines the transmission of power through a terminated two-port net-
work. Because of this analogy, the constant y,,,, is referred to as the
reflection coefficient. Equation (3.41) states that, given the reflection
coefficient y,;,,, and the mean-square value of the forward prediction
error at the output of a filter of order M, we may compute the
mean-square value of the forward prediction error at the output of the
corresponding filter of order M + 1. Note that if the mean-square value
of the forward prediction error is to decrease (or, at worst, remain the
same) as the filter order increases (that is, P, yw, < Py), then we
require that |y,,.,| < 1.
3. The recursive relation of Eq. (3.38) states that, given the reflection
coefficient y,,,, and the coefficients of a prediction-error filter of order
M, we may compute the coefficients of the corresponding prediction-
error filter of order M + 1. This recursive relation is called the Levinson-
Durbin recursion.
To initiate the recursion, we start with the elementary case of a prediction-
error filter of order M = 0. If we put M = 0 in Eq. (3.8), we immediately
find that, since a,)(0) equals one,
aa r(0) (3.43)
where r(0) is the autocorrelation function of the filter input for a lag of zero,
that is, the mean-square value of the filter input.
Example 1
Consider a prediction-error filter of order 3, whose input is a stationary
ergodic process. With u(n), u(n — 1), u(n — 2), u(n — 3) denoting the tap
inputs, we may use the time average
1 M
Pov M1 d, u*(n—m)
m=()
l|
1[u?(n) + u2(n — 1)+u?(n—2)+u?(n—3)| (3.44)
54 INTRODUCTION TO ADAPTIVE FILTERS
as the estimate of P,9.Given P,, and the values of the reflec’ on coefficients
1, Y2. 3, we may proceed as follows:
ae pel av
a,(0) = 1
Gey;
3. For the prediction-error filter of order 2, shown in Fig. 3.7(b), we have
Pes Fe Pra ll = vs )
a0) al
a,(1) = a,(1)+ y24,(1)
a,(2) AD
ie P, (1 — 5)
a,(0) = 1
a3(1) = a,(1) + ¥342(2)
a;(2) = a,(2) + y3a,(1)
a;(3) = y;
Observations
Based on the results of this example, we may make the following observa-
tions:
where P,, is the average of the squared values of the tap inputs.
f,(n)
Figure 3.7 Prediction-error filter of (a) order 1, (b) order 2, (c) order 3.
Do
SO INTRODUCTION TO ADAPTIVE PLETERS
Fy (2) Z| fy(n)]
\/
Let U(z) denote the z-transform of the sequence at the filter input:
U(z) = Z[u(n)]
M
=) uln)z (3.49)
n=()
Ay(z) = Z[ay(k)]
M
= Vaylk)z* (3.50)
k=0
The only reason for using & as the time variable in Eq. (3.50) rather than n
is to conform to the notation used in Eq. (3.47). Then, using the /inearity
and time-shifting properties of the z-transform, as well as the definitions
given in Eqs. (3.48), (3.49), and (3.50), it is shown in Appendix 2 that a
linear convolution sum as in Eq. (3.46) may be transformed as follows:
Equation (3.51) states that the convolution of two sequences in the time
domain is transformed into the product of their respective Z-transforms.
The ratio of the z-transform of a filter output to the z-transform of the
filter input is called the transfer function of the filter. Except for a scaling
factor, the transfer function is uniquely defined by its poles and zeros. The
poles are obtained by solving for the roots of the denominator polynomial,
expressed as a function of z. The zeros are obtained by solving for the roots
of the numerator polynomial of the transfer function, expressed as a
function of z.
Accordingly, A,,(z) represents the transfer function of the prediction-
error filter. From Eq. (3.50) we see that, except for a pole of order M at the
origin, the transfer function Aj,,(z) consists only of zeros. The prediction-
error filter is therefore said to be an all-zero filter.
When the transfer function is evaluated for points on the unit circle,
that is, for z = e/®, we get the frequency response of the filter. Thus, putting
z = e/® in Eq. (3.50), we get the following expression for the frequency
response of a prediction-error filter:
M
Example 2
For the prediction-error filter of order 1, shown in Fig. 3.7(a), the
transfer function equals (using the results of Example 1)
Real part
of z
OW Zero
x Pole
(b)
Figure 3.8 Characteristics of prediction-error filter of order 1 for reflection coefficient y, = 0.5:
(a) pole—zero pattern, (b) amplitude and phase responses.
53
Imaginary part
x Double pole of z
© Double zero
Real part
of z
Unit circle
(a)
Imaginary part
& Double pole of z
Single zero
Real part
of z
Unit circle
(b)
Imaginary part
* Double pole
O Single zero |
Real part
ofz
Unit circle
Figure 3.9 Characteristics of prediction-error filter of order 2:2: (a) pole-zero pattern for
reflection coefficients y; = 0.5 and y, = 0.072; (b) pole—zero pattern for reflection coefficients
y, = 0.5 and y; = —0.6; (c) pole-zero pattern for reflection coefficients y, = 0.5 and y, = 0.2;
(d) amplitude and phase responses for (a), (b), and (c).
60
LINEAR PREDICTION 61
|\A,fe”)|
arg A,(e/))
(dj
AGA ma tony =a
This situation is iMustrated in the pole—zero pattern of Fig. 3.9(a) for
Y, = 0.5
% = 0.072
for which the two zeros lie at z,,z,= —0.268. The corresponding
amplitude and phase responses are shown as curves a in Fig. 3.9(d).
62 INTRODUCTION TO ADAPTIVE FILTERS
2. The two zeros are real and unequal. This occurs when y, and y, satisfy
the condition
AGsie: rn) a)
This situation is illustrated in the pole-zero pattern of Fig. 3.9(b) for
Y= 0
Y= —0.6
for which the two zeros lie at z; = —0.881 and z, = 0.681. The corre-
sponding amplitude and phase responses are shown as curves b in Fig.
3.9(d).
3. The two zeros are complex conjugates. This occurs when y, and y, satisfy
the condition
2
(n+ mn) <%
This situation is illustrated in the pole—zero pattern of Fig. 3.9(c) for
Yor Oo
Yo = 02
for which the zeros lie at z,,z, = —0.3 + j0.332. The corresponding
amplitude and phase responses are shown as curves c in Fig. 3.9(d).
have a magnitude less than one, then all the zeros of the transfer function of
the filter lie inside the unit circle, and the filter is minimum-phase. As a
corollary, we may state that if any one of the reflection coefficients has a
magnitude equal to or greater than one, the prediction-error filter is non-
minimum-phase.
On other point that is noteworthy is the fact that when the forward
prediction-error filter is designed to be minimum-phase, the corresponding
backward prediction-error filter (obtained by reversing the order of the
forward prediction-error filter coefficients, as in Fig. 3.6) is automatically
maximum-phase in that the phase response associated with its amplitude
respond is the maximum possible. In such a case, the zeros of the transfer
function of the backward prediction-error filter are all located outside the
unit circle in the z-plane.
E[w(k)w(n)] = ie se (3.53)
White noise has no information content, in the sense that the value of the
process at time n is uncorrelated with all past values up to and including
time n — 1 (and, indeed, with all future values of the process).
We may now state another important property of a prediction-error
filter. In theory, a prediction-error filter of order M can whiten any
stationary input process represented by the sequence u(n), u(n —
1),...,u(n — M), provided that the order of the filter, M, is sufficiently
large. For this reason, a prediction-error filter designed to whiten a sta-
tionary input process is called a whitening filter. Basically, prediction relies
on the presence of correlation between adjacent samples of the input
process. The implication of this is that as we increase the order of the
prediction-error filter, we successively reduce the correlation between adjac-
ent samples of the process applied to the filter input, until ultimately the
prediction-error process at the filter output consists of a sequence of
uncorrelated samples, and the whitening of the original process is thereby
accomplished.
64 INTRODUCTION TO ADAPTIVE FILTERS
Hag(z) = TAY
Aaa
SAM (3.56)
bao) elke
k=1
The prediction-error filter coefficients are related to the predictor coefficients
by Eq. (3.6). Hence, using Eq. (3.6) in (3.56), we get
Hag(z) = M
AOE) (3:57)
This shows that the transfer function of the AR model in Fig. 3.10 equals
the inverse of the transfer function of the prediction-error filter. Accord-
ingly, the AR model of Fig. 3.10 is often referred to as an inverse filter.
Earlier we indicated that a prediction-error filter is an all-zero filter,
From Eq. (3.57) it follows therefore that an AR model or inverse filter is an
all-pole filter in that its transfer function, except for a multiple zero at the
LINEAR PREDICTION 65
u(n)
wg?
= 7)
u(n
— M)
Predictor
origin, consists only of poles. For the AR model or inverse filter of Fig. 3.10
to be stable, the transfer function Hyp(z) must have all of its poles inside
the unit circle in the z-plane. Equivalently, in view of Eq. (3.57), we may
state that A,,(z) the transfer function of the prediction-error filter must
have all of its zeros inside the unit circle. In other words, the prediction-
error filter, represented by the transfer function A,,(z), must be minimum-
phase.
This restriction on H,yp(z) or Ay(z) may be derived from statistical
considerations, as illustrated in the following two examples.
Example 3
Consider an AR process of order 1, described by
u(n)
= h,(1)u(n — 1) + w(n) (3.58a)
66 INTRODUCTION TO ADAPTIVE FILTERS
where /)(1) is a constant, and { w(1)} is a white noise process of zero mean
and variance o”. We wish to find the mean and autocorrelation function of
the process {u(n)}.
We start by rewriting Eq. (3.58a) in the form of a linear first-order
difference equation:
u(n)
— hy (1)u(n — 1) = w(n) (3.58b)
It is well known that, in the classical method of solving linear difference
equation with constant coefficients, the solution consists of the sum of two
parts: the complementary solution and the particular solution. Here, the
complementary solution is the solution of the homogeneous equation
u(n) — ho(1)u(n — 1) = 0
yielding an exponential function of the form Ch{(1), where C is a constant.
The particular solution is most conveniently obtained by using the unit-
delay operator z~' to relate the delayed sample u(n — 1) to u(n). Specifi-
cally, we may write
u(n — 1) = 2 '[u(n)]
where z | plays the role of an operator. Accordingly, we may rewrite Eq.
(3.58b) as
(1 — Ao(1)z-")[u(n)] = w(x)
Moving the operator et —h,(1)z_') to the right-hand side to operate on
w(n), we have
ea ieee ee
| | =
= —
—
ox —
N
Hn) ll
—
Oa =[48
1
> oe—es —
N aa)
Il
Ms > Or —— —
Ny
~~
= —= —
> ll oS
lI |
Ms = —
—
ox
=
~—”
=
o—~ = —
> ll io)
LINEAR PREDICTION 67
oe he) k=0
ncn “Ky + EC Aes
- : hi (1)w(n — k) + 2 hk (1)w(n — k)
I
Di AG(1)w(n — k) | (3.62)
k=0
This is a geometric series with first term equal to 07h, ‘(1), geometric ratio
equal to A5(1), and number of terms equal to n. Hence, using the formula
for the sum of a geometric series, we may express the autocorrelation
function of u(n) as follows:
E[u(n)u(n
— 1)] = 07ho‘(1) 1 = h"Q) (3.64)
1 — A(1)
5) 4 3 2 1 0 1 2 3 4 5
E[u(n)u(n
— 1)]
E[u(n)u(n = /)| =
o*hy (1)
ey (3.65)
The right-hand side of Eq. (3.65) is now a function of / only, and we may
say that the process {u(n)} is asymptotically stationary up to order 2.
Thus the condition for an autoregressive process of order 1, described
by Eq. (3.58), to be asymptotically stationary up to order 2 is that |A)(1)| < 1.
Remembering that the constant h,(1) may assume a positive or negative
value, we find that the dependence of the autocorrelation function of Eq.
(3.65) on the lag / may take on either one of the two forms shown in Fig.
3.11. If hj (1) > 0, the autocorrelation function decays to zero exponentially
as in Fig. 3.11(a). If, on the other hand, /,(1) < 0, it alternates in sign, as in
Pigs 3-00),
It is also of interest to note that if the general solution given in Eq.
(3.60) is to represent an asymptotically stationary process, then the comple-
mentary solution represented by the first term Ch5(1) must decay to zero as
n approaches infinity. This shows, once again, that the condition for
asymptotic stationary is |h,(1)| < 1. When this condition is satisfied, and
the complementary solution has effectively decayed to zero, we find that the
steady-state behavior of the process {u(n)} is described purely by the
second term of Eq. (3.60). This part of the general solution is therefore
called the stationary solution of Eq. (3.58).
Example 4
Consider next an AR process of order 2, described by
*In this example, we find it convenient to work with a,(1) and a5(2) rather than /o(1)
and ho(2).
70 INTRODUCTION TO ADAPTIVE FILTERS
or
(1 Pia) — p,z~*)[u(n)] = w(n) (3.68)
aries tueeascuee?
=| Ete tS obese lo(
Pim aleDpy*z*[w(n)] — E eh ‘wn
tions also ensure that the autocorrelation function E[u(n)u(n — /)] con-
verges to a finite value as n approaches infinity.
To express the conditions of Eq. (3.71) for asymptotic stationarity in
terms of the coefficients a,(1) and a,(2), we consider the following cases:
1. The roots p, and p, are complex or coincident. This occurs when
4a,(2) = a3(1)
In this case, we have
or equivalently
|a,(1)|< 2
a,(2)
1.0
| |
| |
| |
| |
| |
| | |
/| a,(1)
= 10 0 1.0
Figure 3.12 Illustrating the conditions for the roots p, and p to be complex conjugates.
72 INTRODUCTION TO ADAPTIVE FILTERS
Also we must have 4,(1) > 0 and A,(—1) > 0, where 4,(1) and A,(—1)
are the values of A,(z) for z = 1 and z = —1, respectively. The require-
ment A,(1) > 0 yields
and
a,(1)
Figure 3.13 Illustrating the condition for the roots p, and > to be real and unequal.
LINEAR PREDICTION 73
Figure 3.14 (a) Analysis of a stationary process using prediction-error filtering. (b) Synthesis of
an asymptotically stationary process using an all-pole inverse filter.
74
LINEAR PREDICTION 75
Thus, the two-filter structures of Fig. 3.14 constitute a matched pair. The
prediction-error filter in part (a) of the figure is minimum-phase, with the
zeros of its transfer function located at exactly the same positions (inside the
unit circle in the z-plane) as the poles of the transfer function of the inverse
filter in part (b). This assures the stability of the inverse filter or, equiva-
lently, the asymptotic stationarity of the AR process generated at the output
of this filter. Note also that the impulse response of the prediction-error
filter has a finite duration, whereas the impulse response of the inverse filter
has infinite duration.
The principles described above provide the basics of linear predictive
coding (LPC) vocoders for the transmission and reception of digitized
speech (see Example 3 of Section 1.4).
(3.76)
where in both terms of the last line we have used the fact that a,,(M + 1) is
zero. The first summation term in the right-hand side of Eq. (3.76) is
recognized as the forward prediction error produced by a prediction-error
filter of order M. For the second summation term, we substitute m for
m — 1, and so find that this term is equal to the backward prediction error
produced by a prediction-error filter of order M, but delayed by one sample
period. We may thus simplify Eq. (3.76) as follows
fuel”)
= fa") Fy by (" — 1) (3.17)
Next, we recognize that when a prediction-error filter of order M + 1 is
operated in the backward direction, we have the input-output relation
M+1
byai(n) = DY ayy;(M+1—-m)u(n—-m) (3.78)
m=0(0
(3.80)
LINEAR PREDICTION 77
where in both terms of the last line we have used the fact that a,,(M + 1) is
equal to zero. As before, the first summation term on the right-hand side of
Eq. (3.80) is equal to the backward prediction error produced by a predic-
tion-error filter of order M, but delayed by one time unit. The second
summation term is simply equal to the forward prediction error produced
by a prediction-error filter of order M. Hence, we may simplify Eq. (3.80) as
follows
byai(1)
= by(n — 1) + Yusitu(”) (3.81)
vu (li)
Stage 1
Figure 3.15 (a) Structure of a single-stage lattice predictor. (b) Structure of a multistage lattice
predictor.
78 INTRODUCTION TO ADAPTIVE FILTERS
The pair of recursive relations in Eqs. (3.77) and (3.81), involving the
forward and backward prediction errors, may be represented as in Fig.
3.15(a).
Note that for the elementary case of M = 0, Eqs. (3.73) and (3.74)
reduce to
An important property of the lattice predictor of Fig. 3.15(b) is the fact that
the backward prediction errors resulting at the various stages of the model
are orthogonal to each other. That is,
P i=k
E|\b(n)b,(n
[b,(7)b,(n)] 5 ke
\o. paeplesa (3.83 )
and
k
b(n) = 2) a,(k — p)u(n
— p) (3.85)
p=0
Dae
m=0 p=0
ete ONE EL alr =a up
Se rarerp= ni
m=0 p=0
(3.86)
where r( p — m) is the autocorrelation function of the predictor input for a
lag of p — m. However, from the augmented normal equations for back-
ward prediction, we have [see Eq. (3.25)]
k
Lak pr(o- male mak G8
Therefore, if i = k, we find that
E[b,(n)b,(n)] =P4;,(0)
=P, > 43788)
If, on the other hand, 7 < k — 1, we find that
E[b,(n)b,(n)] = 0 (3.89)
Hence, the backward prediction error b,(n) at stage i of the equivalent
lattice model and the backward prediction error b,(n) at stage k are
orthogonal for i # k.
The lattice structure of Fig. 3.15(b), in effect, transforms the input time
series u(n), u(n — 1),...,u(n — M) into another time series made up of the
backward prediction errors b(n), b,(n),...,5,,(n), which are orthogonal
to each other. No loss of information whatsoever is incurred in the course of
this transformation. The implications of this important property of the
lattice structure of Fig. 3.15(b) will be discussed in Chapter 6.
In this section we will show that if the correlation matrix of the sequence of
samples applied to the input of a multistage lattice structure is positive
definite, then all the reflection coefficients of this filter have a magnitude less
than one, and vice versa.
80 INTRODUCTION TO ADAPTIVE FILTERS
by(n)
b,(n
b(n) = iC ) (3.91)
by (1)
Define the (M + 1)-by-1 input vector
u(n)
Uti h
u(n) = ( ' (3.92)
u(n — M)
Define the (M + 1)-by-(M + 1) lower triangular transformation matrix
1 0 0 ‘ae AD
a,(1) 1 0 0
S = E|b(n)b7(n)] (3.95)
Since the backward prediction errors are orthogonal to each other, we have
[see Eq. (3.83)]
P, = Pll - ¥2))
Therefore, with P, > 0, it is necessary that |y,| < 1 for P, > 0. Continuing
rite noise
w(n) AR
process
u(n)
Stage M Stage 1
Figure 3.16 Signal-flow graph of multistage lattice-inverse filter for synthesizing an AR process
of order M.
with the result that the time series produced at its output is only asymptoti-
cally stationary.
We will illustrate the operation of the lattice-inverse filter of Fig. 3.16
with an example.
Example 5
Figure 3.17(a) shows a single-stage lattice-inverse filter. There are two
possible paths in this figure that can contribute to the makeup of the sample
u(n) at the output. We may write
which is the same as Eq. (3.58b), with a,(1) = —A,(1), for describing an
AR process of order one.
Consider next the two-stage lattice-inverse filter of Fig. 3.17(b). In this
case there are four possible paths that can contribute to the makeup of the
sample u(n) at the output. Specifically, we may write
White noise
w(n)
(a)
White noise
w(n)
Figure 3.17 Lattice-inverse filters for synthesizing AR processes: (a) process of order one, (b)
process of order two.
a,(2) =
and
a3(1) = y(t ye)
We may therefore rewrite Eq. (3.103) as follows:
u(n) + a,(1)u(n — 1) + a,(n)u(n — 2) = w(n)
which is identical to Eq. (3.67), describing an AR process of order two.
LINEAR PREDICTION 85
3.12 NOTES
It appears that the first use of the term “linear predictor” was made by
Wiener [1] in his classic book on “Extrapolation, Interpolation, and
Smoothing of Stationary Time Series.” The title of the second chapter of
this book reads: “The Linear Predictor for a Single Time Series”.
Early applications of linear prediction to the analysis and synthesis of
speech were made by Itakura and Saito [2,3]. Atal and Schroeder [4], and
Atal [5]. The book by Markel and Gray [6] is devoted to an in-depth
treatment of linear prediction as applied to speech. This book also includes
an extensive list of references on the subject, up to and including 1975. A
detailed tutorial review of the linear prediction problem is given by Makhoul
[7].
The Levinson—Durbin recursion was first derived by Levinson [8] in
1947, and it was independently reformulated by Durbin [9] in 1960—hence
the name.
The idea of a minimum-phase network was originated by Bode [10]. For
a mathematical proof of the minimum-phase property of prediction-error
filters, see Burg [11], Pakula and Kay [12], and Haykin and Kesler [13]. A
filter that is minimum-phase also exhibits a minimum-delay property in the
sense that the energy contained in its unit-sample response is concentrated
as closely as possible at the front end of the response. Equivalently, we may
state that if the sequence a,,(0), a,,(1),..., @,,(M) denotes the unit-sample
response of a minimum-delay filter, then the coefficient a,,(0), located at
the front end of the response, is the largest one in magnitude. For a
discussion of minimum-delay filters, see Robinson and Treitel [14].
The books by Oppenheim and Schafer [15] and Jury [16] give detailed
expositions of z-transform theory; the first of these two books also presents
a detailed treatment of digital filters.
The idea of a whitening filter was proposed by Bode and Shannon [17]
in order to use linear-system concepts to redrive the Wiener filter theory.
The autoregressive modelling of a random process is discussed in detail
by Box and Jenkins [18], Koopmans [19], and Priestley [20]. These books
also discuss other models, namely the moving-average (MA) and autoregres-
sive moving-average (ARMA) models, for describing random processes.
The lattice filter is credited to Itakura and Saito [2], although many
other investigators (including Burg and Robinson) had also used the idea of
a lattice filter in one form or another. For a discussion of the properties of
lattice filters, see Makhoul [21], Griffiths [22], and Haykin and Kesler [13]. A
formulation of the lattice filter to deal with complex-valued data is given in
Haykin and Kesler [13].
The hardware implementation of a digital filter is ordinarily performed
using fixed-point arithmetic. However, in order to utilize the full dynamic
range of the multipliers used in this form of implementation, it is highly
86 INTRODUCTION TO ADAPTIVE FILTERS
Pal = ire
and
b,,(n) aa Bn hee
where f,,(n) and b,,(n) are the normalized forward and backward prediction
errors, respectively, and P,, is the variance of the forward prediction error
f,,(n) or that of the backward prediction error 5,,(n) at the output of stage
m. (Note that for a random variable of zero mean, the mean-square value
and the variance are the same.) We may thus describe the propagation of
signals through stage m in the lattice filter as
we get the following relations for stage m of the normalized lattice filter:
A 1 ges :
b(n) = ———b,,_,(n — 1) + Xf, (n= 1) (3.105)
mae V1 = Ym
where m = 1,,2,...,M, and |y,,| < 1 for all m. Let
Ym s) COS @,,
(AGS)
Fin()
>! m( 1)
Figure 3.18 Signal-flow graph for stage m in the normalized lattice filter.
Az }-—1
H(z)=
(2) A= Ze
where
T+1
eee rca |
and T is normalized with respect to the sample period of the incoming data.
The transfer function H(z) ts stable if and only if T > 1, so that A isa
positive constant.
The tapped-delay-line and lattice filters represent the most widely used
structures for the realization of prediction-error filters. Ahmad and Youn
88 INTRODUCTION TO ADAPTIVE FILTERS
Sins")
fn(”)
Bin 4 (1)
REFERENCES
FOUR
ADAPTIVE TAPPED-DELAY-LINE FILTERS
USING THE GRADIENT APPROACH
90
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 91
Adaptive
algorithm
y(n)=) hk, mr
— kat) (4.1)
By comparing this output with the desired response d(n), we produce the
error signal
e(n) =d(n) — y(n) (4.2)
The function of the control mechanism in Fig. 4.1 is to utilize the error
signal e(n) for generating corrections to be applied to the set of filter
coefficients {h(k,n)}, k = 1,2,..., M, in such a way that we move one step
closer to the optimum Wiener configuration defined by the normal equa-
tions.
With the coefficients of the tapped-delay-line filter assumed to have the
values h(1,n), h(2,n),...,h(M,n) at time n, we find that the correspond-
ing value of the mean squared error is [See Eq. (2.20)]
M M M
e(n) = P,-—2 ¥ hA(k,n)p(kK-1)+ ) DY h(k,n)h(m,n)r(m — k)
k=1 k=1 m=1
(4.3)
In Eq. (4.3) we recognize the following points:
1. The average power P,, is defined by
Pie Ela) (4.4)
The cross-correlation function p(k — 1) for a lag of (A — 1) is defined
by [see Eq. (2.16)]
p(k -1)=£[d(n)u(n—k
+1]; k= 142,..5 M (4.5)
The autocorrelation function r(m — k) for a lag of (m — k) is defined
by [see Eq. (2.18)]
r(m—k)=E[u(n—k + 1)u(n—m+ 1)], mm, koe, Doo,
(4.6)
The quantities P;, p(k — 1), and r(m — k) are the results of ensemble
averaging.
2. The filter coefficients h(1,n),h(2,n),...,4(M,n) are treated as con-
stants during the ensemble-averaging operation.
3. The dependence of the mean squared error e(n) on time n is intended to
show that its value depends on the values assigned to the filter coeffi-
cients at time n.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 93
From Eq. (4.3), we observe that the mean squared error e(7) is a second-order
function of the filter coefficients. Thus, we may visualize the dependence of
e(n) on the filter coefficients as a bowl-shaped surface with a unique
minimum. We refer to this surface as the error performance surface of the
adaptive filter. When the filter operates in a stationary environment, the
error performance surface has a constant shape as well as a constant
orientation. The adaptive process has the task of continually seeking the
bottom or minimum point of this surface, where the filter coefficients assume
their optimum values.
Equation (4.3) defines the value of the mean squared error at time n when
the filter coefficients have the values h(1,7n),h(2,n),...,4(M,n). We as-
sume that the point so defined on the multidimensional error performance
surface is some distance away from the minimum point of the surface. We
would like to develop a recursive procedure whereby appropriate correc-
tions are applied to these filter coefficients in such a way that we continually
move closer to the minimum point of the error performance surface after
each iteration. If such a procedure were available to us, then starting from
an arbitrary point on the error performance surface we can move in a
step-by-step fashion toward the minimum point and thereby ultimately
realize the optimum Wiener configuration. The answer to this problem is
provided by an old optimization technique known as the method of steepest
descent.
According to the method of steepest descent we proceed as follows:
1. We begin with a set of initial values for the filter coefficients, which
provides an initial guess as to where the minimum point of the error
performance surface may be located.
2. Using this initial or present guess, we compute the gradient vector, whose
individual elements equal the first derivatives of the mean squared error
e(n) with respect to the filter coefficients.
3. We compute the next guess at the filter coefficients by making a change
in the initial or present guess in a direction opposite to that of the
gradient vector.
4. We go back to step (2) and repeat the procedure.
Let V(n) denote the M-by-1 gradient vector at time n, where M equals
the number of filter coefficients. The kth element of V(7), by definition,
equals the first derivative of the mean squared error e(7) with respect to the
filter coefficient h(k,n). Hence, differentiating both sides of Eq. (4.3) with
respect to h(k,n), we get
d(n) = ) h(m,n)u(n
— m+ 1) + e(n) (4.8)
m=1
M
=E > hA(m,n)u(n — k + 1)u(n — m+ |
m=1
+E[e(n)u(n-—k +1)]
M
= ¥ h(m,n)r(m—k) + E[e(n)u(n—k+1)] (4.10)
m=1
where we have made use of Eq. (4.6). Accordingly, substituting Eq. (4.10) in
(4.7) and simplifying, we get the desired expression
de(n)
= —2E[e(n)u(n—k
+ 1)], Ki Osae M (4.11)
dh(k,n)
Equation (4.11) states that, except for the scaling factor 2, the kth element
of the gradient vector V(n) is the negative of the cross-correlation between
the error signal e(n) and the signal u(n — k + 1) at the kth tap input.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 95
At the minimum point of the error performance surface, all the M elements
of the gradient vector V(n) are simultaneously zero. Accordingly, for
minimum mean squared error, the cross-correlation between the error signal
and each tap input of the filter is zero. This merely restates the principle of
orthogonality discussed in Chapter 2.
We may rewrite Eq. (4.11) in matrix form as follows:
de(n)/dh(1,n)
SHINE I
Lainie ears)
~2E[e(n)u(n)]
—2E[e( Bes 1)|
—2E[e(n)u een
Taking out the common factor —2, the expectation operator, and the error
signal e(n), we may thus write
u(n)
u(n) = oot . (4.13)
u(n — M + 1)
Equation 4.12 states that, except for the scaling factor 2, the gradient vector
V(n) is the negative of the cross-correlation vector between the error signal
e(n) and the tap-input vector u(7).
We are now ready to formulate the steepest-descent algorithm for
updating the filter coefficients. Define the M-by-1 coefficient vector of the
filter at time n as
h(i,n)
h(n) = ea (4.14)
WMP)
With these rules in mind, let us represent Eqs. (4.16) and (4.17) in the
form of the multidimensional signal-flow graph. For this representation, we
first eliminate the scalar-valued error signal e(n) by substituting Eq. (4.17)
in (4.16) and so write
h(n + 1) = h(n) + wE[u(n)(d(n) - u’(n)h(n))|
v(n + 1) = Qle(n + 1)
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 99
Also, using the property of the unitary matrix, namely, the fact that
QQ’ = I, we may write
Q’Re(n) = Q’Rie(7n)
= QTRQQ'e(7)
= Av(n)
Accordingly, we may rewrite Eq. (4.25) in the form
v(n + 1) = (I— pA)v(n) (4.27)
This is the desired recursion for analyzing the stability of the steepest-
descent algorithm.
Putting n = 0 in Eq. (4.26), and using the definition of Eq. (4.21), we
find that the initial value of the transformed coefficient-error vector equals
Figure 4.3 Signal-flow graph representation of the kth natural mode of the steepest-descent
algorithm.
100 INTRODUCTION TO ADAPTIVE FILTERS
are uncoupled from each other, we have a new representation for the
steepest-descent algorithm that is much simpler than the multidimensional
signal-flow graph of Fig. 4.2. This simplification is the result of the unitary
similarity transformation applied to the correlation matrix R and the
corresponding transformation applied to the coefficient vector of the filter.
The solution of the homogeneous difference equation (4.30) is simply
v,(n)
Figure 4.4 Illustrating the transient behavior of the kth natural mode of the steepest-descent
algorithm.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 101
and only if
2
(Rea x (4.34)
max
Hee exp|: =|
TK
(4.35)
Hence, from Eqs. (4.32) and (4.35), we find that the kth time constant can
be expressed in terms of the step-size parameter p and the k th eigenvalue as
follows:
=|
Le snickers (4.36)
The time constant 7, defines the time required for the amplitude of the Ath
natural mode v,(n) to decay to 1/e of its initial value v,(0), where e is the
base of the natural logarithm.
For the special case of slow adaptation, for which the step-size parame-
ter 2 1s small, we may use the following approximation for the logarithm in
the denominator of Eq. (4.36):
In{— fA) = pA,, <i
Correspondingly, we may approximate the time constant 7, of Eq. (4.36) as
Ty ats Te | (4.37)
pay?
Using Eq. (4.31), we may now formulate the solution for the original
coefficient vector h(n). We premultiply both sides of Eq. (4.26) by Q,
obtaining
Qv(n) QQ’e(n)
I
c(n)
where we have used the relation QQ’ =I. Next, using Eq. (4.21) to
eliminate c(n), and solving for h(), we obtain
h(n) = h, + Qv(n) (4.38)
The expression for the coefficient vector h(”) may also be expressed in the
form
M
h(n) short wy v,(n)qy (4.39)
k=
102 INTRODUCTION TO ADAPTIVE FILTERS
(4.40)
where /y(i) is the optimum value of the th filter coefficient, and q,, is the
ith element of the eigenvector q,.
Equation (4.40) shows that each coefficient of the filter in the steepest-
descent algorithm converges as the weighted sum of exponentials of the
form (1 — pA,)". The time 7, required for each term to reach 1/e of its
initial value is given by Eq. (4.36). However, the overall time constant, T,,,
defined as the time required for the summation term in Eq. (4.40) to decay
to 1/e of its initial value, cannot be expressed in a simple closed form. We
may, however, bound 1, as follows. The slowest rate of convergence is
attained when q,,v,(0) is zero for all A except for the one corresponding to
the minimum eigenvalue X,,;,. Then the upper bound on 7, is defined by
—1/In(Q — pA in). The fastest rate of convergence is attained when all the
4,;U,(0) are zero except for the one corresponding to the maximum eigen-
value A,,,,- Then the /ower bound on 7, is defined by —1/In(1 — pA jax):
Accordingly, the overall time constant 7, for any coefficient of the tapped-
delay-line filter is bounded as follows:
pes
ee,val DR lat
eeMl (4.41)
InQ — pA max ) 2 intl = BA min )
This shows that when the eigenvalues of the correlation matrix R are widely
spread, the settling time of the steepest-descent algorithm is limited by the
smallest eigenvalues or the slowest modes.
The curve obtained by plotting the mean squared error e(n) versus the
number of iterations, n, is called a learning curve. From Eq. (4.43) we see
that the learning curve of the steepest-descent algorithm consists of the sum
of exponentials, each of which corresponds to a natural mode of the
algorithm. The number of natural modes, in general, equals the number of
filter coefficients, M. In going from the initial value «(0) to the final value
Emin» the exponential decay for the Ath mode has a time constant to equal to
au.
Tk mse Se
21n(1 a ur,) (4.44 )
Example 1
Consider a tap-input vector u(m) that consists of two uncorrelated
samples. This assumption is satisfied when the sample period is equal to or
greater than the decorrelation time of the input process. The decorrelation
time is defined as the lag for which the autocorrelation function of the
process decreases to a small function (e.g., one percent) of the mean-square
value of the process. The vector u(7) is thus assumed to have a mean of zero
and a correlation matrix
where oa? is the variance of each sample. In this case, the two eigenvalues of
R are equal:
A, =A, =07
At time n, the filter is characterized by two transformed coefficient
errors v,(n) and v,(n). For a constant value of the mean squared error
104 INTRODUCTION TO ADAPTIVE FILTERS
e(n), the locus of possible values of v,(”) and v,(n) consists of a circle with
2
center at the origin and a radius equal to the square root of [e(”) — €nin|/o°-
Figure 4.5 shows a set of such concentric circular loci, corresponding to
different values of e(n). This figure also includes the trajectory obtained by
joining the points represented by the values of the transformed coefficient-
error vectors: v(0), v(1), v(2),...,v(oo), where v(0) is the initial value, and
v(1), v(2),...,v(0o) are the values resulting from the application of the
steepest-descent algorithm. The geometry shown in Fig. 4.5 assumes the
following values:
v,(0) = 2
v,(0) = —4
We see that the trajectory, irrespective of the value of mu, consists of a
straight line that is normal to the loci for constant values of e(). This
trajectory represents the shortest possible path between the points v(0) and
v(co). We thus see that when the eigenvalues of the correlation matrix R are
equal, the steepest-descent algorithm attains the fastest rate of convergence
possible.
Figure 4.6 shows the learning curve of the steepest-descent algorithm
obtained by using Eq. (4.43) to plot [e(7) — e,,,,]/o° versus the number of
iterations n for the different values of the step-size parameter. In this case
we see that the learning curve consists of a single exponential; the rate at
which it decays decreases with increasing w.
e(1)
— €,;,
min
= IPo
2) Emin 4Po
&(S) cea ig Po
U\(n)
Figure 4.5 The trajectory of the transformed coefficient-error vectors v(0), v(1),...,
¥(00) ob-
tained by the steepest-descent algorithm for the special case of two equal eigenvalues.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 105
Example 2
Consider next the case of a zero-mean tap-input vector u(n) that
consists of two correlated samples. The correlation matrix of this vector is
assumed to equal
Sob aun
r= |) a
2p pas 2
nA
Figure 4.6 The learning curve of the steepest-descent algorithm for the case of two equal
eigenvalues.
106 INTRODUCTION TO ADAPTIVE FILTERS
That is,
A, I= (1+ p)o*
and
A, =(1—p)o?
Hence, the eigenvalue spread equals
Nits? tahini atone ep
X mi
min
hy le
This shows that as the adjacent samples of the filter input become highly
correlated, the correlation coefficient p approaches unity, with the result the
eigenvalue spread increases.
For p = 3, the two eigenvalues have the following values:
In this case we find that for a constant value of the mean squared error
e(n) the locus of possible values of v,(n) and v,(n) consists of an ellipse
Emin oP,
e(1)
a 0
e(3) Emin
v,(n)
—4]
[v,(0) = 2, v,(0) =
te
ormed coefficient-error vectors HON Heca
Figure 4.7 The trajectory of the transf
tained by the steepest-descent algorithm for two unequal eigenvalues.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 107
with a minor axis equal to the square root of [e(”7) — €yi,]/A, and a major
axis equal to the square root of [e(n) — Enin|/A>. Figure 4.7 shows a set of
such ellipsoidal loci corresponding to different values of e(n) and the
following values:
v,(0)= 2
v,(0) = -4
0.5
ae EDD
no
This figure also includes the trajectory obtained by joining the points
represented by v(0), v(1), v(2),..., (oo). Here again the trajectory is normal
to the loci for different values of e(7). However, we now find that the
trajectory is curved, leaning towards the v,(n)-axis. Accordingly, the rate of
convergence of the algorithm is slower than that of Example 1. Also, it 1s
dominated by the eigenvalue A,, the smaller one of the two.
Figure 4.8 shows the learning curve of the steepest descent algorithm
obtained by using Eq. (4.43) to plot [e() — €,;,]/o° versus the number of
iterations n for two different values of the step-size parameter yu. In this case
we see that, since the correlation matrix has two unequal eigenvalues, the
learning curve consists of the sum of two exponentials with different time
20
15
i) —
Figure 4.8 The learning curve of the steepest-descent algorithm for the case of two unequal
eigenvalues.
108 INTRODUCTION TO ADAPTIVE FILTERS
constants. Here again, the overall rate at which the learning curve decays
Equation (4.47) states that the updated estimate of the coefficient vector is
obtained by incrementing the old estimate of the coefficient vector by an
amount proportional to the product of the input vector and the error signal.
This equation constitutes the adaptive process.
The error signal itself is defined by Eq. (4.17), reproduced here for
convenience:
e(n) = d(n) — u'(n)h(n) (4.48)
This equation constitutes the filtering process.
Equations (4.47) and (4.48) completely describe the LMS algorithm.
Figure 4.9 shows a multidimensional signal-flow graph representation of the
LMS algorithm, based on these two equations. In this figure we have also
included a branch describing the fact that h(m) may be obtained by
applying the matrix operator z~'I to h(n + 1).
As with the steepest-descent algorithm, we initiate the LMS algorithm
by using an arbitrary value h(0) for the coefficient vector at time n = 0.
Here again, it is customary to set all the coefficients of the filter equal
initially to zero, so that h(0) equals the null vector.
With the initial conditions so determined, we then proceed as follows:
1. Given the following values at time n, the old estimate h(7) of the
cu ~*cient vector of the filter, the tap-input vector u(n), and the desired
response d(n), compute the error signal
e(n) = d(n) — u'(n)h(n)
where u/(n) is the transpose of u(7).
2. Compute the updated estimate h(n + 1) of the coefficient vector of the
filter by using the recursion
h(n + 1) = h(n) + pe(n)u(n)
where p is the step-size parameter.
h(n+ 1) h(n)
3. Increment the time index n by one, go back to step 1, and repeat the
procedure until a steady state is reached.
At first sight it may seem that, because the instantaneous estimate V(n) for
the gradient vector has a large variance, the LMS algorithm is incapable of
good performance. However, we have to remember that the LMS algorithm
is recursive, effectively averaging out this coarse estimate during the course
of the adaptive process.
Although the initial value h(0) of the coefficient vector is usually a known
constant, the application of the LMS algorithm results in the propagation of
randomness into the filter coefficients. Accordingly, we have to treat the
coefficient vector h(n) as nonstationary. To simplify the statistical analysis
of the LMS algorithm it is customary to assume that the time between
successive iterations of the algorithm is sufficiently long for the following
two conditions to hold:
1. Each sample vector u(n) of the input signal is assumed to be uncorre-
lated with all previous sample vectors u(k) for k = 0,1,..., liven de
2. Each sample vector u(”) of the input signal is uncorrelated with all
previous samples of the desired response d(k) for k = 0,1,..., je
Then from Eqs. (4.47) and (4.48), we observe that the coefficient vector
h(n + 1) at time n + 1 depends only on three inputs:
1. The previous sample vectors of the input signal, namely, u(7),
Wie) ca KO),
2. The previous samples of the desired response, namely, d(n),
a(n zel),n dO)
3. The initial value h(0) of the coefficient vector.
Accordingly, in view of the assumptions made above, we find that the
coefficient vector h(n + 1) is independent of both u(m + 1) and d(n + 1).
There are many practical problems for which the tap-input vector and
the desired response do not satisfy the above assumptions. Nevertheless,
experience with the LMS algorithm has shown that sufficient information
about the structure of the adaptive process is retained for the results of the
analysis based on these assumptions to serve as reliable design guidelines
even for some problems having dependent data samples.
To proceed with the analysis, we eliminate the error signal e(n) by
substituting Eq. (4.47) in (4.48), and so write
where I is the identity matrix. Next, using Eq. (4.21) to eliminate h(n) from
the right-hand side of Eq. (4.49), we get
p = Elu(n)d(n)]
and
R = E[u(n)u’(n)]
However, from the matrix form of the normal equations, we have
Rh, =p
Therefore, the second term on the right-hand side of Eq. (4.51) is zero, and
so we may simplify this equation as follows:
E[e(n + 1)] = (1 - pR)E[c(n)] (4.52)
Comparing Eq. (4.52) with (4.23), we see that they are of exactly the
same mathematical form. That is, the average coefficient-error vector E[e(n)]
in the LMS algorithm has the same mathematical role as the coefficient-
error vector ¢e(n) in the steepest-descent algorithm. From our study of the
steepest-descent algorithm in Section 4.4, we recall that it converges pro-
vided that Eq. (4.34) is satisfied. Correspondingly, the LMS algorithm
112 INTRODUCTION TO ADAPTIVE FILTERS
converges in the mean, that is, the average coefficient-error vector E[e(n)]
approaches zero as n approaches infinity, provided that the step-size param-
eter fu satisfies the condition
Or (4.53)
r max
where X.,,,, is the largest eigenvalue of the correlation matrix R. Thus when
this condition is satisfied, the average value of the coefficient vector h(n)
approaches the optimum Wiener solution hy as the number of iterations, n,
approaches infinity.
Also, as with the steepest-descent algorithm, we find that when the
eigenvalues of the correlation matrix R are widely spread, the time taken by
the average coefficient vector E[h()] to converge to the optimum value hy
is primarily limited by the smallest eigenvalues.
Ideally, the minimum mean squared error e,,;, 18 realized when the coeffi-
cient vector h(n) of the tapped-delay line filter approaches the optimum
value hy, defined by the matrix form of the normal equations. Indeed, as
shown in Section 4.5, the steepest-descent algorithm does realize this ideal-
ized condition as the number of iterations, n, approaches infinity. The
steepest-descent algorithm has the capability to do this, because it uses
exact measurements of the gradient vector at each iteration of the algo-
rithm. On the other hand, the LMS algorithm relies on a noisy estimate for
the gradient vector, with the result that the coefficient vector h(n) of the
filter only approaches the optimum value h, after a large number of
iterations and then executes small fluctuations about h,. Consequently, use
of the LMS algorithm, after a large number of iterations, results in a mean
squared error e(oo) that is greater than the minimum mean squared error
Emin. Phe amount by which the actual value of e(00) is greater than €, .,min is
called the excess mean squared error.
There is another basic difference between the steepest-descent algorithm
and the LMS algorithm. In Section 4.5 we showed that the steepest-descent
algorithm has a well-defined learning curve, obtained by plotting the mean
squared error versus the number of iterations. For this algorithm the
learning curve consists of the sum of decaying exponentials, the number of
which equals (in general) the number of tap coefficients. On the other hand,
in individual applications of the LMS algorithm we find that the learning
curve consists of noisy, decaying exponentials, as illustrated in Fig. 4.10(a).
The amplitude of the noise usually become: *maller as the step-size parame-
ter p is reduced.
Mean
squared
error
min
Number of iterations
Average
mean
squared
error
Average excess
mean squared
error
0
Number of iterations
Figure 4.10 (a) Individual learning curve. (b) Ensemble-averaged learning curve.
113
114 INTRODUCTION TO ADAPTIVE FILTERS
tr[R] = > A,
k=1
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 115
By definition, the trace of a square matrix equals the sum of its diagonal
elements:
tr[R] I E[u?(n)] + E[u?(n—1)| +--- +E[w(n-M+ 1)|
= total input power
Accordingly, we have
M
total input power = )° A, (4.55)
k=1
We may therefore restate the stability condition of Eq. (4.54) as follows:
2
On (4.56)
total input power
On the other hand, the necessary and sufficient condition for the LMS
algorithm to be convergent in the mean, that is, for E[h(n)] to be
convergent, is (see Section 4.7)
J
Ot <——
Nik
ewe S ye rx (4.57)
k=1
E _— :
= Be) Emin (4.58)
Eliminating pA, between Eqs. (4.61) and (4.62), we thus get the follow-
ing formula for the misadjustment of the LMS algorithm in terms of the
number of filter coefficients and the average time constant of the adap-
tive process:
M = — (4.63)
4( Tse ig
This formula shows that: (1) the misadjustment increases linearly with
the number of tap coefficients, and (2) the misadjustment may be made
arbitrarily small by using a long adaptive time constant which is in turn
realized by using a small step-size parameter wu.
Misadjustment
due to coefficient-
vector noise
Misadjustment
Total ES
misadjustment
Misadjustment due to
coefficient-vector lag
Figure 4.11 Illustrating the choice of optimum step-size parameter for operation of the LMS
algorithm in a nonstationary environment.
118 INTRODUCTION TO ADAPTIVE FILTERS
4.10 NOTES
Theory
The method of steepest descent is an old optimization technique. For a
discussion of the method, see Murray [1].
The J/east-mean-square (LMS) algorithm is also referred to in the
literature as the stochastic gradient algorithm. It was originally developed by
Widrow and Hoff [2] in 1960 in the study of adaptive switching circuits. In
[3,4], Widrow presents a detailed analysis of the steepest-descent algorithm
and its heuristic relationship to the LMS algorithm. Sharpe and Nolte [5]
present another approach for deriving the LMS algorithm; they start with
the solution to the normal equations in matrix form, that is,
hy=R 'p
and they use a finite summation to approximate the inverse of the correla-
tion matrix.
The LMS algorithm, as described in Eqs. (4.47) and (4.48), is intended
for use with real-valued data. Widrow et al. [6] present the complex LMS
algorithm for dealing with complex-valued data. It has the following form:
h(n + 1) = h(n) + pu*(n)e(n)
where
e(n) = d(n) — u'(n)h(n)
and the asterisk denotes complex conjugation.
A detailed mathematical analysis of the convergence behavior of the
LMS algorithm in a stationary environment is presented by Ungerboeck [7]
and Widrow et al. [8]. In both of these papers it is assumed that (1) the time
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 119
LMS algorithm differs from the conventional form of the LMS algorithm in
that the step-size parameter p is replaced by a /\\u(n)||?. Given that u(r) has
u(n),u(n — 1),...,u(n — M + 1) for its elements, we may express the
squared norm of the vector u(7) as
Ju(n)|) =w7(n)u(n)
l| u2(n) +u?(n—1)+--: +u?(n—- M +1)
The normalization with respect to |ju(7)||* is used for mathematical conveni-
ence. Also, some implementations of adaptive filters do actually use this
normalization, as will be mentioned later. Weiss and Mitra [18, 19], derive a
variety of theoretical results for the normalized LMS algorithm. These
results pertain to the conditions for convergence, rates of convergence, and
the effects of errors due to digital implementation of the algorithm. Hsia [20]
presents a unified treatment of the convergence for both the normalized
LMS algorithm and the conventional form of the LMS algorithm. If we
assume that the random tap-input vectors, u(”), u(m — 1),..., u(n — M+
1), are statistically independent, and if the elements of u(7), denoted by u(/),
are independent and identically distributed (iid) with
L#J
and
E[u(i)] =0
then the necessary and sufficient condition for the normalized LMS algo-
rithm to be convergent in mean square 1s (Hsia [20])
Usa
Hsia also shows that, under this set of conditions, the normalized LMS
algorithm converges faster than the conventional form of the LMS algo-
rithm, a fact that has been noticed by many investigators in computer
simulations, but never theoretically proven.
From Eq. (4.65) it is apparent that the normalized LMS algorithm alters
the magnitude of the correction term without change in its direction.
Accordingly, it bypasses the problem of noise amplification that is experi-
enced in the LMS algorithm when u(7) is large. However, in so doing it
introduces a problem of its own, which is experienced for small u(n). This
problem may be overcome by using the alternate form of the normalized
LMS algorithm (Bitmead and Anderson [17]):
h(n + 1) =h(n) + -
ESTE) aaa (467)
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 121
where f is another positive constant, and the error signal e(7) is defined in
the same way as before. Bitmead and Anderson present an analysis of the
convergence properties of this latter form of the normalized LMS algorithm
alongside the conventional form of the LMS algorithm. Note that by
putting 6 equal to zero in Eq. (4.67), we get the first form of the normalized
LMS algorithm.
Implementations
The methods of implementing adaptive filters may be divided into two
broad categories: analog and digital.
The analog approach is primarily based on the use of charge-coupled
device (CCD) technology or switched-capacitor technology. The basic circuit
realization of the CCD is a row of field-effect transistors with drains and
sources connected in series, and the drains capacitively coupled to the gates.
The set of adjustable coefficients are stored in digital memory locations, and
the multiplications of the analog sample values by the digital coefficients
take place in analog fashion. This approach has significant potential in
applications where the sampling rate of the incoming data is too high for
digital implementation. Mavor et al. [21] review the operational features and
performance of a fully integrated programmable tapped-delay-line filter
using monolithic CCD technology. White and Mack [22] describe a 16-
coefficient adaptive filter, and Cowan and Mavor [23] describe a 256-coeffi-
cient adaptive filter, both based on the monolithic CCD technology. Also,
both implementations were based on the clipped LMS algorithm (Moschner
[24)]).
In the clipped version of the LMS algorithm, the tap-input vector u(7)
in the correction term of the update recursion for the coefficient vector is
replaced by sgn[u(7)}:
where, as before, is the step-size parameter and e(n) is the error signal.
To explain the meaning of the clipped tap-input vector sgn{u(n)}, let u(i)
denote the ith element of the vector u(r). The ith element of sgn[u(7)] is
written mathematically as
ae +1 if u(i)>O0
eo Sante at 81) 0
This clipping action is clearly a nonlinear operation. The purpose of the
clipping is to simplify the implementation of the LMS algorithm without
seriously affecting its performance. In the clipped LMS algorithm, the filter
output remains equal to u/(n)h(n), so that the error signal e() is computed
122 INTRODUCTION TO ADAPTIVE FILTERS
pecon =e
| Oe. chp
Moschner also shows that if, for the clipped LMS algorithm, we have (using
our notation)
2 P2
The set of adjustable coefficients are also stored in shift registers. Logic
circuits are used to perform the required digital arithmetic (e.g., multiply
and accumulate). In this approach, the circuitry may be hard-wired for the
sole purpose of performing adaptive filtering. Alternatively, it may be
implemented in programmable form on a microprocessor. The use of a
microprocessor also offers the possibility of integrating the adaptive filter
with other signal-processing operations, which can be attractive in some
applications. Soderstrand and Vigil [26] and Soderstrand et al. [27] discuss
the microprocessor implementation of an adaptive filter using the LMS
algorithm. Jenkins [28] discusses the use of a residue-number architecture
for implementing a microprocessor-based LMS algorithm. The use of this
architecture provides a structure for general stored-table multiplication,
distributed processing by means of multiple microprocessors, and a poten-
tial fault-tolerant capability. Lawrence and Tewksbury [29] discuss the
multiprocessor architectures and implementations of adaptive filters using
the LMS algorithm and other algorithms. A multiprocessor refers to an
array of interconnected processors where the emphasis is on memory speed
and size.
Clark .* al. [30] discuss the use of block processing techniques for
implementing adaptive digital filters. By considering a performance criterion
based on the minimization of the block mean squared error (BMSE), a
gradient estimate is derived as the correlation (over a block of data) between
the error signals and the input signal. This gradient estimate leads to a
coefficient-adaptation algorithm that allows for block implementation with
either parallel processors or serial processors. Thus a block adaptive filter
adjusts the coefficient vector once per block of data. The conventional form
of the LMS algorithm may be viewed as a special case of the block adaptive
filter with a block length of one. The analysis of convergence properties and
computational complexity of the block adaptive filter, presented by Clark et
al. [30], shows that this filter permits fast implementation while maintaining
a performance equivalent to that of the conventional LMS algorithm.
A basic issue encountered in the digital implementation of an adaptive
filter is that of roundoff errors due to the use of finite-precision arithmetic.
Caraiscos and Liu [31] present a mathematical roundoff-error analysis of the
conventional form of the LMS algorithm, supported by computer simula-
tion. In such an implementation the steady-state output error consists of
three terms: (1) the error due to quantization of the input data, (2) the error
due to truncating the arithmetic operations in calculating the filter’s output;
and (3) the error due to the deviation of the filter’s coefficients from the
values they assume when infinite-precision arithmetic is used. Caraiscos and
Liu discuss these effects for both fixed-point arithmetic and floating-point
arithmetic. They report that the quantization error of the filter coefficients
results in an output quantization error whose mean-square value is, ap-
proximately, inversely proportional to the step-size parameter yw. In particu-
124 INTRODUCTION TO ADAPTIVE FILTERS
lar, the use of a small p for the purpose of reducing the excess mean
squared error may result in a considerable quantization error. Ther excess
mean squared error is found to be larger than the quantization error, as long
as the value chosen for p, allows the algorithm to converge completely. It is
suggested that one way of combatting the quantization error is to use more
bits for the filter coefficients than for the input data.
Applications
Gersho [32] describes the application of the LMS algorithm to the adaptive
equalization of a highly dispersive communication channel (e.g., a voice-grade
telephone channel) for data transmission. Qureshi [33] presents a tutorial
review of adaptive equalization with emphasis on the LMS algorithm.
Nowadays, state-of-the-art adaptive equalizers for data transmission over a
telephone channel are digitally implemented. A major issue in the design of
an adaptive digital equalizer is the determination of the minimum number
of bits required to represent the adjustable equalizer coefficients, as well as
all the internal signal levels of the equalizer. Gitlin et al. [34] consider the
effect of digital implementation of an adaptive equalizer using the LMS
algorithm. They show that a digitally implemented LMS algorithm stops
adapting whenever the correction term in the update recursion for any
coefficient of the equalizer is smaller in magnitude than the least significant
digit to within which the coefficient has been quantized. In a subsequent
paper, Gitlin and Weinstein [35] develop a criterion for determining the
number of bits required to represent the coefficients of an adaptive digital
equalizer so that the mean squared error at the equalizer output is at an
acceptable level.
Widrow et al. [36] discuss the application of the LMS algorithm to the
adaptive line enhancer (ALE), a device that may be used to detect and track
narrow-band signals in wide-band noise. Zeidler et al. [37] evaluate the
steady-state behavior of the ALE for a stationary input consisting of
multiple sinusoids in additive white noise. Rickard and Zeidler [38] analyze
the second-order statistics of the ALE output in steady-state operation, for a
stationary input consisting of weak narrow-band signals in additive white
Gaussian noise. Treichler [39] uses an eigenvalue—eigenvector analysis of the
expected ALE impulse response to describe both the transient and conver-
gence behavior of the ALE. Nehorai and Malah [40] derive an improved
estimate of the misadjustment and a tight stability constraint for the ALE.
Dentino et al. [41] evaluate the performance of an ALE-augmented square-
law detector for a stationary input consisting of a narrow-band signal in
additive white Gaussian noise, and compare it with a conventional square-
law detector. All these papers on the ALE use the conventional form of the
LMS algorithm for adaptation.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 125
The ALE using the LMS algorithm may develop some undesirable
long-term characteristics. Ahmed et al. [42] have examined this problem
both experimentally and theoretically. The /ong-term instability problem may
be summarized as follows. The adaptive predictor in the ALE first adapts to
the high-level components contained in the input, decorrelating the input as
much as possible within its limited capability, as determined by the number
of adjustable coefficients used. Then it adapts to low-level components at
other frequencies. Thus the ALE evolves until its amplitude response is near
unity for all the frequency components contained in the input, regardless of
their amplitude. The result is that, after continuous operation for a long
period of time, the ALE takes on an “all-pass” mode of operation, giving
the overall adaptive predictor the appearance of a “no-pass filter.”” Ahmed
et al. propose a possible cure for the long-term instability problem by
modifying the LMS algorithm.
Sondhi [43] describes an adaptive echo canceller that synthesizes a
replica of the echo by means of an adaptive tapped-delay-line filter, and
then subtracts the replica from the return signal. The filter is designed to
adapt to tk~ transmission characteristic of the echo path and thereby track
variations of the path that may occur during the course of a conversation.
Campanella et al. [44] and Duttweiler [45] describe digital implementations
of an adaptive echo canceller in which the normalized LMS algorithm is
used to adapt the tapped-delay-line filter. Duttweiler and Chen [46] describe
a single-chip VLSI (very large-scale integration) adaptive echo canceller
with 128-tap delay line. Gitlin et al. [47] have proposed and analyzed a
combined echo canceller and phase tracker, which uses the LMS algorithm
to adaptively compensate for the time variation in the channel caused by
carrier phase changes.
Gibson et al. [48], Cohn and Melsa [49], and Gibson [50] describe a
method of speech digitization by means of a residual encoder. This device is
a form of differential pulse-code modulation (DPCM), which uses both an
adaptive quantizer and an adaptive predictor. They used the normalized
LMS algorithm of Eqs. (4.65) and (4.66) for the design of the adaptive
predictor. For a detailed review paper on the subject, see Gibson [51].
Griffiths [52] and Keeler and Griffiths [53] use the LMS algorithm to
develop an adaptive autoregressive model for a nonstationary process, which
is exploited for frequency estimation.
REFERENCES
FIVE
ADAPTIVE TAPPED-DELAY-LINE FILTERS
USING LEAST SQUARES
Suppose that we have two sets of data, namely, an input signal represented
by the samples u(1), u(2),...,u(), and a desired response represented by
the samples d(1), d(2),...,d(n). The input signal {u(i)} is applied to a
tapped-delay-line filter whose impulse response is denoted by the sequence
129
130 INTRODUCTION TO ADAPTIVE FILTERS
h(1,n), h(2,n),...,4(M,n). Note that the filter length M must be less than
or equal to the data length n. Note also that the filter coefficients are
assumed constant for i = 1,2,...,n. Let y(i) denote the resulting filter
output, and use the difference between the desired response d(i) and this
output to define an error signal or residue, as in Fig. 5.1:
e(i) = d(i) — y(4), Hal Zeki (57h)
The requirement is to design the filter in such a way that it minimizes the
residual sum of squares, defined by
n
Jn )imepea?) (5.2)
i=1
The filter output y(/) is given by the convolution sum
M
y(i)= YS A(k,n)u(i-k +1), T= 1, 25..5,8 (5.3)
k=1
Using Eqs. (5.1) and (5.3), we may therefore express the residual sum of
squares J(n) as follows
eee as (5.10)
h(1,n)
h(n) = oe (5.11)
Te®
6(n; 0)
6(n;1)
@(n) = (5.13)
6(n; M — 1)
Accordingly, we may rewrite the normal equations (5.10) in the following
matrix form:
®(n)h(n) = 0(n) (5.14)
Assuming that ®(7) is nonsingular, we may solve Eq. (5.14) for h() and so
obtain
h(n) = ®-'(n)0(n) (S.15)
where ® '(n) is the inverse of the deterministic correlation matrix.
The least squares estimate h(n) for the vector of tap coefficients has a strong
intuitive appeal that is reinforced by a number of important properties, as
discussed below:
(5.18)
and
ortnooeer ye (5.21)
where R is the M-by-M ensemble-averaged correlation matrix of the tap
inputs, and p is the M-by-1 ensemble-averaged cross-correlation vector
between the desired response and the tap inputs.
Hence, under these conditions we find that the least-squares estimate
h(n) approaches the optimum Wiener value hy as n approaches infinity, as
shown by
lim h(n) = lim ®-!(n)@(n)
n> © n> ©
= Ho (5.22)
ej =d- yy (5.23)
ADAPTIVE TAPPED-DELA Y-LINE FILTERS USING LEAST SQUARES 135
éo(1)
ore oe (5.24)
y(n)
and d is the n-by-1 desired response vector:
d(1)
d(2
d = ) (S25)
d(n)
The optimum value of the n-by-1 output vector yp is itself defined by
yo (1)
— yo (2)
Dee
yo(”)
= Uh, (5.26)
where U is the n-by-M data matrix (which is Toeplitz):
u(1) 0 sets 0
u(2) u(1) vee 0
ho(1)
em ee
Hae
(5.28)
ho(M)
Note that, in order to simplify the notation, we have omitted the
dependence on the data length n in the matrix definitions introduced
above. We will continue to follow this practice in the rest of the section,
since this dependence is not critical to the discussion. Thus, using Eqs.
136 INTRODUCTION TO ADAPTIVE FILTERS
where, in the last line, we have made use of Eq. (5.30). We have thus proved
that, if the error vector e) has zero mean and its elements are uncorrelated,
the covariance matrix of the least-squares estimate h, or equivalently the
correlation matrix of the coefficient-error vector h — ho, equals the inverse
of the deterministic correlation matrix of the tap inputs, except for the
scaling factor €,:,-
ee
ll cexp|~ 5)
abel | (5.39)
Stee Yd (5.40)
(2102)"/”*
The summation in the exponent of Eq. (5.39) equals the residual sum of
squares, J(n). By expressing J(n) as a function of the tap coefficients
h(k),k =1,...,M, as in Eq. (5.8), we may view the joint probability
density function f(e,)) as a likelihood function. The maximum-likelihood
estimate of the coefficient vector is defined as that value of h for which this
likelihood function attains its maximum value. In effect, this maximum-like-
lihood estimate represents the most plausible value for the coefficient vector,
given the observations u(1), u(2),...,u(). It is clear that the value of the
coefficient vector that maximizes the likelihood function is precisely the
value of the coefficient vector that minimizes the residual sum of squares.
We conclude therefore that when the elements of the zero-mean optimum
error vector e, are statistically independent and Gaussian-distributed, the
least-squares estimate and the maximum-likelihood estimate of the coeffi-
cient vector assume the same value.
= Ly{> <nr=k 44
: i m#k (5 )
The effect of this modification 1s to add the small positive constant c to each
element on the main diagonal of the deterministic correlation matrix ®(n)
and thereby ensure its positive definiteness. By so doing, we will have
prepared the way for the application of the matrix inversion lemma.
By separating the product u(m — m)u(n — k), corresponding to i = n,
from the summation term on the right-hand side of Eq. (5.43), we may
rewrite the expression for the correlation function $(n; k, m) as follows
i=
o(n;k,m)=u(n—m)u(n—k) +] Yo u(i- m)u(i- k) + 66,
i=1
(5.45)
By definition, the expression inside the square brackets on the right-hand
side of Eq. (5.45) equals @(n — 1;k, m). Accordingly, we may rewrite this
equation as
o(n;k,m) = o{n—1;k,m) + u(n — m)u(n —k),
Kemi 0) deco 3 ly slo 40)
This is a recursive equation for updating the deterministic correlation
function of the tap inputs, with u(n — m)u(n — k) representing the correc-
tion term of the update. Note that this recursive equation is independent of
the constant c.
140 INTRODUCTION TO ADAPTIVE FILTERS
u(n)
u(n — 1)
u(n) = (5:47)
Wiss lial)
Then, by using Eq. (5.46), we may write the following recursive equation for
updating the deterministic correlation matrix:
ee P(n — 1)u(n)
at) 1+ ul(n)P(n — 1)u(n) 98)
Then, we may rewrite Eq. (5.49) as follows:
P(n)
= P(n — 1) — kK(n)u’(n)P(n— 1) (5.52)
The M-by-1 vector k(7) is called the gain vector.
Postmultiplying both sides of Eq. (5.52) by the tap-input vector u(),
we get
Initial Conditions
Putting n = 0 in Eq. (5.43), we get
(a0)
= cr (5.67)
For the initial value of the coefficient vector, it is customary to use
h(0) = 0 (5.68)
where 0 is the M@-by-M null vector. This corresponds to setting all the tap
coefficients of the tapped-delay-line filter initially equal to zero.
MO) TF ulm)
P(n — 1)u(n)
l P(n — Lal)
3. Compute the true estimation error
We thus see that the RLS algorithm consists of first-order matrix difference
equations. Also, the inversion of the correlation matrix ®(7) is replaced by
the inversion of a scalar, namely, 1 + u’(n)P(n — 1)u(n).
144 INTRODUCTION TO ADAPTIVE FILTERS
The minimum value of the residual sum of squares, namely, J,,,,(7), results
when the coefficient vector of the tapped-delay-line filter is set equal to the
least-squares estimate h(n). To compute J,,;,(7) we may use Eq. (5.17). In
this section we will use Eq. (5.17) to develop a recursive relation for
computing J,,,,(7).
From Eq. (5.7) we deduce that
E,(n) = E, (m= 1) + d7(n) (5.69)
Therefore, substituting Eqs. (5.59), (5.62), and (5.69) in (5.17), we get
Jer
n) = E,(n—1)+d7(n)
—[i’(n— 1) + k(n) n(n)| [0(n — 1) + d(n)u(n)]
= oe
< —1)-W(n - 1)0(n - 1)|
n)[d(n) — h(n
— 1)u(n)] — n(n)k"(n)0(n) (5.70)
where in the last term we have restored 8(7) to its original form. For the
expression inside the first set of square brackets, we have
E,(n —1) —W(n — 1)0(n — 1) = Juin(n — 1)
For the expression inside the second set of square brackets, we have
d(n) — W(n — 1)u(n) = n(n)
For the last term we note that the vector product
Fnrin(
2) =Smin(n — 1) + d(n) n(n) = n(n)u"(n)h(n)
= Snyin(? — 1) + n(n) [d(n) — ul (n)h(n)] (5.71)
Accordingly, we may use Eq. (5.64) to simplify Eq. (5.71) as
which is the desired recursion for updating the residual sum of squares.
Thus, the product of the true estimation error n(n) and the error signal
e(n) represents the correction term in this updating recursion.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 145
Although the RLS algorithm does not attempt to minimize the mean
Squared error (in the ensemble-averaged sense), nevertheless, the mean-
square value of the true estimation error n(n) converges within less than
2M iterations, where M is the number of tap coefficients in the tapped-
delay-line filter.
To prove this important property of the RLS algorithm, we first rewrite
the expression for n(n) by recognizing that, in the optimum Wiener
condition, the desired response d(n) equals
*To be consistent with the statistical analysis presented in Section 5.2, we should evaluate
the average mean-squared value et n(n) in two stages: (1) We assume that the tap-input vector
u(7) is known, and we average 7 *(n) with respect to the estimate h(n) prodcued by the RLS
algorithm. (2) We average the resulting mean-square value of 7 *(n) with respect to u(77). In the
analysis presented in Section 5.6, these two steps are combined into one. The final result is, of
course, the same.
146 INTRODUCTION TO ADAPTIVE FILTERS
E(n) = E{tr|
(hy~ h(n — 1))((hy = h(n — 1))"u(n)u"(n)]}
tr{ E|(hy,— h(n — 1))(hy — h(n — 1))"'u(n)u?(n)|\ (5.81)
We now observe that the least-squares estimate h(n — 1) is determined
solely by data available up to and including time n — 1. We may thus
assume that h(n — 1) is independent of the tap-input vector u(”) measured
at time n, in accordance with the independence theory discussed in Chapter
4. Hence, we have
where we have used the definitions of Eqs. (5.78) and (5.79). We may thus
express &(n) as
§(n) = Emintt [P(r is 1)R] (5.82)
Correspondingly, we may rewrite Eq. (5.77) as
For large n, we find from Eq. (5.20) that we may approximate the
ensemble-averaged correlation matrix R by the time-averaged correlation
matrix ®(n — 1)/(n — 1) as follows
R =~ =——1 O(n‘oi- 1)
Also, since P(n — 1) equals ®~'(n — 1), it follows that
|
n= |
where I is the M-by-M identity matrix. With each element of the identity
matrix I equal to one, its trace equals M. Hence, we may approximate the
mean-square value of the true estimation error n(n), for large n, as follows:
M
E|n 2 (n)] me
is oe |
In part (b) of the figure we have included, for the sake of comparison,
the corresponding signal-flow graph representation of the LMS algorithm.
The LMS algorithm is expressed in a way that h(n — 1) represents the old
estimate of the coefficient vector, and h(n) represents the updated estimate,
so that it is consistent with the notation used for the RLS algorithm.
Based on the signal-flow graphs of Fig. 5.2 and the theory presented in
previous sections, we may point out the following basic differences between
the RLS and LMS algorithms:
1. In the LMS algorithm, the correction that is applied in updating the old
estimate of the coefficient vector is based on the instantaneous sample
value of the tap-input vector and the error signal. On the other hand, in
the RLS algorithm the computation of this correction utilizes all the past
available information.
2. In the LMS algorithms, the correction applied to the previous estimate
consists of the product of three factors: the (scalar) step-size parameter
u, the error signal e(n — 1), and the tap-input vector u(m — 1). On the
d(n)
h(n)
Figure 5.2 Multidimensional signal-flow graph: (a) RLS algorithm, (b) LMS algorithm.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 149
other hand, in the RLS algorithm this correction consists of the product
of two factors: the true estimation error n(n) and the gain vector k(n).
The gain vector itself consists of ®~'(n), the inverse of the deterministic
correlation matrix, multiplied by the tap-input vector u(n). The major
difference between the LMS and RLS algorithms is therefore the pres-
ence of ®~'(n) in the correction term of the RLS algorithm that has the
effect of decorrelating the successive tap inputs, thereby making the RLS
algorithm se/f-orthogonalizing. Because of this property, we find that the
RLS algorithm is essentially independent of the eigenvalue spread of the
correlation matrix of the filter input.
. The LMS algorithm requires approximately 20M iterations to converge
in mean square, where M is the number of tap coefficients contained in
the tapped-delay-line filter. On the other hand, the RES algorithm
converges in mean square within less than 2M iterations. The rate of
convergence of the RLS algorithm is therefore, in general, faster than
that of the wMS algorithm by an order of magnitude.
. Unlike the LMS algorithm, there are no approximations made in the
derivation of the RLS algorithm. Accordingly, as the number of itera-
tions approaches infinity, the least-squares estimate of the coefficient
vector approaches the optimum Wiener value, and correspondingly, the
mean-square error approaches the minimum value possible. In other
words, the RLS algorithm, in theory, exhibits zero misadjustment. On the
other hand, the LMS algorithm always exhibits a nonzero misadjust-
ment; however, this misadjustment may be made arbitrarily small by
using a sufficiently small step-size parameter pu.
. The superior performance of the RLS algorithm compared to the LMS
algorithm, however, is attained at the expense of a large increase in
computational complexity. The complexity of an adaptive algorithm for
real-time operation is determined by two principal factors: (1) the
number of multiplications (with divisions counted as multiplications) per
iteration, and (2) the precision required to perform arithmetic operations.
The RLS algorithm requires a total of 3M(3 + M)/2 multiplications,
which increases as the square of M, the number of filter coefficients. On
the other hand, the LMS algorithm requires 2M + 1 multiplications,
increasing linearly with M. For example, for M = 31 the RLS algorithm
requires 1581 multiplications, whereas the LMS algorithm requires only
63.
where e(i) is the error signal defined in the same way as before, and w(n, /)
is a weighting factor with the property that
Oy ist yisil es a ey (5.86)
The use of the weighting factor is intended to ensure that data in the distant
past is “forgotten,” in order to afford the possibility of following statistical
variations in the incoming data when the filter operates in a nonstationary
environment. One such form of weighting that is commonly used in practice
is the exponential weighting factor defined by
w(n,i) =A", ah Pee | (5.87)
where A is a positive scalar equal to or less than one. The reciprocal of
1 — A is, roughly speaking, a measure of the memory of the exponentially
weighted RLS algorithm. Thus, for A = 1 all past data is weighted equally in
computing the updated coefficient vector h(n). On the other hand, for A < 1
the past data are attenuated exponentially, with the result that the present
data have a larger influence on the updating computation than the past
data. This, indeed, is the feature that we like to have in the adaptive process
when it is required to deal with the output of a time-varying channel or a
time-varying desired response.
Following a procedure similar to that described above, we may show
that the exponentially weighted RLS algorithm is described by the following
set of equations:
A! P(n — 1)u(n)
Bsa + Au? (n)P(n — 1)u(n) ae)
P(n) =A 'P(n — 1) —A7!k(n)u’(n)P(n— 1) (5.89)
n(n) = d(n) — w'(n)h(n— 1) (5.90)
h(n)
= h(n — 1) + k(n) n(n) (5.91)
The derivations of these relations are left as an exercise for the reader. What
is really important to note, however, is the fact that the introduction of the
exponential weighting factor only affects the computations of the gain
vector k(n) and the estimation error correlation matrix P(n). The initial
conditions are chosen in the same way as before.
5.9 NOTES
Theory
Least-squares estimation has an old history, going back to Gauss [1] for its
origin in 1809. It is discussed in detail in many textbooks in mathematics,
e.g., Lawson and Hanson [2], Stewart [3], Miller [4], Draper and Smith [5],
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 151
and Weisberg [6]. See also the books by Ljung and Séderstrom [7], Hsia [8],
Goodwin and Payne [9], Franklin and Powell [10].
The recursive least squares (RLS) algorithm was apparently first derived
in 1950 by Plackett [11]. However, the algorithm has been derived indepen-
dently by several other authors; see, for example, Hastings-James [12]. The
book by Ljung and Séderstrém [7] presents a comprehensive treatment of
the algorithm. This book also contains an extensive list of references on the
subject.
The RLS algorithm represents a time-varying, nonlinear stochastic
difference equation. Accordingly, the analysis of its convergence behavior is
in general difficult. Ljung [13] has shown that a time-invariant deterministic
ordinary differential equation can be associated with the algorithm. The
stability properties of this equation are tied to the convergence properties of
the algorithm. This approach to convergence analysis is also described in
detail in the honk by Ljung and Séderstrém [7].
Mueller [14] presents a theory for explaining the first convergence of the
RLS algorithm, with particular reference to the use of this algorithm in
adaptive equalizers. Examining the solution of the deterministic normal
equations (5.10) after exactly M iterations (where M is the number of tap
coefficients), we find that in the noiseless case the tap-input vectors
u(1), u(2),..., u(M) are linearly independent of each other for the data
sequences that are usually used for equalizer startup. Accordingly, in the
special case where the transfer function of the channel is of the all-pole type
and of order M — 1, the residual sum of squares J(n) will remain zero for
n > M, and the least-squares estimate h(n)= h(M). Thus, after only M
iterations, the RLS algorithm yields a coefficient vector that is only asymp-
totically attainable by the LMS algorithm.
The RLS algorithm, as derived in the chapter, applies to real-valued
data. In [15], Mueller develops the complex form of the RLS algorithm for
dealing with complex-valued data.
The RLS algorithm is closely related to Kalman filter theory. The
derivation of the Kalman filter [16], [17] is based on modelling a linear
dynamical system by the following pair of equations:
1. A state equation, describing the motion of the system:
x(n + 1) = ®(n + 1,n)x(n) + v(n)
where the M-by-1 vector x(n) is the state of the system and ®(n + 1,7)
is a known M-by-M state transition matrix relating the states of the
system at times n + 1 and n. The M-by-1 vector v() represents errors
introduced in formulating the motion of the system.
2. A measurement equation, describing the observation process as follows:
y(n) =C(n)x(n) + e(n)
where the N-by-1 vector y(n) denotes the observed data, and C(n) 1s a
152 INTRODUCTION TO ADAPTIVE FILTERS
Fast Algorithms
As mentioned in Section 5.7, the major limitation of the RLS algorithm,
compared to the LMS algorithm, is that it requires a number of multiplica-
tions that increases as the square of M, the number of tap coefficients,
whereas the LMS algorithm requires a number of multiplications that
increases linearly with M. Morf et al. [26] describe a fast implementation of
the RLS algorithm, which requires a computational complexity that is linear
in M. Thus the fast RLS algorithm offers the improved convergence
properties of least-squares algorithms at a cost that is competitive with the
LMS algorithm. The computational efficiency of the fast RLS algorithm is
made possible by exploiting the so-called shifting property that is encoun-
tered in most sequential estimation problems. We may describe the shifting
property as follows. If at time n, the tap-input vector u(7) is defined by
u(n)
heya i 1)
nn M + 1)
u(n + 1)
u(n + 1) = ee
un = M + 2)
That is, with the arrival of the new data sample u(n + 1), the oldest sample
154 INTRODUCTION TO ADAPTIVE FILTERS
u(n — M + 1) is discarded from the tap-input vector u(7), and the remain-
ing samples u(n), u(n — 1),...,u(n — M + 2) are shifted back in time by 1
sample duration, thereby making room for the newest sample u(n + 1). In
this way, the tap-input vector is updated.
The fast RLS algorithm is based on ideas originally introduced by Morf
[27]. An explicit application of these ideas to adaptive algorithms is devel-
oped by Ljung et al. [28], and the application to adaptive equalizers is
described by Falconer and Ljung [29]. The derivation of the fast RLS
algorithm uses an approach that parallels that of the Levinson-Durbin
recursion in that it relies on the use of forward and backward linear
predictors (that are optimum in the least-squares sense) to derive the gain
vector k(n) as a by-product. This is done without any manipulation or
storage of M-by-M matrices as required in the conventional form of the
RLS algorithm. The determination of the forward and backward predictors
is based on the matrix form of the normal equations (5.14) and the
minimum residual sum of squares given in Eq. (5.17). The derivation of the
(exponentially weighted) fast RLS algorithm for real-valued data is given by
Falconer and Ljung [29], and for complex-valued data it is given by Mueller
[15]. Falconer et al. [30] discuss the hardware implementation of the fast
RLS algorithm, and they show that the algorithm can be partitioned so as to
be realizable with an architecture based on multiple parallel processors. This
implementation issue is also discussed by Lawrence and Tewksbury [31].
A serious limitation of the fast RLS algorithm, however, is that it has a
tendency to become numerically unstable. Mueller [15] reports computer
simulations of an adaptive equalization problem that included the use of the
fast RLS algorithm with an exponential weighting factor. When single-preci-
sion floating-point arithmetic (i.e., 24 bits for the mantissa) was used,
unstable behavior of the fast RLS algorithm resulted. However, the use of
double-precision arithmetic (i.e., 56 bits for the mantissa) eliminated the
instability problem. Using computer simulation, Lin [32] found that the
instability and finite precision problems of the fast RLS algorithm are
closely related to an abnormal behavior of a quantity in this algorithm. In
particular, this quantity may be interpreted as a ratio of two autocorrela-
tions; therefore, it should always be positive. However, when the fast RLS
algorithm is implemented using finite-precision arithmetic, the computed
value of this quantity sometimes becomes negative, thereby causing the
performance of the algorithm to degrade seriously. Lin [32] describes a
method to re-initialize the algorithm periodically so as to overcome this
problem.
Cioffi and Kailath [33], and Cioffi [34] have developed another fasz,
fixed-order, least-squares algorithm for adaptive tapped-delay-line filter ap-
plications which requires slightly fewer operations per iteration and exhibits
better numerical properties than the fast RLS algorithm. The approach
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 155
Applications
Early applications of the RLS algorithm to system identification were
reported by Hastings-James [12] and Astrom and Eykhoff [38].
The book by Ljung and Séderstrom [7] describes the application of
recursive identification techniques (including the RLS algorithm) to off-line
identification, adaptive control, adaptive estimation, and adaptive signal
processing (e.g., adaptive equalization, adaptive noise cancelling).
Marple [39] presents a fast, efficient RLS algorithm for modelling an
unknown dynamic system as a finite-impulse response (FIR) or tapped-
delay-line filter. Marple and Rabiner [40] measure the performance of this
156 INTRODUCTION TO ADAPTIVE FILTERS
Gain
(dB)
al)
=i)
i=) 1000 2000 3000 4000
Frequency (Hz)
Delay
(ms)
Figure 5.3 Characteristics of voice channel used by Gitlin and Magee [42]; reproduced with
permission of the IEEE.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 157
LMS
1?) , pre)
algorithm
Output
squared
average
error algorithm
01 2
0 100 200 300 400 500 600 700 800 900 1000
Number of iterations
Figure 5.4 Comparison of the output average squared errors produced by the LMS and the
RLS algorithms, as reported by Gitlin and Magee [42]; reproduced with permission of the
IEEE:
Gitlin and Magee [42] have compared the performance of the LMS and
RLS algorithm” for adaptive equalization in a noisy environment, using the
voice-channel characteristics shown in Fig. 5.3. For the 3l-tap equalizer
used in the simulation, the largest-to-smallest eigenvalue ratio of the correla-
tion matrix of the tap inputs was 18.6. The data rate used in the simulation
was 13,2000 bits per second. To transmit the digital data through the
channel, four-level vestigial sideband modulation was used, with the carrier
frequency at 3.455 kHz. Gaussian-noise samples were added to the channel
output to produce a signal-to-noise ratio of 31 dB. Figure 5.4 shows the
simulation results for the LMS and RLS algorithms. The conclusion to be
drawn from Fig. 5.4 is clear. The RLS algorithm reaches convergence within
60 iterations (data symbols), while the LMS algorithm requires about 900
iterations. This shows that for this application the rate of convergence of the
~ *Gitlin and Magee [42] refer to the RLS algorithm as Godard’s algorithm, in recognition
of its derivation by Godard [18] using Kalman filter theory. They also refer to the LMS
algorithm as the simple gradient algorithm.
158 INTRODUCTION TO ADAPTIVE FILTERS
LMS
algorithm
5
ae)
&
Ss
z
vo
op
fa}
o>
se}
= Fast RLS
£ algorithm
eq
Minimum mean
squared error
0.01 ee | (hae le
0 20 40 60 80 100 120 140
Number of iterations
Figure 5.5 Comparison of the output average squared errors produced by the LMS and fast
RLS algorithms, as reported by Falconer and Ljung [29]; reproduced with permission of the
Melee
RLS algorithm is faster than that of the LMS algorithm by more than an
order of magnitude.
Falconer and Ljung [29] present computer simulation results for the
same set of parameters as those considered by Gitlin and Magee [42]. The
fast RLS algorithm* was implemented with the weighting constant A = 1.
The results of the experiment, presented in Fig. 5.5, show that the fast RLS
algorithm does indeed retain the fast convergence property of the conven-
tional RLS algorithm.
*Palconer and Ljung [29] refer to the fast RLS algorithm as the fast Kalman algorithm.
They also refer to the LMS algorithm as the simple gradient algorithm.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 159
re
Equalized
eal Feedforward Threshold PT
Fu(n)} section device signal
Feedback
section
REFERENCES
26. M. Morf, L. Ljung, and T. Kailath, “Fast Algorithms for Recursive Identification,” Proc.
IEEE Conference on Decision and Control (Clearwater Beach, Florida, December 1976).
. M. Morf, “Fast Algorithms for Multivariable Systems,” Ph.D. Dissertation, Stanford
University, Stanford, California, 1974.
. L. Lyung, M. Morf, and D. D. Falconer, “Fast Calculation of Gain Matrices for Recursive
Estimation Schemes,” International Journal of Control, vol. 27, pp. 1-19, 1978.
29. D. D. Falconer and L. Ljung, “Application of Fast Kalman Estimation to Adaptive
Equalization,” IEEE Trans. Communications, vol. COM-26, pp. 1439-1446, 1978.
30. D. D. Falconer, V. B. Lawrence, and S. K. Tewksbury, “Processor-Hardware Considera-
tions for Adaptive Digital Filter Algorithms,” Proceedings IEEE International Conference
on Communications (Seattle, Washington, June 1980), pp. 57.5.1-57.5.6.
. V. B. Lawrence and S. K. Tewksbury, “Multiprocessor Implementation of Adaptive
Digital Filters,’ IEEE Trans. Communications, vol. COM-31, pp. 826-835, June 1983.
. D. W. Lin, “On Digital Implementation of the Fast Kalman Algorithms,” submitted to
IEEE Trans. Acoustics, Speech and Signal Processing.
BS J. M. Cioffi and T aailath, “Fast, Fixed-Order, Least-Squares Algorithms for Adaptive
Filtering” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
Processing Boston, Massachusetts, April 1983, pp. 679-682.
34. J. M. Cioffi, “Fast, Fixed-Order Least-Squares Algorithms for Communications Applica-
tions,” Ph.D. Dissertation, Stanford University, Stanford, California, 1984.
35) T. Kailath, “ Time-Variant and Time-invariant Lattice Filters for Nonstationary Processes,”
“Proceedings Fast Algorithms for Linear Dynamical Systems” (Aussios, France, 1981), pp.
417-464.
36. D. T. L. Lee, M. Morf, and B. Fiedlander, “Recursive Least-Squares Ladder Estimation
Algorithms,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp.
627-641, June 1981.
37. G. Carayannis, D. G. Manolakis, and N. Kalouptsidis, “A Fast Sequential Algorithm for
Least-Squares Filtering and Prediction,’ IEEE Trans. Acoustics, Speech, and Signal
Processing, vol. ASSP-31, pp. 1394-1402, December 1983.
38. K. J. Astrém and P. Eykhoff, “System IdentificationA Survey,” Automatica, vol. 7, pp.
123-162, 1971.
39: S. L. Marple, Jr., “Efficient Least Squares FIR System Identification,’ IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp. 62-73, February 1981.
40. S. L. Marple, Jr., and L. R. Rabiner, “Performance of a Fast Algorithm for FIR System
Identification Using Least-Squares Analysis,” Bell Syst. Tech. J., vol. 62, pp. 717-742,
March 1983.
41. S. L. Marple, Jr., “Fast Algorithms for Linear Prediction and System Identification Filters
with Linear Phase,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-30,
pp. 942-952, December 1982.
42. R. D. Gitlin and F. R. Magee, Jr., “Self-Orthogonelizing Adaptive Equalization Algo-
rithms,” IEEE Trans. Communications, vol. COM-25, pp. 666-672, 1977.
43. M. Austin, “Decision Feedback Equalization for Digital Communication over Dispersive
Channels,” MIT Res. Lab. Electron., Tech. Rep 461, August 1967.
44. C. A. Belfiore and J. H. Park, Jr., “Decision Feedback Equalization,” Proc. IEEE, vol. 67,
pp. 1143-1156, August 1979.
45. V. U. Reddy, T. J. Shan, and T. Kailath, “Application of Modified Least-Squares
Algorithms to Adaptive Echo Cancellations,” Proceedings IEEE International Conference
on Acoustics Speech, and Signal Processing (Boston, Massachusetts, April 1983), pp.
53-56.
46. F. K. Soong and A. M. Peterson, “Fast Least-Squares (LS) in the Voice Echo Cancellation
Application,” Proceedings IEEE International Conference on Acoustics, Speech, and
Signal Processing (Paris, France, May 1982), pp. 1398-1403.
CHAPTER
SIX
ADAPTIVE LATTICE FILTERS
162
ADAPTIVE LATTICE FILTERS 163
MO)
m-1 (= W
Figure 6.1 A single stage of the lattice filter used in the development of the forward—backward
method.
Se eC
= B| fea] + [W]e [20 - 0]
oy Ee) brent ae 1)] (6.3)
Differentiating the mean squared error e'/) with respect to y//? and setting
the result equal to zero, we find that the optimum value of y(/ is defined by
Similarly, we may show that the optimum value of the backward reflection
coefficient, for which the mean-square value of the backward prediction
164 INTRODUCTION TO ADAPTIVE FILTERS
ae
m= 1(
(i)|
Time-Update Recursions
When the time series applied to the input of the lattice filter is stationary,
and the forward and backward reflection coefficients of each stage of the
lattice filter are set at their optimum values, we have
ECG) =2lb Gal) lane (6.6)
where M is the order of the lattice filter. Under these conditions, we find
from Eqs. (6.4) and (6.5) that
43 =70).-_lsmsM (6.7)
When, however, the filter input is Pe: we find that, in
general, the two reflection coefficients y\/} and y,{’) have unequal values.
Also, there is no guarantee that they both will have a magnitude less than
one.
To use the formulas of Eqs. (6.4) and (6.5) to compute the forward and
backward reflection coefficients for stage m of the lattice filter, we need
estimators for the expectations in the numerators and denominators of these
formulas. Assuming a time series of n samples, we may use the estimators
tabulated below:
Expectation Eeeaion
1
El fn bp aG re 1)], ==Sy \"~ Tae 6h) Bye \@ c 1)
Pat
Een Gt) a,
Ny Na ae C8)
Baath
km—1(n)
YG po a (6.8)
EX),
m1 (n-1)
ADAPTIVE LATTICE FILTERS 165
and
(b) eas ke
m-1 am)
Yn (a) ED, (n) (6.9)
where
and
Note that since the filter input equals zero for i < 0, by assumption, we
have
isl n
We next modify Eqs. (6.10), (6.11), and (6.12) that define k,,,(n),
E‘?)\(n), and E”),(n), so that we may compute them recursively. Consider
first Eq. (6.10). By separating out the term corresponding to i = n from the
summation, we may rewrite this equation as follows:
= i
i Sh
Order-Update Recursions
Having computed the forward reflection coefficient y\/’(n) and the back-
ward reflection coefficient Ye) for stage m of the lattice filter, we may
next compute for i = 1,2,...,” the forward prediction error f,,(7) and the
166 INTRODUCTION TO ADAPTIVE FILTERS
backward prediction error b,,(i) at the output of this stage by using the
order-update relations:
Bil
m = Bit = 1) Ye CE) fat h)s tae eevee eee
where y(n) and y(n) are treated as constants for the time interval
aren.
Extending the definition of Eq. (6.11) to filter order m, we may express
the estimate E“”
m
for the variance of the forward prediction error as follows:
EY(n) = YN ¥2(i)
i=]
+[y(n)]?
¥ Nb _G - 1) (6.18)
1=1
Substituting the definitions of Eqs. (6.8), (6.10), (6.11), and (6.12) in (6.18),
we get
2h n(n)
ED ((n) ) = EP,
mn 1( (n) E®) mt
(n — 1) foal
m 1( We )
k a
si m1) EY) (n a 1)
EO (n a. 1)
ke we(FE)
= EY (n co 6.19
Ma Otay ie
which is the desired order-update recursion. Similarly, using Eq. (6.12), we
may develop the companion order update recursion for the estimate of the
variance of the backward prediction error:
Version 1
1. Initialize the algorithm by setting
k,,(0) = 0, LA oadURNBier |
yP(n) = seewere)
Bn = 1)
(b) ee etn)
m—1
Vn (n) ral
EL?)
Bn NE el ata)
Be NE el) t Bn)
Version 2
1. Initialize the algorithm by setting
k,,(0) = 0, Od
On sala I
(f)
kee
pe Aces
ok
Ym (n) =
EX)(n — 1)
ko ate)
Eehn De (ne
EK, (n)
The basic difference between the two versions of the algorithm as sum-
marized above is that in version | the time-update recursions of Eqs. (6.14)
and (6.15) are used for updating the values of the forward and backward
prediction-error variances, whereas in version 2 these variances are updated
by using the order-update recursions of Eqs. (6.19) and (6.20).
byt ei 1)
Figure 6.2 A single stage of the lattice filter used in the development of the Burg method.
170 INTRODUCTION TO ADAPTIVE FILTERS
= (1 a Veg) El eae)
which is the desired result. Similarly, we may prove Eq. (6.27) by using Eqs.
(6.22) and (6.25).
ge
lg
eee ge (6.28)
NAV oeeth
This result follows directly from Eqs. (6.4) and (6.5) for the forward and
backward reflection difficulties. Accordingly, the Burg method is sometimes
referred to as the harmonic-mean method.
less constant
Then we may express the optimum forward reflection coefficient ¥/2 of Eq.
(6.4) and the optimum backward reflection coefficient y{%) of Eq. (6.5) in
terms of p,, and a,, as follows: .
gh ki
Pr,
(6.32)
and
% 4 = Pm (6.33)
Since we always have a,, + (1/a,,) = 2, and |p,,| < 1, it follows that
= Il for all m
Ym,0
6.3 DISCUSSION
Ym —
aa OLN (6.34)
VE[ #2-1@)] [62,4 - 0]
172 INTRODUCTION TO ADAPTIVE FILTERS
We observe that, except for a minus sign, the reflection coefficient of Eq.
(6.34) equals the geometric mean of the forward reflection coefficient
defined by Eq. (6.4) and the backward reflection coefficient defined by
Eq. (6.5); hence the name of the method. We also observe that the
formula of Eq. (6.34), except for the minus sign, may be interpreted as
the statistical correlation between f,,_,(/) and 6,,_,(i — 1). As with the
Burg method, we therefore have |y,,,| < 1 for all m.
4. In the minimum method, we compute the forward and backward reflec-
tion coefficients defined by Eqs. (6.4) and (6.5), respectively, and choose
the one with the smaller magnitude. If it turns out that |!) < |y,(”)|, we
assign y/)} to both the forward and Bae reflection coefficients of
stage m. If, on the other hand, we find that |") < |y{/}|, we assign y,’),
to both the forward and backward reflection coefficients of stage m. It
turns out that if the magnitude of either y{/} or y{”) is greater than one,
the magnitude of the other is necessarily less than one. This follows from
Eqs. (6.32) and (6.33). Hence, the use of the minimum method will
always yield a minimum-phase lattice filter.
From the above discussion, we observe that the Burg method, the
forward method, the backward method, and the forward-backward method
are all the direct results of the minimization of an error criterion. However,
only the Burg method guarantees a minimum-phase condition for forward
prediction-error filtering. We also observe that, although both the geomet-
ric-mean method and the minimum method do guarantee a minimum-phase
forward prediction-error filtering operation, neither method can be derived
directly by minimizing some error criterion. We therefore conclude that the
Burg method is in the unique position of being the direct result of mini-
mizing an error criterion and always producing a minimum-phase design. It
should, however, be stressed that a minimum-phase filter design is necessary
only if the problem involves both analysis and synthesis, as, for example, in
linear predictive encoding of speech (see the discussion of Section 3.7). If,
on the other hand, the problem of interest only requires analysis, then
clearly the minimum-phase requirement of the forward prediction-error
filter is not necessary.
n realy jE A Ue 1)
i=1
=, Ds x” oe) se by, (i ia 1)
i=]
Correspondingly, the estimate of the reflection coefficient for stage m in
the lattice filter, and at time n, is given by
k Bi)
Yur) = —- ac
a EG ime Sal eee (6.37)
6.37
where
and
(6.39)
Note that the computation of both k,,_,(m) and E,,_,(n) depends on
the forward and backward prediction errors produced at the output of
stage m — 1 in the lattice filter.
When the filter input is stationary, the individual stages in the lattice
filter are decoupled from each other, and the repeated use of Eqs. (6.37) to
(6.39) results in a globally optimum design for the complete filter. When,
174 INTRODUCTION TO ADAPTIVE FILTERS
method. The first procedure uses a minor modification of the Burg formula.
The second procedure uses an approach similar to that used for the
development of the LMS algorithm used for the adaptive operation of
tapped-delay-line filters.
k m—1(1)
Yn(n + 1) = 7 m=1,2,...,M (6.41)
E,—1(”)
We make this modification in order to make sure that the correction which
is applied to the old estimate y,,(m) depends only on past values of the
forward and backward prediction errors that are available at time n. The
exact nature of this correction will be determined presently. It is clear,
however, that an estimation error is incurred in the use of Eq. (6.41) instead
of (6.37).
We may compute the quantities k,,_,() and E,,_,(n), recursively, as
follows:
(b)
Figure 6.3 Signal-flow graphs for the computation of (a) k,, (7), (b) Em —1(")-
176 INTRODUCTION TO ADAPTIVE FILTERS
Eu) E,,-\(" a 1)
1 BAR ean
SoS k 71 m
- 2 (6.45)
Es
m=) al mm) Balt)
Substituting the recursive relations of Eqs. (6.42) and (6.43) in (6.45), we get
follows:
§,(n) = — Saray
E,_ ne mew n)[B,— ree ay, (n Venn )]
nae ae 1(n)b,,
(1) + by,So ae) (6.47)
|e
where in the last line we have used the order-update recursions for stage m
in the lattice filter. Thus, substituting Eq. (6.47) in (6.44), we get the desired
time-update recursion for the reflection-cvefficient estimate for stage m of
the lattice filter:
eH) — (E Bes
femal(n)| + E[b?eel (n — 1)]}[1 + 72(n)|
+4y,,(2)E | fin—1(1)b_—1("
— 1)] (6.49)
where the forward prediction error f,,_,(”) and the delayed backward
prediction error b,,_,(n — 1) refer to the input of stage m of the filter. The
dependence of the cost function ¢,,(”) on time n arises because of the
variation of the reflection coefficient y,,(”) with time. By differentiating
€,,() with respect to y,,(”), we get the following expression for the
gradient:
_ 9&,(N)
Vee) = dy,,(n)
Using instantaneous values for the mean-square values in Eq. (6.50), we get
an instantaneous estimate for the gradient V,,,(”):
Vintt) = ay (mie
ipa i(n Nia mM 1( Ls 1)] ie 4fn-1(1)b,\(n = 1)
(6.51)
Clearly, this is an unbiased estimate in that its expected value equals the
true value of the gradient V,,,(7). Thus, by analogy with the LMS algorithm
for the adaptation of the coefficients of a tapped-delay-line filter, we may
write the following time-update recursion for the reflection coefficient of
stage m of the lattice filter:
Wh O= 0
Ear (0) ae
where M is the order (i.e., the number of stages) of the filter. Accordingly,
we may utilize this sequence of backward prediction errors as inputs to a
corresponding set of tap coefficients, w(0),w(1),...,w(M), to produce the
minimum-mean-square estimate of a desired response d(n).
The transformation between the sequence of input samples u(n), u(n —
1),...,u(m — M) and the sequence of backward prediction errors
by(n), b(n), ..., by(m) may be expressed as follows [see Eq. (3.27)]:
by(”)
b(n) = ae (6.56)
mee
the (M + 1)-by-1 vector u() is the input vector:
u(n)
u(n) = ng int (6.57)
ALE
and the (M + 1)-by-(M + 1) transformation matrix L is a lower triangular
matrix defined by
il 0) 0 ve 0
a,(1) 1 0 vee
L=| 42(2) a,(1) ; sean (cr)
w(0)
1
w= a (6.59)
w(M )
Then with the backward prediction-error vector b(”) used as the input to
the tap-coefficient vector w, we may express the estimate of the desired
response as
S = LRL’ (6.68)
where L’ is an upper diagonal matrix that is the transpose of L. Similarly,
substituting Eq. (6.55) in (6.63) and using Eq. (6.67), we get
q=Lp (6.69)
Thus, substituting Eqs. (6.68) and (6.69) in (6.64) and comparing the results
with Eq. (6.65), we get the desired relationship between the two optimum
vectors W, and hy, namely,
Li w, = h, (6.70)
Equivalently, we may write
The cross-correlation between the desired response d(n) and the backward
184 INTRODUCTION TO ADAPTIVE FILTERS
ounsly
$9 oy]
185
ammjonys
Jo aandepe3o1y}e] ssssz001d-ju
Sd ‘
Iol“UONPUITIS
UO[SISA
UOISIOA
S V"V
186 INTRODUCTION TO ADAPTIVE FILTERS
where
and A is a positive real constant that is less than or equal to one. Here again
a time-varying step size parameter is used in the adaptation process in order
to keep the overall rate of convergence of the adaptive filter in Fig. 6.5
insensitive to disparity in the eigenvalues of the correlation matrix of the
lattice predictor input.
d —w(0,n)b : = 0)
€,,(n) = Mer ving Pol oe (6.88)
wR. n)b, (4),
Egan) 5M
m=1,2,..
where d(n) is the desired response, and w(0,n),w(1,7),..., w(M,n) are the
adjustable tap coefficients at time n.
The formula of Eq. (6.80) for the optimum value of the ‘mth tap
coefficient represents the solution of the equation:
a
@ ee
; = 0, SS Oem, rasa,
rarer [ez(n)]} ue
; de? (n)
Vimn(n) = dw(m,n)
de, (n)
= 2d ewan:
cea COSTS Sa m = 0 Mec don AY! (6.89)
6.89
188
ADAPTIVE LATTICE FILTERS 189
Fig. 6.6 as
w(m,n+1)=w(m,n) —48,(n)v,,(n)
w(m,n) +B, (n)e,,(n)b,,(n), m=0,1,...,M
(6.92)
where the time-varying step-size parameter B,,(n) is defined by Eqs. (6.86)
and (6.87). As with version A of the adaptive lattice joint-process estimator
of Fig. 6.5, a time-varying step-size parameter is used to update each tap
coefficient of the structure in Fig. 6.6 so as to make the overall rate of
convergence insensitive to eigenvalue spread.
Discussion
The main difference between the two update algorithms of Eqs. (6.85) and
(6.92) is that the former algorithm (pertaining to the structure of Fig. 6.5)
only provides a single error signal e(n), whereas the second algorithm
(pertaining to the structure of Fig. 6.6) provides a set of individual error
signals, {e,,(7)}, m = 0,1,..., M, one for each tap.
Ignoring the effects of algorithm self-noise, the two adaptive lattice
joint-process estimators of Figs. 6.5 and 6.6 should provide identical results.
However, the results of computer simulation experiments indicate that the
algorithm self-noise produced by the structure of Fig. 6.5 may be consider-
ably greater than that of the structure in Fig. 6.6.
The adaptive lattice joint-process estimator of Fig. 6.6 has another
advantage over that of Fig. 6.5 in that it provides a mechanism for
determining the optimum number of stages in a time-varying environment.
Specifically, the mean-square value of the error signal e,,(m) must be a
minimum in m (i.e., the number of stages involved in its computation),
because the time constant of the adaptive lattice is proportional to the
number of stages, and later stages have longer time constants which cannot
track the input.
For these reasons we find that, in practice, the second adaptive lattice
joint-process estimator of Fig. 6.6 is preferred to that of Fig. 6.5.
6.8 NOTES
Theory
Makhoul [1, 2] presents an integrated treatment of the forward method, the
backward method, the forward—backward method, the geometric-mean
method, the minimum method, and the Burg method for the design of a
lattice filter. The geometric-mean method was originated by Itakura and
190 INTRODUCTION TO ADAPTIVE FILTERS
Pie a ND eeSpe
rey
Fax Whit) Open t= Lae ey eae)
On—1
a, r
The derivation of the adaptive lattice algorithm of Eq. (6.48) presented in
Section 6.5 follows the approach described by Makhoul and Viswanathan
[8]. This paper also includes a description of the block estimation approach
presented in Section 6.5. A variation of the adaptive lattice algorithm of Eq.
(6.48) is also reported by Durrani and Murukutla [9].
Makhoul and Cosell [10] suggest the minimization of the following cost
function (for stage m of the lattice filter):
e= E[(1—a)f2?(i) + ab2(i)], O<saK<l
as the basis of an adaptive lattice filter design. The constant a determines
the mix between the forward and backward prediction errors. The optimum
value of the reflection coefficient for this stage, for which e is minimum, is
defined by
Yo,m(@) =
E[ fn-1(i)bm—1(i
— 1]
m— (i yet
Elaf, Sas i= 1)|
ADAPTIVE LATTICE FILTERS 191
an sal)
Pal ay
an Net ay? (rtdes(7 ah)
pa) n —Kn—a()
ice aa Ep tad)
The remarkable feature of the LSL algorithm summarized above is that the
forward and backward prediction errors, f,,(n) and 6,,(n), obey a set of
order updates that are identical in structure to the corresponding order
updates derived for the forward—backward method (based on minimization
of the mean squared error). Basically, in the forward—backward method the
order updates arise because of the assumed Toeplitz structure of the
(ensemble-averaged) correlation matrix of the filter input. On the other
hand, no such assumption is made in the derivation of the LSL algorithm,
since, in general, the (deterministic) correlation matrix of the filter input is
non-Toeplitz (see Section 5.1).
The basic difference between the LSL algorithm and the forward—back-
ward lattice algorithm (discussed in Section 6.1) is that the former algorithm
includes a new parameter a,,(n) that only enters into the lattice recursions
194 INTRODUCTION TO ADAPTIVE FILTERS
through the time update for the parameter k,,(n). Indeed, if we were to set
a,,-1(1 — 1) = 1 for all m in Eq. (6.95), the recursive formula for k,,_ (1)
in the LSL algorithm reduces to the same form as that in Eq. (6.13) in the
forward—backward lattice algorithm. The factor 1/a,,_,(m — 1) appears as
a gain factor, determining the rate of convergence of k,,_,(m). An im-
portant property of a,,(n) is that it is bounded by zero and one:
Comparisons of Algorithms
Gibson and Haykin [29] present a comparison of the performance of four
lattice-filter algorithms: (1) the forward—backward method, (2) the Burg
method, (3) the minimum method, and (4) the geometric-mean method,
using computer-simulated radar data. The data consisted of signals repre-
sentative of radar returns due to targets and weather disturbances. The
radar return due to the latter is commonly referred to as clutter, as it tends
to “clutter” up the radar display and thereby obscure the detection of a
moving target (e.g., aircraft). The generation of the weather-clutter data was
based on a generalized form of the autocorrelation function of radar clutter,
starting from a collection of randomly distributed scatterers. The target
signal was simulated by the product of a complex sine wave representing the
Doppler component related to target radial velocity, and a Gaussian en-
velope approximating the horizontal beam pattern of the radar antenna. For
a performance measure, the improvement factor was used, which is defined
as: “the signal-to-clutter ratio at the output of the system, divided by the
signal-to-clutter ratio at the input of the system, averaged uniformly over all
target radial velocities of interest.” Note that the ratio used in this calcula-
tion is a ratio of average powers. With the simulated data, it is a straightfor-
ward matter to average over all possible target radial velocities. Figure 6.7
shows the results of this computer simulation experiment, with the improve-
ment factor plotted versus the exponential weighting constant A for the case
of a lattice filter consisting of five stages. A notable feature of Fig. 6.7 is that
the curves are not smooth, but show considerable local variation as the
weighting constant A changes. The geometric-mean algorithm seems to be
particularly sensitive to this influence. On the other hand, the Burg algo-
rithm shows little of this influence. Also, on the whole, the Burg algorithm
gives the highest improvement.
Satorius and Alexander [30] have used computer simulation to compare
the performances of the two adaptive lattice joint-process estimators of Figs.
6.5 and 6.6 that were used for adaptive equalization of highly dispersive
communication channels. The results of this simulation showed that the
algorithm self-noise produced by the structure in Fig. 6.5 is considerably
greater than that of the second structure in Fig. 6.6.
In [23], Satorius and Alexander present computer simulation results on
a comparative evaluation of two adaptive equalizers for two channels
representing pure, heavy amplitude distortion. One equalizer was a tapped-
delay-line filter adapted with the LMS algorithm. The other equalizer was
the lattice joint-process estimator of Fig. 6.6 adapted by means of the
gradient-adaptive-lattice (GAL) algorithm. In all simulations, 11-tap
equalizers were used, and the data sequence applied to the equalizer input
was a random sequence with polar signaling [a, = +1 in Eq. (1.3)]. The
sampled impulse response g(k) of the channel used in all the simulations
196 INTRODUCTION TO ADAPTIVE FILTERS
iL
10
lon
factor
Improvement
FB
3k
2 :
g(k)= 51 + cos(20 W i les ieseas
0 otherwise
where W was set equal to 3.1 or 3.3, corresponding to an eigenvalue spread
(i.e., ratio of maximum eigenvalue to minimum eigenvalue) of 11 or 21,
respectively. A Gaussian white-noise sequence of zero mean and variance
0.001 was added to the channel output to simulate the effect of receiver
noise. The results of this experiment showed that: (1) the adaptive lattice
joint-process estimator of Fig. 6.6 using the GAL algorithm has a faster rate
of convergence than the corresponding tapped-delay-line filter adapted with
the LMS algorithm, and (2) unlike this adaptive tapped-delay-line filter, the
adaptive lattice structure of Fig. 6.6 has a rate of convergence that is
practically insensitive to the eigenvalue disparity of the channel correlation
matrix.
In [24], Satorius and Pack present computer simulation results on a
comparative evaluation of the adaptive lattice joint-process estimator of Fig.
6.6 using the following two algorithms: (1) the least-squares lattice (LSL)
algorithm, and (2) the gradient-adaptive lattice (GAL) algorithm described
by Eqs. (6.48) and (6.43). The structure was again used as an adaptive
equalizer for two channels representing pure, heavy amplitude distortion, as
ADAPTIVE LATTICE FILTERS 197
Figure 6.8 A comparison of the LMS, GAL, and LSL algorithms for case 1. Reproduced from
Friedlander [20] by permission of the IEEE.
1.0
|
!
0.8 i
!
I
'
0.6 I
!
I. GAL (A = 0.99)
0.4 A
ye:
LSL (A = 0.99)
0.0
Number of iterations
Figure 6.9 A comparison of the LMS, GAL and LSL algorithms for case 2. Reproduced from
Friedlander [20] by permission of the IEEE.
parameter), the GAL algorithm, and LSL algorithm. The two lattice algo-
rithms were run with the exponential weighting constant A = 0.99. The
LMS algorithm was run with the two step sizes pw = 0.005 and p = 0.05.
Examination of these two figures leads to the following observations:
1. The two adaptive lattice algorithms converge considerably faster than the
LMS algorithm. The LSL algorithm is the fastest, reaching steady-state
conditions in about 10 iterations.
2. The rate of convergence of the LMS algorithm is highly sensitive to
variations in the eigenvalue spread, whereas the adaptive lattice algo-
rithms are practically insensitive to it.
Implementations
Lawrence and Tewksbury [34] discuss the issue of using multiprocessors (i.e.,
arrays of interconnected processors) for implementing adaptive lattice filter
algorithms. In this form of high-density digital hardware considerable
emphasis is placed on memory speed and size. With the multiprocessors
structured in a pipelined configuration, the total processing /atency must be
kept to one sample period. The latency of a digital signal processing device
is defined as the time between the arrival of the first binary digit (bit) of the
input signal at the input port of the device and the time when the last bit of
the answer appears at the output port of the device. In the case of a lattice
structure, in particular, the b,,() samples used to compute the f,,(n) have
to be taken before the sample-period delay, allowing the computation of the
f,,(”) path one sample period earlier. An efficient approach to do this is to
use redundant storage, as illustrated in Fig. 6.10, where the number of
sample period delays per stage is doubled.
Fellman and Brodersen [35] describe the integration of an adaptive
lattice filter in MOS large-scale-integration (LSI) technology. The architec-
ture used in the implementation is designed to optimally exploit the
advantages of analog and digital approaches. In particular, switched-capaci-
tor circuitry is used to perform the filtering operation, and digital circuitry is
used to perform the adaptation.
Satorius et al. [36] present some preliminary results on the implementa-
tion of adaptive prediction-error filters with fixed-point arithmetic. Both
tapped-delay-line and lattice structures were considered. Their performance
is compared in terms of the number of bits required to reach a prescribed
In (1)
Figure 6.10 Illustrating the use of redundant storage for modifying the lattice structure.
200 INTRODUCTION TO ADAPTIVE FILTERS
(1) The steady-state error power for each filter increases dramatically below
a minimum wordlength. This occurs when the difference between
successive time updates of the filter coefficients are less than the quantiz-
ing level of the filter, at which point the filter coefficients stop adapting.
This is in agreement with an earlier finding reported by Gitlin et al. (see
reference 34 of Chapter 4).
(2) Different filter structures, which behave identically when implemented
with infinite precision, can perform quite differently when finite preci-
sion is used.
Applications
Satorius and Alexander [23], Satorius and Pack [24], and Mueller [19]
discuss the application of the adaptive lattice joint-process estimator of Fig.
6.6 to adaptive equalization for data transmission over telephone channels.
Griffiths [7] describes an adaptive filter structure for multichannel
noise-cancelling applications, which is a generalization of the adaptive
lattice joint-process estimator of Fig. 6.6. Reddy et al. [37] present a study
of the lattice form of implementing an adaptive line enhancer, using the
exact least-squares lattice algorithm.
Carter [38] presents a preliminary investigation of adaptive lattice filter
algorithms (based on the forward—backward method and the Burg method)
applied to both real and artifically generated data. Makhoul and Cosell [10]
have investigated the adaptive lattice analysis of speech, using the (gener-
alized) adaptive estimate of the reflection coefficient given in Eq. (6.93).
Makhoul and Cosell deal exclusively with real speech signals, and use the
subjective judgment of the human listener as the criterion of goodness. The
results of this investigation showed that, for applications in speech analysis
and synthesis, the convergence of the adaptive lattice predictor was fast and
efficient enough for its performance to be indistinguishable from that of the
optimum (but more expensive) adaptive autocorrelation method developed
by Barnwell [39,40]. According to this method an infinite real-pole time
window is used to compute the autocorrelation function of the speech signal
for lags equal to 0,1,..., M, recursively at each instant n, and then this set
of values is used to solve the normal equations (for forward linear predic-
tion) for the M tapped-delay-line predictor coefficients.
Morf and Lee [41] discuss the use of the exact least squares lattice
(LSL) algorithm as a tool for modelling speech signals. They exploit a novel
feature of the algorithm [namely, the variation of the parameter a,,(”) with
changes in the statistics of the input signal] as a sensitive pitch detector.* It
*Morf and Lee [41] define the gain factor as 1/[1 — y,,(n)], where y,,(7) = 1 — a@,,4,(1).
ADAPTIVE LATTICE FILTERS 201
is found that a,,(”) takes on low values (close to zero) for non-Gaussian
components in the input signal. This causes the gain factor 1/a,,_,(n — 1)
appearing in the time-update recursion of Eq. (6.95) to assume a large value
for non-Gaussian components in the input, which in turn causes the lattice
parameters k,,_,(n), EY(n) and E“(n) to change quickly. Accordingly,
the gain factor 1/a,,_,(n — 1) may be used to track fast changes in the
statistics of the input signal.
Reddy et al. [42] and Soong and Peterson [43] discuss the use of lattice
algorithms for adaptive echo cancellation.
Porat and Kailath [44] present two normalized lattice algorithms for
least-squares identification of finite-impulse-response models. The one algo-
rithm, known as the growing-memory algorithm, is recursive in both time
and order. On the other hand, Marple’s algorithm, which basically solves
the same problem, is not recursive in time [45]. The second algorithm,
known as the sliding-memory algorithm, is suited for identifying time-vary-
ing models, for which neither Marple’s algorithm nor the growing-memory
algorithm is useful.
Gibson and Haykin [46, 47] present the results of an experimental study
(using real radar data) into the use of an adaptive lattice filter for the
improved detection of a moving target in the presence of clutter. It is
assumed that the target and clutter have different radial velocities. Use 1s
made of the normal condition that the clutter returns have constant statisti-
cal characteristics over a large area (and thus a large number of samples in
the time series). On the other hand, target returns normally cover a very
small area limited by the beamwidth of the radar antennas (typically, about
20 samples).
Metford and Haykin [48] describe a model-dependent detection algo-
rithm, and present experimental results (based on actual data obtained from
a coherent radar in an air traffic control environment). The results show an
improvement of 3 to 5 dB over the classical design of moving-target
detectors for radar surveillance, resulting from maximization of target to
noise plus clutter (power) ratio. Two different implementations of the
algorithm are considered, one using the LSL algorithm and the other using
the Kalman filtering algorithm that assumes a random walk state model.
The basic theory of this detection algorithm is described in [49].
REFERENCES
1. J. Makhoul, “New Lattice Methods for Linear Prediction,” Proceedings of the IEEE
International Conference on Acoustics, Speech, and Signal Processing 76 (Philadelphia,
April 1976).
2. J. Makhoul, “Stable and Efficient Lattice Methods for Linear Prediction,” HEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-25, pp. 423-428, October 1977.
202 INTRODUCTION TO ADAPTIVE FILTERS
3) F. Itakura and S. Saito, “Digital Filtering Techniques for Speech Analysis and Synthesis,”
paper 25-C-1, Proceedings of the 7th International Cong. Acoustics (Budapest, 1971), pp.
261-264.
_ J. P. Burg, “Maximum Entropy Spectral Analysis,” Ph.D. Dissertation, Stanford Univer-
sity, Stanford, California, 1975.
_M. D. Srinath and M. M. Viswanathan, “Sequential Algorithm for Identification of
Parameters of an Autoregressive Process,” IEEE Trans. Automatic Control, vol. AC-20,
pp. 542-546, August 1975.
_L. J. Griffiths, “A Continuously-Adaptive Filter Implemented as a Lattice Structure,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing 77 (Hartford, Connecticut, May 1977), pp. 683-686.
_L. J. Griffiths, “An Adaptive Lattice Structure for Noise-Cancelling Applications,” Pro-
ceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(Tulsa, Oklahoma, 1978), pp. 87—90.
. J. Makhoul and R. Viswanathan, “Adaptive Lattice Methods for Linear Prediction,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing (Tulsa, Oklahoma, 1978), pp. 83-86.
. T. S. Durrani and N. L. M. Murukutla, “Recursive Algorithm for Adaptive Lattices,”
Electronics Letters, vol. 15, pp. 831-833, December 1979.
10. J. Makhoul and L. K. Cosell, “Adaptive Lattice Analysis of Speech,” IEEE Trans. Circuits
and Systems, vol. CAS-28, pp. 494-499, June 1981.
Ne M. Morf, “Fast Algorithms for Multivariable Systems,” Ph.D. Dissertation, Stanford
University, Stanford, California, 1974.
. B. Friedlander, M. Morf, T. Kailath, and L. Ljung, “ New Inversion Formulas for Matrices
Classified in Terms of Their Distance from Toeplitz Matrices,” Linear Algebra and Its
Applications, vol. 27, pp. 31-60, 1979.
33 M. Morf, A. Vieira, and D. T. Lee, “Ladder Forms for Identification and Speech
Processing,” Proc. 1977 IEEE Conference on Decision and Control (New Orleans, Decem-
ber 1977), pp. 1074-1078.
. M. Morf, D. T. Lee, and A. Vieira, ““Ladder Forms for Estimation and Detection,”
Abstracts of Papers, IEEE Int. Symp. Information Theory” (Ithaca, New York, October
1977), pp. 111-112.
13). M. Morf and D. T. Lee, “Recursive Least Squares Ladder Forms for Fast Parameter
Tracking,” Proc. 1978 IEEE Conference on Decision and Control (San Diego, California,
January 1979), pp. 1362-1367.
16. D. T. L. Lee, M. Morf, and B. Friedlander, “Recursive Least Squares Ladder Estimation
Algorithms,” IEEE Trans. Circuits and Systems, vol. CAS-28, pp. 467-481, June 1981.
Mie J. K. Pack and E. H. Satorius, “Least Squares, Adaptive Lattice Algorithms,” Technical
Report 423, Naval Ocean Systems Center, San Diego, California, April 1979.
18. E. Shichor, “Fast Recursive Estimation Using the Lattice Structure,” Bell System Tech. J.,
vol. 61, pp. 97-115, January 1982.
Wy M. S. Mueller, “Least-Squares Algorithms for Adaptive Equalizers,” Bell System Tech. J.,
vol. 60, pp. 1905-1925, October 1981.
20.= B. Friedlander, “Lattice Filters for Adaptive Processing,” Proc. IEEE, vol. 70, pp.
829-867, August 1982.
oa J. Makhoul, “A Class of All-Zero Lattice Digital Filters: Properties and Applications,”
IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-26, pp. 304-314, August
1978. :
. T. S. Durrani and N. L. M. Murukutla, “Convergence of Adaptive Lattice Filters,”
Electronics Letters, vol. 15, pp. 633-635, December 1979.
3. E. H. Satorius and S. T. Alexander, “Channel Equalization Using Adaptive Lattice
Algorithms,” IEEE Trans. Communications, vol. COM-27, pp. 899-905, June 1979.
ADAPTIVE LATTICE FILTERS 203
Rq = Aq (Al1.1)
for some constant A. This condition states that the vector q is transformed
to the vector Aq by the transformation R. Since A is a constant, the vector q
therefore has special significance in that it is left invariant in direction (in
the M-dimensional space) by a linear transformation. For a typical M-by-M
matrix R there will be M such vectors. To show this, we first rewrite Eq.
(A1.1) in the form
(R — Al)q = 0 (A1.2)
where I is the identity matrix. Equation (A1.2) has a nonzero solution in the
vector q if and only if the determinant of the matrix R — AI equals zero:
det(R — AI) = 0 (A1.3)
205
206 INTRODUCTION TO ADAPTIVE FILTERS
If no such scalars exist, we say that the eigenvectors are /inearly independent.
We will prove the validity of Property 1 by contradiction. Suppose that
Eq. (A1.4) holds for certain scalars v,. Repeated multiplication of Eq. (A1.4)
by the matrix R, and the use of Eq. (A1.1), yield the following set of M
equations:
M
eoNe
qe 05, Sie oe M (A1.5)
ae
This set of equations may be written in the form of a single matrix equation
as follows:
ee M Nu as iva ;
The matrix S is called a Vandermonde matrix. When the i, are distinct, the
Vandermonde matrix S is nonsingular. Therefore, we may postmultiply Eq.
EIGENVALUES AND EIGENVECTORS 207
and
Rq, = 4,4; (A1.11)
Premultiplying both sides of Eq. (A1.10) by the transposed vector q'. we get
qi Rq,; = A,q/4, (A1.12)
The matrix R is symmetric, by hypothesis. That is, R’ = R. Hence, taking
the transpose of both sides of Eq. (A1.11), we get
qiR=),q/ (A1.13)
Postmultiplying both sides of Eq. (A1.13) by the vector q,, we get
qi Rq; = 4,444, (A1.14)
Subtracting Eq. (1.14) from (A1.12) we thus get
(A; —A,)aZq; = 0 (A1.15)
Since the eigenvalues of the matrix R are distinct, by hypothesis, we have
A, # A,. Accordingly, the condition of Eq. (A1.15) holds if and only if
qq; = 0, i#j (A1.16)
which is the desired result.
Note that both Property 1 and Property 2 apply only when the
eigenvalues of matrix R are distinct. Property 1 on the linear independence
of the associated eigenvectors applies to any square matrix R. On the other
hand, Property 2 on the orthogonality of the associated eigenvectors applies
only when the matrix R is symmetric. Note also that the orthogonality of
the eigenvectors for a symmetric matrix implies their linear independence.
norm Of a vector q;, is defined as q/q,, the inner ron of q; with itself. The
orthogonality condition that the inner product q/q ; = 9, i # J, follows from
Property 2 when the matrix R is symmetric with distinct eigenvalues. When
both of these conditions are satisfied, that is,
Ti ae ee
>
q49; ie AN (A1.17)
qi
q)
Q’Q=] ~ |[di.42,----4a/]
du
qi, 442° Gg
ee ee bP (A122)
To simplify the trace of Q7 RQ, we use the following rule in matrix algebra.
Let A be an M-by-N matrix and B be an N-by-M matrix. Then the trace of
the matrix product AB equals the trace of BA. Thus, identifying Q’ with A
and RQ with B, we may write
We have thus shown that the trace of a matrix R equals the sum of its
eigenvalues. In proving this result we used a property that requires the
matrix R to be symmetric with distinct eigenvalues; nevertheless, the result
applies to any square matrix.
To prove this property, we first use Eq. (A1l.1) to express the condition
on the jth eigenvalue A, as
Rq,=A,q;, i=1,2,...,M (A1.30)
Premultiplying both sides of this equation by q/, the transpose of eigenvec-
tor q;, we get
qi Rg, =A,q'g,, §=1,2)5:5M (A1.31)
The inner product q/q, is a positive scalar, representing the squared length
of the eigenvector q,, that is, q’q, > 0. We may therefore divide both sides
of Eq. (A1.31) by q/q, and so express the ith eigenvalue A, as the ratio
ne:
. qi Rq;
‘ ol Patri” | (A1.32)
q'q;
When the matrix R is positive definite, the quadratic form q/Rq, in the
numerator of this ratio is positive, that is, q/Rq, > 0. Therefore, it follows
from Eq. (A1.32) that A; > 0 for all 7. That is, all the eigenvalues of a
positive definite matrix are real and positive.
REFERENCES
rwo
CONVOLUTION
where z ' is the unit-delay operator. Similarly, we may define the one-sided
z-transforms of the sequences { u(n)} and { y(7)} as follows:
and
212
CONVOLUTION 213
The inner summation on the right-hand side of Eq. (A2.6) represents the
z-transform of the sequence {u(n)} delayed by k samples. Hence, using the
time-shifting property of the z-transform, we have
REFERENCES
é,)
7 ®
ayyi ¢
. et
2-0 mt es,
’ ane
o@ 4 —c oi pei otihe ies og =
e fe]
»
Sis 26 =
sot
= Petal
»
a
ee
-
«@
a° :
Ae aia
al
——
of | 4
- as
—_
-
A
img Gf
_
INDEX
215
216 INTRODUCTION TO ADAPTIVE FILTERS
Tapped-delay-line filter, 4
adaptive, 90, 129 z-transform, one-sided, 212
, v=
ca oe oe
7 i
CU ee
7 ), Gag = 7
. sy ie ie tt
vile ‘1 ae, ei’
a ’
© 3e 2S ; ‘ ;
aga et
Pa) 7 1 y
§ a Py one
‘e)
- - ’
2 “ >
ee ee ve 7
i. 7
7
i
i A .
Also available
ISBN O-02-9494L0-5