0% found this document useful (0 votes)
153 views240 pages

Introduction To Adaptive Filters - Simon Haykin

The document is an introduction to adaptive filters by Simon Haykin, focusing on their theory and applications in statistical signal processing. It covers various types of filters, including Wiener and least-squares filters, and discusses their design, operation, and implementation. The book is structured into six chapters and two appendices, providing a mathematical foundation suitable for electrical engineers.

Uploaded by

rkyr64bcvh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views240 pages

Introduction To Adaptive Filters - Simon Haykin

The document is an introduction to adaptive filters by Simon Haykin, focusing on their theory and applications in statistical signal processing. It covers various types of filters, including Wiener and least-squares filters, and discusses their design, operation, and implementation. The book is structured into six chapters and two appendices, providing a mathematical foundation suitable for electrical engineers.

Uploaded by

rkyr64bcvh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 240

~ INTRODUCT ION TO

gee
a ers
Digitized by the Internet Archive
in 2022 with funding from
Kahle/Austin Foundation

httos://archive.org/details/introductiontoad0000hayk
INTRODUCTION TO ADAPTIVE FILTERS
INTRODUCTION TO ADAPTIVE FILTERS

SIMON HAYKIN
Communications Research Laboratory
McMaster University

MACMILLAN PUBLISHING COMPANY


A Division of Macmillan, Inc.
NEW YORK
Collier Macmillan Publishers
LONDON
Copyright © 1984 by Macmillan Publishing Company
A division of Macmillan, Inc.
All rights reserved. No part of this book may be reproduced
or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or by any
information storage and retrieval system, without permission
in writing from the Publisher.

Macmillan Publishing Company


866 Third Avenue, New York, NY 10022

Collier Macmillan Canada, Inc.

Printed in the United States of America

printing number
WY Bhs)
G7 th 10)

Library of Congress Cataloging in Publication Data

Haykin, Simon S., 1931-


Introduction to adaptive filters.
Bibliography: p.
Includes index.
1. Adaptive filters. I. Title.
TK7872.F5H37 1984 621.3815'324 84-11234
ISBN 0-02-949460-5
To my wife
Nancy
CONTENTS

Preface

Chapter 1. Introduction
ipl Filters
1.2. Adaptivity ‘
1.3. Classifications of Sampled-Data and Digital Filters
1.4 Examples of Adaptivity
1.5 What Do These Examples of Adaptivity Have in Common?
Notes
References

Chapter 2. Wiener Filters


2.1 Discrete-Time Linear Estimation
2.2 Formulation of the Linear-Filtering Problem
2.3 Normal Equations
The Minimum Mean Squared Error
Principle of Orthogonality
Matrix Formulation of the Normal Equations
Properties of the Correlation Matrix
Representation of the Correlation Matrix in Terms of Its
Eigenvalues and Eigenvectors
2.9 Canonical Form of Error-Performance Surface
2.10 Notes
References

Chapter 3. Linear Prediction


3.1 The Normal Equations for Forward Linear Prediction
3.2 The Normal Equations for Backward Linear Prediction
Vili INTRODUCTION TO ADAPTIVE FILTERS

Bre: The Levinson-Durbin Recursion 50


3.4 Minimum-Phase Property of Forward Prediction-Error Filters 56
a Whitening Property of Prediction-Error Filters
3.6 Autoregressive Model
Sul Implications of the Whitening Property of a Prediction-Error
Filter and the Autoregressive Modelling of a Random Process i
3.8 The Lattice Predictor
a9 Orthogonality of Backward Prediction Errors
ete) Restriction on the Reflection Coefficients of a Lattice
Prediction-Error Filter Resulting from the Positive Definiteness
of the Correlation Matrix of the Lattice Input
eye Synthesis Structure Based on the Reflection Coefficients
5.12 Notes
References

Chapter 4. Adaptive Tapped-Delay-Line Filters Using the Gradient


Approach
4.1 Some Preliminaries
4.2 The Method of Steepest Descent
4.3 Signal-Flow Graph Representation of the Steepest-Descent
Algorithm
4.4 Stability of the Steepest-Descent Algorithm
4.5 The Mean Squared Error
4.6 The Least-Mean-Square (LMS) Algorithm
4.7 Convergence of the Coefficient Vector in the LMS Algorithm
4.8 Average Mean Squared Error
4.9 Operation of the LMS Algorithm in a Nonstationary
Environment
4.10 Notes
References

Chapter 5. Adaptive Tapped-Delay-Line Filters Using Least Squares

apll The Deterministic Normal Equations


2 Properties of the Least-Squares Estimate
alo, The Matrix-Inversion Lemma
5.4 The Recursive Least-Squares (RLS) Algorithm
Js Update Recursion for the Residual Sum of Squares
5.6 Speed of Convergence
Dal Comparison of the RLS and LMS Algorithms
5.8 Operation in a Nonstationary Environment
Dw Notes
References
TABLE OF CONTENTS ix

Chapter 6. Adaptive Lattice Filters 162


6.1 The Forward-Backward Lattice Method 162
6.2 The Burg Method 168
6.3 Discussion akg
6.4 Block Implementation of the Burg Method 172
6.5 Adaptive Implementation of the Burg Method 174
6.6 Convergence Properties 180
6.7 Joint Process Estimation 180
6.8 Notes 189
References 201

Appendix 1. Eigenvalues and Eigenvectors 205


Al.1 Definitions of Eigenvalues and Eigenvectors 205
Al.2 Properties of Eigenvalues and Eigenvectors 206
References IA

Appendix 2. Convolution 212


References Paes

Index : 215
- wert ‘Fa 2 7 > | 7
7 : ; ¥ ibe
: 7 yaad "2
» "ex 23? em.) "pede oe Pot S ;
7 ) » “ev bied * st aah

hie 4 a —_ ey : AA, 7

o ty py) Se fare rigs b gill

oon =<. wena

7 Pop-irs »

unc aC —

a » 7 _ =

€ _ 7 a 5

- mn eee:

id »7 a & oa

= -
p> Tal? —
ow Se pai -
> oe

fl ee |
e@

=
— -

i.

7 ®
go @

>=
a a

-

S ? ae _

7 —- rl

7
|
iF
: 7 7

; 7
.

- — 1°

7 :

Ae. ond)
PREFACE

The subject of adaptive filters has established itself as an important part of


statistical signal processing, with applications in various fields (particularly,
communications and control). The aim of this book is to serve as an
introduction to the theory of adaptive tapped-delay-line and lattice filters. It
is written at a mathematical level that should make it readable by an
electrical engineer with a bachelor’s degree or higher. The emphasis
throughout the book is on discrete-time-linear filters.
The book consists of six chapters and two appendices. In Chapter 1 we
briefly discuss the idea of adaptivity, followed by five examples that
illustrate the practical benefits of adaptive filters. In Chapter 2 we discuss
the idea of Wiener filtering and develop the normal equations for an
optimum tapped-delay-line filter operating in a stationary environment. In
Chapter 3 we apply this theory to study the /inear prediction problem and
derive the /attice predictor. In Chapter 4 we first describe the method of
steepest descent and then use the results thus obtained to derive the widely
used /east mean square (LMS) algorithm for the operation of an adaptive
tapped-delay-line filter. In Chapter 5 we derive the recursive least squares
(RLS) algorithm and compare it with the LMS algorithm. In Chapter 6 we
discuss procedures for the block and adaptive implementation of a lattice
predictor and the use of this structure in joint-process estimation. In the two
appendices we briefly review the eigenvalue problem and convolution.
Each chapter ends with a set of notes that are intended to help the
reader delve more deeply into the literature of adaptive filters, their theory,
implementations, and applications. These notes are supported with exten-
sive lists of references on the subject.

xi
xii INTRODUCTION TO ADAPTIVE FILTERS

I have given credit to the originators of ideas (both theoretical and


practical) pertaining to adaptive filters and related issues. I invite readers of
the book to let me have their inputs if I have failed in any part of the book
to give proper credit where it is due.

SIMON HAYKIN
Hamilton, Ontario, Canada
CHAPTER

ONE
INTRODUCTION

1.1 FILTERS s

The term “filter” is often used to describe a device in the form of a piece of
physical hardware or computer software that is applied to a set of noisy
data in order to extract information about a prescribed quantity of interest.
The noise may arise from a variety of sources. For example, the data may
have been derived by means of noisy sensors, or may represent a useful
signal component that has been corrupted by transmission through a
communication channel. In any event, we may use a filter to perform three
basic information-processing operations:

Le Filtering, which means the extraction of information about a quantity of


interest at time ¢ by using data measured up to time f.
2: Smoothing, which differs from filtering in that information about the
guantity of interest need not be available at time 7, and data measured
later than time rf can be used in obtaining this information. This means
that in the case of smoothing there is a delay in producing the result of
interest. Since in the smoothing process we are able to use data obtained
not only up to time ¢, but also data obtained after time t, we would
expect it to be more accurate in some sense than the filtering process.
Prediction, which is the forecasting side of information processing. The
aim here is to derive information about what the quantity of interest will
be like at some time ¢ + 7 in the future for some 7 > 0 by using data
measured up to time f.
2 INTRODUCTION TO ADAPTIVE FILTERS

In this book, we will be principally concerned with filtering and prediction


operations performed by a discrete-time linear filter.
In the statistical approach to the optimum design of filters, we assume
the availability of certain statistical characteristics of the useful signal and
unwanted additive noise, and the problem is to design a filter with the noisy
data as input so as to reduce the effects of noise as much as possible. Wiener
and Kolmogorov were the first, in the 1940s, to provide a solution to this
filter optimization problem for the case of stationary processes for which the
statistical properties of the pertinent signal and noise processes do not
change with time. The resulting solution, based on a minimum-mean-squared
error criterion, is commonly known as the Wiener filter.
The theory developed by Wiener and Kolmogorov, however, is inade-
quate for dealing with situations in which nonstationarity of the signal
and/or noise is intrinsic to the problem. To overcome this limitation of the
Wiener-Kolmogoroy theory, Kalman developed in the 1960s a new filtering
theory that is applicable to nonstationary processes. The resulting solution
is in the form of a linear time-variable filter, commonly known as the
Kalman filter. The theory of Kalman filters is closely related to the classical
method of least squares, which dates back to Gauss in the 1800s.
In this book we will consider both Wiener filters and least-squares
filters.

1.2 ADAPTIVITY

The design of a Wiener filter requires a priori information about the


statistics of the data to be processed. This filter is optimum only when the
statistical characteristics of the input data match the a priori information on
which the design of the filter is based. When this information is not known
completely, however, the filter is no longer optimum. In such situations, we
may use an adaptive filter, which has come to play an increasingly im-
portant role in recent years in the fields of communications, control,
seismology, etc. By an adaptive filter we mean a device that is se/f-designing
in the sense that it contains a set of adjustable parameters, and that values
are automatically assigned to these parameters based upon estimated statis-
tical characteristics of the relevant signals. Thus the theory of adaptive
filters is closely related to the design of optimum filters. In the design
problem the requirement is to find the optimum set of filter parameters from
a knowledge of the relevant signal characteristics according to some crite-
rion. On the other hand, in the adaptive filtering problem the requirement is
to find an algorithm for adjusting the filter parameters in a situation where
complete knowledge of the relevant signal characteristics is not available, so
that the performance of the adaptive filter converges to that of the optimum
filter after a sufficiently large number of iterations of the algorithm.
INTRODUCTION 3

The adaptive filter may be one of two possible kinds:


1. The adaptive filter has an open-loop or noniterative configuration, consist-
ing of a two-stage process whereby it first “learns” the statistics of the
relevant signals and then plugs the results so obtained into a nonrecursive
algorithm for computing the required filter parameters. For real-time
operation, this configuration usually has the disadvantage of requiring
excessively elaborate and costly hardware.
2. The adaptive filter has a closed-loop or iterative configuration, where the
pertinent statistics are not estimated explicitly; rather, the design of the
adaptive filter is accomplished in a single process by means of a recursive
algorithm that automatically updates the filter parameters with the arrival
of each new data sample. In each iteration of the algorithm, the adaptive
filter learns a little more about the statistics of the relevant signals, and
an improvement to the current set of values of the adjustable filter
parameters is computed using this new information. The adjustable filter
parameters are incremented, and the next learning iteration is based on
the operation of the filter with the improved set of values. In the
closed-loop configuration, the learning and computing operations are
thus combined into a single process, and as such, it is likely to have a
simpler implementation than the open-loop configuration.
In this book we will only consider the second type of adaptive filters.

1.3 CLASSIFICATIONS OF SAMPLED-DATA


AND DIGITAL FILTERS

An adaptive filter may be implemented in continuous-time or discrete-time


form, in which case the input-output relations of the filter may be described
by differential equations or difference equations, respectively. In this book,
we will be concerned exclusively with the discrete-time form of adaptive
filters. Consider an analog signal &(¢) which varies continuously with time f.
For example, u(t) may represent the voltage from the voice signal of a
telephone conversation, the output of a communication channel produced
by digital computer data applied to the channel input, or the video output
of a radar system. For technical reasons, we often find it more convenient to
represent the analog signal i(1) by a sequence of uniformly spaced samples,
{iu(nT )}, where T is the sample period. The famous sampling theorem
defines the conditions which have to be satisfied in order to justify this
sampling process. In particular, provided that the sampling rate 1/7 is
equal to or greater than twice the highest frequency component of the
analog signal u(t), there is a one-to-one correspondence between the
original analog signal a(t) and the sequence {i(nT)}, such that u(1) may
be recovered from {u(nT )}.
4 INTRODUCTION TO ADAPTIVE FILTERS

A filter operating on the sequence of samples {u#(n7T)} to produce


another sequence of samples {(nT)} is called a sampled-data filter.
When the samples of the sequence {i#(nT)} are quantized, it becomes
possible to represent their amplitudes by means of a binary code word of
finite length. A ‘filter operating on a binary sequence of 1’s and 0’s to
produce another binary sequence is called a digital filter. Thus roundoff
errors are introduced when a filter is implemented in digital form.
For convenience of analysis, let u(n) = u(nT) and y(n) = j(nT).
Accordingly, we may use (u(n)} to denote the sequence of samples obtained
by sampling the signal u(r). Let the sequence { y(n)} be computed by the
formula
M N
y(n)= bo a(k)u(n—k) + D) b(k)
y(n - k) (1.1)
k=0 kel
where the coefficients a(k) and b(k) are constants. The difference equation
(1.1) defines a sampled-data or digital filter. We thus see that the output
y(n) of such a filter, in its most general form, is merely a linear combination
of the past and present input samples u(n), u(n — 1),..., u(n — M), plus a
linear combination of the past output samples y(n — 1), y(n — 2),...,
y(n — N). We may distinguish two filter types, depending on the values of
the coefficients in Eq. (1.1), as described below:
1. The coefficients b(k) are all zero, so that the output y(n) depends only
on past and present sample values of the input, as shown by
M
y(n) = ¥& a(k)u(n —k) (1e2)
k=0
The corresponding filter structure is as shown in Fig. 1.1(a), where z~!
denotes the unit-delay operator. We refer to this structure as a finite-
impulse response (FIR) filter. It is also referred to as a tapped-delay-line
filter or transveral filter. An important property of the filter structure of
Fig. 1.1(a) is that it is inherently stable.
2. One or more of the coefficients b(k) are nonzero, with the result that
past sample values of the output influence the present sample value y(n)
of the output. We refer to this structure as an infinite-impulse response
(IIR) filter. There are several ways in which this structure may be
realized in practice. Figure 1.1(b) shows one such realization for the case
when N = M. Unlike an FIR filter, an IIR may be unstable, depending
on the values assumed by the feedback coefficients b(k).

In the case of adaptive filters using an IIR structure, two major difficulties
may arise due to the feedback paths: (1) the filter may become unstable,
unless special precautions are taken, and (2) the presence of feedback may
Figure 1.1 (a) FIR filter, (b) IIR filter.
6 INTRODUCTION TO ADAPTIVE FILTERS

have an adverse effect on the accuracy with which the filter coefficients have
to be specified. It is for these reasons that in practical applications requiring
the use of adaptive filters, we find that adaptive FIR filters are used almost
exclusively. In this book, we will concentrate on adaptive filters using FIR
structures.
Another filter structure that we will consider is the multistage lattice
filter, so called because each stage of the filter has a latticelike form. This
filter has some interesting properties, which make it an attractive alternative
structure to a tapped-delay-line structure for adaptive filter applications.

1.4 EXAMPLES OF ADAPTIVITY

Before proceeding with the development of the theory of adaptive filters, it


is instructive to consider some examples that illustrate different applications
of adaptive filters. In this section we will consider five such applications.

Example 1 Modeling of an Unknown Dynamic System


Suppose we have an unknown dynamic system, with a set of discrete-time
measurements defining variation of the output signal of the system in
response to a known stationary signal applied to the system input. We
assume that the system is time-invariant and linear. The requirement is to
develop a model for this system in the form of a tapped-delay-line filter
consisting of a set of delay-line elements (each one of which is represented
by the unit-delay operator z~') and a corresponding set of adjustable
coefficients, which are interconnected in the manner shown in Fig. 1.2. At
time n the available signal consists of a set of samples u(n),u(n —
1),...,u(n — M + 1). These samples are multiplied by a corresponding set
of adjustable coefficients, namely, A(1),A(2),...,4(M), to produce an
output signal denoted by y(7). Let the actual output of the unknown system
be denoted by d(n). The adaptive filter output y(n) is compared with the
unknown system output d(n) to produce an error signal e(n), defined as the
difference between them.
We may now state the operation of the adaptive filter as follows. With
the unknown system output d(n) regarded as the desired response, the
requirement is to develop an adaptive procedure for adjusting the coeffi-
cients of the tapped-delay-line filter so as the minimize the error signal e(n)
in some sense. A criterion that is often used in practice for this minimization
is the mean-square-error criterion, according to which the adjustable filter
coefficients are chosen so as to minimize the mean-square value of the error
signal e(n). For the case when the input and output signals of the unknown
system are stationary (which, in effect, assumes that the system is time-
invariant) it turns out that the error signal is also stationary and its
INTRODUCTION 7

u(n) ney | u(n — 2) ape ee?


one Zz

Pale ZT
| 43) oO ite ny) h( M)

Adaptive eu)
algorithm

u(n) Unknown
dynamic
system
s

Figure 1.2 Modelling of an unknown dynamic system by a tapped-delay-line filter.

mean-square value is precisely a second-order function of the filter coeffi-


cients. The dependence of the mean squared error on the unknown filter
coefficients is thus in the form of a multidimensional paraboloid (punch
bowl) with a uniquely defined bottom or minimum point. The adaptive filter
has the task of continually seeking the bottom of this surface, so as to
approach the optimum performance possible.
When the unknown dynamic system is time-varying, the resulting
system output (and therefore the desired response presented to the adaptive
tapped-delay-line filter) becomes nonstationary. Correspondingly, the orien-
tation of the error performance surface varies with time. In this case, the
adaptive algorithm used to adjust the coefficients of the tapped-delay-line
filter has the added task of continually tracking the bottom of the error
performance.

Example 2 Adaptive Equalization for Data Transmission


During the last twenty years a considerable effort has been devoted to
the study of data transmission systems that utilize the available channel
bandwidth efficiently. The objective here is to design the system so as to
accommodate the highest possible rate of data transmission, subject to a
specified reliability that is usually measured in terms of the error rate or
average probability of symbol error. The transmission of digital data through
8 INTRODUCTION TO ADAPTIVE FILTERS

a linear communication channel is limited by two factors:


1. Intersymbol interference (ISI): This is caused by the dispersion of the
transmitted pulse shape, which, in turn, results from deviations of the
channel frequency response from the ideal characteristics of constant
amplitude and linear phase (i.e., constant delay).
2. Additive thermal noise: This is generated at the front end of the receiver.
For bandwidth-limited channels (e.g., voice telephone channels) we usually
find that intersymbol interference is the chief determining factor in the
design of high-data-rate transmission systems.
Figure 1.3 shows the equivalent baseband model of a data communica-
tion system. The form of the data sequence { a(n)} depends not only on the
binary data to be transmitted but also on the type of modulation used. For
example, in the case of M-ary phase-shift keying, a(n) takes any one of
the M values exp[ j27(m — 1)/M], where m = 1,2,..., M. Ordinarily, M
equals an integer power of two. In the absence of noise, the output of the
receiving filter in Fig. 1.3 equals

u(t) = Lia, p(t — nT) (1.3)


n

where T is the duration of the signalling interval, and p(t) is the impulse
response of the cascade connection of the transmitting filter, the channel,
and the receiving filter. By sampling u(¢) synchronously with the trans-
mitter, and defining u(n) = u(nT) and p(n) = p(nT), we get

u(k) = dia, p(k — 1)


n

Il a,p(0)+ > a, p(k—-n) (1.4)


nek

The first term on the right-hand side of Eq. (1.4) defines the desired symbol,
whereas the remaining series represents the intersymbol interference caused
by the combined action of the transmitting filter, the channel, and the
receiving filter. This intersymbol interference, if left unchecked, can result in
erroneous decisions when the sampled signal at the receiving filter output is
compared with some preassigned threshold by means of a decision device.
To overcome the intersymbol interference problem, control of the time
function p(t) is required. In principle, if the characteristics of the channel
are known precisely, then it is virtually always possible to design a pair of
transmitting and receiving filters that will make the effect of intersymbol
interference (at sampling times) arbitrarily small, and at the same time limit
the effect of the additive receiver noise by minimizing the average probabil-
ity of symbol error. In practice, however, we find that a channel is random
in the sense that it is one of an ensemble of possible channels. Accordingly,
the use of a fixed pair of transmitting and receiving filters, designed on the
induy
ynding
AIBUIG
Areurq
eiep S as[nd |, iutsued
un3 WATOIOY
ee) 3 aandepy ISI99q
UOTSIOa erep
JO\e19Ua3 Joyy jouueys Jayy
= Jozipenba dOIAap
Jojdures

> —— /}-—nmuisuresy ION ei ae

andy
ET YOolg WRIZeIP
JO & Puegaseg
BVP UOISSIUSUI]
“WOISAS
10 INTRODUCTION TO ADAPTIVE FILTERS

basis of average channel characteristics, may not adequately reduce inter-


symbol interference. This suggests the need for an adaptive equalizer that
provides precise control over the time response of the channel, thereby
realizing the full transmission capability of the channel. “Equalizer” is a
term that is used to describe a filter used on telephone channels to flatten
the amplitude and delay characteristics of the channel.
Among the basic philosophies for equalization of data transmission
systems are pre-equalization at the transmitter and post-equalization at the
receiver. Since the former technique requires the use of a feedback path, we
will only consider equalization at the receiver, where the adaptive equalizer
is placed after the receiving filter, as in Fig. 1.3.
Figure 1.4 shows the block diagram of an adaptive equalizer, the
operation of which involves a training mode followed by a tracking mode.
During the training mode, a known test signal is transmitted to probe
the channel. A widely used test signal consists of a maximal-length shift
register or pseudo-noise (PN) sequence with a broad, even spectrum. The
test signal must obviously be at least as long as the equalizer in order to
make sure that the transmitted signal spectrum is adequately dense in the
bandwidth of the channel to be equalized. By generating a synchronized
version of the test signal in the receiver, the adaptive equalizer is supplied
with a desired response. The equalizer output is subtracted from this desired
response to produce an error signal, which is in turn used to adaptively
adjust the coefficients of the equalizer to their optimum values. The most
popular class of adaptive algorithms used for this adjustment involves
updating of each coefficient of the equalizer during each symbol period,
starting from prescribed initial values.
When the initial training period is completed, the coefficients of the
adaptive equalizer may be continually adjusted in a decision-directed mode.
In this mode the error signal is derived from the final (not necessarily
correct) receiver estimate of the transmitted sequence. The receiver estimate
is obtained by applying the adaptive equalizer output to a decision device,
as shown in Fig. 1.4. In normal operation, the receiver decisions are correct

Decision
device
Switch Test
Input Adaptive
signal
signal equalizer
generator

Figure 1.4 Adaptive equalizer with a decision-directed mode of operation.


INTRODUCTION 11

with a high probability, so that the estimate of the error signal is correct
often enough to allow the adaptive equalizer to maintain proper adjustment
of its coefficients. Another attractive feature of a decision-directed adaptive
equalizer is the fact that it can track slow variations in the channel
characteristics or perturbations in the receiver front end, such as slow jitter
in the sampler phase.

Example 3 Digital Representation of Speech


During the last twenty years there has been ever-increasing use of
digital methods for the efficient encoding and transmission of speech. Two
major factors are responsible for this trend. First, digital methods make it
possible to control the degrading effects of noise and interference picked up
during the course of transmission, thereby significantly improving the
reliability of the system. Second, the revolution in digital technology
[culminating in very-large-scale integration (VLSI)] has continually reduced
both the cost and size of the hardware.
The coders used for the digital representation of speech signals fall into
two broad classes: source coders and waveform coders. Source coders are
model-dependent in that they use a priori knowledge about how the speech
signal is generated at the source. Source coders for speech are generally
referred to as vocoders (a contraction for voice coders). They can operate at
4.8 kbit/s or below; however, they provide a synthetic quality, with the
speech signal having lost substantial naturalness. Waveform coders, on the
other hand, essentially strive for facsimile reproduction of the speech
waveform. In principle, these coders are signal-independent. They may be
designed to provide telephone-toll quality for speech at coding rates as low
as 16 kbit/s.
Model of the Speech Production Process
Figure 1.5 shows a simplified block diagram of the classical model for the
speech production process. It assumes that the sound-generating mechanism
(i.e., the source of excitation) is linearly separable from the intelligence-
modulating, vocal-tract filter. The precise form of the excitation depends on
whether the speech sound is voiced or unvoiced, as described below:
1. A voiced speech sound (such as* /i/ in eve) is generated from quasiperi-
odic vocal-cord sound. In the model of Fig. 1.5, the impulse-train
generator produces a sequence of impulses (i.e., very short pulses) which
are spaced by a fundamental period equal to the pitch period. This
signal, in turn, excites a linear filter whose impulse response equals the
vocal-cord sound pulse.
2. An unvoiced speech sound (such as /f/ in fish) is generated from random
sound produced by turbulent air flow. In this case the excitation consists

PTD ARE symbol /-/ is used to denote the phoneme, a basic linguistic unit.
12 INTRODUCTION TO ADAPTIVE FILTERS

simply of a white (i.e., broad-spectrum) noise source. The probability


distribution of the noise samples does not appear to be critical.

The frequency response of the vocal-tract filter for unvoiced speech or that
of the vocal tract multiplied by the spectrum of the vocal-cord sound pulses
determines the short-time spectral envelope of the speech signal.
Linear Predictive Coding
The method of /inear predictive coding (LPC) is an example of source
coding. This method is important because it provides not only a powerful
technique for the digital transmission of speech at low bit rates but also
accurate estimates of basic speech parameters.
The development of LPC relies on the model of Fig. 1.5 for the speech
production process. The frequency response of the vocal tract for unvoiced
speech or that of the vocal tract multiplied by the spectrum of the vocal-cord
sound pulse for voiced speech is described by the transfer function

Hae M o (1.5)
1+ >) a(k)z&
wey
where G is a gain parameter and z ' is the unit-delay operator. The form of
excitation applied to this filter is changed by switching between voiced and

Pitch
period

Impulse
train
generator

Vocal-cord
sound
pulse
Vocal-
Voiced /unvoiced switch Synthesized
tract
speech
filter
White-
noise
generator

Vocal-tract
parameters

Figure 1.5 Block diagram of simplified model for the speech production process.
INTRODUCTION 13

unvoiced sounds. Thus the filter with transfer function H(z) is excited by a
sequence of impulses to generate voiced sounds or a white-noise sequence to
generate unvoiced sounds.
In linear predictive coding, as the name implies, linear prediction is used
to estimate the speech parameters. Given a set of past samples of a speech
signal, namely, u(n — 1),u(n — 2),...,u(n — M), a linear prediction of
u(n), the present sample value of the signal, is defined by

M
i(n|n—1,...,.n-M)=Y h(k)u(n—k) (1.6)
k=1
The predictor coefficients, h(1), h(2),...,h(M), are optimized by minimiz-
ing the mean square value of the prediction error, e(n), defined as the
difference between u(n) and i(n|n — 1,...,n — M). The use of the mini-
mum-mean-squared-error criterion for optimizing the predictor may be
justified for two basic reasons:

1. If the speech signal satisfies the model described by Eq. (1.5), and if the
mean square value of the error signal e(7) is minimized, then we find
that e() equals the excitation x(n) multiplied by the gain parameter G
in the model of Fig. 1.5, and a(k) = —h(k), k = 1,2,..., M.* Thus the
error signal e(n) consists of a train of impulses in the case of voiced
sounds or a white noise sequence in the case of unvoiced sounds. In
either case, the error signal e(n) would be small most of the time.
2. The use of the minimum-mean-squared-error criterion leads to tractable
mathematics.

Figure 1.6 shows the block diagram of an LPC vocoder. It consists of a


transmitter and a receiver. The transmitter first applies a window (typically
10-30 ms long) to the input speech signal, thereby identifying a block of
speech samples for processing. This window is short enough for the vocal-
tract shape to be nearly stationary, so that the parameters of the speech-pro-
duction model in Fig. 1.5 may be treated as essentially constant for the
duration of the window. The transmitter then analyzes the input speech
signal in an adaptive manner, block by block, by performing a linear
prediction and pitch detection. Finally, it codes the parameters: (1) the set
of predictor coefficients, (2) the pitch period, (3) the gain parameter, and (4)
the voiced/unvoiced parameter, for transmission over the channel. The
receiver performs the inverse operations, by first decoding the incoming
parameters. In particular, it computes the values of the predictor coeffi-
cients, the pitch period, and the gain parameter, and determines whether
the segment of interest represents voiced or unvoiced sound. Finally, the

elise relationship between the set of predictor coefficients, {h(k)}, and the set of all-pole
filter coefficients, {a(k)}, is derived in Chapter 3.
14 INTRODUCTION TO ADAPTIVE FILTERS

Speech EPC To channel


signal analyzer input

Pitch
detector

From Reproduction
Speech
channel Decoder of speech
synthesizer
output signal

(b)
Figure 1.6 Block diagram of LPC vocoder: (a) transmitter, (b) receiver.

receiver uses these parameters to synthesize the speech signal by utilizing the
model of Fig. 1.5.
Waveform Coding
In waveform coding the operations performed on the speech signal are
designed to preserve the shape of the signal. Specifically, the operations
include sampling (time discretization) and quantization (amplitude discreti-
zation). The rationale for sampling follows from a basic property of all
speech signals, namely, they are bandlimited. This means that a speech
signal can be sampled in time at a finite rate in accordance with the
sampling theorem. For example, commercial telephone networks designed
to transmit speech signals occupy a bandwidth from 200 to 3200 Hz. To
satisfy the sampling theorem, a conservative sampling rate of 8 kHz is
commonly used in practice. Quantization is justified on the following
grounds. Although a speech signal has a continuous range of amplitudes
(and therefore its samples also have a continuous amplitude range), never-
theless, it is not necessary to transmit the exact amplitudes of the samples.
Basically, the human ear (as ultimate receiver) can only detect finite
amplitude differences.
Examples of waveform coding include pulse-code modulation (PCM)
and differential pulse-code modulation (DPCM). In PCM, as used in tele-
phony, the speech signal (after low-pass filtering) is sampled at the rate of
8 kHz, nonlinearly quantized, and then coded into 8-bit words, as in Fig.
1.7(a). The result is a good signal-to-quantization-noise ratio over a wide
dynamic range of input signal levels. DPCM involves the use of a predictor
as in Fig. 1.7(b). The predictor is designed to exploit the correlation that
exists between adjacent samples of the speech signal, in order to realize a
Sampled
speech Nonuniform PCM
input quantizer wave

(a)

Sampled
speech & Quantizer Bees

(b)

Adaptive
algorithm

Sampled
speech eae
av

eed
algorithm

(c)

Figure 1.7 Waveform coders: (a) PCM, (b) DPCM, (c) ADPCM.

15
16 INTRODUCTION TO ADAPTIVE FILTERS

reduction in the number of bits required for the transmission of each sample
of the speech signal and yet maintain a prescribed quality of performance.
This is achieved by quantizing and then coding the prediction error that
results from the subtraction of the predictor output from the input signal. If
the prediction is optimized, the variance of the prediction error will be
significantly smaller than that of the input signal, so that a quantizer with a
given number of levels can be adjusted to produce a quantizing error with a
smaller variance than would be possible if the input signal were quantized
directly as in a standard PCM system. Equivalently, for a quantizing error
of prescribed variance, DPCM requires a smaller number of quantizing
levels (and therefore a smaller bit rate) than PCM.
Differential pulse-code modulation uses a fixed quantizer and a fixed
predictor. A further reduction in the transmission rate can be achieved by
using an adaptive quantizer and an adaptive predictor, as in Fig. 1.7(c). This
type of waveform coding is called adaptive differential pulse-code modulation
(ADPCM). An adaptive predictor is used in order to account for the
nonstationary nature of speech signals.

Example 4 Echo Cancellation


In telephone connections that involve the use of both four-wire and
two-wire transmissions, an echo is generated at the hybrid that connects a
four- to a two-wire transmission. When the telephone call is made over a
long distance (e.g., using geostationary satellites), an echo represents an
impairment that can be as annoying subjectively as the more obvious
impairments of low volume and noise. Figure 1.8 shows a satellite circuit
with no echo protection. The hybrids at both ends of the circuit convert the
two-wire transmissions used on customer loops and metallic trunks to the
four-wire transmission needed for carrier circuits. Due to the high altitude
of the satellite, a delay of 270 ms occurs in each four-wire path. Ideally,

ere Hybrid Speaker

Figure 1.8 Satellite circuit with no echo protection.


INTRODUCTION 17

when person A on the left speaks, his speech should follow the upper
transmission path to the hybrid on the right and from there be directed to
the two-wire circuit. In practice, however, not all the speech energy is
directed to this two-wire circuit, with the result that some is returned along
the lower four-wire path to be heard by the person on the left as an echo
that is delayed by 540 ms.
To overcome this problem, echo cancellers are installed in the network
in pairs, as illustrated in Fig. 1.9(a). The cancellation is achieved by making
an estimate of the echo and subtracting it from the return signal. The
underlying assumption here is that the echo return path, from the point
where the canceller bridges to the point where the echo estimate is sub-
tracted, is linear and time-invariant.
Thus, referring to the single canceller in Fig. 1.9(b) for definitions, the
return signal at time n may be expressed as

see k=0SA =n

Speaker Echo
Echo
A canceller
canceller

Speaker
Echo B
canceller

Figure 1.9 (a) Satellite circuit with a pair of echo suppressors. (b) Signal definitions.
18 INTRODUCTION TO ADAPTIVE FILTERS

where u(n), u(n — 1),..., are samples of the far-end speech (from speaker
A), v(n) is the near-end speech (from speaker B) plus any additive noise at
time n, and {h(k)} is the impulse response of the echo path. The echo
canceller makes an estimate {hi k)} of the impulse response of the echo
path, and then estimates the echo as the convolution sum

which can be realized by means of a tapped-delay-line filter with coefficients


h(0), h(1),..., A(M). An error signal e(n) is formed by subtracting the
estimate (1) from the return signal y(n), as shown by

e(n) = y(n) — $(n)


The error signal e(n) is, in turn, used to adaptively control the canceller
coefficients h(0), h(1),...,4(M), so that after a small number of iterations
the effect of the echo is minimized in some sense.

Example 5 Adaptive Line Enhancer


For our last example on adaptivity, we consider the adaptive line
enhancer (ALE). This device can be used to detect a low-level sine wave
embedded in a background of additive noise with a broad-band spectrum.
As illustrated in Fig. 1.10, the ALE consists of a delay element and a
linear predictor. The predictor output y(n) is subtracted from the input
signal u(n) to produce the error signal e(n). This error signal is, in turn,
used to adaptively control the coefficients of the predictor. The predictor
input equals u(n — A), the original input signal u(n) delayed by A seconds,

Adaptive
algorithm

Input = u(n)
signal

Figure 1.10 Adaptive line enhancer.


INTRODUCTION 19

where A is equal to or greater than the sample period. The main function of
the delay parameter A is to remove correlation that may exist between the
noise component in the original input signal u(n) and the noise component
in the delayed predictor input u(n — A). For this reason, the delay parame-
ter A is called the decorrelation parameter of the ALE. An ALE may thus be
viewed as an adaptive filter that is designed to suppress broad-band
components (e.g., white noise) contained in the input while at the same time
passing narrow-band components (e.g., sine waves) with little attenuation.
In other words, it can be used to enhance the presence of sine waves (whose
spectrum consists of harmonic J/ines) in an adaptive manner—hence the
name.

1.55 WHAT DO THESE EXAMPLES OF ADAPTIVITY


HAVE IN COMMON?

We may answer this question by summarizing the common features of the


last five examples as follows: :
1. They represent different applications of the same basic adaptive filter
theory.
2. In each example, the adaptive operation of the filter relies on the
availability of a desired response. The form of this desired response is
identified for the different applications in Table 1.1.
3. In each application, an error signal is formed by subtracting the filter
output from the desired response. The error signal is used to adaptively
control the filter coefficients. This adjustment is carried out in an iterative
manner, sample by sample, starting from a prescribed set of initial values.
The LPC vocoder, as described in Example 3, differs from the other four
examples in that the speech samples are processed in blocks. A limitation
of block processing is that usually it requires a large amount of computa-

Table 1.1
Application Desired response

System identification Actual output of the unknown system


Adaptive equalization Replica of the transmitted test signal, regen-
erated in the receiver

Adaptive prediction (as used in the digital Present value of the input signal
representation of speech signals, and adaptive
line enhancement)

Echo cancellation The return signal produced by the echo


20 INTRODUCTION TO ADAPTIVE FILTERS

tion and a large amount of storage. This practical limitation is overcome


by using an adaptive implementation.

1.6 NOTES

The origins of optimum filter theory go back to the pioneering work of


Wiener [1] and Kolmogorov [2] in the 1940s; they formulated the mathe-
matics dealing with the statistical design of optimum linear filters with a
fixed structure. Bode and Shannon [3] reformulated the Wiener filter theory
in a language that was more easily understood by engineers of that time.
The Wiener theory of optimum filters was later extended and enhanced by
Kalman [4] and by Kalman and Bucy [5]. A detailed historical account of
the theory of optimum (linear) filters is given by Kailath [6].
The earliest work on adaptive filters may be traced back to the late
1950s, during which time a number of researchers were working indepen-
dently on different applications of adaptive filters. Widrow and Hoff [7, 8]
devised the least-mean-square (LMS) adaptive algorithm in their study of
adaptive switching circuits. Howells [9] designed and built an antenna
sidelobe canceller (a form of adaptive spatial filter) for suppressing an
interference impinging on the antenna from an unknown direction. Gabor
et al. [10] used nonlinearity to develop a filter with a learning capability.
Glaser [11] described a system capable of adapting and optimizing its
response to a certain class of pulse signals.
From early 1960 and onward, work on adaptive filter theory and its
applications intensified. Lucky [12-14] used adaptive filter theory to design
and build an adaptive equalizer to combat the effects of intersymbol
interference, a device that made the efficient transmission of digital data
over telephone channels at relatively high bit rates a practical reality.
Di Toro [15, 16] used adaptive equalization to mitigate the impairments
from dispersion, multipath reception, group-delay distortion, etc., resulting
from the transmission of digital data over high-frequency (HF) links. A
survey of the literature on communication theory (dealing with digital
transmission through linear dispersive channels, including adaptive equali-
zation) is presented by Lucky [17]. Proakis [18] presents a tutorial review of
various adaptive equalization techniques used in digital data transmission.
Another tutorial review of the subject, somewhat restricted in scope, is given
by Qureshi [19].
The first application of linear prediction to speech was made by Saito
and Itakura [20], and the first results on the predictive coding of speech
were published by Atal and Schroeder [21-24]. Gold [25] presents a tutorial
review of different waveform-coding and source-coding techniques for the
digital representation of speech, with an eye toward an all-digital communi-
INTRODUCTION 21

cation network. Flanagan et al. [26] present a most detailed review of the
subject, emphasizing the many practical issues involved in speech-coder
design. Gibson [27] focuses on the analysis and design of adaptive predic-
tors for the differential encoding of speech. The books by Flanagan [28],
Markel and Gray [29], and Rabiner and Schafer [30] are devoted to the
various issues involved in the analysis and synthesis of speech, and its
coding.
The initial work on the use of adaptivity for echo cancellation started
around 1965. It appears that Kelly was the first to propose the use of an
adaptive filter (with the speech signal itself utilized in performing the
adaptation) for echo cancellation. Kelly’s contribution is recognized in the
paper by Sondhi [31]. This invention and its refinement are described in
the patents by Kelly and Logan [32] and Sondhi [33]. The description given
in Example 4 on echo cancellation is based on the paper by Duttweiler and
Chen [34].
The adaptive line enhancer was originated by Widrow and his co-
workers. An early version of this device was built in 1965 to cancel 60-Hz
interference at the output of an electrocardiographic amplifier and recorder.
This work is described in the paper by Widrow et al. [35]. The adaptive line
enhancer and its application as an adaptive detector are patented by
McCool et al. (36, 37].
It should be noted that although the echo canceller and the adaptive
line enhancer are intended for different applications, nevertheless, they
represent special forms of the adaptive noise canceller [35].
Falconer [38] presents an overview of adaptive-filter theory and its
applications to adaptive equalization, adaptive prediction in speech coding,
and echo cancellation.
The theory of adaptive filters (operating on a time series) is closely
related to that of adaptive antennas (operating on blocks of spatial samples).
For material on adaptive antennas, the reader is referred to the book by
Monzingo and Miller [39] and the collection of papers edited by Haykin
[40].

REFERENCES

1. N. Wiener, “Extrapolation, Interpolation, and Smoothing of Stationary Time Series” (MIT


Press, 1949).
2. A. N. Kolmogorov, “Interpolation and Extrapolation of Stationary Random Sequences,”
Bull. de l’Academie des Sciences de U.S.S.R., Sér. Math. vol. 5, 1941, pp. 3-14. A
translation of this paper from the Russian has been published by the Rand corporation,
Santa Monica, Memorandum RM-3090-PR, April 1962.
22 INTRODUCTION TO ADAPTIVE FILTERS

3 H. W. Bode and C. E. Shannon, “A Simplified Derivation of Linear Least Square


Smoothing and Prediction Theory,” Proc. IRE, vol. 38, pp. 417-425, April 1950.
4. R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” J. Basic
Eng., Trans. ASME, Series D, vol. 82, pp. 35-45, 1960.
_ R.E. Kalman and R. J. Bucy, “ New Results in Linear Filtering and Prediction Theory,” J.
Basic Eng., Trans. ASME, Series D, vol. 83, pp. 95-108, 1961.
_ T. Kailath, “A View of Three Decades of Linear Filtering Theory,” IEEE Trans. Informa-
tion Theory, Vol. IT-20, pp. 146-181, March 1974.
_ B. Widrow and M. E. Hoff, Jr., “Adaptive Switching Circuits,” IRE WESCON Cony. Rec.,
Pt. 4, pp. 96-104, 1960.
. B. Widrow, “Adaptive Filters,” in book “Aspects of Network and System Theory,” edited
by R. E. Kalman and N. DeClaris (Holt, Rinehart, and Winston, 1970), pp. 565-587.
. P. Howells, “Intermediate Frequency Side-lobe Canceller,” U.S. Patent 3,202,990, August
24, 1965.
. D. Gabor, W. P. L. Wilby, and R. Woodcock, “A Universal Nonlinear Filter Predictor and
Simulator Which Optimizes Itself by a Learning Process,” Proc. IEE (London), vol. 108,
Part B, pp. 422-438, July 1960.
_ E. M. Glaser, “Signal Detection by Adaptive Filters,” IRE Trans. Information Theory, vol.
IT-7, pp. 87-98, April 1961.
_ R. W. Lucky, “Automatic Equalization for Digital Communication,” Bell System Tech. J.,
vol. 44, pp. 547-588, 1965.
_ R. W. Lucky, “Automatic Equalization for Digital Communication,” Bell System Tech. J.,
vol. 45, pp. 255-286, 1966.
. R.W. Lucky, J. Salz, and E. J. Weldon, “Principles of Date Communication” (McGraw-Hill,
1968).
. M. J. Di Toro, “A New Method of High Speed Adaptive Signal Communication Through
Any Time-Variable and Dispersive Transmission Medium,” lst IEEE Ann. Communica-
tion Conference, 1965, pp. 763-767.
. M. J. Di Toro, “Communication in Time-Frequency Spread Media Using Adaptive
Equalization,” Proc. IEEE, vol. 56, pp. 1653-1679, Oct. 1968.
. R. W. Lucky, “A Survey of the Communication Theory Literature: 1968-1973,” IEEE
Trans. Information Theory, vol. IT-19, pp. 725-739, Nov. 1973.
. J. G. Proakis, “Advances in Equalization for Intersymbol Interference,’ Advances in
Communication Systems, vol. 4, pp. 123-198 (Academic Press, 1975.)
. S. Qureshi, “Adaptive Equalization,” IEEE Communications Magazine, vol. 20, pp. 9-17,
March 1982.
. S. Saito and F. Itakura, “The Theoretical Consideration of Statistically Optimum Methods
for Speech Spectral Density,” Report no. 3107, Electrical Communication Laboratory,
N.T.T., Tokyo (1966) (in Japanese).
. B.S. Atal and M. R. Schroeder, “Predictive Coding of Speech Signals,” Proc. 1967 Conf,
Commun. and Processes, 1967, pp. 360-361.
. B.S. Atal and M. R. Schroeder, “Predictive Coding of Speech Signals,” 1968 WESCON
Technical Papers, paper 8/2, 1968.
. B.S. Atal, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J.
Acoust. Soc. Am., vol. 47, p. 65, 1970,
. B.S. Atal and M. R. Schroeder, “Adaptive Predictive Coding for Speech Signals,” Bell
System Tech. J., vol. 49, pp. 1973-1986, 1970.
. B. Gold, “Digital Speech Networks,” Proc. IEEE, vol. 65, pp. 1636-1658, Dec. 1977.
. J. L. Flanagan, M. R. Schroeder, B. S. Atal, R. E. Ceochiere, N. S. Jayant, and J. M.
Tribolet, “Speech Coding,” IEEE Trans. Communications, vol. COM-27, pp. 710-737,
April 1979,
. J. D. Gibson, “Adaptive Prediction in Speech Differential Encoding Systems,” Proc, IEEE,
vol. 68, pp. 488-525, April 1980.
INTRODUCTION 23

J. L. Flanagan, “Speech Analysis, Synthesis and Perception,” second edition (Springer-


Verlag, 1972).
. J. D. Markel and A. H. Gray, Jr., “Linear Prediction of Speech” (Springer-Verlag, 1976).
. L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals” (Prentice-Hall,
1978).
M. M. Sondhi, “An Adaptive Echo Canceller,” Bell System Tech. J., vol. 46, pp. 497-511,
March 1967.
. J. L. Kelly, Jr. and R. F. Logan, Jr., “Self-Adaptive Echo Canceller,” U.S. Patent
3,500,000, March 10, 1970.
. M. M. Sondhi, “Closed Loop Adaptive Echo Canceller Using Generalized Filter Networks,”
U.S. Patent No. 3,499,999, March 10, 1970.
. D. L. Duttweiler and Y. S. Chen, “A Single VLSI Echo Canceller,” Bell System Tech. J.,
vol. 59, pp. 149-160,
Feb. 1980
. B. Widrow, J. R. Glover, Jr., J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn, J. R.
Zeidler, E. Dong, Jr., and R. C. Goodlin, “Adaptive Noise Cancelling: Principles and
Applications,” Proc. IEEE, vol. 63, pp. 1692-1716, 1975.
. J. M. McCool, B. Widrow, J. R. Zeidler, R. Hearn, and D. Chabries, “Adaptive Line
Enhancer,” U.S. Patent 4,238,746, December 9, 1980.
. J. M. McCool, B. Widrow, R. Hearn, J. R. Zeidler, D. Chabries, and R. H. Moore, “An
Adaptive Detector,” U.S. Patent 4,243,935, January 6, 1981.
. D. D. Falconer, “Adaptive Filter Theory and Applications,” Lecture Notes in Control and
Information Sciences, edited by A. Bensoussan and J. L. Lions (Springer-Verlag, 1980).
. R.A. Monzingo and T. W. Miller, “Introduction to Adaptive Arrays,” (Wiley-Interscience,
1980).
. S. Haykin (editor), “Array Processing: Applications to Radar,” (Dowden, Hutchinson, and
Ross, 1980).
CHAPTER

TWO
WIENER FILTERS

Estimation theory deals with the intelligent use of information derived from
observations in order to make optimum decisions about physical parameters
of interest, with the decision being weighted by all available information. By
information we mean data of practical value in the decision-making process.
The subject of estimation theory is a vast one. However, our interest will be
limited to linear estimation performed by discrete-time devices whose im-
pulse response has a finite duration. In this chapter we will consider a
tapped-delay-line filter to perform the estimation, and use the classical
Wiener filter theory for the statistical characterization of the problem.

2.1 DISCRETE-TIME LINEAR ESTIMATION

A classic problem in communication theory is the estimation of a signal of


interest, which can be observed only in the presence of some additive noise.
In other words, the available information about the signal, denoted by 5(f),
is contained in the received signal:

u(t) = §(4) + W(t) C2eE)

where w(t) is the noise. We are interested in the use of discrete-time devices
to process the received signal (7). To accommodate this requirement, let
the signal u(t) be sampled uniformly at a rate equal to 1/T samples per
second, where T is the sample period. The result of this sampling process is

24
WIENER FILTERS 25

a sequence of samples, defined by

i(nT) = §(nT) + @(nT) (2.2)


where n takes on integer values. For convenience of notation, we write

u(n) = u(nT) (2.3)


s(n) = 5(nT) (2.4)
w(n) = (nT) (2.5)
Accordingly, we may rewrite Eq. (2.2) in the form

u(n) = s(n) + w(n) (2.6)


The observable u(n) is a random variable whose statistics are determined by
the signal s(n) and the noise w(n). The set of observables {u(n)}, corre-
sponding to different values of the time variable n, is a random time series.
Figure 2.1 depicts the particular estimation problem we wish to address.
The black box represents a discrete-time linear filter whose impulse re-
sponse, {h(n)}, has a finite duration. This filter applies a /inear transforma-
tion to the input { u(7)} whose duration matches that of the filter’s impulse
response. Correspondingly, the estimation is said to be /inear.
The filter is designed to produce at its output an estimate of some
desired response d(n). Depending on how we define d(n), we may dis-
tinguish the following two types of estimation:
1. The desired response d(n) equals the signal s(m), and the estimation is
referred to as filtering. The requirement here is to suppress the effect of
the additive noise w(7).
2. The desired response d(n) equals s(n + a), where a > 0, and the
estimation is referred to as prediction. In this case, the signal of interest
s(n) can only be observed in the presence of additive noise w(7) and the
requirement is to predict the value of this signal looking a units of time
into the future. Alternatively, the desired response d(n) equals u(n + a),
where a > 0. This second situation arises, for example, when the ob-
served signal u(n) consists of a speech signal and the requirement is
simply to predict its value a units of time into the future.

s(n) Discrete-time
linear y(n)
filter

Figure 2.1 Linear estimation.


26 INTRODUCTION TO ADAPTIVE FILTERS

2.2 FORMULATION OF THE LINEAR-FILTERING PROBLEM

Consider a linear tapped-delay-line filter whose impulse response is denoted


by the sequence of numbers h(1), h(2),...,4(M). As shown in Fig. 2.2, the
filter consists of a set of delay elements (each being represented by the
unit-delay operator z~'), a corresponding set of adjustable tap gains or
coefficients h(1), h(2),...,4(M) connected to the tap inputs, and a set of
adders for summing the resultant outputs. This filter is also referred to as a
transversal filter. The filter is driven by a random time series producing the
random variables u(n), u(n — 1),...,u(n — M + 1) as the M tap inputs of
the filter. We assume that this random time series is stationary, so that it
satisfies the following two conditions:
1. The mean value of the process, denoted by m, is independent of time n,
as shown by
m = E[u(n)]| = constant (21)
where E is the expectation operator.
2. The autocorrelation function of the process, defined by
r(n,m) = E[u(n)u(m)} (2.8)
depends on the time difference n — m only:
r(n,m) =r(n-—m) (2.9)
Without loss of the generality, we assume from here on that the mean value
m of the process is zero.
Denote the signal produced at the filter output by y(”). We may
express y(n) by the convolution sum:
M
y(n) = ¥ h(k)u(n-—k
+1) (2.10)
k=1

Figure 2.2 Tapped-delay-line filter.


WIENER FILTERS 27

We wish to design this filter so that the difference between a desired


response, d(n), and the corresponding value of the actual filter output,
y(n), is minimized in some sense. Let this difference be denoted by

e(n) = d(n) — y(n) (2.11)


The difference e(7) is called the error signal or residual.
Typically, the filter output y(n) is different from the desired response
d(n), with the result that the error signal e(7) is nonzero. The requirement
we therefore have to meet is to optimize the design of the tapped-delay-line
filter in Fig. 2.2 so as to maintain the error signal e(n) as small as possible
in some Statistical sense. In the Wiener cheory, the filter is optimized by
minimizing the mean-square value of the error signal e(n).

2.3 NORMAL EQUATIONS

Let the mean-square value of the error signal be denoted by


e = E[e?(n)| Sey
This mean-square value is a real and positive scalar quantity, representing
the average power of the error signal e(n) when it is developed across a
load of 1 &. Substituting Eq. (2.11) in (2.12), we get

e= E[d?(n)] —2£[d(n)y(n)] + Ely%(n)] (2.13)


Next, substituting Eq. (2.10) in (2.13), and then interchanging the orders of
summation and expectation in the last two terms, we get

e = E[d?(n)| -2 d h(k)E[d(n)u(n — k + 1)]

ke h(k)h(m)E[u(n—k + 1)u(n—m-+1)] (2.14)


ite
Assuming that the input signal u(n) and desired response d(n) are jointly
stationary, the three terms on the right-hand side of Eq. (2.14) may be
interpreted as follows:
1. The expectation E[d?(n)] is equal to the mean-square value of the
desired response d(n):
= E({d?(n)| (2.15)
2. The expectation E[d(n)u(n — k + 1)] is equal to the cross-correlation
function of the desired response d(n) and the input signal u(n) for a lag
28 INTRODUCTION TO ADAPTIVE FILTERS

of k-1:
p(k-1)=E[d(n)u(n-—k+1)], k=1,2,...,M (2.16)
We may therefore rewrite the single summation term on the right-hand
side of val(2.14) as follows:

yh(k)E[d(n)u(n—k + 1)]= > h(k)p(k-1) (2.17)


k jon
3. Finally, the expectation E[u(n — k + 1)u(n — m + 1)] is equal to the
autocorrelation function of the input signal u(n) for a lag of m — k:
r(m—k)=E[u(n—k+1)u(n—m+1)] (2.18)
Accordingly, we may rewrite the double summation term on the right-
hand side of Eq. (2.14) in the form
M M
YY h(k)h(m)E[u(n
— k + 1)u(n — m + 1)|
k=1 m=1

M M

=> DV h(k)h(m)r(m—k) (2.19)


= 1

Thus, substituting Eqs. (2.15), (2.17), and (2.19) in (2,14), we find that the
expression for the mean squared error ae be rewritten in the form

c= P= 2¥ hl
k) p( ees Ak h(m)r(m—k) (2.20)
k=1 m=1

Equation (2.20) states that, for the case when the desired response
{d(n)} and the input signal { u(7)} are jointly stationary, the mean squared
error e€ is precisely a second-order function of the tap coefficients
h(i), h(2),...,h(M) of the tapped-delay-line filter. Accordingly, we may
visualize the dependence of the mean squared error ¢ on the tap coefficients
as a bowl-shaped surface with a unique minimum. We refer to this surface
as the error-performance surface of the tapped-delay-line filter. The require-
ment is to design this filter so that it operates at the bottom or minimum
point of the error-performance surface.
The mean squared error e attains its minimum value when its deriva-
tives with respect to the tap coefficients h(k), for k =1,2,...,M, are
simultaneously zero. Differentiating the expression for mean squared error
e, defined by Eq. (2.20), with respect to h(k), we get
M
de
an(k) PCR W422 h(m)r(m — k) (2.21)
m=1

Setting this result equal to zero, we obtain the optimum values of the tap
coefficients. Let these values be denoted by hy(1), ho(2),...,9(M). They
WIENER FILTERS 29

are given aethe solution of the set of equations

x hol (Tie
We)eaapd eter she) 8 6 yd
2s ed AT a (2.22)
m=1

This is a system of M simultaneous equations, called the normal equations


for a tapped-delay-line filter; the reason for this name will be given in
Section 2.5. The known quantities in this system of equations are the
autocorrelations {r(m — k)} of the filter input and the cross-correlations
{ p(k — 1)} between the desired response and the filter input. The tap
coefficients of the optimum tapped-delay-line filter are the unknowns.

2.4 THE MINIMUM MEAN SQUARED ERROR

Let e,,;, denote the minimum value of the mean squared error, which results
when the tapped-delay-line filter assumes its optimum condition. Using
ho (1), Ao(2),..., 49(M) for the tap coefficients in Eq. (2.20), we get

Emin = Py 2 DY ho(k) p(K-1)+ YY Ao(k)ho(m)r(m — k)


k=1 k=1 m=1

- Dhol(k)|2p(k — 1) Stig hy(m)r(m


— k) (223)
m=1

By substituting Eq. (2.22) in (2.23), we may simplify the expression for the
minimum mean squared error as follows:
M
E BG ay sz hy(k) p(k - 1) (2.24)
Ra
We refer to a tapped delay line filter whose impulse response is defined
by the normal equations (2.22) as optimum in the mean-square sense. There
is no other linear filter that we can design which can produce a mean
squared error [between the desired response d(n) and the filter output y(7)]
smaller in value than the minimum mean squared error ¢,,;, of Eq. (2.24).

2.5 PRINCIPLE OF ORTHOGONALITY

The normal equations (2.22) define the tap coefficients of the optimum
tapped-delay-line filter in the minimum-mean-square sense. We may rewrite
this set of equations by using the definitions of Eqs. (2.16) and (2.18) for the
cross-correlation function p(k — 1) and autocorrelation function r(k — m),
30 INTRODUCTION TO ADAPTIVE FILTERS

respectively, as follows:
M
y Ay(m)E[u(n — m+ 1)u(n—k +1)] = E[d(n)u(n — k + 1),
m=1

ke Wy 2k W225)

Interchanging the order of expectation and summation in Eq. (2.25), and


then transposing the left-hand term to the right-hand side of the equation,
we get

e|d(n) — ¥ hy(m)u(n
— m+ 1)Ju(n—k+ 1)) = 0,
m=1 ee

k= es M (2.26)

However, the summation term inside the square brackets in Eq. (2.26) is
recognized as the signal y,() resulting at the optimum filter output in
response to the set of input samples u(n), u(n — 1),...,u(n — M + 1). We
may therefore view y(n) as the minimum-mean-square estimate of the
desired response d(n), based on an input signal consisting of the samples
uli), uin=— A)...Wn = Mek). Let: “this estiinate “be “denoted, soy
d(n|n,...,n — M + 1). We may thus write
Vaida ted ae n3M-+1)
M
= DY hy(m)u(n — m+ 1) (2.27)
m=1

Accordingly, we may rewrite Eq. (2.26) in the form

E{[d(n) — yo(n)] u(n — k + 1)} = E[e.(n)u(n — k + 1)]


= 0, k=1,2,...,M (2.28)

where €)(n) = d(n) — yo(n) is the error signal resulting from use of the
optimum filter. Equation (2.28) states that, for the optimum filter, the error
signal and any of the tap inputs are orthogonal. This result is known as the
principle of orthogonality. Hence, we conclude that the two criteria: “mini-
mum mean squared error” and “orthogonality between error and input”
yield identical optimum filters.
As a corollary of the principle of orthogonality, we may also state that
the error signal ey(7) and the optimum filter output y)(”) are orthogonal,
as shown by

E[eg(n) yo(n)] aw (2.29)


This result is obtained by using Eq. (2.27), with k in place of m, to express
WIENER FILTERS 31

d
7)

Figure 2.3 Geometric interpretation of the principle of


Yo orthogonality.

the expectation of the product e9(n) yo(n) as follows:

E[eo(n) yo(n)] e|eotm d ho(k)u(n — k + 1)

I
pase eal) ee =k + 1)|

=0
where in the last line we have made use of Eq. (2.28).
Equation (2.29) has an interesting geometric interpretation. If we view
the random variables representing the filter output, the desired response,
and the error signal as vectors, and recall that, by definition, the desired
response equals the filter output plus the error signal, we see that these three
vector quantities are related geometrically as shown in Fig. 2.3. In particu-
lar, the vector ey is drawn “normal” to the vector yy—hence the name
“normal equations.” Clearly, it is only when this condition is satisfied that
the vector representing the error signal attains its minimum length.

2.6 MATRIX FORMULATION OF THE NORMAL EQUATIONS

The normal equations (2.22) consist of a system of M simultaneous equa-


tions with the optimum filter coefficients as the unknowns. We may express
this system of equations in a compact form by using matrix notation. For
this purpose, we introduce the following definitions:
1. The M-by-1 coefficient vector of the optimum filter is denoted by

h,(1)
h, = tile) (2.30)
ho(M)
2. The m-by-1 cross-correlation vector, whose elements consist of the corre-
lation between the desired response d(n) and the tap inputs u(n),
32 INTRODUCTION TO ADAPTIVE FILTERS

u(n — 1),...,u(n — M + 1), is given by

p(0)
pl) (2.31)
p(M— 1)
where the kth element is
p(k-1)=E[d(n)u(n-—k+1)], k=1,2,...,M. (2.32)
3. The M-by-M correlation matrix, whose elements consist of the mean-
square values of the individual tap inputs u(n),u(n — 1),...,
u(n — M + 1), as well as the correlations between these tap inputs, is
given by
r(0) ra) a url ug)
ate ae - - r(1 : M) (2.33)

gree 1) ie 2) th (0)
where
r(m—k)=E[u(n—k+1)u(n—m+1)], Whe
= 1s as M
(2.34)
Note that r(m — k) is the mkth element located at the intersection of row
m and column k of the matrix R.
Thus, using the definitions of Eqs. (2.30), (2.31), and (2.33), we may
rewrite the normal equations (2.22) in matrix form, as follows:
Rh, = p (2535)
This equation represents the discrete-time version of the well-known
Wiener-Hopf equation.
To solve for the coefficient vector of the optimum filter, we premultiply
both sides of Eq. (2.35) by the inverse of the correlation matrix R. Denoting
this inverse by R~', we may thus write

h,=R™“‘p (2.36)
For the inverse matrix R~' to exist, the correlation matrix R has to be
nonsingular. The justification for this is given in the next section.
Correspondingly, we may rewrite the expression for the minimum mean
squared error, given in Eq. (2.24), as follows:
e at ph,

= Pao Ria (2.37)


WIENER FILTERS 33

where the 1-by-M vector p’ is the transpose of the vector p. Throughout the
book, we will use the superscript T to indicate matrix transposition.

2.7 PROPERTIES OF THE CORRELATION MATRIX

The correlation matrix R of the filter input plays a key role in the solution
of the optimum filter, as evidenced by the matrix form of the normal
equations in (2.35). The development of efficient procedures for the
computation of this solution capitalizes on certain properties of the correla-
tion matrix, as we will see in subsequent chapters. It is therefore important
that we understand the properties of the correlation matrix R and their
implications.
Using the definition given in Section 2.6, we find that the correlation
matrix R of a stationary process has the following properties:

Property 1. The correlation matrix R is symmetric, that is,

R’=R (2.38)

where R’ is the transpose of R.


We express the M-by-M correlation matrix R in the form

R = E[{u(n)u’(n)| (2.39)
where u(7) is the M-by-1 tap-input vector, defined by

u(n)
toe ue )) (2.40)
me ee)
By substituting Eq. (2.40) in (2.39) and expanding, it is a straightforward
matter to show that the result is the same as in Eq. (2.33). Taking the
transpose of both sides of Eq. (2.39), we get the result given in Eq. (2.38).
The statement that R’ = R is equivalent to saying that the m&th
element and Amth element of the correlation matrix R are equal. Accord-
ingly, the expanded form for the correlation matrix R takes on the following
special structure:

r(0) r(1) ens f{ Mi— 1)


ide (1) ay —s 2) (2.41)

PA eae Weta)
34 INTRODUCTION TO ADAPTIVE FILTERS

Property 2. The correlation matrix R is Toeplitz, that is, the elements on


its main diagonal are equal and so are the elements on any other diagonal
parallel to the main diagonal.
In Eq. (2.41), we see that the elements on the main diagonal of the
correlation matrix R have the common value r(0), the elements on the first
diagonal above or below the main diagonal have the common value r(1),
and so forth. A matrix having this property is called a Toeplitz matrix in
honor of the mathematician O. Toeplitz. It is important to recognize,
however, that the Toeplitz property of the correlation matrix R is a direct
consequence of the assumption that the random time series applied to the
input of the transversal filter is stationary. Indeed, we may state that if the
filter input is stationary, the correlation matrix R must be Toeplitz. Con-
versely, if the correlation matrix R is Toeplitz, the filter input must be
Stationary.

Property 3. The correlation matrix R is almost always positive definite.


Let x be an arbitrary M-by-1 vector. Define the scalar random variable
a= x'u(n)
=u (n)x
where u(7) is the M-by-1 tap-input vector. The mean-square value of the
random variable a equals the quadratic form x’ Rx, as shown by
E[a?] = E[xu(n)u’(n)x]
= x’E[u(n)u’(n)|x
= x Rx
Since
E[a*]>0
it follows that
x Res 0
A quadratic form that satisfies this condition is said to be nonnegative
definite or positive semidefinite. Accordingly, we may state that the correla-
tion matrix R is always nonnegative definite.
If the quadratic form x’ Rx satisfies the condition
x’ Rx > 0
for every x, we say that the correlation matrix R is positive definite. In
practice, we find that for a stationary time series this condition is satisfied
with a probability close to one, so that the correlation matrix R is almost
always positive definite.
The positive definiteness of a matrix has several implications. One
important implication is that a positive definite matrix is nonsingular. We
WIENER FILTERS 35

say that a matrix is nonsingular if its inverse does exist; otherwise, it is


singular. Other implications of a positive definite matrix are discussed later
in the next section and in Chapter 3.
In summary, the correlation matrix of a real-valued stationary time
series is symmetric, Toeplitz, always nonnegative definite, and almost al-
ways positive definite and nonsingular.

2.8 REPRESENTATION OF THE CORRELATION MATRIX


IN TERMS OF ITS EIGENVALUES AND EIGENVECTORS

Computations with a positive definite matrix, such as the correlation matrix,


may be simplified by using two sets of parameters known as the eigenvalues
and eigenvectors of the matrix.
Let R be an M-by-M matrix. Then q is an eigenvector of R correspond-
ing to the eigenvalue Xd if
q #0
(2.42)
Rq = Aq
The requirement that q # 0 is necessary, because if q equals the null vector
0, then any number A satisfies the equation Rq = Aq.
From the second line of Eq. (2.42) we observe that the eigenvalue A is
measured in units of power, the same as the autocorrelation function. On
the other hand, the eigenvector is dimensionless. It is also important to
realize that an eigenvector may correspond to only one eigenvalue, but an
eigenvalue may have many eigenvectors. For example, if q is an eigenvector
with eigenvalue A, then so is aq for any a # 0.
From the second line of Eq. (2.42) we see that A is an eigenvalue of R if
and only if the determinant
det(R — AI) = 0 (2.43)
where I is the M-by-M identity matrix. Thus the eigenvalues of R are
precisely those values of A which satisfy Eq. (2.43). This equation is called
the characteristic equation. The function
f(A) = det(R — AD)
is a polynomial in A, whose leading term is (—1)”A”, where M is the order
of the matrix R. It follows therefore that the characteristic equation has M
roots, which are the eigenvalues of R.
In Appendix 1, it is shown that if the matrix R is positive definite, the
eigenvalues of R are both real and positive. Since the correlation matrix R
of a stationary time series is (almost always) positive definite, it follows that
its eigenvalues are likewise both real and positive.
We next point out a notational device that often simplifies manipu-
lations with eigenvalues and eigenvectors. Let the eigenvectors q;,q>,---, 4
36 INTRODUCTION TO ADAPTIVE FILTERS

satisfy the set of equations


Rq, = A,q,; | ee Oe NY (2.44)

Define the M-by-M matrix

Q = [41,42,---54a] (2.45)
and the diagonal M-by-M matrix
A =idiag (Ag) N55. Ny) (2.46)
Then it is a straightforward matter to show that the set of equations (2.44) is
equivalent to the single matrix equation:
RQ = QA (2.47)

Let q),q4>,---,4,y be the eigenvectors of R corresponding to the eigen-


values A,,A5,...,A,y, respectively. When the eigenvalues of the correlation
matrix are distinct, the associated eigenvectors are orthogonal, and the
matrix Q is nonsingular (see Appendix 1). Hence, upon premultiplying both
sides of Eq. (2.47) by the inverse matrix Q~', we get

Q TRO=A (2.48)
A matrix transformation of special interest is the unitary similarity trans-
formation, for which we have

QQ (2.49)
or equivalently

Q'0 =I (2.50)
where Q’ is the transpose of matrix Q. A matrix Q that satisfies this
condition is called a unitary matrix. Accordingly, we may rewrite Eq. (2.48)
in the form

Q’RQ=A tech}
Note that substitution of Eq. (2.45) in (2.50) yields

ip me Lak
0, otherwise (2.52)

In other words, in a unitary similarity transformation the eigenvectors that


constitute the individual columns of the unitary matrix Q satisfy two
conditions:
1. The eigenvectors are orthogonal to each other.
2. Each eigenvector is normalized to have a length of one, with the squared
length of the eigenvector q, defined as q/q,.
WIENER FILTERS 37

2.9 CANONICAL FORM OF


ERROR-PERFORMANCE SURFACE

Having familiarized ourselves with the unitary similarity transformation, we


next use it to develop further insight into the error-performance surface.
Equation (2.20) defines the dependence of the mean squared error e on the
coefficients of the tapped-delay-line filter. By using the definitions of Eqs.
(2.30), (2.31), and (2.33), we may reformulate this expression for the mean
squared error € in matrix form as follows:

e = P,— 2p’h+h’Rh (2.53)


where P, is a fixed term representing the mean-square value of the desired
response, p is the cross-correlation vector between the desired response and
the tap inputs of the filter, R is the correlation matrix of the tap inputs, and
h is the tap-coefficient vector of the filter. Taking the transpose of both sides
of the matrix form of the normal equations in (2.35) and using the
symmetric property of the correlation matrix R, we have
hoR =p! (2.54)
where hy is the optimum value of the tap-coefficient vector. Hence, sub-
stituting Eq. (2.54) in (2.53), we get
e = P,— 2h) Rh+h’Rh (2.55)
The mean squared error attains its minimum value e,,, when the
tap-coefficient vector of the filter assumes its optimum value h,. The
minimum mean squared error €,,;, 18 defined by Eq. (2.37). Substituting Eq.
(2.54) in the first line of Eq. (2.37), we get
Emin = Py — WERhy (2.56)
Subtracting Eq. (2.56) from (2.55), we may thus rewrite the expression
for the mean squared error as follows:
e=e,,, +(h— hy)’ R(h— hy) (2.57)
This equation shows explicitly the unique optimality of the minimizing
filter-coefficient vector ho.
Although the quadratic form on the right-hand side of Eq. (2.57) 1s
quite informative, nevertheless, it is desirable to change the basis on which it
is defined so that the representation of the error performance surface is
simplified. In particular, we use the unitary similarity transformation of Eq.
(2.51). Thus, by substituting Eq. (2.51) in (2.57), we may rewrite the
expression for the mean squared error as follows:

toh aaa nin + (h — hy) QAQ7(h — hy) (2.58)


Let the M-by-1 vector v denote the transformed version of the difference
38 INTRODUCTION TO ADAPTIVE FILTERS

between the filter coefficient vector h and its optimum value ho:
y= Q7(h—h,) (2.59)
Using Eq. (2.59) in (2.58), we may then express the mean squared error in
the new form

€ = Enin
+ v/Av (2.60)
Since the matrix A is a diagonal matrix, the quadratic form v‘Ay is in its
canonical condition in that it contains no cross-product terms, as shown by
M
ome Hy Awe (2.61)
ct
where A, is the Ath eigenvalue of the correlation matrix R, and v, is the
kth component of the transformed coefficient error vector vy. The feature
that makes the canonical form of Eq. (2.61) a useful representation of the
error-performance surface is the fact that the components of the vector v
are uncoupled from each other. In other words, the M components of the
transformed coefficient error vector v constitute the principal axes of the
error-performance surface. The significance of this result will become
apparent in the next chapter.

2.9 NOTES

In 1795, Gauss [1] used an estimation procedure called the method of least
squares in his efforts to determine the orbital parameters of the asteroid
Ceres. Accordingly, the method of least squares is credited to Gauss even
though it was first published by Legendre [2] in 1805. Since then, there has
been a vast literature on various aspects of the least-squares method. In
particular, Kolmogorov [3] in 1941 and Wiener [4] in 1942 reintroduced and
reformuiated independently the linear least-squares problem for the filter-
ing, smoothing, and prediction of stochastic processes. Kolmogorov studied
discrete-time problems and solved them by using a recursive orthogonaliza-
tion procedure known as the Wold decomposition. Wiener, on the other
hand, studied the continuous-time problem and formulated the optimum
filter in terms of the famous Wiener—Hopf integral equation, which requires
knowledge of the correlation functions of the signal process. The solution of
this equation is rather difficult for all but the simplest problems. Equation
(2.35) is the matrix form of the discrete-time version of the Wiener—Hopf
equation.
When the filter input is a real-valued stationary process, the correlation
matrix contained in this equation is symmetric, Toeplitz, nonnegative defi-
nite, and almost always positive definite in practice. For a discussion of
Toeplitz matrices, see Grenander and Szegé [5] and Widom [6]. For a
WIENER FILTERS 39

discussion of the positive definiteness of the correlation matrix, see Ferguson


[9] and Feller [10].
The eigenvalue problem is discussed by Guillemin [7] and Hadley [8].
For a description of numerical methods used to compute eigenvalues and
eigenvectors, see Stewart [11]. The diagonalization of the correlation matrix
as a means of expressing the error performance surface in terms of its
principal coordinates is discussed by Widrow [12].

REFERENCES

C. F. Gauss, “Theoria Motus Corporum Coelestium in Sectionibus Conicus Solem


Ambientum,” Hamburg, 1809 [translation: Dover, 1963].
. A. M. Legendre, “Méthode des Moindres Quarrés, pour Trouver le Milieu le Plus Probable
entre les Resultats de Différentes Observations,’ Mem. Inst. France, pp. 149-154, 1810.
. A. N. Kolmogorov, “Interpolation and Extrapolation of Stationary Random Sequences,”
Bull. Acad. Sci. USSR, Ser. Math., vol. 5, 1941.
. N. Wiener, “Extrapolation, Interpolation, and Smoothing of Stationary Time Series with
Engineering Applications,’ MIT Press, 1949 (originally issued in February 1942 as a
Classified National Defense Research Council Report). /
. U. Grenander and G. Szegé, “Toeplitz Forms and their Applications” (University of
California, 1958).
. H. Widom, “Toeplitz Matrices,” in book “Studies in Real and Complex Analysis,” edited
by I. I. Hirschman, Jr., MAA Studies in Mathematics (Prentice-Hall, 1965).
. E. A. Guillemin, “The Mathematics of Circuit Analysis” (John Wiley and Sons, 1949).
. G. Hadley, “Linear Algebra” (Addison-Wesley, 1964).
T. S. Ferguson, “Mathematical Statistics” (Academic Press, 1967).
_ W. Feller, “An Introduction to Probability Theory and Its Applications,” vol. II (Wiley,
1966).
. G. W. Stewart, “Introduction to Matrix Computations” (Academic Press, 1973).
. B. Widrow, “Adaptive Filters,” in book “Aspects of Network and System Theory,” edited
by R. E. Kalman and N. de Claris (Holt, Rinehart and Winston, 1970).
CHAPTER

THREE
LINEAR PREDICTION

A problem of special interest in signal processing is that of prediction, where


the requirement is to use a finite set of sample values of a stationary process
to predict a sample value of the process some time into the future. We say
that the prediction is /inear if it is made by performing a linear filtering
operation on the given set of samples of the process. Our interest in this
book is confined to the linear-prediction problem. We refer to the filter
designed to make the prediction as a predictor. The difference between the
actual sample value of the process at the time of interest and the predictor
output is referred to as the prediction error. According to the Wiener filter
theory, the predictor is designed to minimize the mean-square value of the
prediction error.
The prediction described above is said to be in the forward direction, as
the intention 1s to look into the future. We may also design a predictor that
operates on the finite set of sample values of the process to make a
prediction of a sample value of the process some time into the past. We
refer to this form of prediction as backward prediction.
The application of the Wiener filter theory to the linear-prediction
problem, be it in the forward or backward direction, results in a tapped-
delay-line structure for the predictor. It turns out that both the forward and
backward forms of prediction may ‘indeed be combined into a single
structure, known as the /attice predictor, which has several interesting
properties that make it a useful signal-processing tool. The development of a
lattice predictor is considered later in the chapter.
We begin the study of linear prediction by reformulating the normal
equations (developed in Chapter 2) for forward linear prediction.

40
LINEAR PREDICTION 4]

3.1 THE NORMAL EQUATIONS FOR FORWARD


LINEAR PREDICTION

Consider the time series u(n — 1), u(n — 2),...,u(n — M) obtained from a
stationary process. In the forward linear prediction (FLP) problem, this set
of samples is used to make a prediction of u(n). We refer to this special
form of forward prediction as one-step prediction, as we are looking exactly
one step into the future. Let a(n|n — 1,...,n2 — M) denote the value of
this prediction. Although this notation may at first sight appear cumber-
some, nevertheless, it properly describes the one-step prediction at time n,
given the sample values at times n — 1,...,n — M. With u(n) denoting the
actual sample value of the process at time n, we define the forward
prediction error as

fyln)=u(n) = a(n|n—1,.2,n-M) (3st)


In effect, u(n) plays the role of the desired response. The use of subscript M
in the symbol f,,(7) is intended to emphasize the fact that ! sample values
of the process are used to make the prediction.
For the predictor we use a tapped-delay-line filter, as in Fig. 3.1, which
is optimized by minimizing the mean-square value of f,,(”) with respect to
the tap coefficients of the filter. We thus write
M
a(n|[n—1,....n-M)= > hy(k)u(n—-k) (32)
k=1
where u(n — 1),u(n — 2),...,u(n — M) are the tap inputs, and /,(1),
h,(2),..-,4o(M) are the predictor coefficients.
To solve for these predictor coefficients we use the normal equations
adapted to suit the one-step forward-prediction problem. To help with this
adaptation, we have set up a table of correspondences between the quanti-

Figure 3.1 Forward predictor.


42 INTRODUCTION TO ADAPTIVE FILTERS

ties appearing in Fig. 3.1 and those in Fig. 2.2; it is given in Table 3.1. Thus,
adapting the normal equations (2.22) to the one-step prediction problem, in
accordance with the correspondences indicated in the table, we may write

> hy(m)r(m—k)=r(k), ba 2, LM (3.3)

where r(m — k) is the correlation between the “tap inputs”:


r(m—k)=E[u(n—k)u(n—™m)}], kgm=
1, 2y0; Mood)
and r(k) is the correlation between the “desired response” and the “tap
inputs”:
r(k) = E[u(n)u(n—k)], k=1,2,:.4M
We see therefore that we only need to know the autocorrelation function of
the input process for different lags in order to solve the normal equations
(3.3) for the one-step predictor coefficients.
Similarly, using the correspondences given in Table 3.1 to adapt Eq.
(2.24) for the minimum mean squared error into a form suitable for the
one-step predictor problem, we may express the mean-square value of the
forward prediction error as

Phm 4 E| fa(n)|
M

= 1(0)— )) Ao(m)r(m) (3.5)


m=1

where r(0) is the mean-square value of the desired response u(n), given by

r(0) = E[u*(n)]
and r(m) is the correlation between the desired response and the tap inputs
for lag m = 1,2,...,M.
The normal equations (3.3) for one-step prediction and Eq. (3.5) for the
mean-square value of the forward prediction error are formulated in terms
of the predictor coefficients hy(1), Ao(2),..., 4p(M). We may combine these
equations into a single set by introducing a new set of filter coefficients

Table 3.1
Tapped-delay-line Tapped-delay-line
Description filter of Fig. 3.1 filter of Fig. 2.2

Tap inputs Ul) en) a ees TAP NE). HCA OKO =D) oe 6u(n — M+ 1)
Desired response u(n) d(n)
Error signal fy(n) e(n)
LINEAR PREDICTION 43

defined by

il, m= 0
@y(m)= —ho(m), m=1,...,M (3.6)
0, m>M
Accordingly, we may reformulate the normal equations (3.3) by moving
r(k) inside the summation on the left-hand side, and so write
M
>) ay(m)r(m — k) = 0, Ke A tok Md (3.7)
m=0

Similarly, we may reformulate Eq. (3.5) ty moving r(0) inside the summa-
tion, and so write
M
DS ay(m)r(m) = Pr vg (3.8)
m=0

We may go one step further by recognizing that the summation on the


left-hand side of Eq. (3.8) equals the summation on the left-hand side of Eq.
(3.7) for k = 0. This suggests that we may combine Eqs. (3.7) and (3.8) into
a single set of M+ 1 simultaneous equations by allowing the yariable k
take on values defined by 0 < k < M. We may thus write
M Iz k= @
ay(m)r(m—k)=h fh (3.9)
De au(m)r(
m=0 te k=1,2,...,M
We refer to this set of equations as the augmented normal equations for
forward linear prediction.

Expression for the Forward Prediction Error in Terms of the


Prediction— Error-Filter Coefficients
One other reformulation that we wish to do is to define the forward
prediction error f,,(”) in terms of the new coefficients. With this in mind,
we substitute Eq. (3.2) in (3.1) and then use the definition of Eq. (3.6),
obtaining
M
fu(n) = Lo ay(k)u(n
— k) (3.10)
k=0
This input-output relation leads to the tapped-delay-line structure shown in
Fig. 3.2, where the output f,,(”) is produced in response to the tap inputs
u(n), u(n — 1),...,u(m — M). This new structure is called the prediction-
error filter.
The relationship between the one-step predictor and the prediction-
error filter is illustrated in Fig. 3.3. This figure clearly shows that the
impulse response of the prediction-error filter is longer than that of the
44 INTRODUCTION TO ADAPTIVE FILTERS

Figure 3.2 Prediction-Error Filter.

one-step predictor by one sample period. Nevertheless, we will use M to


denote the order of either device, where M is the number of input samples
used to make the one-step prediction.

3.2 THE NORMAL EQUATIONS FOR BACKWARD


LINEAR PREDICTION

Consider next the backward linear-prediction (BLP) problem. In this case


we use the time series u(n), u(n — 1),..., u(n — M + 1) to make a predict-
ion of u(n — M). Let i(n — M|n,...,n — M + 1) denote the result of this
prediction. With u(n — M) denoting the actual sample value of the process
at time n — M, we define the backward prediction error as

by(n) = u(n— M)—a(n-— Min,...,n -—M +1) (3:11)

where u(n — M) plays the role of the desired response. Here again we have
used the subscript M in the symbol b,,(”) to indicate that M input
samples are used to make the backward one-step prediction.

One-step PUG agi aoe eee oe


predictor
of order M

Prediction-error filter

Figure 3.3 Relationship between the predictor and the predictor-error filter.
LINEAR PREDICTION 45

Table 3.2
Tapped-delay-line Tapped-delay-line
Description filter of Fig. 3.4 filter of Fig. 2.2

Tap inputs TECGTE


AU Uren) eee EXC = WESTID) TAC OG = NOS 5 oan OKO = AVES 1)
Desired response u(n — M) d(n)
Error signal byy(n) e(n)

For the predictor we again use a linear tapped-delay-line structure, as in


Fig. 3.4. Thus with u(n), u(n — 1),...,u(n — M + 1) acting as tap inputs,
we may express the backward one-step prediction as
M
i(n—-M|n,....n—-M+1)= ¥ g,(k)u(n—k+1) (3.12)
k=1
where go(1), 89(2),..., (MM) are the predictor coefficients, optimized in
the mean-square sense.
To solve for these coefficients we use the normal equations adapted to
suit the one-step backward prediction problem. To help with this adapta-
tion, we have set up, in Table 3.2, the table of correspondences between the
quantities appearing in Fig. 3.4 and those in Fig. 2.2. Thus adapting the
normal equations (2.22) to the backward-prediction problem, in accordance
with the correspondences indicated in Table 3.2, we may write
M
> go(m)r(m—k)=r(M—k+1), k=1,2,...,M (3.13)
m=1

where r(m — k) is the correlation between the “tap inputs,” given by


r(m—k)=E[u(n—k+1)u(n—m+1)], km = 126M
(3.14)

Ne n= M+ 1)

Figure 3.4 Backward predictor.


46 INTRODUCTION TO ADAPTIVE FILTERS

and r(M — k + 1) is the correlation between the “desired response” and


the “tap inputs,”
r(M—k+1)=E[u(n—k+1)u(n- M)]
Here again we see that only knowledge of the autocorrelation function of
the input process for different lags is needed to solve the normal equations
(3.13) for backward linear prediction.
Similarly, using the correspondences given in Table 3.2 to adapt Eq.
(2.24) for the minimum mean squared error into a form appropriate for the
backward linear problem, we may express the mean-square value of the
backward prediction error as follows:

86 hag E[by,(n)]
M
=r(0)— > g,(m)r(M
—m +1) (3.15)
m=1

where r(0) is the mean-square value of the “desired response” and r(M —
m + 1) is the correlation between the “desired response” and the “tap
inputs”.
If in Eq. (3.13) we replace m with M — m + 1, replace k with M—k
+ 1, and also recognize that, for a stationary real-valued process, r(k — m)
equals r(m — k), we may rewrite this equation in the following equivalent
form:

53g)(M—m+1)r(m—k)=r(k), k=1,2,...,M (3.16)


m=1

Comparing Eqs. (3.16) and (3.3), we see that they have the same mathemati-
cal form with
hy(m) = go( M — m + 1), mM Sa ot (3:17)
Equivalently, we may write
9(m) =h,.(M -— m+), R= hake M (3.18)
Equation (3.18) suggests that we may use the forward predictor, with its
coefficients arranged in reverse order as in Fig. 3.5, to compute the back-
ward prediction error b,,(n).
If in Eq. (3.5) we replace m with M — m + 1, and then use Eq. (3.18),
we find that

|e Oe ae (3.19)
That is, for a stationary input process the backward-prediction error by(n)
and the forward prediction error f,,(”) have exactly the same mean-square
value.
LINEAR PREDICTION 47

As with the forward linear prediction, the next manipulation we wish to


perform is to combine Eqs. (3.13) and (3.15) into a single set of M+ 1
simultaneous equations. To do this, we first rewrite the normal equations for
backward linear prediction in terms of the forward predictor coefficients by
substituting Eq. (3.18) in (3.13). The result of this substitution is
M
y Ay(M -—m+ 1)r(m—k)=r(M-—k +1), k= 1 25M
m=1

(3.20)
Next, we replace k — 1 with j and replace m — 1 with /, obtaining the
result
M-1
y Ag(M —-I)rUl-j)=r(M-j), jf=09,1,...,.M—-1
!=0

Clearly, the meaning of this equation is unaffected if we replace / with m,


replace j with k, and thus write
M-1
>» hy (M -— m)r(m-k)=r(M—-k), Kem] eauad Sal
m=0

(3.21)
We could have indeed obtained Eq. (3.21) directly from Eq. (3.20) by
replacing k — 1 with k and m — 1 with m. The only reason for making the
substitutions in two stages was for the sake of clarity. In any event, we
observe in Eg. (3.21) that r(M — k) equals r(m — k) for m = M. Hence,
moving the term r(M — k) inside the summation on the left-hand side of
Eg. (3.21), and also using Eq. (3.6), we may rewrite the normal equations for
backward prediction in terms of the forward prediction-error filter coeffi-

TA IW Anc ont! = 170 1D)

Figure 3.5 Realization of the backward predictor using forward predictor coefficients in reverse
order.
48 INTRODUCTION TO ADAPTIVE FILTERS

cients as follows
M
Y ay(M-—m)r(m-—k)=0, k=0,1,...,.M-—1 (3.22)
m=0

Next, we reformulate the expression for the mean-square value of the


backward prediction error in terms of the forward predictor coefficients by
substituting Eq. (3.18) in (3.15). The result of this substitution is
M
P, w= 7r(0)— ho( M— m+ 1)r(M—m + 1)
m=1

If now we replace m — 1 with m, we get the result


M-1
Py =r) = 2) k= mrt a) (3.23)
m=()

We observe that r(0) equals r(M — m) for m = M. Therefore, moving the


term r(0) inside the summation on the right-hand side of Eq. (3.23), and
also using Eq. (3.6), we may express the mean-square value of the backward
prediction error in terms of the forward prediction-error filter coefficients as
follows:
M
Cone LE ay(M — m)r(M — m)
m=0

For a stationary real-valued process, r(M — m) equals r(m — M). Hence,


we may also write
M
P, w= 4y(M= m)r(m= M) (3.24)
m=0

Finally, we observe that the summation on the right-hand side of Eq.


(3.24) equals the summation on the left-hand side of Eq. (3.22) for k = M.
This suggests that we may combine Eqs. (3.22) and (3.24) into a single set of
M + 1 simultaneous equations, as follows
M
0, k=0,
ys ay(M~m)r(m—k)= (ip ey
m=0

We refer to this set of equations as the augmented normal equations for


backward linear prediction.

Expression for the Backward Prediction Error in Terms of the Forward


Prediction-Error Filter Coefficients
Equations (3.11) and (3.12) define the backward prediction error by (1) in
terms of the backward predictor coefficients. To reformulate b,,(”) in terms
of the forward prediction-error filter coefficients, we first substitute Eq.
LINEAR PREDICTION 49

(3.12) in (3.11), obtaining


M

by(n) = u(n— M)— Y go(k)u(n — k +1)


R21
Since g9(k) = h)(M — k + 1), from Eq. (3.18), we have

Bei (ret) — > hy(M—k+1u(n—k+1) (3.26)


Adapting the definition of Eq. (3.6) to our present situation, we may write
ifs k=M+1
Be Met) Ay ty 1) Kael VE

Hence, we may rewrite Eq. (3.26) as follows:

by(n) = Sen —k+1)u(n-—k +1)


k=1
Finally, replacing k — 1 with k, we get the desired expression for the
backward prediction error in terms of the forward prediction-error filter
coefficients:
M
by(n)= >) ay(M
— k)u(n —k) (3.27)
k=0
This expression suggests the configuration of Fig. 3.6, based on the forward
prediction-error filter coefficients, for computing the backward prediction
error in response to the tap inputs u(n), u(n — 1),...,u(n — M) arranged
in the same way as for the forward prediction-error computation.

Figure 3.6 Backward prediction-error filter, based on forward prediction-error filter coeffi-
cients.
50 INTRODUCTION TO ADAPTIVE FILTERS

3.3 THE LEVINSON-DURBIN RECURSION

Suppose we know the solution to the set of M+ 1 augmented normal


equations (3.9) for the case of forward linear prediction of order M, and the
requirement is to utilize this solution in order to solve for the forward linear
prediction of order M + 1 (i.e. one order higher). If, indeed, we have the
recursive solution to this problem, then we can start with the elementary
case of order M equal to zero, for which the solution is trivial. We can then
use this result to compute the solution for the order M + 1 equal to one,
and continue in this fashion until we reach the desired value for the order of
the predictor.
To proceed with the development of this order update, define

M
Ayw= >) ay(m)r(M+ 1 =m) (3.28)
m=0

Since the correlation matrix R is symmetric, we have r(M + 1 — m)=r(m


— M — 1). Accordingly, we may also express A,, as follows:

M
Aw= > ay(m)r(m- M-1) (3.29)
m=0

We note that the summation on the right-hand side of Eq. (3.29) equals the
summation on the left-hand side of the augmented normal equations (3.9)
for the case of forward linear prediction of order M and with k = M + 1.
We may therefore combine Eqs. (3.9) and (3.29) into a single set of M + 2
simultaneous equations as follows:

M Pees k = ()

» ay(m)r(m—k) = (0, ea et M (3.30)


ee Ay, k=M+1

In this set of equations the variable k takes on values inside the interval
0 <k <M + 1, whereas the variable m takes on values inside the interval
0 <m<M. Our ultimate aim is to develop a set of augmented normal
equations for forward linear prediction of order M + 1, which requires that
m take on the same range of values as k. To do this, we first recognize that
the prediction-error filter coefficient a,,(M + 1) is zero, because for a filter
of order M this coefficient is nonexistent. This means that ay(M + 1)r(M
+ 1 —k) is also zero, regardless of the value of r(M + 1 —k). The term
ay(M + l)r(M+1-—k) equals ay(m)r(m—k) for m=M +1.
Accordingly, we may extend the summation on the left-hand side of Ed
(3.30) up to M + 1 without affecting the validity of this equation in any
LINEAR PREDICTION 51

way. We may thus write

M+1 Pru» k=0


DY ay(m)r(m—k) = 10, Kee eve (3.31)
m=O Ay, k=M+1
where both k and m now lie inside the same range of values (0, M + 1).
Consider next the augmented normal equations (3.25) for backward
linear prediction of order M. In order to combine Eq. (3.28) with (3.25), we
have to first make the summation on the right-hand side of Eq. (3.28) take
on a form compatible with the summation on the left-hand side of Eq.
(3.25). To do this, we replace m with M — m in Eq. (3.28) and so rewrite
this equation in the form
M
Ay = > ay(M —m)r(m +1) (3.32)
m=0

Now we see that the summation on the right-hand side of Eq. (3.32) equals
the summation on the left-hand side of Eq. (3.25) with k = —1. Hence, we
may combine Eqs. (3.25) and (3.32) into a single set of M + 2 simultaneous
equations as follows:

M ee k= 1

DL ay(M — m)r(m—k) = 40, Fie Ot ar en 3238)


aay Pit: k=M

Here again we see that the variables A and m have different ranges of
values; k lies inside the interval —1 < k < M, whereas m lies inside the
interval 0 < m < M. We may make both & and m take on the same range
of values by again recognizing that the prediction-error filter coefficient
ay (M + 1) is zero. This means that a,,(M + 1)r(—1 — k) 1s also zero,
regardless of the value of r(—1 — k). Since ay(M + 1)r(—1 — k) equals
ady(M— m)r(m—k) for m= —1, it follows that we may extend the
summation on the left-hand side of Eq. (3.33) down to m = —1 without
affecting the validity of this equation in any way. We may thus write
M Ay: k = all

ay Mam) —k)= 4%), k=0,...,M—1 (3.34)


ties NY cymae ed

where both & and m now lie inside the same range of values (— 1, M).
The next manipulation we wish to perform is to combine Eqs. (3.31)
and (3.34) together. However, before we can do this, we have to modify Eq.
(3.34) so that both k and m lie inside the range of values (0, /), as they do
in Eq. (3.31). To satisfy this requirement, we replace m with m — 1, and
replace k with k — 1 in Eq. (3.34), and thus rewrite these equations in the
52 INTRODUCTION TO ADAPTIVE FILTERS

equivalent form

M+1 Ay, k=0


ey trrr hee es Kee GaulRee)
m=(0) lgarre k=M+1

We are now ready for the final step. Specifically, we multiply both sides of
Eq. (3.35) by a constant y,,,,, and then add the resultant to Eq. (3.31),
thereby obtaining
M+1
Be [ayy(m) ny iy (Minit 1)|r(m ke)
m=0(0

Pheut+Yu+14o, k=90
Fie) |
goes eee M (3.36)
Auch yyetee a eS ee

The reason for introducing the constant y,,,, is to give us the extra
degree of freedom we need, in order to ensure that this new set of M + 2
simultaneous equations represents the augmented normal equations for
forward linear prediction of order M+ 2. Let ay,,,(0), ay.,(1),
..+5Ay44,(M + 1) denote the coefficients of a prediction-error filter of order
M + 1. Let P, 474, denote the mean-square value of the forward prediction
fuy+1(”) produced at the output of this filter. Then, using the standard form
for the augmented normal equations for forward linear prediction of order
M + 1, we may write
M+1
P, k=0
ay.i(m)r(m—k)= Fy, mi» (3.37)
Z, sg lo, Ke Nace cs Maal
Accordingly, comparing Egs. (3.36) and (3.37), we may make the following
deductions:

Gyii(m) = ay(m) + Yyaidy(M-mt+1), m=0,1,...,.M+1


(3.38)
Prom = Pp + Yas 4a (3.39)
0 = Ay + Yu+iP s,m (3.40)
However, earlier we showed that P,yy= Py,4, as in Eq. (3.19). Therefore,
using this equality, and using Eq. (3.40) to eliminate Ay from Eq. (3.39), we
get

Pea Pe yl ss hea) (3.41)


There now only remains the problem of determining the identity of the
LINEAR PREDICTION 53

constant Yy41- If we put m = M + 1 in Eq. (3.38), and recognize two facts:


(1) the prediction-error filter coefficient a,,(M + 1) is zero, and (2) the
prediction-error filter coefficient a,,(0) equals one, we get the result

Uva ie Ay+(M 7 1) (3.42)

Let us now try to summarize the results we have obtained thus far, and
develop physical interpretations for them:
1. Equation (3.42), in effect, states that the constant y,,,, simply equals the
last coefficient, a,,,,(@M +1), of the prediction-error filter of order
Maa.
2. Equation (3.41) has the same mathematical form as the equation that
defines the transmission of power through a terminated two-port net-
work. Because of this analogy, the constant y,,,, is referred to as the
reflection coefficient. Equation (3.41) states that, given the reflection
coefficient y,;,,, and the mean-square value of the forward prediction
error at the output of a filter of order M, we may compute the
mean-square value of the forward prediction error at the output of the
corresponding filter of order M + 1. Note that if the mean-square value
of the forward prediction error is to decrease (or, at worst, remain the
same) as the filter order increases (that is, P, yw, < Py), then we
require that |y,,.,| < 1.
3. The recursive relation of Eq. (3.38) states that, given the reflection
coefficient y,,,, and the coefficients of a prediction-error filter of order
M, we may compute the coefficients of the corresponding prediction-
error filter of order M + 1. This recursive relation is called the Levinson-
Durbin recursion.
To initiate the recursion, we start with the elementary case of a prediction-
error filter of order M = 0. If we put M = 0 in Eq. (3.8), we immediately
find that, since a,)(0) equals one,
aa r(0) (3.43)

where r(0) is the autocorrelation function of the filter input for a lag of zero,
that is, the mean-square value of the filter input.

Example 1
Consider a prediction-error filter of order 3, whose input is a stationary
ergodic process. With u(n), u(n — 1), u(n — 2), u(n — 3) denoting the tap
inputs, we may use the time average
1 M
Pov M1 d, u*(n—m)
m=()

l|
1[u?(n) + u2(n — 1)+u?(n—2)+u?(n—3)| (3.44)
54 INTRODUCTION TO ADAPTIVE FILTERS

as the estimate of P,9.Given P,, and the values of the reflec’ on coefficients
1, Y2. 3, we may proceed as follows:
ae pel av

2. For the prediction-error filter of order 1, shown in Fig. 3.7(a), we have

Pay - Pro ma vr)

a,(0) = 1
Gey;
3. For the prediction-error filter of order 2, shown in Fig. 3.7(b), we have

Pes Fe Pra ll = vs )

a0) al
a,(1) = a,(1)+ y24,(1)
a,(2) AD

4. For the prediction-error filter of order 3, shown in Fig. 3.7(c), we have

ie P, (1 — 5)
a,(0) = 1
a3(1) = a,(1) + ¥342(2)
a;(2) = a,(2) + y3a,(1)
a;(3) = y;

Observations
Based on the results of this example, we may make the following observa-
tions:

1. Knowledge of the reflection coefficients y,,y3,...,Yy is sufficient to


completely determine the coefficients of the prediction-error filter of
order 1, those of the prediction-error filter of order 2, and so on, right up
to the prediction-error filter of order M.
2. For a stationary ergodic input, the mean-square value of the forward
prediction-error at the output of the filter of order M is determined by
M

Prom = Py IT(1 _ Yi) (3.45)

where P,, is the average of the squared values of the tap inputs.
f,(n)

Figure 3.7 Prediction-error filter of (a) order 1, (b) order 2, (c) order 3.

Do
SO INTRODUCTION TO ADAPTIVE PLETERS

3. The computation proceeds on a stage-by-stage basis, such that when the


coetlicients of the prediction-error filter of order M have been computed,
we will have also computed the coeflicients of the preceding prediction.
error filters of orders Mo~ 1,..., Me
Clearly, the reflection coeflicients play a significant role in the characteriza
tion of linear prediction of stationary ergodic processes, In Chapter 6 we
describe procedures for using a known time series to estimate the reflection
coeflicients, and thereby supply the Levinson) Durbin recursion the parame-
ter values it needs.

3.4 MINIMUM-PHASE PROPERTY OF FORWARD


PREDICTION-ERROR FILTERS

The input-output relation of a prediction-error filter may be deseribed in the


time domain or the frequency domain, Consider a forward prediction-error
filter whose coefficients are denoted by the sequence ay,/(0), ay,(1),
..,@y,(M), as in Fig. 3.2. This sequence of numbers defines the impulse
response Or unii-sample response of the filter, in that if we apply the
unit-sample sequence 1,0,0,...,0 to the filter input, the sequence
Qrq(0), @y(1), . 055 ay,(M) is produced at the filter output. In effect, this
unit-sample response provides the time-domain description of the filter,
Correspondingly, the input-output relation of the filter is described by the
convolution sum of Eq. (3.10), reproduced here for convenience:
Mf
tu(n) = ¥ ay(k)u(n - k) (3.46)
k=O
Where the sequence u(r), u(m = 1)... u(r = M) denotes the filter input,
and fy,(7) ts the forward prediction error produced at the filter output at
time m. Equation (3.46) states that the forward prediction error at the filter
output is produced by convolving the filter input with the impulse response
of the filter.
Paking the z-transform of both sides of Eq. (3.46), we may write
\l
Z| fy(n)] Z > ay (k)u(n =k) (3.47)
k=O
Where Z denotes the z-transform operator, Let Fy(z) denote the z-trans-
form of the forward prediction-error sequence:

Fy (2) Z| fy(n)]
\/

> fae (3.48)


neQ
LINEAR PREDICTION 57

Let U(z) denote the z-transform of the sequence at the filter input:

U(z) = Z[u(n)]
M
=) uln)z (3.49)
n=()

Let Ay,(z) denote the z-transform of the sequence represented by the


prediction-error filter coefficients:

Ay(z) = Z[ay(k)]
M
= Vaylk)z* (3.50)
k=0

The only reason for using & as the time variable in Eq. (3.50) rather than n
is to conform to the notation used in Eq. (3.47). Then, using the /inearity
and time-shifting properties of the z-transform, as well as the definitions
given in Eqs. (3.48), (3.49), and (3.50), it is shown in Appendix 2 that a
linear convolution sum as in Eq. (3.46) may be transformed as follows:

Fy) =A, z)U(z) pe tees)

Equation (3.51) states that the convolution of two sequences in the time
domain is transformed into the product of their respective Z-transforms.
The ratio of the z-transform of a filter output to the z-transform of the
filter input is called the transfer function of the filter. Except for a scaling
factor, the transfer function is uniquely defined by its poles and zeros. The
poles are obtained by solving for the roots of the denominator polynomial,
expressed as a function of z. The zeros are obtained by solving for the roots
of the numerator polynomial of the transfer function, expressed as a
function of z.
Accordingly, A,,(z) represents the transfer function of the prediction-
error filter. From Eq. (3.50) we see that, except for a pole of order M at the
origin, the transfer function Aj,,(z) consists only of zeros. The prediction-
error filter is therefore said to be an all-zero filter.
When the transfer function is evaluated for points on the unit circle,
that is, for z = e/®, we get the frequency response of the filter. Thus, putting
z = e/® in Eq. (3.50), we get the following expression for the frequency
response of a prediction-error filter:
M

Ayle)= 2, aylkje (3.52)


k=0

The amplitude response of the filter is defined by the magnitude of A,,(e/*),


and the phase response of the filter is defined by the argument of A,,(e/*).
58 INTRODUCTION TO ADAPTIVE FILTERS

Example 2
For the prediction-error filter of order 1, shown in Fig. 3.7(a), the
transfer function equals (using the results of Example 1)

A,(z) = a,(0) + a,(1)z~*


=l1+ Viz
This transfer function has a single pole at z = 0 and a single zero at
z = —y,. For the case when the reflection coefficient has a magnitude less
then one, the zero lies inside the unit circle, as in Fig. 3.8(a). In this figure,
we have used the value y, = 0.5. For z = e/®, the frequency response of the
prediction-error filter of order one equals
A,(e/*)=1+y,e7%?
= (1+ y,cosw) — jy, sinw
The amplitude response of the filter equals
|A,(e/”)| = [(2 + 7, cos w )” +(ysinw)"|

= (1 + 2y,cosw + y2cos2w + yZsin’w)'””


= (1 + 2y,cosw + yz)?
The phase response of the filter equals
y, sinw
arg| A,(e/*)] = —tan-!
1 + y, cos w |
In Fig. 3.8(b) we have plotted the amplitude response and phase response of
the filter for y, = 0.5.
Consider next the prediction-error filter of order 2, shown in Fig. 3.7(b).
The transfer function of this filter equals (using the results of Example 1)
A(z) =a,(0)ras (Lz an (2)2—
lite bynereyaviZameiiaeeo
This transfer function has a double pole at z = 0 and two zeros located at
the roots of the quadratic equation

z* + (yi, F Yovi)2 + Ye =.0


The roots of this equation are
2 Le
42> —2 > wy) La(n a sie |
Three different situations can arise, depending on the values of the reflection
coefficients y, and y,:
1. The two zeros are coincident, as shown by

21,2, = —3(% 3 12%)


Imaginary part of z

Real part
of z

OW Zero
x Pole

(b)

Figure 3.8 Characteristics of prediction-error filter of order 1 for reflection coefficient y, = 0.5:
(a) pole—zero pattern, (b) amplitude and phase responses.

53
Imaginary part
x Double pole of z

© Double zero

Real part
of z

Unit circle

(a)

Imaginary part
& Double pole of z
Single zero

Real part
of z

Unit circle

(b)

Imaginary part
* Double pole
O Single zero |

Real part
ofz

Unit circle

Figure 3.9 Characteristics of prediction-error filter of order 2:2: (a) pole-zero pattern for
reflection coefficients y; = 0.5 and y, = 0.072; (b) pole—zero pattern for reflection coefficients
y, = 0.5 and y; = —0.6; (c) pole-zero pattern for reflection coefficients y, = 0.5 and y, = 0.2;
(d) amplitude and phase responses for (a), (b), and (c).

60
LINEAR PREDICTION 61

|\A,fe”)|

arg A,(e/))

(dj

Vigure 3.9 ( continued)

This occurs when y, and 7, satisfy the condition

AGA ma tony =a
This situation is iMustrated in the pole—zero pattern of Fig. 3.9(a) for
Y, = 0.5
% = 0.072
for which the two zeros lie at z,,z,= —0.268. The corresponding
amplitude and phase responses are shown as curves a in Fig. 3.9(d).
62 INTRODUCTION TO ADAPTIVE FILTERS

2. The two zeros are real and unequal. This occurs when y, and y, satisfy
the condition

AGsie: rn) a)
This situation is illustrated in the pole-zero pattern of Fig. 3.9(b) for
Y= 0
Y= —0.6
for which the two zeros lie at z; = —0.881 and z, = 0.681. The corre-
sponding amplitude and phase responses are shown as curves b in Fig.
3.9(d).
3. The two zeros are complex conjugates. This occurs when y, and y, satisfy
the condition
2
(n+ mn) <%
This situation is illustrated in the pole—zero pattern of Fig. 3.9(c) for
Yor Oo
Yo = 02
for which the zeros lie at z,,z, = —0.3 + j0.332. The corresponding
amplitude and phase responses are shown as curves c in Fig. 3.9(d).

Conditions for Minimum-Phase Property


The first-order prediction-error filter and the three versions of the second-
order prediction-error filter considered above have the following common
features:
1. The reflection coefficients y, and y, have a magnitude that is less than
one.
2. The zeros of the pole—zero patterns lie inside the unit circle in the
z-plane.
3. The phase response associated with the amplitude response is the mini-
mum possible; any change in the amplitude response has a corresponding
effect on the phase response. In particular, the polarity of the phase
response at any frequency is the same as the polarity of the s/ope of the
amplitude response at that frequency.
A filter whose phase response, for a prescribed amplitude response, is the
minimum possible is said to be minimum-phase. We conclude therefore that
the forward prediction-error filters considered here are examples of a
minimum-phase structure.
Indeed, we may go one step further and state that if the reflection
coefficients y,, ¥2,---,Yy Of a forward prediction-error filter of order M all
LINEAR PREDICTION 63

have a magnitude less than one, then all the zeros of the transfer function of
the filter lie inside the unit circle, and the filter is minimum-phase. As a
corollary, we may state that if any one of the reflection coefficients has a
magnitude equal to or greater than one, the prediction-error filter is non-
minimum-phase.
On other point that is noteworthy is the fact that when the forward
prediction-error filter is designed to be minimum-phase, the corresponding
backward prediction-error filter (obtained by reversing the order of the
forward prediction-error filter coefficients, as in Fig. 3.6) is automatically
maximum-phase in that the phase response associated with its amplitude
respond is the maximum possible. In such a case, the zeros of the transfer
function of the backward prediction-error filter are all located outside the
unit circle in the z-plane.

3.5 WHITENING PROPERTY OF PREDICTION-ERROR


FILTERS

Consider a stationary process {w(n)}, n= 0, +1, +2,..., “where each


sample has zero mean and variance o*. We say that such a process is a
white-noise process if it consists of a sequence of uncorrelated random
variables, as shown by

E[w(k)w(n)] = ie se (3.53)

White noise has no information content, in the sense that the value of the
process at time n is uncorrelated with all past values up to and including
time n — 1 (and, indeed, with all future values of the process).
We may now state another important property of a prediction-error
filter. In theory, a prediction-error filter of order M can whiten any
stationary input process represented by the sequence u(n), u(n —
1),...,u(n — M), provided that the order of the filter, M, is sufficiently
large. For this reason, a prediction-error filter designed to whiten a sta-
tionary input process is called a whitening filter. Basically, prediction relies
on the presence of correlation between adjacent samples of the input
process. The implication of this is that as we increase the order of the
prediction-error filter, we successively reduce the correlation between adjac-
ent samples of the process applied to the filter input, until ultimately the
prediction-error process at the filter output consists of a sequence of
uncorrelated samples, and the whitening of the original process is thereby
accomplished.
64 INTRODUCTION TO ADAPTIVE FILTERS

3.6 AUTOREGRESSIVE MODEL

Forward linear prediction and autoregressive modelling of a random pro-


cess are intimately related to each other. We say that a random process
{u(n)} is an autoregressive (AR) process of order M if its sample u(n) at
time n is regressed on M past samples, u(n — 1), u(n — 2),...,u(n — M),
as shown by
M
u(n) = >, ho(k)u(n
— k) + w(n) (3.54)
k=1
where {h,(k)}, k = 1,2,..., M, are constants, and {w(1)} is a white-noise
process. In Eq. (3.54), w() serves as the input, producing the output u(7n).
The summation term on the right-hand side of this equation represents a
forward linear prediction of u(n), based on the M past samples u(n —
1), u(n — 2),...,u(n — M). Accordingly, we may generate an AR process
of order M by using the structure of Fig. 3.10, which includes a forward
linear predictor of order M in its feedback path.
The transfer function H,(z) of the linear forward predictor equals
M

Hy(z) = b) ho(k)z* (3.55)


k=1

Hence, the transfer function of the AR model of Fig. 3.10, viewed as a


linear feedback system, equals

Hag(z) = TAY
Aaa

SAM (3.56)
bao) elke
k=1
The prediction-error filter coefficients are related to the predictor coefficients
by Eq. (3.6). Hence, using Eq. (3.6) in (3.56), we get

Hag(z) = M

AOE) (3:57)

This shows that the transfer function of the AR model in Fig. 3.10 equals
the inverse of the transfer function of the prediction-error filter. Accord-
ingly, the AR model of Fig. 3.10 is often referred to as an inverse filter.
Earlier we indicated that a prediction-error filter is an all-zero filter,
From Eq. (3.57) it follows therefore that an AR model or inverse filter is an
all-pole filter in that its transfer function, except for a multiple zero at the
LINEAR PREDICTION 65

u(n)

wg?
= 7)

u(n
— M)

Predictor

Figure 3.10 Autoregressive model or inverse filter.

origin, consists only of poles. For the AR model or inverse filter of Fig. 3.10
to be stable, the transfer function Hyp(z) must have all of its poles inside
the unit circle in the z-plane. Equivalently, in view of Eq. (3.57), we may
state that A,,(z) the transfer function of the prediction-error filter must
have all of its zeros inside the unit circle. In other words, the prediction-
error filter, represented by the transfer function A,,(z), must be minimum-
phase.
This restriction on H,yp(z) or Ay(z) may be derived from statistical
considerations, as illustrated in the following two examples.

Example 3
Consider an AR process of order 1, described by
u(n)
= h,(1)u(n — 1) + w(n) (3.58a)
66 INTRODUCTION TO ADAPTIVE FILTERS

where /)(1) is a constant, and { w(1)} is a white noise process of zero mean
and variance o”. We wish to find the mean and autocorrelation function of
the process {u(n)}.
We start by rewriting Eq. (3.58a) in the form of a linear first-order
difference equation:
u(n)
— hy (1)u(n — 1) = w(n) (3.58b)
It is well known that, in the classical method of solving linear difference
equation with constant coefficients, the solution consists of the sum of two
parts: the complementary solution and the particular solution. Here, the
complementary solution is the solution of the homogeneous equation
u(n) — ho(1)u(n — 1) = 0
yielding an exponential function of the form Ch{(1), where C is a constant.
The particular solution is most conveniently obtained by using the unit-
delay operator z~' to relate the delayed sample u(n — 1) to u(n). Specifi-
cally, we may write

u(n — 1) = 2 '[u(n)]
where z | plays the role of an operator. Accordingly, we may rewrite Eq.
(3.58b) as

(1 — Ao(1)z-")[u(n)] = w(x)
Moving the operator et —h,(1)z_') to the right-hand side to operate on
w(n), we have

u(n) aero eee) (3:39)


Since

ea ieee ee
| | =
= —

ox —
N

we may use Eq. (3.59) to express the particular solution as follows:

Hn) ll

Oa =[48
1
> oe—es —
N aa)
Il
Ms > Or —— —
Ny
~~
= —= —

> ll oS

lI |
Ms = —

ox
=
~—”
=
o—~ = —

> ll io)
LINEAR PREDICTION 67

Summing the complementary function and the particular solution obtained


above, the general solution of Eq. (3.58a) is therefore
[oe]

u(n) = Ch§(1) + YAK) w(n - k) (3.60)


k=0
where the constant C is arbitrary. Its value is determined by the initial
conditions. Thus, for example, assuming that u(0) = 0, we find from Eq.
(3.60) that

~ hho 1)w(—-k) (3.61)

Substituting Eq. (3.61) in (3.60), we get

oe he) k=0
ncn “Ky + EC Aes
- : hi (1)w(n — k) + 2 hk (1)w(n — k)

I
Di AG(1)w(n — k) | (3.62)
k=0

Since E[w(n)] = 0 for all n, then taking the mathematical expectation of


both sides of Eq. (3.62), we find that u(n) has zero mean:
E[u(n)] =0 for all n.
The autocorrelation function of u(n) equals E[u(n)u(n — /)]. Thus, using
Eq. (3.62), we may write
iil
E[u(n)u(n-/)] =E DL AK) w(n - KY hy(1)w(n — 1 - |
k=0 i=0
ASM jal

J Bie bo (wor eik) wn |


k=0 i=0
(alba
=> VA O)E[w(n-k)w(n-1-i)] (3.63)
k=0 i=0

Since { w(n)} is a white-noise process, we have


D ‘
=, 25g CO ay tes ie Ie = | sey
AD AAALAC, ) i k#l+i
Accordingly, we may simplify Eq. (3.63) as follows:

E[u(n)u(n — 1)| = 07h 1) 5 wa


68 INTRODUCTION TO ADAPTIVE FILTERS

This is a geometric series with first term equal to 07h, ‘(1), geometric ratio
equal to A5(1), and number of terms equal to n. Hence, using the formula
for the sum of a geometric series, we may express the autocorrelation
function of u(n) as follows:

E[u(n)u(n
— 1)] = 07ho‘(1) 1 = h"Q) (3.64)
1 — A(1)

5) 4 3 2 1 0 1 2 3 4 5

E[u(n)u(n
— 1)]

Figure 3.11 Autocorrelation function of asym


hg(1) > 0, (b) ho(1) < 0. prowcally stationary AWW process 1 order I (2)
LINEAR PREDICTION 69

We thus see that the autocorrelation function of u(n) is a function of n,


indicating that the process {u(n)} is not stationary up to order 2. However,
if |ho(1)| < 1, then we may argue that for n sufficiently large

E[u(n)u(n = /)| =
o*hy (1)
ey (3.65)

The right-hand side of Eq. (3.65) is now a function of / only, and we may
say that the process {u(n)} is asymptotically stationary up to order 2.
Thus the condition for an autoregressive process of order 1, described
by Eq. (3.58), to be asymptotically stationary up to order 2 is that |A)(1)| < 1.
Remembering that the constant h,(1) may assume a positive or negative
value, we find that the dependence of the autocorrelation function of Eq.
(3.65) on the lag / may take on either one of the two forms shown in Fig.
3.11. If hj (1) > 0, the autocorrelation function decays to zero exponentially
as in Fig. 3.11(a). If, on the other hand, /,(1) < 0, it alternates in sign, as in
Pigs 3-00),
It is also of interest to note that if the general solution given in Eq.
(3.60) is to represent an asymptotically stationary process, then the comple-
mentary solution represented by the first term Ch5(1) must decay to zero as
n approaches infinity. This shows, once again, that the condition for
asymptotic stationary is |h,(1)| < 1. When this condition is satisfied, and
the complementary solution has effectively decayed to zero, we find that the
steady-state behavior of the process {u(n)} is described purely by the
second term of Eq. (3.60). This part of the general solution is therefore
called the stationary solution of Eq. (3.58).

Example 4
Consider next an AR process of order 2, described by

u(n) =ho(1)u(n — 1) + ho(2)u(n


— 2) + w(n) (3.66)
or equivalently
u(n) + a,(1)u(n — 1) + a,(2)u(n — 2) = w(n) (3.67)
where a,(1) = —h (1) and a,(2) = —A (2) are constants,* and {w(7)} is a
white-noise process of zero mean and variance o*. Here again we wish to
determine the conditions required for this AR process to be asymptotically
stationary up to order 2.
Using the unit-delay operator z~', we may rewrite Eq. (3.67) in the
form

(1 uy a,(1)z~° “i a,(2)z 7)[u(n)] = w(n)

*In this example, we find it convenient to work with a,(1) and a5(2) rather than /o(1)
and ho(2).
70 INTRODUCTION TO ADAPTIVE FILTERS

or
(1 Pia) — p,z~*)[u(n)] = w(n) (3.68)

where p, and p, are roots of the characteristic polynomial


Wnt) Las (lz aan (3.69)
A particular solution of Eq. (3.68) is given by
1
u(n) = [w(n)]
(1 BiSe wl vi brea}

aries tueeascuee?
=| Ete tS obese lo(
Pim aleDpy*z*[w(n)] — E eh ‘wn

“ ral,Dpy w(n — k) — a hw(n = 2)


otk
2- (a—-
— J =k)
Pi

The eo ante solution is the solution of the homogeneous equation


un) + as( lun aL) 4a (2) ular = 2) =O
The solution of this equation is of the form C,p) + C,e3, where C, and C,
are constants. The general solution of the linear second-order difference
equation (3.67) is therefore
ri aK
u(n) = C,p2 + C03 Deie. Poa (3.40)
Py
Following the same argument used in Example 3, dealing with an AR
process of order 1, it is clear that if Eq. (3.67) is to represent an asymptoti-
cally stationary process, the complementary solution represented by C, py +
C,p3 must decay to zero as n approaches infinity.
Accordingly, for asymptotic stationarity, we require that
(p\| < 1 ands |o5|/=1 (3¢71)
When these conditions are satisfied, and the complementary solution has
effectively decayed to zero, the steady-state behavior of the AR process
{u(n)} is described purely by the summation term of Eg. (3.70), which we
call the stationary solution. It is straightforward to verify that these condi-
LINEAR PREDICTION 71

tions also ensure that the autocorrelation function E[u(n)u(n — /)] con-
verges to a finite value as n approaches infinity.
To express the conditions of Eq. (3.71) for asymptotic stationarity in
terms of the coefficients a,(1) and a,(2), we consider the following cases:
1. The roots p, and p, are complex or coincident. This occurs when

4a,(2) = a3(1)
In this case, we have

[Pil = |e2| = /4>(2)


Correspondingly, the condition for asymptotic stationarity becomes
a,(2) < 1. We may illustrate this graphically by representing the pair of
coefficients a,(1), a,(2) as a point in the (a,(1), a,(2))-plane, as in Fig.
3.12. The points lying on the parabola correspond to coincident roots for
which 4a,(2) = a3(1). The points inside the shaded area correspond to
complex roots for which 4a,(2) > a3(1).
. The roots p, and p, are unequal and real. This occurs when

4a,(2) < a3(1)


The characteristic polynomial A,(z) of Eq. (3.69) attains its minimum
value when z = —a,(1)/2. Hence, for the roots of this polynomial to lie
between —1 and 1, we must have

or equivalently
|a,(1)|< 2

a,(2)

1.0

| |
| |
| |
| |
| |
| | |
/| a,(1)
= 10 0 1.0

Figure 3.12 Illustrating the conditions for the roots p, and p to be complex conjugates.
72 INTRODUCTION TO ADAPTIVE FILTERS

Also we must have 4,(1) > 0 and A,(—1) > 0, where 4,(1) and A,(—1)
are the values of A,(z) for z = 1 and z = —1, respectively. The require-
ment A,(1) > 0 yields

a,(1) + a,(2) > -1


The requirement A,(—1) > 0 yields

a,(1) — a,(2) <1


Thus, for the case when the roots p, and p, are real and unequal, we
require that for asymptotic stationarity the point (a,(1), a,(2)) lie inside
the triangle defined by

and

ax(l) =a, (2)


as shown by the shaded region in Fig. 3.13.

a,(1)

Figure 3.13 Illustrating the condition for the roots p, and > to be real and unequal.
LINEAR PREDICTION 73

Conditions for Asymptotic Stationarity of an AR Process of Order M


Having studied AR processes of order 1 and 2 in some detail, we may now
go back to Eq. (3.54), which defines an AR process of order M. We may
rewrite this equation in the form of linear difference equation of order M:
M
Y ay (k)u(n — k) = w(n) (3.72)
k=0
where the set of constants {a,,(k)}, k = 0,1,..., M, are related to {h,(k)}
by Eq. (3.6), and {w(n)} is a white-noise process. Define the characteristic
polynomial
M
Ay (2) = eran (eee
k=0
For the AR process {u(n)} defined by Eq. (3.72) to be asymptotically
stationary, the roots of the characteristic polynomial A,,(z) must all have
magnitude less than one. This condition is equivalent to the requirement
that all the roots of the characteristic polynomial A,,(z) lie inside the unit
circle in the z-plane. Under this condition, the complementary solution
decays to zero as n approaches infinity, and correspondingly the autocorre-
lation function of the AR process approaches a finite value.
We observe that the characteristic polynomial A,,(z) equals the trans-
fer function of a prediction-error filter whose coefficients are represented by
Ay (0), dy (1),...,@y(M). We conclude therefore that condition for the
asymptotic stationarity of the AR process {u(n)} is exactly the same as the
condition for the minimum-phase property of the corresponding
prediction-error filter.

3.7 IMPLICATIONS OF THE WHITENING PROPERTY OF A


PREDICTION-ERROR FILTER AND THE AUTOREGRESSIVE
MODELLING OF A RANDOM PROCESS

In Section 3.5 we discussed the ability of a prediction-error filter to whiten a


random process applied to the filter input. In Section 3.6 we discussed the
modelling of a random process as an autoregressive (AR) process. Indeed,
we may view these two operations as complementary, as explained below:
1. We may view the operation of prediction-error filtering, applied to a
random process {u(n)} of zero mean, as one of analysis. In particular,
we may use such an operation to whiten the process {u(n)} by choosing
the prediction-filter order M sufficiently large. Then the forward predic-
tion-error process {fy(n)}, produced at the filter output, consists of
uncorrelated samples. When this unique condition has been established,
u(n — M + 1)

Figure 3.14 (a) Analysis of a stationary process using prediction-error filtering. (b) Synthesis of
an asymptotically stationary process using an all-pole inverse filter.

74
LINEAR PREDICTION 75

the random process {u(n)} is represented by the set of prediction-error


filter coefficients, {a,,(k)}, k = 1,2,..., M, and the mean-square value
Py of the forward prediction error fy(). This prediction-error filter,
consisting of an all-zero filter, is illustrated in Fig. 3.14(a).
2. We may view the use of an all-pole inverse filter to generate an AR
process {u(m)} as one of synthesis. In particular, given the set of
constants {ay(k)}, k = 1,2,...,M, and a white-noise process { w(n)}
of zero mean and variance o* = Py, we may generate the AR process
{u(n)} by using the structure shown in Fig. 3.14(b).

Thus, the two-filter structures of Fig. 3.14 constitute a matched pair. The
prediction-error filter in part (a) of the figure is minimum-phase, with the
zeros of its transfer function located at exactly the same positions (inside the
unit circle in the z-plane) as the poles of the transfer function of the inverse
filter in part (b). This assures the stability of the inverse filter or, equiva-
lently, the asymptotic stationarity of the AR process generated at the output
of this filter. Note also that the impulse response of the prediction-error
filter has a finite duration, whereas the impulse response of the inverse filter
has infinite duration.
The principles described above provide the basics of linear predictive
coding (LPC) vocoders for the transmission and reception of digitized
speech (see Example 3 of Section 1.4).

3.8 THE LATTICE PREDICTOR

Consider a prediction-error filter of order M, which is operated in the


forward direction, as depicted in Fig. 3.2. The input-output relation of this
filter is defined by Eq. (3.10), reproduced here for convenience:
M
fu(n) = Le ay(m)u(n — m) (3.73)
m=0

This filter is designed to minimize the mean-square value of the forward


prediction error fy(n), which is equal to the difference between the actual
value of the input u(r) at time n and its predicted value based on the set of
past samples (mn —.1),u(n'— 2),....u(n — MM).
When the prediction-error filter is operated in the backward direction,
as in Fig. 3.6, the input-output relation of the filter is defined by Eq. (3.27),
reproduced here for convenience:
M
by (i= ay — a i) (3.74)
m=(0

In this case, the prediction-error filter is designed to minimize the mean-


square value of the backward prediction error by,(n), which is equal to the
76 INTRODUCTION TO ADAPTIVE FILTERS

difference between the actual value of the input u(m — M) at time n- M


and its predicted value based on the set of input samples u(n),
UO — eee UConn aes 1) .
Suppose we now increase the filter order to M + 1. Then, for operation
in the forward direction, we have
M+1
im) = a dy.\(m)u(n it) (3.75)
m=(0

Substituting the Levinson—Durbin recursion of Eq. (3.38) in (3.75), we get


M+1 M+1
Iuxi(t) = >» ay(m)u(n — M) + Yy41 3d ay(M+1—m)u(n-m)
m=0 m=0
M M+1

= Y ay(m)u(n—M) + yma. Le ay (M+ 1—-m)u(n— m)


m=(0 m=1

(3.76)
where in both terms of the last line we have used the fact that a,,(M + 1) is
zero. The first summation term in the right-hand side of Eq. (3.76) is
recognized as the forward prediction error produced by a prediction-error
filter of order M. For the second summation term, we substitute m for
m — 1, and so find that this term is equal to the backward prediction error
produced by a prediction-error filter of order M, but delayed by one sample
period. We may thus simplify Eq. (3.76) as follows

fuel”)
= fa") Fy by (" — 1) (3.17)
Next, we recognize that when a prediction-error filter of order M + 1 is
operated in the backward direction, we have the input-output relation
M+1
byai(n) = DY ayy;(M+1—-m)u(n—-m) (3.78)
m=0(0

From the Levinson—Durbin recursion of Eq. (3.38), we have


Ay4)(M + 1—m) =ay(M + 1l—m) + yy414y(m),
m=O a Ml (79)
Therefore, substituting Eq. (3.79) in (3.78), we get
M+1 M+1
bysi(n) = Lo ay(M+1-—m)u(n-m) +r, ¥ ady(m)u(n — m)
m=0 m=0
M+1 M
= )) ay(M+1—-m)u(n—-m) + Yu+1 0, @y(m)u(n — m)
m=1 m=(0)

(3.80)
LINEAR PREDICTION 77

where in both terms of the last line we have used the fact that a,,(M + 1) is
equal to zero. As before, the first summation term on the right-hand side of
Eq. (3.80) is equal to the backward prediction error produced by a predic-
tion-error filter of order M, but delayed by one time unit. The second
summation term is simply equal to the forward prediction error produced
by a prediction-error filter of order M. Hence, we may simplify Eq. (3.80) as
follows
byai(1)
= by(n — 1) + Yusitu(”) (3.81)

vu (li)

Stage 1

Figure 3.15 (a) Structure of a single-stage lattice predictor. (b) Structure of a multistage lattice
predictor.
78 INTRODUCTION TO ADAPTIVE FILTERS

The pair of recursive relations in Eqs. (3.77) and (3.81), involving the
forward and backward prediction errors, may be represented as in Fig.
3.15(a).
Note that for the elementary case of M = 0, Eqs. (3.73) and (3.74)
reduce to

fo(n) = bo(n) = u(x) (3.82)


Therefore, starting with M = 0, and increasing the filter order by one at a
time, we obtain the lattice equivalent model shown in Fig. 3.15(b) for a
prediction-error filter or order M + 1. The model is so called because each
stage, in its realization, has the appearance of a Jattice.
The lattice structure of Fig. 3.15(b) combines both the forward and
backward operations of a prediction-error filter into a single structure. It
consists of a number of stages equal to the order of the prediction-error
filter. Each stage of the model is characterized completely by specifying the
pertinent value of the reflection coefficient.
A noteworthy feature of the lattice structure is that, compared with the
tapped-delay-line realization shown in Fig. 3.2, it exhibits a relatively low
sensitivity to roundoff errors resulting from its implementation on a digital
computer operating with finite-word-length arithmetic. Other important
properties of the lattice predictor are discussed in Sections 3.9 and 3.10.

3.9 ORTHOGONALITY OF THE BACKWARD


PREDICTION ERRORS

An important property of the lattice predictor of Fig. 3.15(b) is the fact that
the backward prediction errors resulting at the various stages of the model
are orthogonal to each other. That is,

P i=k
E|\b(n)b,(n
[b,(7)b,(n)] 5 ke
\o. paeplesa (3.83 )

To prove this important property, we note that, by definition,

b(n) = )) a,(i- m)u(n— m) (3.84)


m=(0

and
k
b(n) = 2) a,(k — p)u(n
— p) (3.85)
p=0

where a,(m), m = 0.1,..., i, are the coefficients of a prediction-error filter


LINEAR PREDICTION 79

of order i, and u(n) is the input signal at time n. Therefore,

ALR OLA baa DB OEE Oa.Ihc


eme Cra
m=0 p=0

Dae
m=0 p=0
ete ONE EL alr =a up
Se rarerp= ni
m=0 p=0
(3.86)
where r( p — m) is the autocorrelation function of the predictor input for a
lag of p — m. However, from the augmented normal equations for back-
ward prediction, we have [see Eq. (3.25)]
k
Lak pr(o- male mak G8
Therefore, if i = k, we find that

E[b,(n)b,(n)] =P4;,(0)
=P, > 43788)
If, on the other hand, 7 < k — 1, we find that
E[b,(n)b,(n)] = 0 (3.89)
Hence, the backward prediction error b,(n) at stage i of the equivalent
lattice model and the backward prediction error b,(n) at stage k are
orthogonal for i # k.
The lattice structure of Fig. 3.15(b), in effect, transforms the input time
series u(n), u(n — 1),...,u(n — M) into another time series made up of the
backward prediction errors b(n), b,(n),...,5,,(n), which are orthogonal
to each other. No loss of information whatsoever is incurred in the course of
this transformation. The implications of this important property of the
lattice structure of Fig. 3.15(b) will be discussed in Chapter 6.

3.10 RESTRICTION ON THE REFLECTION COEFFICIENTS


OF A LATTICE STRUCTURE RESULTING FROM THE
POSITIVE DEFINITENESS OF THE CORRELATION
MATRIX OF THE LATTICE INPUT

In this section we will show that if the correlation matrix of the sequence of
samples applied to the input of a multistage lattice structure is positive
definite, then all the reflection coefficients of this filter have a magnitude less
than one, and vice versa.
80 INTRODUCTION TO ADAPTIVE FILTERS

Consider a sequence of backward prediction errors bo(n), )(7),


_.. by(n) produced by a multistage lattice structure of order M in response
to the sequence of samples u(n),u(n — 1),...,u(n — M) applied to the
lattice input. As indicated in the previous section, these two sequences are
related as follows:
k
bilnj= = alk — mun =m); b= 0; 1p 05M 1390)
m=0

where a,(k — m), m= 0,1,...,k, are the backward prediction-error filter


coefficients of order k. Define the (M + 1)-by-1 vector of backward predic-
tion errors:

by(n)
b,(n
b(n) = iC ) (3.91)
by (1)
Define the (M + 1)-by-1 input vector

u(n)
Uti h
u(n) = ( ' (3.92)

u(n — M)
Define the (M + 1)-by-(M + 1) lower triangular transformation matrix
1 0 0 ‘ae AD
a,(1) 1 0 0

L=| 4(2) a(1) 1 0] (3.93)

isn (Mi), - Aye Mish nay CMaec Fst al


Then we may rewrite the set of M + 1 equations (3.90) in the following
matrix form:
b(n) = Lu(n) (3.94)
Note that:
1. The nonzero elements of row k of the matrix L equal the coefficients of a
backward prediction-error filter of order k.
2. All the diagonal elements of the matrix L equal unity.
3. All the elements above the main diagonal are zero.
Since the determinant of a triangular matrix equals the product of its
diagonal elements, it follows from (2) above that the lower triangular matrix
L has a determinant equal to unity. Hence, the matrix L is nonsingular.
LINEAR PREDICTION 81

Let the (M + 1)-by-(M + 1) matrix S denote the correlation matrix of


the sequence of backward prediction errors {b,,(n)}. That is,

S = E|b(n)b7(n)] (3.95)
Since the backward prediction errors are orthogonal to each other, we have
[see Eq. (3.83)]

E[b(m)o(mi= (oe OTE


Hence, the correlation matrix S is a diagonal matrix with its diagonal
elements equal to the prediction-error powers Po, P,,..., Py that pertain to
predictor orders 0,1,..., M, respectively:
S =idiae(Pa ls). eri) (3.96)
Substituting Eq. (3.94) in (3.95), and noting that b’7(n) = u’(n)L’, we
get
S = E[Lu(n)u7(n)L|
= LE[u(n)u7(n)|L7
= LRL’ © (3.97)
where R = E[u(n)u’(n)] is the (M + 1)-by-(M + 1) correlation matrix of
the input sequence u(n),u(n — 1),...,u(m — M). The transposed matrix
L’ is an upper diagonal matrix.
Earlier we stated that the lower triangular transformation matrix L is
nonsingular. Accordingly, we may use a theorem in matrix algebra, which
may be stated as follows:*
Let R be an (M + 1)-by-(M + 1) positive definite matrix. Then LRL’ is
a positive definite matrix for any nonsingular (M + 1)-by-(M + 1) matrix L.
Therefore, if the correlation matrix R is positive definite, then from this
theorem it follows that S = LRL’ is also positive definite.
For the (M + 1)-by-(M + 1) diagonal matrix S of Eq. (3.96) to be
positive definite, we require that all of its diagonal elements be positive, that
iS,
Pee cal). m=0,1,...,M (3.98)
We note that P, = E[bé(n)} = E[u?(n)] equals the average power of the
filter input. Hence, P, > 0. Next, we recall from Eq. (3.41) that

P, = Pll - ¥2))
Therefore, with P, > 0, it is necessary that |y,| < 1 for P, > 0. Continuing

EAS Graybill, “Introduction to Matrices with Applications in Statistics’ (Wadsworth


Publishing Co., Inc., Belmont, California, 1969), p. 317.
82 INTRODUCTION TO ADAPTIVE FILTERS

in this fashion, we find that the condition of Eq. (3.98) is equivalent to


yall = lee a (3.99)
We may thus state that:
1. If the (M + 1)-by-(M + 1) correlation matrix R of the lattice input is
positive definite, then all the reflection coefficients y,,¥2,---.Yx of the
lattice have magnitude less than one.
2. Conversely, if all the reflection coefficients y,,¥,..-,Yy Of a lattice of
order M have magnitude less than one, then the correlation matrix of the
lattice input is positive definite.
It is of interest to note that when the correlation matrix R of the lattice
input is positive definite, and consequently the correlation matrix S of the
backward prediction errors is also positive definite, then Eq. (3.97) may be
rewritten in the form
R-' = DL
= (D'”7L)’(D!”L) (3.100)
where R ! is the inverse of the correlation matrix R, and
Dim Se diag PoP) Pe) (3.101)
DY =Miae( Pee eP ae | (3.102)
The transformation of Eq. (3.100) represents the Cholesky decomposition* of
the inverse matrix R-'.

3.11. SYNTHESIS STRUCTURE BASED ON THE


REFLECTION COEFFICIENTS

The multistage lattice filter of Fig. 3.15(b) may be viewed as an analyzer.


That is, it enables us to represent an autoregressive (AR) process {u(n)} by
a corresponding sequence of reflection coefficients {y,,}. By rewiring this
multistage lattice filter in the manner described in Fig. 3.16, we may use this
new structure as a synthesizer or inverse filter. That is, given the sequence of
reflection coefficients {y,,}, we may reproduce the original AR process
{u(n)} by applying a stationary white-noise process {w(7)} to the input of
the structure in Fig. 3.16. This lattice inverse filter differs from the inverse
filter of Fig. 3.10 (based on a tapped-delay-line structure) in that it produces
a truly stationary time series from the very first sample, whereas the inverse
filter of Fig. 3.10 produces transients due to nonstationary initial conditions,

*G. W. Stewart, “Introduction to Matrix Computations” (Academic Press, New York,


1973), De 134.
LINEAR PREDICTION 83

rite noise
w(n) AR
process
u(n)

Stage M Stage 1

Figure 3.16 Signal-flow graph of multistage lattice-inverse filter for synthesizing an AR process
of order M.

with the result that the time series produced at its output is only asymptoti-
cally stationary.
We will illustrate the operation of the lattice-inverse filter of Fig. 3.16
with an example.

Example 5
Figure 3.17(a) shows a single-stage lattice-inverse filter. There are two
possible paths in this figure that can contribute to the makeup of the sample
u(n) at the output. We may write

u(n) = w(n) — y,u(n - 1)

Since y, = a,(1), we may rewrite this equation as

u(n) + a,(1)u(n — 1) = w(n)

which is the same as Eq. (3.58b), with a,(1) = —A,(1), for describing an
AR process of order one.
Consider next the two-stage lattice-inverse filter of Fig. 3.17(b). In this
case there are four possible paths that can contribute to the makeup of the
sample u(n) at the output. Specifically, we may write

u(n)= win) = yu(n =) 2 Ny2uU(n <7 Nee y,u(n i)

w(n)— y,(1 + ¥2)u(m — 1) — y2,u(n — 2) (3.103)


84 INTRODUCTION TO ADAPTIVE FILTERS

White noise
w(n)

(a)
White noise
w(n)

Figure 3.17 Lattice-inverse filters for synthesizing AR processes: (a) process of order one, (b)
process of order two.

From Example 1, we recall that

a,(2) =
and
a3(1) = y(t ye)
We may therefore rewrite Eq. (3.103) as follows:
u(n) + a,(1)u(n — 1) + a,(n)u(n — 2) = w(n)
which is identical to Eq. (3.67), describing an AR process of order two.
LINEAR PREDICTION 85

3.12 NOTES

It appears that the first use of the term “linear predictor” was made by
Wiener [1] in his classic book on “Extrapolation, Interpolation, and
Smoothing of Stationary Time Series.” The title of the second chapter of
this book reads: “The Linear Predictor for a Single Time Series”.
Early applications of linear prediction to the analysis and synthesis of
speech were made by Itakura and Saito [2,3]. Atal and Schroeder [4], and
Atal [5]. The book by Markel and Gray [6] is devoted to an in-depth
treatment of linear prediction as applied to speech. This book also includes
an extensive list of references on the subject, up to and including 1975. A
detailed tutorial review of the linear prediction problem is given by Makhoul
[7].
The Levinson—Durbin recursion was first derived by Levinson [8] in
1947, and it was independently reformulated by Durbin [9] in 1960—hence
the name.
The idea of a minimum-phase network was originated by Bode [10]. For
a mathematical proof of the minimum-phase property of prediction-error
filters, see Burg [11], Pakula and Kay [12], and Haykin and Kesler [13]. A
filter that is minimum-phase also exhibits a minimum-delay property in the
sense that the energy contained in its unit-sample response is concentrated
as closely as possible at the front end of the response. Equivalently, we may
state that if the sequence a,,(0), a,,(1),..., @,,(M) denotes the unit-sample
response of a minimum-delay filter, then the coefficient a,,(0), located at
the front end of the response, is the largest one in magnitude. For a
discussion of minimum-delay filters, see Robinson and Treitel [14].
The books by Oppenheim and Schafer [15] and Jury [16] give detailed
expositions of z-transform theory; the first of these two books also presents
a detailed treatment of digital filters.
The idea of a whitening filter was proposed by Bode and Shannon [17]
in order to use linear-system concepts to redrive the Wiener filter theory.
The autoregressive modelling of a random process is discussed in detail
by Box and Jenkins [18], Koopmans [19], and Priestley [20]. These books
also discuss other models, namely the moving-average (MA) and autoregres-
sive moving-average (ARMA) models, for describing random processes.
The lattice filter is credited to Itakura and Saito [2], although many
other investigators (including Burg and Robinson) had also used the idea of
a lattice filter in one form or another. For a discussion of the properties of
lattice filters, see Makhoul [21], Griffiths [22], and Haykin and Kesler [13]. A
formulation of the lattice filter to deal with complex-valued data is given in
Haykin and Kesler [13].
The hardware implementation of a digital filter is ordinarily performed
using fixed-point arithmetic. However, in order to utilize the full dynamic
range of the multipliers used in this form of implementation, it is highly
86 INTRODUCTION TO ADAPTIVE FILTERS

desirable to make the coefficients of the filter and the-signals propagating


through it as large as possible. This can be achieved by appropriately scaling
or normalizing the filter. Gray and Markel [23] describe a normalized lattice
filter in which the forward and backward prediction errors at the various
stages of the filter are all normalized to have unit variance. Consider stage m
of the lattice filter, for which we define

Pal = ire

and

b,,(n) aa Bn hee

where f,,(n) and b,,(n) are the normalized forward and backward prediction
errors, respectively, and P,, is the variance of the forward prediction error
f,,(n) or that of the backward prediction error 5,,(n) at the output of stage
m. (Note that for a random variable of zero mean, the mean-square value
and the variance are the same.) We may thus describe the propagation of
signals through stage m in the lattice filter as

Pre as YmPm—1(1 a ie’


Sey ee = f,-1(n) m—1 m1

b(n) Py a Desay (tt > bP Pe i on tract Pe

Dividing through by P}/* and recognizing that


L = Pa * Yn)

we get the following relations for stage m of the normalized lattice filter:

In (0) = fin 1(1) + A by, (0 1) (3.104)

A 1 ges :
b(n) = ———b,,_,(n — 1) + Xf, (n= 1) (3.105)
mae V1 = Ym
where m = 1,,2,...,M, and |y,,| < 1 for all m. Let
Ym s) COS @,,

where w,, is an angle that lies between —7 and z. Accordingly, we may


simplify Eqs. (3.104) and (3.105) as follows:

fin(n) = csc(,) fn—(n) + cot(w,,)b,,;(n — 1)

b,,(1) = CSC( Wy) By y(n ae 1) si cot(,,) fn—y(1)

We may thus represent stage m of the normalized lattice filter by the


signal-flow graph shown in Fig. 3.18.
Markel and Gray [24] report that the normalized lattice filter has
superior roundoff noise properties to the unnormalized lattice filter. The
issue of roundoff noise arising from the digital implementation of normal-
LINEAR PREDICTION 87

(AGS)
Fin()

>! m( 1)

Figure 3.18 Signal-flow graph for stage m in the normalized lattice filter.

ized lattice filters with finite-word-length arithmetic is also discussed by


Mullis and Roberts [25]. f
In [26], Chu and Messerschmitt study the first-order zero sensitivities of
a lattice structure for small deviations of the reflection coefficients. It is
shown that although the reflection coefficients of the front-end as well as
subsequent stages of the lattice structure have a small effect on the radius of
a zero, nevertheless, the front-end stage coefficients may have relatively large
effects on the angle of a zero. This therefore suggests that the reflection
coefficients of the stages at the front end of the structure require a finer
quantization than the later stages if the angle of the zero is important.
Messerschmitt [27] discusses the generalization of a lattice filter by
replacing the unit-delay element z' by an arbitrary all-pass filter. Figure
3.19 shows a generalized lattice filter, in which H(z) is a discrete-time
single-pole all-pass filter, defined by

Az }-—1
H(z)=
(2) A= Ze

where

T+1
eee rca |
and T is normalized with respect to the sample period of the incoming data.
The transfer function H(z) ts stable if and only if T > 1, so that A isa
positive constant.
The tapped-delay-line and lattice filters represent the most widely used
structures for the realization of prediction-error filters. Ahmad and Youn
88 INTRODUCTION TO ADAPTIVE FILTERS

Sins")
fn(”)

Bin 4 (1)

Figure 3.19 Generalized lattice filter.

[28] use a factorization procedure (pertinent to Gram—Schmidt ortho-


gonalization) to derive another structure, termed the escalator, for realizing
prediction-error filters. The structure is so called because the final prediction
error f,,(n) is obtained in M stages, where M is the filter order.

REFERENCES

— _ N. Wiener, “Extrapolation, Interpolation, and Smoothing of Stationary Time Series” (MIT


Press, 1949).
. F. Itakura and S. Saito, “Digital Filtering Techniques for Speech Analysis and Synthesis,”
7th International Congress on Acoustics, Budapest, 1971.
. F. Itakura and S. Saito, “On the Optimum Quantization of Feature Parameters in the
PARCOR speech synthesizer,” Conference Record, IEEE 1972 Conf. Speech Communication
and Processing, (New York, 1972), pp. 434-437.
. B. S. Atal and M. R. Schroeder, “Predictive Coding of Speech Signals,” 1968 WESCON
Technical Papers, paper 8/2, 1968.
. B.S. Atal, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J.
Acoust. Soc. America, vol. 47, p. 65, 1970.
. J. D. Markel and A. H. Gray, Jr., “Linear Prediction of Speech” (Springer-Verlag, 1976).
. J. Makhoul, “Linear Prediction: A Tutorial Review,” Proc. IEEE, vol. 63, pp. 561-580,
LOWS:
. N. Levinson, “The Wiener RMS (root-mean-square) Error Criterion in Filter Design and
Prediction”, J. Math. and Phys., vol. 25, pp. 261-278, 1947. This paper is reprinted as an
Appendix in Wiener’s book [1].
. J. Durbin, “The Fitting of Time-Series Models,” Rev. Intern. Statist. Inst., vol. 28, pp.
233-244, 1960.
. H. W. Bode, “Network Analysis and Feedback Amplifier Design” (Van Nostrand, 1945).
LINEAR PREDICTION 89

mk J. P. Burg, “Maximum Entropy Spectral Analysis,” Ph.D. Dissertation, Stanford Univer-


sity, Stanford, California, 1975.
Pe, L. Pakula and S. Kay, “Simple Proofs of the Minimum Phase Property of the Prediction
Error Filter,’ IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-31, p. 501,
April 1983.
18} S. Haykin and S. Kesler, “Prediction-Error Filtering and Maximum-Entropy Spectral
Estimation,” in book “Nonlinear Methods of Spectral Analysis,” edited by S. Haykin,
second edition (Springer-Verlag, 1983).
14. E. A. Robinson and S. Treitel, “Geophysical Signal Analysis” (Prentice-Hall, 1980).
USS, A. V. Oppenheim arid R. W. Schafer, “Digital Signal Processing” (Prentice-Hall, 1975).
16. E. I. Jury, “Theory and Application of the Z-Transform Method” (Wiley, 1964).
Wik. H. W. Bode and C. E. Shannon, “A Simplified Derivation of Linear Least Square
Smoothing and Prediction Theory,” Proc. IRE, vol. 38, pp. 417-425, 1950.
18. G. E. P. Box and G. M. Jenkins, “Time Series Analysis: Forecasting and Control”
(Holden-Day, 1976).
1:9) I. H. Koopmans, “The Spectral Analysis of Time Series” (Academic Press, 1974).
20. M. B. Priestely, “Spectral Analysis and Time Series,” vols. 1 and 2 (Academic Press, 1981).
2h J. Makhoul, “A Class of All-Zero Lattice Digital Filters: Properties and Applications,”
IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, pp. 304-314, August
1978.
IBD. L. J. Griffiths, “A Continuous-Adaptive Filter Implemented as a Lattice Structure,”
Proceedings “IEEE Intern. Conf. Acoustics, Speech, and Signal Processing” (Hartford,
Connecticut, 1977), pp. 683-686.
JB). A. H. Gray, Jr., and J. D. Markel, “A Normalized Digital Filter Structure,” IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-23, pp. 268-277, 1975.
24. J. D. Markel and A. H. Gray, Jr., “Roundoff Noise Characteristics of a Class of
Orthogonal Polynomial Structures,” IEEE Trans. Acoustics, Speech, and Signal Processing,
vol. ASSP-23, pp. 473-486, 1975.
75: C. T. Muilis and R. A. Roberts, ““Roundoff Noise in Digital Filters: Transformations and
Invariants,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-24, pp.
538-550, 1976.
26. P. L. Chu and D. G. Messerschmitt, “Zero Sensitivity Analysis of the Digital Lattice
Filter,” Proc. Intern. Conf. Acoustics, Speech, and Signal Processing (Denver, Colorado,
April 1980), pp. 89-93.
Dale D. G. Messerschmitt, “A Class of Generalized Lattice Filters,’ IEEE Trans. Acoustics,
Speech, and Signal Processing, vol. ASSP-28, pp. 198-204, April 1980.
28. N. Ahmed and D. H. Youn, “On a Realization and Related Algorithm for Adaptive
Prediction,’ IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-28, pp.
493-497, October 1980.
CHAPTER

FOUR
ADAPTIVE TAPPED-DELAY-LINE FILTERS
USING THE GRADIENT APPROACH

In Chapter 2 we showed that the coefficients of a tapped-delay-line filter,


optimized in the mean-square sense, are defined by the normal equations. In
particular, to solve the matrix form of the normal equations for the
optimum coefficient vector hy, we require knowledge of two quantities: (1)
the correlation matrix R of a signal vector whose elements are defined by
the filter-tap inputs, and (2) the cross-correlation vector p between this
signal vector and a desired response. Furthermore, the solution requires
inversion of the correlation matrix R, and then the multiplication of the
resultant inverse matrix R-' by the cross-correlation vector p. When
the filter operates in an environment for which the correlation matrix R and
the cross-correlation vector p are unknown, we may use all the data
collected up to and including time n to compute estimates R(n) and p(n),
pertaining to the correlation matrix R and the cross-correlation vector p,
respectively; then compute R '(n), the inverse of R(7); and finally multiply
R ‘(n) by p(n) in accordance with the matrix form of the normal equa-
tions. When, however, the tapped-delay-line filter contains a large number
of taps, this procedure is highly inefficient. A more efficient procedure is to
use an adaptive algorithm whereby, starting with a prescribed initial value
h(0) for the coefficient vector of the filter, the coefficient vector is updated
each time we receive new sample values for the desired response and the
signal vector. The adaptation is continued until we reach a state close
enough to the optimum Wiener solution for the coefficient vector.
In this chapter we derive a widely used algorithm known as the
least-mean-square (LMS) algorithm for implementing the adaptation of the

90
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 91

coefficient vector of the tapped-delay-line filter when operating in an


unknown environment.

4.1 SOME PRELIMINARIES

Figure 4.1 shows the block diagram of an adaptive tapped-delay-line filter.


The basic elements of this filter are as follows:
1. A set of delay elements, each represented by the unit-delay operator z~',
that are used to store past values of the input sequence, namely,
Nh) tn = Mall):
2. A set of adjustable coefficients, represented at time n by A(1,n),
h(2,n),...,h(M,n), that are used to scale the tap inputs u(n),
Une Nh uti 1) respectively.
Ww Summers for adding the scaled versions of the tap inputs.
4. A control mechanism for adjusting the filter coefficients in an adaptive
manner.
Two kinds of processes take place in this adaptive filter:
1. The adaptive or training process, which is concerned with the automatic
adjustment of the filter coefficients.
2. The filtering or operating process, which uses the set of filter coefficients
from the adaptive process to produce an output signal by weighting the
signals at the delay-line taps.

Adaptive
algorithm

Figure 4.1 Block diagram of adaptive filter.


92 INTRODUCTION TO ADAPTIVE FILTERS

During the filtering process, a desired response, d(n), 1s supplied to the


control mechanism so as to provide a frame of reference for adjusting the
filter coefficients.
Let y(n) denote the output of the tapped-delay-line filter at time 1, as
shown by the convolution sum

y(n)=) hk, mr
— kat) (4.1)

By comparing this output with the desired response d(n), we produce the
error signal
e(n) =d(n) — y(n) (4.2)
The function of the control mechanism in Fig. 4.1 is to utilize the error
signal e(n) for generating corrections to be applied to the set of filter
coefficients {h(k,n)}, k = 1,2,..., M, in such a way that we move one step
closer to the optimum Wiener configuration defined by the normal equa-
tions.
With the coefficients of the tapped-delay-line filter assumed to have the
values h(1,n), h(2,n),...,h(M,n) at time n, we find that the correspond-
ing value of the mean squared error is [See Eq. (2.20)]
M M M
e(n) = P,-—2 ¥ hA(k,n)p(kK-1)+ ) DY h(k,n)h(m,n)r(m — k)
k=1 k=1 m=1

(4.3)
In Eq. (4.3) we recognize the following points:
1. The average power P,, is defined by
Pie Ela) (4.4)
The cross-correlation function p(k — 1) for a lag of (A — 1) is defined
by [see Eq. (2.16)]
p(k -1)=£[d(n)u(n—k
+1]; k= 142,..5 M (4.5)
The autocorrelation function r(m — k) for a lag of (m — k) is defined
by [see Eq. (2.18)]
r(m—k)=E[u(n—k + 1)u(n—m+ 1)], mm, koe, Doo,
(4.6)
The quantities P;, p(k — 1), and r(m — k) are the results of ensemble
averaging.
2. The filter coefficients h(1,n),h(2,n),...,4(M,n) are treated as con-
stants during the ensemble-averaging operation.
3. The dependence of the mean squared error e(n) on time n is intended to
show that its value depends on the values assigned to the filter coeffi-
cients at time n.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 93

From Eq. (4.3), we observe that the mean squared error e(7) is a second-order
function of the filter coefficients. Thus, we may visualize the dependence of
e(n) on the filter coefficients as a bowl-shaped surface with a unique
minimum. We refer to this surface as the error performance surface of the
adaptive filter. When the filter operates in a stationary environment, the
error performance surface has a constant shape as well as a constant
orientation. The adaptive process has the task of continually seeking the
bottom or minimum point of this surface, where the filter coefficients assume
their optimum values.

4.2 THE METHOD OF STEEPEST DESCENT

Equation (4.3) defines the value of the mean squared error at time n when
the filter coefficients have the values h(1,7n),h(2,n),...,4(M,n). We as-
sume that the point so defined on the multidimensional error performance
surface is some distance away from the minimum point of the surface. We
would like to develop a recursive procedure whereby appropriate correc-
tions are applied to these filter coefficients in such a way that we continually
move closer to the minimum point of the error performance surface after
each iteration. If such a procedure were available to us, then starting from
an arbitrary point on the error performance surface we can move in a
step-by-step fashion toward the minimum point and thereby ultimately
realize the optimum Wiener configuration. The answer to this problem is
provided by an old optimization technique known as the method of steepest
descent.
According to the method of steepest descent we proceed as follows:

1. We begin with a set of initial values for the filter coefficients, which
provides an initial guess as to where the minimum point of the error
performance surface may be located.
2. Using this initial or present guess, we compute the gradient vector, whose
individual elements equal the first derivatives of the mean squared error
e(n) with respect to the filter coefficients.
3. We compute the next guess at the filter coefficients by making a change
in the initial or present guess in a direction opposite to that of the
gradient vector.
4. We go back to step (2) and repeat the procedure.

It is intuitively reasonable that successive corrections to the filter coefficients


in the direction of the negative of the gradient vector (i.e., in the direction of
the steepest descent of the error performance surface) should eventually lead
to the minimum mean squared error €,,;,, at which point the filter coeffi-
cients assume their optimum values.
94 INTRODUCTION TO ADAPTIVE FILTERS

Let V(n) denote the M-by-1 gradient vector at time n, where M equals
the number of filter coefficients. The kth element of V(7), by definition,
equals the first derivative of the mean squared error e(7) with respect to the
filter coefficient h(k,n). Hence, differentiating both sides of Eq. (4.3) with
respect to h(k,n), we get

ae Gea CAGE Se k= 152 M


oh(k,n) P m=1
, ; Saas
(4.7)
We may simplify this expression in the following way. First, we
eliminate the filter output y(n) between Eqs. (4.1) and (4.2) and so express
the desired response d(n) in terms of the filter-tap inputs (with A replaced
by m) and the error signal as follows:
M A

d(n) = ) h(m,n)u(n
— m+ 1) + e(n) (4.8)
m=1

Hence, substituting Eq. (4.8) in (4.5), we get


M
Deal)ee |Y h(m,n)u(n — m+ 1) + e(n) u(n—'k + |
m=1

M
=E > hA(m,n)u(n — k + 1)u(n — m+ |
m=1

+El[e(n)u(n—k + 1)| (4.9)


Interchanging the order of summation and expectation in the second term
on the right-hand side of Eq. (4.9) and treating the filter coefficient h(m, n)
as a constant, we may rewrite this equation as follows:
M
pP(k-1)= ¥ hA(m,n)El[u(n—k + 1)u(n— m+ 1)]
m=1

+E[e(n)u(n-—k +1)]
M
= ¥ h(m,n)r(m—k) + E[e(n)u(n—k+1)] (4.10)
m=1

where we have made use of Eq. (4.6). Accordingly, substituting Eq. (4.10) in
(4.7) and simplifying, we get the desired expression
de(n)
= —2E[e(n)u(n—k
+ 1)], Ki Osae M (4.11)
dh(k,n)
Equation (4.11) states that, except for the scaling factor 2, the kth element
of the gradient vector V(n) is the negative of the cross-correlation between
the error signal e(n) and the signal u(n — k + 1) at the kth tap input.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 95

At the minimum point of the error performance surface, all the M elements
of the gradient vector V(n) are simultaneously zero. Accordingly, for
minimum mean squared error, the cross-correlation between the error signal
and each tap input of the filter is zero. This merely restates the principle of
orthogonality discussed in Chapter 2.
We may rewrite Eq. (4.11) in matrix form as follows:

de(n)/dh(1,n)
SHINE I

Lainie ears)

~2E[e(n)u(n)]
—2E[e( Bes 1)|

—2E[e(n)u een
Taking out the common factor —2, the expectation operator, and the error
signal e(n), we may thus write

V(n) = -2E[e(n)u(n)] (4.12)


The M-by-1 vector u(7) is the tap-input vector whose elements consist of the
M tap inputs of the filter:

u(n)
u(n) = oot . (4.13)
u(n — M + 1)

Equation 4.12 states that, except for the scaling factor 2, the gradient vector
V(n) is the negative of the cross-correlation vector between the error signal
e(n) and the tap-input vector u(7).
We are now ready to formulate the steepest-descent algorithm for
updating the filter coefficients. Define the M-by-1 coefficient vector of the
filter at time n as

h(i,n)

h(n) = ea (4.14)

WMP)

Then, according to the steepest-descent algorithm, the updated value of the


96 INTRODUCTION TO ADAPTIVE FILTERS

coefficient vector at time n + 1 is defined by


h(n +1) = h(n) + 3p[—V(n)] (4.15)
where p is a positive scalar and V(n) is the gradient vector at time n. The
factor + has been introduced for convenience. Substituting Eq. (4.12) in
(4.15), we get
h(n + 1) =h(n) + pE[e(n)u(n)] (4.16)
This shows that, in order to update the value of the coefficient vector, we
apply to the old estimate a correction equal to the scalar » multiplied by the
cross-correlation between the error signal e(n) and the tap-input vector
u(n). Thus, the scalar » controls the size of the correction as we proceed
from one iteration to next. For this reason, the scalar p 1s called the step-size
parameter. The significant feature of the steepest-descent algorithm is that
the gradient vector (and therefore the correction) may be conveniently
computed without knowledge of the error performance surface.
The error signal e(n) equals the difference between the desired response
d(n) and the filter output y(). Expressing the filter output y(7), defined in
Eq. (4.1), in matrix form, we may thus express the error signal as follows:
e(n) = d(n) — uw(n)h(n) (4.17)
where u/(n) is the transpose of the tap-input vector.
The combination of Eqs. (4.16) and (4.17) defines the steepest-descent
algorithm. The algorithm is initiated with an arbitrary guess h(Q). It is
customary to set all the tap-coefficients of the filter mitially equal to zero, so
that we may put h(0) equal to the M-by-1 null vector @.

4.3 SIGNAL-FLOW GRAPH REPRESENTATION OF THE


STEEPEST-DESCENT ALGORITHM

It is informative to represent the equations defining the steepest-descent


algorithm in the form of a multidimensional signal-flow graph that is
matrix-valued.
A signal-flow graph is made up of nodes and branches. When it is
multidimensional, the nodes of the graph consist of vectors. Correspond-
ingly, the ‘ransmittance of a branch is a square matrix or a scalar. The rules
for constructing a multidimensional signal-flow graph are as follows:
1. For each branch of the graph, the vector flowing out equals the vector
flowing in multiplied by the transmittance of the branch.
2. For two branches connected in parallel, the overall transmittance equals
the sum of the transmittances of the individual branches.
3. For two branches connected in cascade, the overall transmittance equals
the product of the individual transmittances arranged in the same order
as the pertinent branches.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 97

Figure 4.2 Multidimensional signal-flow graph representation of the steepest-descent algo-


rithm.

With these rules in mind, let us represent Eqs. (4.16) and (4.17) in the
form of the multidimensional signal-flow graph. For this representation, we
first eliminate the scalar-valued error signal e(n) by substituting Eq. (4.17)
in (4.16) and so write
h(n + 1) = h(n) + wE[u(n)(d(n) - u’(n)h(n))|

= h(n) + pE[u(n)d(n)] — wE[u(n)u(n)]h(n) (4.18)


where, in the last term, we have used the fact that the coefficient vector h(n)
is a constant vector and may therefore be taken outside the expectation
operator. We recognize that the expectation E[u(n)d(n)] equals the M-by-1
cross-correlation vector p [see Eqs. (2.31) and (2.32)], and the expectation
E{u(n)u’(n)] equals the M-by-M correlation matrix R [see Eq. (2.39)]. We
may therefore rewrite Eq. (4.18) in the form

h(n + 1) = h(n) + wp — pRh(7)


= (I — pR)h(7) + wp (4.19)
where I is the M-by-M identity matrix. We also recognize that h(n) may be
obtained by applying the unit-delay operator z_' to h(n + 1), as shown by
h(n) =z '[h(n + 1)| (4.20)
Accordingly, we may use Eqs. (4.19) and (4.20) to construct the multidimen-
sional signal-flow graph show in Fig. 4.2 for the steepest-descent algorithm.
Note that, in this figure, z ‘I is the transmittance of a unit delay branch
representing a delay of one iteration cycle.

4.4 STABILITY OF THE STEEPEST-DESCENT ALGORITHM

The signal-flow graph of Fig. 4.2 reveals an important feature of the


steepest-descent algorithm, namely, the presence of feedback in the opera-
tion of the algorithm. In particular, we have a feedback loop that consists of
two branches: one with a transmittance equal to the matrix I — »R, and the
other with a transmittance equal to z ‘I. Accordingly, the steepest-descent
98 INTRODUCTION TO ADAPTIVE FILTERS

algorithm is subject to the possibility of becoming unstable, depending on


the product of these two transmittances. Furthermore, the stability perfor-
mance of the algorithm is determined by two factors: (1) the correlation
matrix R of the tap-input vector, and (2) the step-size parameter . The
correlation matrix R is determined by the environment in which the filter
operates. On the other hand, the step-size parameter p is under the
designer’s control.
To carry out this stability analysis, we find it convenient to reformulate
the steepest-descent algorithm as follows:
1. We define a coefficient-error vector as
e(n) = h(n) — hy (4.21)
where h, is the optimum value of the coefficient vector. This optimum
coefficient vector is defined by the matrix form of the normal equations,
namely,
Rh, =p (4.22)
Therefore, subtracting h, from both sides of Eq. (4.19), and using Eq.
(4.22) to eliminate the cross-correlation vector p, we get
h(n + 1) —h, = (I — pR)h(x) + pRhy — hy
= (I — wR)h(n) — (1 — wR)hy
= (I — wR)[h(n) — ho]
The difference h(7) — hy equals the coefficient-error vector e(7) at time
n. Correspondingly, the difference h(n + 1) —h, equals the updated
value of the coefficient-error vector, namely, e(n + 1). We may thus write
e(n + 1) = (I — pR)c(n) (4.23)
2. We represent the correlation matrix R in terms of its eigenvalues and
associated eigenvectors, as shown by (see Section 2.8)
Q™RQ=A (4.24)
where the diagonal matrix A consists of the eigenvalues of R, and the
columns of the unitary matrix Q are the associated eigenvectors. Thus,
premultiplying both sides of Eq. (4.23) by Q’, we get
Q’e(n + 1) = QTL = pR)e(n)
= Q’e(n) — wQ7Re(n) (4.25)
Define the transformed coefficient-error vector
v(n) = Q’e(n) (4.26)
In a corresponding way we may express the updated value of this vector
as

v(n + 1) = Qle(n + 1)
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 99

Also, using the property of the unitary matrix, namely, the fact that
QQ’ = I, we may write
Q’Re(n) = Q’Rie(7n)
= QTRQQ'e(7)
= Av(n)
Accordingly, we may rewrite Eq. (4.25) in the form
v(n + 1) = (I— pA)v(n) (4.27)
This is the desired recursion for analyzing the stability of the steepest-
descent algorithm.
Putting n = 0 in Eq. (4.26), and using the definition of Eq. (4.21), we
find that the initial value of the transformed coefficient-error vector equals

v(0) = Q7[h(0) — ho] (4.28)


For the case when all the coefficients of the filter are set initially equal to
zero, Eq. (4.28) simplifies as
v(0) = —Q’7h, (4.29)
We recognize that the multiplying factor I — yA on the right-hand side
of Eq. (4.27) is a diagonal matrix. Hence, this equation represents a system
of uncoupled scalar-valued first-order difference equations, the Ath one of
which may be written as
vu,(nm +1) = (1 — pA,)v,(), k= 1,2,0..,M@ (4.30)
This equation defines the transient behavior of the kth natural mode of the
steepest-descent algorithm.
Figure 4.3 shows the single-dimensional signal-flow graph representa-
tion of Eq. (4.30). In this graph we have also included a branch to represent
the fact that old estimate v,(n) is obtained by applying the unit-delay
operator z_' to the updated estimate v,(n + 1). For each element of the
transformed coefficient-error vector we have a signal-flow graph similar to
that shown in Fig. 4.3. For a tapped-delay line filter that has M taps, there
will therefore be M such graphs. However, because these individual graphs

v,(n + 1) v0, (1)

Figure 4.3 Signal-flow graph representation of the kth natural mode of the steepest-descent
algorithm.
100 INTRODUCTION TO ADAPTIVE FILTERS

are uncoupled from each other, we have a new representation for the
steepest-descent algorithm that is much simpler than the multidimensional
signal-flow graph of Fig. 4.2. This simplification is the result of the unitary
similarity transformation applied to the correlation matrix R and the
corresponding transformation applied to the coefficient vector of the filter.
The solution of the homogeneous difference equation (4.30) is simply

un) = (Le pA peo (0) ail


as elt (4.31)
where v,(0) is the initial value of the kth element of the transformed
coefficient-error vector, determined in accordance with Eq. (4.28). As il-
lustrated in Fig. 4.4, the numbers generated by this solution represent a
geometric series having the geometric ratio

re he Jemma y nee (4.32)

For stability or convergence of the steepest-descent algorithm, this geometric


ratio must have magnitude less than one for all k. This ensures that,
irrespective of the initial conditions, all natural modes of the algorithm die
out with time. In other words, as the number of iterations, n, approaches
infinity, the transformed coefficient-error vector v(m) approaches zero, and,
correspondingly, the coefficient vector h(n) of the filter approaches the
optimum value hy. Therefore, the necessary and sufficient condition for the
stability of the steepest-descent algorithm is that the step-size parameter
satisfies the following condition:

OA ese aie ad (4.33)


Since all the eigenvalues of the correlation matrix R are real and almost
always positive, we conclude that the steepest-descent algorithm is stable if

v,(n)

Figure 4.4 Illustrating the transient behavior of the kth natural mode of the steepest-descent
algorithm.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 101

and only if
2
(Rea x (4.34)
max

where A,,., 1S the largest eigenvalue of the correlation matrix R.


Referring to Fig. 4.4, we see that an exponential envelope of time
constant 7,.can be fitted to the geometric series by assuming the unit of time
to be the duration of one iteration cycle and by choosing the time constant
tT, Such that

Hee exp|: =|
TK
(4.35)
Hence, from Eqs. (4.32) and (4.35), we find that the kth time constant can
be expressed in terms of the step-size parameter p and the k th eigenvalue as
follows:
=|
Le snickers (4.36)
The time constant 7, defines the time required for the amplitude of the Ath
natural mode v,(n) to decay to 1/e of its initial value v,(0), where e is the
base of the natural logarithm.
For the special case of slow adaptation, for which the step-size parame-
ter 2 1s small, we may use the following approximation for the logarithm in
the denominator of Eq. (4.36):
In{— fA) = pA,, <i
Correspondingly, we may approximate the time constant 7, of Eq. (4.36) as

Ty ats Te | (4.37)
pay?
Using Eq. (4.31), we may now formulate the solution for the original
coefficient vector h(n). We premultiply both sides of Eq. (4.26) by Q,
obtaining

Qv(n) QQ’e(n)
I
c(n)
where we have used the relation QQ’ =I. Next, using Eq. (4.21) to
eliminate c(n), and solving for h(), we obtain
h(n) = h, + Qv(n) (4.38)
The expression for the coefficient vector h(”) may also be expressed in the
form
M
h(n) short wy v,(n)qy (4.39)
k=
102 INTRODUCTION TO ADAPTIVE FILTERS

where q,,q>,--..,, are the normalized eigenvectors associated with the


eigenvalues \,,A,,...,A,, of the correlation matrix R, respectively. Thus
from Eqs. (4.31) and (4.39) we find that the transient behavior of the /th
coefficient of the tapped-delay-line filter is described by
M
hin) ho (i) td Ge pO) tA py pal | l= Li dee
k=1

(4.40)
where /y(i) is the optimum value of the th filter coefficient, and q,, is the
ith element of the eigenvector q,.
Equation (4.40) shows that each coefficient of the filter in the steepest-
descent algorithm converges as the weighted sum of exponentials of the
form (1 — pA,)". The time 7, required for each term to reach 1/e of its
initial value is given by Eq. (4.36). However, the overall time constant, T,,,
defined as the time required for the summation term in Eq. (4.40) to decay
to 1/e of its initial value, cannot be expressed in a simple closed form. We
may, however, bound 1, as follows. The slowest rate of convergence is
attained when q,,v,(0) is zero for all A except for the one corresponding to
the minimum eigenvalue X,,;,. Then the upper bound on 7, is defined by
—1/In(Q — pA in). The fastest rate of convergence is attained when all the
4,;U,(0) are zero except for the one corresponding to the maximum eigen-
value A,,,,- Then the /ower bound on 7, is defined by —1/In(1 — pA jax):
Accordingly, the overall time constant 7, for any coefficient of the tapped-
delay-line filter is bounded as follows:

pes
ee,val DR lat
eeMl (4.41)
InQ — pA max ) 2 intl = BA min )

This shows that when the eigenvalues of the correlation matrix R are widely
spread, the settling time of the steepest-descent algorithm is limited by the
smallest eigenvalues or the slowest modes.

4.5 THE MEAN SQUARED ERROR

We may develop further insight into the operation of the steepest-descent


algorithm by examining the formula for the mean squared error. At time n,
the value of the mean squared error is given by [see Eq. (2.61)]

e(n) = &nin + DS A,v2(n) (4.42)

where €,,;, 18 the minimum mean squared error.


ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 103

Substituting Eq. (4.31) in (4.42), we get


M
"07(0)
e(1) = &min + DO Ag(1 — wA,) (4.43)
kel
When the steepest-descent algorithm is convergent—that is, the step-size
parameter w is chosen within the bounds defined by Eq. (4.34)—we see
from Eq. (4.43) that, irrespective of the initial conditions,
lim e(7) = ein
1 =, 60

The curve obtained by plotting the mean squared error e(n) versus the
number of iterations, n, is called a learning curve. From Eq. (4.43) we see
that the learning curve of the steepest-descent algorithm consists of the sum
of exponentials, each of which corresponds to a natural mode of the
algorithm. The number of natural modes, in general, equals the number of
filter coefficients, M. In going from the initial value «(0) to the final value
Emin» the exponential decay for the Ath mode has a time constant to equal to
au.
Tk mse Se
21n(1 a ur,) (4.44 )

For small values of u, we may approximate this time constant as


1
Teme ~ Fay, (4.45)
In the following two examples we will study the form of Eq. (4.43) for
two particular cases of interest: (1) the eigenvalues of the correlation matrix
R are equal, and (2) they are unequal.

Example 1
Consider a tap-input vector u(m) that consists of two uncorrelated
samples. This assumption is satisfied when the sample period is equal to or
greater than the decorrelation time of the input process. The decorrelation
time is defined as the lag for which the autocorrelation function of the
process decreases to a small function (e.g., one percent) of the mean-square
value of the process. The vector u(7) is thus assumed to have a mean of zero
and a correlation matrix

where oa? is the variance of each sample. In this case, the two eigenvalues of
R are equal:
A, =A, =07
At time n, the filter is characterized by two transformed coefficient
errors v,(n) and v,(n). For a constant value of the mean squared error
104 INTRODUCTION TO ADAPTIVE FILTERS

e(n), the locus of possible values of v,(”) and v,(n) consists of a circle with
2
center at the origin and a radius equal to the square root of [e(”) — €nin|/o°-
Figure 4.5 shows a set of such concentric circular loci, corresponding to
different values of e(n). This figure also includes the trajectory obtained by
joining the points represented by the values of the transformed coefficient-
error vectors: v(0), v(1), v(2),...,v(oo), where v(0) is the initial value, and
v(1), v(2),...,v(0o) are the values resulting from the application of the
steepest-descent algorithm. The geometry shown in Fig. 4.5 assumes the
following values:
v,(0) = 2
v,(0) = —4
We see that the trajectory, irrespective of the value of mu, consists of a
straight line that is normal to the loci for constant values of e(). This
trajectory represents the shortest possible path between the points v(0) and
v(co). We thus see that when the eigenvalues of the correlation matrix R are
equal, the steepest-descent algorithm attains the fastest rate of convergence
possible.
Figure 4.6 shows the learning curve of the steepest-descent algorithm
obtained by using Eq. (4.43) to plot [e(7) — e,,,,]/o° versus the number of
iterations n for the different values of the step-size parameter. In this case
we see that the learning curve consists of a single exponential; the rate at
which it decays decreases with increasing w.

£(0) — €nin = 20P5

e(1)
— €,;,
min
= IPo

2) Emin 4Po

&(S) cea ig Po

U\(n)

[v,(0) = 2, v,(0) = —4]

Figure 4.5 The trajectory of the transformed coefficient-error vectors v(0), v(1),...,
¥(00) ob-
tained by the steepest-descent algorithm for the special case of two equal eigenvalues.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 105

Example 2
Consider next the case of a zero-mean tap-input vector u(n) that
consists of two correlated samples. The correlation matrix of this vector is
assumed to equal

Sob aun
r= |) a

where r(0) = o” is the variance of an input sample, and p = r(1)/r(0) is the


correlation coefficient that lies in the interval 0 < p < 1. The two eigenval-
ues of R are given by the roots of the quadratic equation:

2p pas 2

nA

Figure 4.6 The learning curve of the steepest-descent algorithm for the case of two equal
eigenvalues.
106 INTRODUCTION TO ADAPTIVE FILTERS

That is,
A, I= (1+ p)o*
and
A, =(1—p)o?
Hence, the eigenvalue spread equals
Nits? tahini atone ep
X mi
min
hy le
This shows that as the adjacent samples of the filter input become highly
correlated, the correlation coefficient p approaches unity, with the result the
eigenvalue spread increases.
For p = 3, the two eigenvalues have the following values:

In this case we find that for a constant value of the mean squared error
e(n) the locus of possible values of v,(n) and v,(n) consists of an ellipse

(0) =e.) = LOK,

Emin oP,
e(1)

e(2) Emin — ere.

a 0
e(3) Emin

v,(n)

—4]
[v,(0) = 2, v,(0) =

te
ormed coefficient-error vectors HON Heca
Figure 4.7 The trajectory of the transf
tained by the steepest-descent algorithm for two unequal eigenvalues.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 107

with a minor axis equal to the square root of [e(”7) — €yi,]/A, and a major
axis equal to the square root of [e(n) — Enin|/A>. Figure 4.7 shows a set of
such ellipsoidal loci corresponding to different values of e(n) and the
following values:

v,(0)= 2
v,(0) = -4
0.5
ae EDD
no
This figure also includes the trajectory obtained by joining the points
represented by v(0), v(1), v(2),..., (oo). Here again the trajectory is normal
to the loci for different values of e(7). However, we now find that the
trajectory is curved, leaning towards the v,(n)-axis. Accordingly, the rate of
convergence of the algorithm is slower than that of Example 1. Also, it 1s
dominated by the eigenvalue A,, the smaller one of the two.
Figure 4.8 shows the learning curve of the steepest descent algorithm
obtained by using Eq. (4.43) to plot [e() — €,;,]/o° versus the number of
iterations n for two different values of the step-size parameter yu. In this case
we see that, since the correlation matrix has two unequal eigenvalues, the
learning curve consists of the sum of two exponentials with different time

20

15

i) —

Figure 4.8 The learning curve of the steepest-descent algorithm for the case of two unequal
eigenvalues.
108 INTRODUCTION TO ADAPTIVE FILTERS

constants. Here again, the overall rate at which the learning curve decays

4.6 THE LEAST-MEAN-SQUARE (LMS) ALGORITHM

In the last four sections we studied the operation of the steepest-descent


algorithm in some detail. We showed that, without knowledge of the error
performance surface, the algorithm is capable of converging to the optimum
Wiener solution, irrespective of the initial conditions. The main limitation of
the steepest-descent algorithm, however, is that it requires exact measure-
ments of the gradient vector at each iteration. In reality, exact measure-
ments are not possible and the gradient vector must be estimated from a
limited number of input data samples, thereby introducing errors. There is,
therefore, need for an algorithm that derives estimates of the gradient vector
from the available data. One such algorithm is the so-called /east-mean-square
(LMS) algorithm. The attractive feature of this algorithm is its relative
simplicity; it does not require measurements of the pertinent correlation
functions, nor does it require matrix inversion.
The LMS algorithm uses instantaneous estimates of the gradient vector,
based on sample values of the tap-input vector u(m) and the error signal
e(n). In particular, from Eq. (4.12) we deduce the following instantaneous
estimate for the gradient vector:

V(n) = —2e(n)u(n) (4.46)


Note that this estimate is unbiased, because its expected value is exactly the
same as the actual gradient vector of Eq. (4.12).
We are now ready to formulate the LMS algorithm, according to which
changes in the filter coefficient vector are made along the direction of the
gradient vector estimate as follows:

h(n + 1) =h(n) + du[—¥(n)]


= h(n) + we(n)u(n) (4.47)
where

h(n) = filter coefficient vector before adaptation (i.e., old estimate)

h(n + 1) = filter coefficient vector after adaptation (i.e., updated estimate)


fL = step-size parameter
e(n) = error signal at the nth iteration

u(”) = tap-input vector at the nth iteration


ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 109

Equation (4.47) states that the updated estimate of the coefficient vector is
obtained by incrementing the old estimate of the coefficient vector by an
amount proportional to the product of the input vector and the error signal.
This equation constitutes the adaptive process.
The error signal itself is defined by Eq. (4.17), reproduced here for
convenience:
e(n) = d(n) — u'(n)h(n) (4.48)
This equation constitutes the filtering process.
Equations (4.47) and (4.48) completely describe the LMS algorithm.
Figure 4.9 shows a multidimensional signal-flow graph representation of the
LMS algorithm, based on these two equations. In this figure we have also
included a branch describing the fact that h(m) may be obtained by
applying the matrix operator z~'I to h(n + 1).
As with the steepest-descent algorithm, we initiate the LMS algorithm
by using an arbitrary value h(0) for the coefficient vector at time n = 0.
Here again, it is customary to set all the coefficients of the filter equal
initially to zero, so that h(0) equals the null vector.
With the initial conditions so determined, we then proceed as follows:
1. Given the following values at time n, the old estimate h(7) of the
cu ~*cient vector of the filter, the tap-input vector u(n), and the desired
response d(n), compute the error signal
e(n) = d(n) — u'(n)h(n)
where u/(n) is the transpose of u(7).
2. Compute the updated estimate h(n + 1) of the coefficient vector of the
filter by using the recursion
h(n + 1) = h(n) + pe(n)u(n)
where p is the step-size parameter.

h(n+ 1) h(n)

Figure 4.9 Multidimensional signal-flow graph representation of the LMS algorithm.


110 INTRODUCTION TO ADAPTIVE FILTERS

3. Increment the time index n by one, go back to step 1, and repeat the
procedure until a steady state is reached.
At first sight it may seem that, because the instantaneous estimate V(n) for
the gradient vector has a large variance, the LMS algorithm is incapable of
good performance. However, we have to remember that the LMS algorithm
is recursive, effectively averaging out this coarse estimate during the course
of the adaptive process.

4.7 CONVERGENCE OF THE COEFFICIENT VECTOR IN THE


LMS ALGORITHM

Although the initial value h(0) of the coefficient vector is usually a known
constant, the application of the LMS algorithm results in the propagation of
randomness into the filter coefficients. Accordingly, we have to treat the
coefficient vector h(n) as nonstationary. To simplify the statistical analysis
of the LMS algorithm it is customary to assume that the time between
successive iterations of the algorithm is sufficiently long for the following
two conditions to hold:
1. Each sample vector u(n) of the input signal is assumed to be uncorre-
lated with all previous sample vectors u(k) for k = 0,1,..., liven de
2. Each sample vector u(”) of the input signal is uncorrelated with all
previous samples of the desired response d(k) for k = 0,1,..., je
Then from Eqs. (4.47) and (4.48), we observe that the coefficient vector
h(n + 1) at time n + 1 depends only on three inputs:
1. The previous sample vectors of the input signal, namely, u(7),
Wie) ca KO),
2. The previous samples of the desired response, namely, d(n),
a(n zel),n dO)
3. The initial value h(0) of the coefficient vector.
Accordingly, in view of the assumptions made above, we find that the
coefficient vector h(n + 1) is independent of both u(m + 1) and d(n + 1).
There are many practical problems for which the tap-input vector and
the desired response do not satisfy the above assumptions. Nevertheless,
experience with the LMS algorithm has shown that sufficient information
about the structure of the adaptive process is retained for the results of the
analysis based on these assumptions to serve as reliable design guidelines
even for some problems having dependent data samples.
To proceed with the analysis, we eliminate the error signal e(n) by
substituting Eq. (4.47) in (4.48), and so write

h(n + 1) = h(n) + pu(n)[d(n) ~ u’(n)h(n)|


= [1 — pu(n)u’(n)|
h(n) + pu(n)d(n) (4.49)
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 111

where I is the identity matrix. Next, using Eq. (4.21) to eliminate h(n) from
the right-hand side of Eq. (4.49), we get

h(n + 1) = [I~ pu(n)u’(n)] [e(m) + ho] + wu(n)d(n)


= [1 — pu(n)u"(n)|e(n) + hy + u[u(n)d(n) — u(n)u7(n)ho|
where hg is the optimum coefficient vector, and ¢(n) is the coefficient-error
vector. Transposing hy to the left-hand side, and recognizing that the
difference h(n + 1) — hy equals the updated value of the coefficient-error
vector, we may thus write

e(m = 1) = [1 = pu(n)u?(n)Je(n) + w[u(m)d(n) — u(n)u?(n)ho |


(4.50)
As a consequence of the two assumptions made above, we observe that the
coefficient vector h(n) is independent of the tap-input vector u(n). Corre-
spondingly, the coefficient-error vector ¢(n) is independent of u(). Hence,
taking the expectation of both sides of Eq. (4.50) and using the indepen-
dence of c(n) from u(n), we get

Ele(n + 1)] = E[(I=pu(n)u"(n))e(n)] +mE [u(n) d(n)—u(m)u7(


ng]
= (I pE[u(n)u’(n)]) E[e(n)]
+u(£[u(n)d(n)] — E[u(n)u7(n)]hy)
= (I— pR)E[e(n)] + u(p — Rho) (4.51)
where we have used the definition of Eqs. (2.31) and (2.32) for the
cross-correlation vector p, and that of Eq. (2.39) for the correlation matrix
R, that is,

p = Elu(n)d(n)]
and

R = E[u(n)u’(n)]
However, from the matrix form of the normal equations, we have
Rh, =p
Therefore, the second term on the right-hand side of Eq. (4.51) is zero, and
so we may simplify this equation as follows:
E[e(n + 1)] = (1 - pR)E[c(n)] (4.52)
Comparing Eq. (4.52) with (4.23), we see that they are of exactly the
same mathematical form. That is, the average coefficient-error vector E[e(n)]
in the LMS algorithm has the same mathematical role as the coefficient-
error vector ¢e(n) in the steepest-descent algorithm. From our study of the
steepest-descent algorithm in Section 4.4, we recall that it converges pro-
vided that Eq. (4.34) is satisfied. Correspondingly, the LMS algorithm
112 INTRODUCTION TO ADAPTIVE FILTERS

converges in the mean, that is, the average coefficient-error vector E[e(n)]
approaches zero as n approaches infinity, provided that the step-size param-
eter fu satisfies the condition

Or (4.53)
r max

where X.,,,, is the largest eigenvalue of the correlation matrix R. Thus when
this condition is satisfied, the average value of the coefficient vector h(n)
approaches the optimum Wiener solution hy as the number of iterations, n,
approaches infinity.
Also, as with the steepest-descent algorithm, we find that when the
eigenvalues of the correlation matrix R are widely spread, the time taken by
the average coefficient vector E[h()] to converge to the optimum value hy
is primarily limited by the smallest eigenvalues.

4.8 AVERAGE MEAN SQUARED ERROR

Ideally, the minimum mean squared error e,,;, 18 realized when the coeffi-
cient vector h(n) of the tapped-delay line filter approaches the optimum
value hy, defined by the matrix form of the normal equations. Indeed, as
shown in Section 4.5, the steepest-descent algorithm does realize this ideal-
ized condition as the number of iterations, n, approaches infinity. The
steepest-descent algorithm has the capability to do this, because it uses
exact measurements of the gradient vector at each iteration of the algo-
rithm. On the other hand, the LMS algorithm relies on a noisy estimate for
the gradient vector, with the result that the coefficient vector h(n) of the
filter only approaches the optimum value h, after a large number of
iterations and then executes small fluctuations about h,. Consequently, use
of the LMS algorithm, after a large number of iterations, results in a mean
squared error e(oo) that is greater than the minimum mean squared error
Emin. Phe amount by which the actual value of e(00) is greater than €, .,min is
called the excess mean squared error.
There is another basic difference between the steepest-descent algorithm
and the LMS algorithm. In Section 4.5 we showed that the steepest-descent
algorithm has a well-defined learning curve, obtained by plotting the mean
squared error versus the number of iterations. For this algorithm the
learning curve consists of the sum of decaying exponentials, the number of
which equals (in general) the number of tap coefficients. On the other hand,
in individual applications of the LMS algorithm we find that the learning
curve consists of noisy, decaying exponentials, as illustrated in Fig. 4.10(a).
The amplitude of the noise usually become: *maller as the step-size parame-
ter p is reduced.
Mean
squared
error

min

Number of iterations

Average
mean
squared
error

Average excess
mean squared
error

€ min re aa? he oe ken he Ee ge ee oe ae ee Te

0
Number of iterations

Figure 4.10 (a) Individual learning curve. (b) Ensemble-averaged learning curve.

113
114 INTRODUCTION TO ADAPTIVE FILTERS

Imagine now an ensemble of adaptive tapped-delay-line filters. Each


filter is assumed to use the LMS algorithm with the same step-size parame-
ter yp and the same initial coefficient vector h(0). Also, each adaptive filter
has individual stationary ergodic inputs that are selected at random from
the same statistical population. If, at each time n, we compute the ensemble
average of the noisy learning curves for this ensemble of adaptive filters, we
find that the resultant consists of the sum of decaying exponentials as
illustrated in Fig. 4.10(b). In practice we usually find that this smooth
ensemble-averaged learning curve is closely realized by averaging out 50—200
independent trials of the LMS algorithm.
Thus we may use an average mean squared error, denoted by E[e(n)], to
describe the dynamic behavior of the LMS algorithm. The need for ensem-
ble averaging arises because, as mentioned previously, the error signal
{e(n)} is a nonstationary random process during the adaptation process, as
the coefficient vector h(n) adapts toward the optimum value ho.
The mathematical evaluation of the average mean squared error of the
LMS algorithm is quite complicated, even when we make the simplifying
assumptions described in Section 4.7. We content ourselves here by present-
ing a summary of the results of this evaluation:
1. When a small value is assigned to the step-size parameter p, the adapta-
tion is slow, which is equivalent to the LMS algorithm having a long
“memory.” Correspondingly, the excess mean squared error (after adap-
tation) is small, on the average, because of the large amount of data used
by the algorithm to estimate the gradient vector. On the other hand,
when uw is large, the adaptation is relatively fast, but at the expense of an
increase in the average excess mean squared error after adaptation. In
this second situation, less data enter the estimation—hence, a degraded
estimation error performance is the result.
2. Unlike the average coefficient vector, E[h(n)], the convergence properties
of the average mean squared error, E[e(n)], depend on the number of
taps, M. In particular, the necessary and sufficient condition for the LMS
algorithm to converge in mean square, that is, for E[e(n)] to be conver-
gent, is
2
OS ee ee (4.54)
oy
k=1

where the A, are the eigenvalues of the correlation matrix R. In Appen-


dix 1 it is shown that the trace of the matrix R equals the sum of its
eigenvalue, that is,

tr[R] = > A,
k=1
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 115

By definition, the trace of a square matrix equals the sum of its diagonal
elements:
tr[R] I E[u?(n)] + E[u?(n—1)| +--- +E[w(n-M+ 1)|
= total input power
Accordingly, we have
M
total input power = )° A, (4.55)
k=1
We may therefore restate the stability condition of Eq. (4.54) as follows:
2
On (4.56)
total input power
On the other hand, the necessary and sufficient condition for the LMS
algorithm to be convergent in the mean, that is, for E[h(n)] to be
convergent, is (see Section 4.7)
J
Ot <——
Nik

why. Ajax iS the largest eigenvalue of R. We see therefore that unlike


the steepest-descent algorithm, the LMS algorithm has two different
conditions for convergence: one for convergence in the mean and the
other for convergence in mean square. Since we always have
M

ewe S ye rx (4.57)
k=1

we see that by choosing the adaptation constant pu to satisfy the condi-


tion (4.54) for E[e(n)] to be convergent, we automatically satisfy the
convergence condition for E[h(n)]. Note also that knowledge of the total
input power is sufficient to apply the convergence condition of Eq. (4.56).
. When the eigenvalues of the autocorrelation matrix R are widely spread,
the average excess mean squared error produced by the LMS algorithm is
primarily determined by the large eigenvalues.
. As a measure of the cost of adaptivity, we may use misadjustment, which
is defined by
average excess mean squared error
M = =
minimum mean squared error

E _— :
= Be) Emin (4.58)

For example, a misadjustment of 10 percent means that the adaptive


algorithm produces an average excess mean squared error (after adapta-
tion) that is 10 percent greater than the minimum mean squared error.
116 INTRODUCTION TO ADAPTIVE FILTERS

When the step-size parameter 1 is small, the misadjustment ts approni-


mately given by

M=4*yn> dr, (4.59)

where the A, are the eigenvalues of the correlation matrix R. Let A,


denote the average of the eigenvalues, that 1s,

Aw=a LA, (4.60)

Then we may rewrite Eq. (4.59) in terms of the average eigenvalue A, as


follows:
M = >pMi,, (4.61)
Let the ensemble-averaged learning curve of the LMS algorithm be
approximated by a single decaying exponential whose time constant Is
denoted by (Tys¢)ay-
nse
Then, based on Eq. (4.45), we may write
1
(Tse ee 7 2ur -
(4.62)

Eliminating pA, between Eqs. (4.61) and (4.62), we thus get the follow-
ing formula for the misadjustment of the LMS algorithm in terms of the
number of filter coefficients and the average time constant of the adap-
tive process:

M = — (4.63)
4( Tse ig

This formula shows that: (1) the misadjustment increases linearly with
the number of tap coefficients, and (2) the misadjustment may be made
arbitrarily small by using a long adaptive time constant which is in turn
realized by using a small step-size parameter wu.

4.9 OPERATION OF THE LMS ALGORITHM IN A


NONSTATIONARY ENVIRONMENT

The ability of the LMS algorithm to operate in a nonstationary environment


has been demonstrated experimentally. The nonstationarity may arise in
practice in one of two ways:
1. The frame of reference provided by the desired response may be time-
varying. Such a situation arises, for example, in system identification
when an adaptive tapped-delay-line filter is used to model a time-varying
system.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 117

2. The sequence of tap-input vectors is nonstationary. This situation arises,


for example, when an adaptive tapped-delay-line filter is used to equalize
a time-varying channel.
In any event, when an adaptive tapped-delay-line filter operates in a
nonstationary environment, the optimum filter coefficient vector assumes a
time-varying form in that its value changes from one iteration to the next.
Then the LMS algorithm has the task of not only seeking the minimum
point of the error performance surface but also tracking the continually
changing position of this minimum point.
To emphasize the fact that the optimum coefficient vector of the filter is
time-varying, we denote it by h)(7), where n denotes the iteration number.
Application of the LMS algorithm causes the filter coefficient h(n) of the
adaptive filter to attempt to best match the unknown h,(). At the nth
instant, the filter coefficient vector tracking error is h(n) — ho(n), which may
be expressed as
(coefficient error vector),, = h(n) — hy(n)

= (h(n) — E[h(n)]) +(£[h()] — hon)


(4.64)
where the expectations are averaged over the ensemble. Two components of
error are identified in Eq. (4.64):
1. Any difference between the individual coefficient vectors of the adaptive
filter and their ensemble mean is due to gradient noise; this difference is
called the coefficient vector noise. It is represented by the term h(n) —
E[h(n)] in Eq. (4.64).

Misadjustment
due to coefficient-
vector noise
Misadjustment
Total ES
misadjustment

Misadjustment due to
coefficient-vector lag

opt Step-size parameter pu

Figure 4.11 Illustrating the choice of optimum step-size parameter for operation of the LMS
algorithm in a nonstationary environment.
118 INTRODUCTION TO ADAPTIVE FILTERS

2. Any difference between the ensemble average of the coefficient vectors of


the adaptive filter and the target value hg() is due to lag in the adaptive
process; this difference is called the coefficient vector lag. It is represented
by the term E[h(n)] — ho(n) in Eq. (4.64).
In using the LMS algorithm, we find that the misadjustment due to the
coefficient vector lag is inversely proportional to the step-size parameter p.
On the other hand, the misadjustment due to the coefficient vector noise is
directly proportional to 4, as in the case of stationary inputs. This, there-
fore, suggests that the optimum choice of » (which results in the minimum
overall misadjustment) occurs when these two contributions to misadjust-
ment are equal. That is, the rate of adaptation is optimized when the loss of
performance due to gradient-vector noise is equal to the loss of performance
due to gradient-vector lag. This is illustrated in Fig. 4.11.

4.10 NOTES

Theory
The method of steepest descent is an old optimization technique. For a
discussion of the method, see Murray [1].
The J/east-mean-square (LMS) algorithm is also referred to in the
literature as the stochastic gradient algorithm. It was originally developed by
Widrow and Hoff [2] in 1960 in the study of adaptive switching circuits. In
[3,4], Widrow presents a detailed analysis of the steepest-descent algorithm
and its heuristic relationship to the LMS algorithm. Sharpe and Nolte [5]
present another approach for deriving the LMS algorithm; they start with
the solution to the normal equations in matrix form, that is,

hy=R 'p
and they use a finite summation to approximate the inverse of the correla-
tion matrix.
The LMS algorithm, as described in Eqs. (4.47) and (4.48), is intended
for use with real-valued data. Widrow et al. [6] present the complex LMS
algorithm for dealing with complex-valued data. It has the following form:
h(n + 1) = h(n) + pu*(n)e(n)
where
e(n) = d(n) — u'(n)h(n)
and the asterisk denotes complex conjugation.
A detailed mathematical analysis of the convergence behavior of the
LMS algorithm in a stationary environment is presented by Ungerboeck [7]
and Widrow et al. [8]. In both of these papers it is assumed that (1) the time
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 119

between successive iterations of the LMS algorithm is sufficiently long for


the random sequence of tap-input vectors, u(7), u(n — 1),...,u(n — M + il);
to be statistically independent, and (2) the frame of reference supplied by the
desired response d(n) is time-invariant. The convergence analysis of the
LMS algorithm based on such an assumption is referred to in the literature
as the independent theory. Dauiell [9], Davisson [10], and Kim and Davisson
[11] present some useful results on the LMS algorithm for stationary
dependent inputs. Daniell derives asymptotic properties for a general version
of the LMS algorithm. Davisson establishes bounds on the steady-state
mean-square value of the error signal, assuming that mean squared coeffi-
cient deviations from the optimal do converge to a steady-state value. In the
subsequent paper by Kim and Davisson [11], convergence of the mean
squared coefficient error is demonstrated, and a bound is found for it. Mazo
[12] and Jones et al. [13] present two exact theories, using entirely different
approaches, for the convergence analysis of the LMS algorithm. These exact
theories show explicitly that, for sufficiently small values of the step-size
parameter jy, the results obtained by using the independence theory and
first-order eigenvalue analysis are likely to differ little from conclusions
obtainee *y using the more exact analysis.
An exact mathematical analysis of the operation of the LMS algorithm
in a nonstationary environment poses serious mathematical difficulties.
Indeed, this remains to be an open problem. Nevertheless, some useful
results have been reported in the literature. Widrow et al. [8] discuss the
optimization of the step-size parameter p such that, in a nonstationary
environment, the loss of performance due to gradient-vector noise equals
the loss of performance due to gradient-vector lag. Faden and Sayood [14]
present a study of the use of the LMS algorithm in tracking a time-varying
parameter.
In 1967, Nagumo and Noda [15] and Albert and Gardner [16] indepen-
dently suggested another stochastic algorithm that is applicable to adaptive
tapped-delay-line filters. According to this algorithm, the coefficient vector
of the filter is adapted as follows
h(n + OE era (4.65)

where a is a positive constant, u(7) is the tap-input vector at time n and


\ju(7)||? is its squared norm. As before, e(n) is the error signal defined by
e(n) = d(n) — u'(n)h(n) (4.66)
where d(n) is the desired response. Nagumo and Noda did not use any
special name for this algorithm, while Albert and Gardner referred to it as a
“quick and dirty regression” scheme. It appears that Bitmead and Anderson
[17] coined the normalized LMS (NLMS) algorithm as the name for the
adaptive algorithm described by Eqs. (4.65) and (4.66). The normalized
i20 INTRODUCTION TO ADAPTIVE FILTERS

LMS algorithm differs from the conventional form of the LMS algorithm in
that the step-size parameter p is replaced by a /\\u(n)||?. Given that u(r) has
u(n),u(n — 1),...,u(n — M + 1) for its elements, we may express the
squared norm of the vector u(7) as

Ju(n)|) =w7(n)u(n)
l| u2(n) +u?(n—1)+--: +u?(n—- M +1)
The normalization with respect to |ju(7)||* is used for mathematical conveni-
ence. Also, some implementations of adaptive filters do actually use this
normalization, as will be mentioned later. Weiss and Mitra [18, 19], derive a
variety of theoretical results for the normalized LMS algorithm. These
results pertain to the conditions for convergence, rates of convergence, and
the effects of errors due to digital implementation of the algorithm. Hsia [20]
presents a unified treatment of the convergence for both the normalized
LMS algorithm and the conventional form of the LMS algorithm. If we
assume that the random tap-input vectors, u(”), u(m — 1),..., u(n — M+
1), are statistically independent, and if the elements of u(7), denoted by u(/),
are independent and identically distributed (iid) with

Elway] = (0° 13 haeJ)

L#J

and
E[u(i)] =0
then the necessary and sufficient condition for the normalized LMS algo-
rithm to be convergent in mean square 1s (Hsia [20])
Usa
Hsia also shows that, under this set of conditions, the normalized LMS
algorithm converges faster than the conventional form of the LMS algo-
rithm, a fact that has been noticed by many investigators in computer
simulations, but never theoretically proven.
From Eq. (4.65) it is apparent that the normalized LMS algorithm alters
the magnitude of the correction term without change in its direction.
Accordingly, it bypasses the problem of noise amplification that is experi-
enced in the LMS algorithm when u(7) is large. However, in so doing it
introduces a problem of its own, which is experienced for small u(n). This
problem may be overcome by using the alternate form of the normalized
LMS algorithm (Bitmead and Anderson [17]):

h(n + 1) =h(n) + -
ESTE) aaa (467)
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 121

where f is another positive constant, and the error signal e(7) is defined in
the same way as before. Bitmead and Anderson present an analysis of the
convergence properties of this latter form of the normalized LMS algorithm
alongside the conventional form of the LMS algorithm. Note that by
putting 6 equal to zero in Eq. (4.67), we get the first form of the normalized
LMS algorithm.

Implementations
The methods of implementing adaptive filters may be divided into two
broad categories: analog and digital.
The analog approach is primarily based on the use of charge-coupled
device (CCD) technology or switched-capacitor technology. The basic circuit
realization of the CCD is a row of field-effect transistors with drains and
sources connected in series, and the drains capacitively coupled to the gates.
The set of adjustable coefficients are stored in digital memory locations, and
the multiplications of the analog sample values by the digital coefficients
take place in analog fashion. This approach has significant potential in
applications where the sampling rate of the incoming data is too high for
digital implementation. Mavor et al. [21] review the operational features and
performance of a fully integrated programmable tapped-delay-line filter
using monolithic CCD technology. White and Mack [22] describe a 16-
coefficient adaptive filter, and Cowan and Mavor [23] describe a 256-coeffi-
cient adaptive filter, both based on the monolithic CCD technology. Also,
both implementations were based on the clipped LMS algorithm (Moschner
[24)]).
In the clipped version of the LMS algorithm, the tap-input vector u(7)
in the correction term of the update recursion for the coefficient vector is
replaced by sgn[u(7)}:

h(n + 1) = h(n) + we(n) sgn[u(n)|

where, as before, is the step-size parameter and e(n) is the error signal.
To explain the meaning of the clipped tap-input vector sgn{u(n)}, let u(i)
denote the ith element of the vector u(r). The ith element of sgn[u(7)] is
written mathematically as

ae +1 if u(i)>O0
eo Sante at 81) 0
This clipping action is clearly a nonlinear operation. The purpose of the
clipping is to simplify the implementation of the LMS algorithm without
seriously affecting its performance. In the clipped LMS algorithm, the filter
output remains equal to u/(n)h(n), so that the error signal e() is computed
122 INTRODUCTION TO ADAPTIVE FILTERS

as before in a linear manner, as shown by

e(n) = d(n) — u'(n)h(n)


where d(n) is the desired response. Accordingly, the clipping affects only
the adaptive process and not the filtering process, with the result that an
adaptive filter based on the clipped LMS algorithm can achieve about the
same level of steady-state mean squared error as one based on the conven-
tional form of the LMS algorithm. Moschner [24] derives the conditions for
convergence of the clipped LMS algorithm, and compares its overall perfor-
mance with that of the conventional form of the LMS algorithm. Let (1) the
ith element of the tap-input vector u(7) satisfy, for all 7,
E[u(i)| =0
E|u?(i)| =o?
and (2) the coefficient vector h(n) be independent of u(7), by the indepen-
dent-samples assumption. For this set of conditions, Moschner shows that
the conventional and clipped versions of the LMS algorithm achieve the
same rate of convergence of the average coefficient vector, E{h()], when
their step-size parameters are chosen so that

pecon =e
| Oe. chp

Moschner also shows that if, for the clipped LMS algorithm, we have (using
our notation)
2 P2

where M is the number of coefficients used in the tapped-delay-line filter,


then the following results hold:
(1) We have

0 < BX max/ ee <2


10

where A_,,, iS the largest eigenvalue of the correlation matrix R of the


tap-input vector u(7).
(2) The misadjustment equals

Martin and Sedra [25] describe several building blocks, based on


switched-capacitor technology, for the implementation of adaptive filters.
In the digital implementation of an adaptive filter, the filter input is
sampled and quantized into a form suitable for storage in shift registers.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 123

The set of adjustable coefficients are also stored in shift registers. Logic
circuits are used to perform the required digital arithmetic (e.g., multiply
and accumulate). In this approach, the circuitry may be hard-wired for the
sole purpose of performing adaptive filtering. Alternatively, it may be
implemented in programmable form on a microprocessor. The use of a
microprocessor also offers the possibility of integrating the adaptive filter
with other signal-processing operations, which can be attractive in some
applications. Soderstrand and Vigil [26] and Soderstrand et al. [27] discuss
the microprocessor implementation of an adaptive filter using the LMS
algorithm. Jenkins [28] discusses the use of a residue-number architecture
for implementing a microprocessor-based LMS algorithm. The use of this
architecture provides a structure for general stored-table multiplication,
distributed processing by means of multiple microprocessors, and a poten-
tial fault-tolerant capability. Lawrence and Tewksbury [29] discuss the
multiprocessor architectures and implementations of adaptive filters using
the LMS algorithm and other algorithms. A multiprocessor refers to an
array of interconnected processors where the emphasis is on memory speed
and size.
Clark .* al. [30] discuss the use of block processing techniques for
implementing adaptive digital filters. By considering a performance criterion
based on the minimization of the block mean squared error (BMSE), a
gradient estimate is derived as the correlation (over a block of data) between
the error signals and the input signal. This gradient estimate leads to a
coefficient-adaptation algorithm that allows for block implementation with
either parallel processors or serial processors. Thus a block adaptive filter
adjusts the coefficient vector once per block of data. The conventional form
of the LMS algorithm may be viewed as a special case of the block adaptive
filter with a block length of one. The analysis of convergence properties and
computational complexity of the block adaptive filter, presented by Clark et
al. [30], shows that this filter permits fast implementation while maintaining
a performance equivalent to that of the conventional LMS algorithm.
A basic issue encountered in the digital implementation of an adaptive
filter is that of roundoff errors due to the use of finite-precision arithmetic.
Caraiscos and Liu [31] present a mathematical roundoff-error analysis of the
conventional form of the LMS algorithm, supported by computer simula-
tion. In such an implementation the steady-state output error consists of
three terms: (1) the error due to quantization of the input data, (2) the error
due to truncating the arithmetic operations in calculating the filter’s output;
and (3) the error due to the deviation of the filter’s coefficients from the
values they assume when infinite-precision arithmetic is used. Caraiscos and
Liu discuss these effects for both fixed-point arithmetic and floating-point
arithmetic. They report that the quantization error of the filter coefficients
results in an output quantization error whose mean-square value is, ap-
proximately, inversely proportional to the step-size parameter yw. In particu-
124 INTRODUCTION TO ADAPTIVE FILTERS

lar, the use of a small p for the purpose of reducing the excess mean
squared error may result in a considerable quantization error. Ther excess
mean squared error is found to be larger than the quantization error, as long
as the value chosen for p, allows the algorithm to converge completely. It is
suggested that one way of combatting the quantization error is to use more
bits for the filter coefficients than for the input data.

Applications
Gersho [32] describes the application of the LMS algorithm to the adaptive
equalization of a highly dispersive communication channel (e.g., a voice-grade
telephone channel) for data transmission. Qureshi [33] presents a tutorial
review of adaptive equalization with emphasis on the LMS algorithm.
Nowadays, state-of-the-art adaptive equalizers for data transmission over a
telephone channel are digitally implemented. A major issue in the design of
an adaptive digital equalizer is the determination of the minimum number
of bits required to represent the adjustable equalizer coefficients, as well as
all the internal signal levels of the equalizer. Gitlin et al. [34] consider the
effect of digital implementation of an adaptive equalizer using the LMS
algorithm. They show that a digitally implemented LMS algorithm stops
adapting whenever the correction term in the update recursion for any
coefficient of the equalizer is smaller in magnitude than the least significant
digit to within which the coefficient has been quantized. In a subsequent
paper, Gitlin and Weinstein [35] develop a criterion for determining the
number of bits required to represent the coefficients of an adaptive digital
equalizer so that the mean squared error at the equalizer output is at an
acceptable level.
Widrow et al. [36] discuss the application of the LMS algorithm to the
adaptive line enhancer (ALE), a device that may be used to detect and track
narrow-band signals in wide-band noise. Zeidler et al. [37] evaluate the
steady-state behavior of the ALE for a stationary input consisting of
multiple sinusoids in additive white noise. Rickard and Zeidler [38] analyze
the second-order statistics of the ALE output in steady-state operation, for a
stationary input consisting of weak narrow-band signals in additive white
Gaussian noise. Treichler [39] uses an eigenvalue—eigenvector analysis of the
expected ALE impulse response to describe both the transient and conver-
gence behavior of the ALE. Nehorai and Malah [40] derive an improved
estimate of the misadjustment and a tight stability constraint for the ALE.
Dentino et al. [41] evaluate the performance of an ALE-augmented square-
law detector for a stationary input consisting of a narrow-band signal in
additive white Gaussian noise, and compare it with a conventional square-
law detector. All these papers on the ALE use the conventional form of the
LMS algorithm for adaptation.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 125

The ALE using the LMS algorithm may develop some undesirable
long-term characteristics. Ahmed et al. [42] have examined this problem
both experimentally and theoretically. The /ong-term instability problem may
be summarized as follows. The adaptive predictor in the ALE first adapts to
the high-level components contained in the input, decorrelating the input as
much as possible within its limited capability, as determined by the number
of adjustable coefficients used. Then it adapts to low-level components at
other frequencies. Thus the ALE evolves until its amplitude response is near
unity for all the frequency components contained in the input, regardless of
their amplitude. The result is that, after continuous operation for a long
period of time, the ALE takes on an “all-pass” mode of operation, giving
the overall adaptive predictor the appearance of a “no-pass filter.”” Ahmed
et al. propose a possible cure for the long-term instability problem by
modifying the LMS algorithm.
Sondhi [43] describes an adaptive echo canceller that synthesizes a
replica of the echo by means of an adaptive tapped-delay-line filter, and
then subtracts the replica from the return signal. The filter is designed to
adapt to tk~ transmission characteristic of the echo path and thereby track
variations of the path that may occur during the course of a conversation.
Campanella et al. [44] and Duttweiler [45] describe digital implementations
of an adaptive echo canceller in which the normalized LMS algorithm is
used to adapt the tapped-delay-line filter. Duttweiler and Chen [46] describe
a single-chip VLSI (very large-scale integration) adaptive echo canceller
with 128-tap delay line. Gitlin et al. [47] have proposed and analyzed a
combined echo canceller and phase tracker, which uses the LMS algorithm
to adaptively compensate for the time variation in the channel caused by
carrier phase changes.
Gibson et al. [48], Cohn and Melsa [49], and Gibson [50] describe a
method of speech digitization by means of a residual encoder. This device is
a form of differential pulse-code modulation (DPCM), which uses both an
adaptive quantizer and an adaptive predictor. They used the normalized
LMS algorithm of Eqs. (4.65) and (4.66) for the design of the adaptive
predictor. For a detailed review paper on the subject, see Gibson [51].
Griffiths [52] and Keeler and Griffiths [53] use the LMS algorithm to
develop an adaptive autoregressive model for a nonstationary process, which
is exploited for frequency estimation.

Fractionally Spaced Equalization

In a conventional equalizer (using a tapped-delay-line structure) the equalizer


taps are spaced at the reciprocal of the symbol rate. Such an equalizer is
said to be synchronous.
126 INTRODUCTION TO ADAPTIVE FILTERS

In a_ fractionally spaced equalizer (FSE), on the other hand, the


equalizer taps are spaced closer than the reciprocal of the symbol rate.
Consequently, an FSE has the capability of compensating for delay distor-
tion much more effectively than a conventional synchronous equalizer.
Another advantage of the FSE is the fact that data transmission may begin
with an arbitrary sampling phase. Gitlin and Weinstein [54] describe the
performance and structure of an optimum FSE, including the use of the
LMS algorithm to adaptively control the tap-weight vector. As with the
conventional synchronous adaptive equalizer, the LMS algorithm is updated
once per symbol period. This paper also presents computer simulation
results that illustrate the advantages of fractionally spaced equalization over
conventional synchronous equalization. In an earlier paper, Gitlin and
Weinstein [35] give a detailed analytic and experimental treatment of the
rate of convergence and some of the dynamic aspects of an adaptive FSE.

REFERENCES

1. W. Murray (Ed.), ““ Numerical Methods for Unconstrained Optimization” (Academic Press,


1972).
NO. B. Widrow and M. E. Hoff, Jr., “Adaptive Switching Circuits,” 1960 IRE WESCON
Convention Record, Pt. 4, pp. 96-104.
3. B. Widrow, “Adaptive Filters I: Fundamentals,” Rept. SEL-66-126 (TR 6764-6), Stanford
Electronics Laboratories, Stanford, California, December 1966.
4. B. Widrow, “Adaptive Filters,” in book “Aspects of Network and System Theory,” edited
by R. E. Kalman and N. DeClaris (Holt, Rinehart and Winston, New York, 1971), pp.
563-587.
5. S. M. Sharpe and L. W. Nolte, “Adaptive MSE Estimation,” Proceedings IEEE Interna-
tional Conference on Acoustics, Speech, and Signal Processing, (Atlanta, Georgia, April
OSD) Spps sl 8521
6. B. Widrow, J. M. McCool, and M. Ball, “The Complex LMS Algorithm,” Proc. IEEE, vol.
63, pp. 719-720, April 1975.
7. G. Ungerboeck, “Theory on the Speed of Convergence in Adaptive Equalizers for Digital
Communication,” IBM J. Res. and Dev., vol. 16, pp. 546-555, Nov. 1972.
8. B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Johnson, “Stationary and
Nonstationary Learning Characteristics of the LMS Adaptive Filter,” Proc. IEEE, vol. 64,
pp. 1151-1162, Aug. 1976.
9. T. P. Daniell, “Adaptive Estimation with Mutually Correlated Training Sequences,” IEEE
Trans. Systems Science and Cybernetics, vol. SSC-6, pp. 12-19, January 1970.
10. L. D. Davisson, “Steady-State Error in Adaptive Mean-Square Minimization,” IEEE
Trans. Information Theory, vol. IT-16, pp. 382-385, July 1970.
11. J. K. Kim and L. D. Davisson, “Adaptive Linear Estimation for Stationary M-dependent
Processes,” IEEE Trans. Information Theory, vol. IT-21, pp. 23-31, January 1975.
12. J. E. Mazo, “On the Independence Theory of Equalizer Convergence,” Bell Syst. Tech. J.,
vol. 58, pp. 963-993, May 1979.
13. S. K. Jones, R. K. Cavin, I, and W. M. Reed, “Analysis of Error-Gradient Adaptive
Linear Estimators for a Class of Stationary Dependent Processes,’ IEEE Trans. Informa-
tion Theory, vol. IT-28, pp. 318-329, March 1982.
14. D. C. Faden and K. Saywood, “Tracking Properties of Adaptive Signal Processing
Algorithms,” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING THE GRADIENT APPROACH 127

Processing (Denver, Colorado, April 1980), pp. 466-469.


I), J. I. Nagumo and A. Noda, “A Learning Method for System Identification,” IEEE Trans.
Automatic Control, vol. AC-12, pp. 282-287, June 1967.
16. A. E. Albert and L. S. Gardner, Jr., “Stochastic Approximation and Nonlinear Regression”
(MIT Press, 1967).
WHE R. R. Bitmead and B. D. O. Anderson, “Performance of Adaptive Estimation Algorithms
in Dependent Random Environments,” IEEE Trans. Automatic Control, vol. AC-25, pp.
788-794, August 1980.
18. A. Weiss and D. Mitra, “Some Mathematical Results on the Effects on Digital Adaptive
Filters of Implementation Errors and Noise,” Proceedings IEEE International Conference
on Acoustics, Speech, and Signal Processing (Tulsa, Oklahoma, April 1978), pp. 113-117.
19. A. Weiss and D. Mitra, “Digital Adaptive Filters: Conditions for Convergence, Rates of
Convergence, Effects of Noise and Errors Arising from the Implementation,” IEEE Trans.
Information Theory, vol. IT-25, pp. 637-652, November 1979.
20. T. C. Hsia, “Convergence Analysis of LMS and NLMS Adaptive Algorithms,” Proceed-
ings IEEE International Conference on Acoustics, Speech, and Signal Processing (Boston,
Massachusetts, April, 1983), pp. 667-670.
JAK. J. Mavor, P. B. Denyer, J. W. Arthur, and C. F. N. Cowan, “A Monolithic c.c.d.
Programmable Transversal Filter for Analogue Signal Processing,” The Radio and Elec-
tronic Engineer, vol. 50, pp. 213-225, May 1980.
22 M. H. White and I. A. C. Mack, “A CCD Monolithic LMS Adaptive Analog Signal
Processor Integrated Circuit,” Report AD-A092-510, Westinghouse Defense and Electronic
Systems Cenw., Baltimore, Maryland, March 1980.
23 C. F. N. Cowan and J. Mavor, “Miniature CCD-based Analog Adaptive Filters,” Proceed-
ings IEEE International Conference on Acoustics, Speech and Signal Processing (Denver,
Colorado, April 1980), pp. 474-477.
24. J. L. Moschner, “Adaptive Filter with Clipped Input Data,” Rept. 6796-1, Information
Systems Laboratory, Stanford University, June 1970.
25: K. Martin and A. S. Sedra, “Switched-Capacitor Building Blocks for Adaptive Systems,”
IEEE Trans. Circuits and Systems, vol. CAS-28, pp. 526-584, June 1981.
26. M. A. Soderstrand and M. C. Vigil, “Microprocessor Controlled Totally Adaptive Digital
Filter,’ Proceedings IEEE International Conference on Circuits and Computers, Port-
Chester, New York, pp. 1188-1191, 1980.
AT M. A. Soderstrand, C. Vernia, D. W. Paulson, and M. C. Vigil, “Microprocessor Con-
trolled Adaptive Digital Filter,” Proceedings IEEE International Symposium on Circuits
and Systems, Houston, Texas, pp. 142-146, 1980.
28. W. K. Jenkins, “Architectures for Microprocessor-Based Adaptive Digital Filters,” Pro-
ceedings 21st Midwest Symposium on Circuits and Systems, pp. 148-152, August 1978.
Do: V. B. Lawrence and S. K. Tewksbury, “Multiprocessor Implementation of Adaptive
Digital Filters,’ [EEE Trans. Communications, vol. COM-31, pp. 826-835, June 1983.
30. G. A. Clark, S. K. Mitra, and S. R. Parker, “Block Implementation of Adaptive Digital
Filters,” IEEE Trans. Circuits and Systems, vol. CAS-28, pp. 584-592, June 1981.
gilt C. Caraiscos and B. Liu, “A Round-off Error Analysis of the LMS Adaptive Algorithm,”
Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing,
(Boston, Massachusetts, April 1983), pp. 29-32.
32; A. Gersho, “Adaptive Equalization of Highly Dispersive Channels for Data Transmission,”
Bell Syst. Tech. J., vol. 48, pp. 55-70, January 1969.
333 S. Qureshi, “Adaptive Equalization,” IEEE Communications Society Magazine, vol. 20, pp.
9-16, March 1982.
34. R. D. Gitlin, J. E. Mazo, and M. G. Taylor, “On the Design of Gradient Algorithms for
Digitally Implemented Adaptive Filters,’ IEEE Trans. Circuit Theory, vol. CT-20, pp.
125-136, March 1973.
35). R. D. Gitlin and S. B. Weinstein, “On the Required Tap-Weight Precision for Digitally
Implemented, Mean-Squared Equalizers,” Bell Syst. Tech. J., vol. 58, pp. 301-321, February
1979.
128 INTRODUCTION TO ADAPTIVE FILTERS

36. B. Widrow, J. R. Glover, Jr., J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn, J. R.


Zeidler, E. Dong, Jr., and R. C. Goodlin, “Adaptive Noise Cancelling: Principles and
Applications,” Proc. IEEE, vol. 65, pp. 1692-1716, December 1975.
fe J. R. Zeidler, E. H. Satorius, D. M. Chabries, and H. T. Wexler, “Adaptive Enhancement
of Multiple Sinusoids in Uncorrelated Noise,” IEEE Trans. Acoustics, Speech and Signal
Processing, vol. ASSP-26, pp. 240-254, June 1978.
38. J. T. Rickard and J. R. Zeidler, “Second-Order Output Statistics of the Adaptive Line
Enhancer,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-27, pp.
31-39, February 1979.
39) J. R. Treichler, “Transient and Convergence Behavior of the Adaptive Line Enhancer,”
IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-27, pp. 53-62, February
O79;
40. A. Nehorai and D. Malah, “On the Stability and Performance of the Adaptive Line
Enhancer,” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
Processing (Denver, Colorado, April 1980), pp. 478-481.
41. M. J. Dentino, H. M. Huey, and J. R. Zeidler, “ Comparative Performance of Adaptive and
Conventional Detectors for Finite Bandwidth Signals,” Proceedings IEEE International
Conference on Acoustics, Speech, and Signal Processing (Atlanta, Georgia, April 1981), pp.
397-400.
42. N. Ahmed, G. R. Elliott, and S. D. Stearns, “Long-Term Instability Problems in Adaptive
Noise-Cancelelrs,’” Report SAND 78-1032, Sandia Laboratories, Albuquerque, New
Mexico, August 1978.
. M. M. Sondhi, An Adaptive Echo Canceller,” Bell Syst. Tech. J., vol. 46, pp. 497-511,
March 1967.
. S. J. Campanella, H. G. Suyderhond, and M. Onufry, “Analysis of an Adaptive Impulse
Response Echo Canceller,’” COMSAT Tech. Rey., vol. 2, pp. 1-38, 1972.
. D. L. Duttweiler, “A Twelve-Channel Digital Echo Canceller,’ IEEE Trans. Communica-
tions, vol. COM-26, pp. 647-653, May 1978.
. D. L. Duttweiler and Y. §. Chen, “A Single-Chip VLSI Echo Canceller,” Bell Syst. Tech.
J., vol. 59, pp. 149-160, February 1980.
. R. D. Gitlin and J. S. Thompson, “A Technique for Adaptive Phase Compensation in Echo
Cancellation,’ IEEE National Telecommunication Conference, December 1977,
pp. 04:6-1-04:6-7.
48. J. D. Gibson, S. K. Jones, and J. L. Melsa, “Sequentially Adaptive Prediction and Coding
of Speech Signals,’ IEEE Trans. Communications, vol. COM-22, pp. 1789-1797, Novem-
ber 1974.
49. D. L. Cohn and J. L. Melsa, “The Residual Encoder—An Improved ADPCM System for
Speech Digitization,” IEEE Trans. Communications, vol. COM-23, pp. 935-941, Septem-
bemilo7s:
50. J. D. Gibson, “Sequentially Adaptive Backward Prediction in ADPCM Speech Coders,”
IEEE Trans. Communications, vol. COM-26, pp. 145-150, January 1978.
all. J. D. Gibson, “Adaptive Prediction in Speech Differential Encoding Systems,” Proc. IEEE,
vol. 68, pp. 488-525, April 1980.
OE, L. J. Gnffiths, “Rapid Measurement of Digital Instantaneous Frequency,” IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-23, pp. 207—222, April 1975.
2p) R. J. Keeler and L. J. Griffiths, “Acoustic Doppler Extraction by Adaptive Linear
Prediction Filtering,” J. Acoust. Soc. Amer., vol. 61, pp. 1218-1227, May 1977.
54. R. D. Gitlin and S. B. Weinstein, “Fractionally Spaced Equalization: An Improved Digital
Transversal Equalizer,” Bell Syst. Tech. J., vol. 60 pp. 275-296, February 1981.
CHAPTER

FIVE
ADAPTIVE TAPPED-DELAY-LINE FILTERS
USING LEAST SQUARES

Consider a set of observations u(1), u(2),...,u(n) made at times


t1,t>,.-.,¢,, respectively, and suppose that we have the requirement to
approximate this set of data by a “smooth curve” defined by x(t;) = u(i),
i= 1,2,...,n. It is rare to find that the curve will pass exactly through all
the points, and so we try to draw the “best” curve in the /east-squares sense.
That is, we minimize the sum of the squares of the difference between the
observed values u(i) and x(t;) for i = 1,2,...,n. This is the basic idea of
least squares. Note that in such an analysis, the notion of random variables
need not enter into the discussion.
In this chapter, we use the method of least squares to derive a recursive
algorithm for automatically adjusting the coefficients of a tapped-delay-line
filter, without invoking assumptions on the statistics of the input signals.
This procedure, which we call the recursive least-squares (RLS) algorithm,
is capable of realizing a rate of convergence that is much faster than the
LMS algorithm, because the RLS algorithm utilizes all the information
contained in the input data from the start of the adaptation up to the
present. The price that we pay for this improvement, however, is increased
complexity.

5.1 THE DETERMINISTIC NORMAL EQUATIONS

Suppose that we have two sets of data, namely, an input signal represented
by the samples u(1), u(2),...,u(), and a desired response represented by
the samples d(1), d(2),...,d(n). The input signal {u(i)} is applied to a
tapped-delay-line filter whose impulse response is denoted by the sequence

129
130 INTRODUCTION TO ADAPTIVE FILTERS

h(1,n), h(2,n),...,4(M,n). Note that the filter length M must be less than
or equal to the data length n. Note also that the filter coefficients are
assumed constant for i = 1,2,...,n. Let y(i) denote the resulting filter
output, and use the difference between the desired response d(i) and this
output to define an error signal or residue, as in Fig. 5.1:
e(i) = d(i) — y(4), Hal Zeki (57h)
The requirement is to design the filter in such a way that it minimizes the
residual sum of squares, defined by
n

Jn )imepea?) (5.2)
i=1
The filter output y(/) is given by the convolution sum
M
y(i)= YS A(k,n)u(i-k +1), T= 1, 25..5,8 (5.3)
k=1
Using Eqs. (5.1) and (5.3), we may therefore express the residual sum of
squares J(n) as follows

Jny=> a?) = Th Cen) 8G) u(i-k +1)


it k=1 i=1
M M n
+) 2» h(k,n)h(m,n)
Yo u(i- k + 1)u(i -— m+ 1)
k=1 i=1
(5.4)
where M < n.

Figure 5.1 Tapped-delay-line filter.


ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 131

We may now introduce the following definitions for the summation


terms in Eq. (5.4) that involve the variable i:
1. We define the deterministic correlation between the input signals at taps k
and m, summed over the data length n, as

(n;k,m)= ) u(i-—k)u(i-m), k,m=0,1,...,.M—1 (5.5)


i=1
Note that this sum is the same as the inner sum of the third term on the
right-hand side of Eq. (5.4) with k —1 replaced by k and m-—1
replaced by m.
2. We define the deterministic correlation between the desired response and the
input signal at tap k, summed over the data length n, as

Q(n:k)= Ed(iui-k), k=O Lye M—1— (56)


Note that this sum is the same as the inner sum of the second term on the
right-hana side of Eq. (5.4) with k — 1 replaced by k.
3. We define the energy of the desired response as

E,(n) = osd?(i) (5.7)


Accordingly, using these three definitions, we may rewrite the expres-
sion for the residual sum of squares in Eq. (5.4) as

J(n) = E,(n) -—2 > (kon )O(n7k— 1)


k=1

i : bash(k,n)h(m,n)$(n;k — 1,m—1) (5.8)


k=1 ll
m=1

We are interested in evaluating the set of tap coefficient that minimizes


the residual sum of squares J(n). We may treat the tap coefficients as
constants for the duration of the input data, from 1 to n. Hence, differenti-
ating Eq. (5.8) with respect to h(k,n), we get
M REY este 0
Eh? Se pate Dy
20(n;k De 2a ,n)o(n ;k — 1,
Gea

Jom le, (5.9)


Let h(k,n) denote the value of the kth tap coefficient for which the
derivative 0J(n)/dh(k, n) is zero at time n. Thus, using this definition in
132 INTRODUCTION TO ADAPTIVE FILTERS

Eq. (5.9), we get


M
>, h(m,n)o(n;k —1,m—1)=6(n;k - 1),
m=1

eee as (5.10)

This set of M simultaneous equations constitutes the deterministic normal


equations. Their solution determines the /east-squares filter whose tap coeffi-
cients are denoted by h(1,n),h(2,n),..., h(M,n).
The normal equations (2.22), derived for the Wiener filter in Chapter 2,
and the normal equations (5.10), derived for the deterministic least-squares
filter here, have similar mathematical form. The basic difference between
them is that in the normal equations (2.22) the autocorrelation function of
the tap inputs and the cross-correlation function between the desired
response and the tap inputs are ensemble averages. On the other hand, in
the normal equations (5.10) the corresponding correlation functions are
time-averaged over the available observation interval. For a finite observa-
tion interval these two sets of correlation functions are different.
We may rewrite the normal equations (5.10) in a compact form by using
the following matrix definitions:

1. We use the M-by-1 vector h(n) to represent the least-squares estimates of


the tap coefficients:

h(1,n)

h(n) = oe (5.11)
Te®

2. We use the M-by-M matrix ®(n) to represent the deterministic correla-


tion matrix of the tap inputs:

$(n; 0,0) $(n; 0,1) vee o(n;0,


M — 1)
tone oY) ey ss AE get

o(n;M-1,0) 6(n;M-1,1) +» $(n;M—1,M-1)


(5.12)
Note that the deterministic correlation matrix ®(n) is both symmetric
and nonnegative definite. However, it is non-Toeplitz in that the elements
along its main diagonal (and for that matter along any other diagonal
parallel to the main diagonal) are unequal, in general.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 133

3. We use the M-by-1 vector 6(7) to represent the deterministic cross-corre-


lation vector of the desired response and the tap inputs:

6(n; 0)
6(n;1)
@(n) = (5.13)
6(n; M — 1)
Accordingly, we may rewrite the normal equations (5.10) in the following
matrix form:
®(n)h(n) = 0(n) (5.14)
Assuming that ®(7) is nonsingular, we may solve Eq. (5.14) for h() and so
obtain
h(n) = ®-'(n)0(n) (S.15)
where ® '(n) is the inverse of the deterministic correlation matrix.

Minimum Residual Sum of Squares


When the tap coefficients satisfy the normal equations (5.10), the residual
sum of squares attains the minimum value:
M
Imin()
= E,(n) — ¥& h(k,n)0(n;
k — 1) (5.16)
ea
Using the matrix definitions of Eqs. (5.11) and (5.13), we may rewrite this
expression for the minimum residual sum of squares in the matrix form

Jnin(n) = Ey(n) — hb’(n)0(n) (5.17)


where h’(7) is the transpose of the least-squares estimate h(n), defined by
ede (5.35).

5.2 PROPERTIES OF THE LEAST-SQUARES ESTIMATE

The least squares estimate h(n) for the vector of tap coefficients has a strong
intuitive appeal that is reinforced by a number of important properties, as
discussed below:

Property 1. The least-squares estimate of the coefficient vector approaches


the optimum Wiener solution as the data length n approaches infinity, if the
filter input and the desired response are jointly stationary ergodic processes.
When the filter input {u(i)} and the desired response {d(/)} are jointly
stationary ergodic processes, we may substitute for the ensemble-averaged
134 INTRODUCTION TO ADAPTIVE FILTERS

autocorrelation function r(m — k) of the tap inputs and the cross-correla-


tion function p(k) between the tap inputs and the desired response the
corresponding time averages:
n

r(m—k)= fine eRe 7k k,m=0,1,..., Meal


n>o NM _)

(5.18)
and

p(k) = lim +P d(iu(i-k),


)( Nee dit©.) i=l
k= 0,1,...,M—1
(5.19)
Correspondingly, we may write

R= lim =®(n) (5.20)


and

ortnooeer ye (5.21)
where R is the M-by-M ensemble-averaged correlation matrix of the tap
inputs, and p is the M-by-1 ensemble-averaged cross-correlation vector
between the desired response and the tap inputs.
Hence, under these conditions we find that the least-squares estimate
h(n) approaches the optimum Wiener value hy as n approaches infinity, as
shown by
lim h(n) = lim ®-!(n)@(n)
n> © n> ©

= lim n®-'(n)- lim SO(n)


no n— 6

= Ho (5.22)

Property 2. The least-squares estimate of the coefficient vector is unbiased


if the error signal e(i) has zero mean for all i.
To prove this property, we will first reformulate the expression for the
least-squares estimate given in Eq. (5.15). We do this by introducing some
new matrix definitions, as described below:
1. The optimum value of the error signal e,(i) denotes the difference
between the desired response d(i) and the optimum filter output yo(i)
1) fateh a eae n. We express this relationship in matrix form by writing

ej =d- yy (5.23)
ADAPTIVE TAPPED-DELA Y-LINE FILTERS USING LEAST SQUARES 135

where € is the optimum value of the n-by-1 error vector:

éo(1)

ore oe (5.24)
y(n)
and d is the n-by-1 desired response vector:

d(1)
d(2
d = ) (S25)

d(n)
The optimum value of the n-by-1 output vector yp is itself defined by

yo (1)
— yo (2)
Dee
yo(”)
= Uh, (5.26)
where U is the n-by-M data matrix (which is Toeplitz):

u(1) 0 sets 0
u(2) u(1) vee 0

U= u(M) u(M-1) -:-- u(1) (5527)


u(M + 1) u(M) vee u(2)

u(n) Une1) cee Brun


3M + 1)
and h, is the optimum (Wiener) value of the M-by-1 coefficient vector:

ho(1)
em ee
Hae
(5.28)
ho(M)
Note that, in order to simplify the notation, we have omitted the
dependence on the data length n in the matrix definitions introduced
above. We will continue to follow this practice in the rest of the section,
since this dependence is not critical to the discussion. Thus, using Eqs.
136 INTRODUCTION TO ADAPTIVE FILTERS

(5.25) and (5.26), we may express the desired response vector d as


d = Uh, + & (5.29)
2. We write the M-by-M deterministic correlation matrix of the tap inputs
as ®, with its element in row k + 1 and column m + 1 given by Eq.
(5.5). Accordingly, we may express ® as the product of two Toeplitz
matrices, represented by the data matrix U and its transpose Us }-as
follows:
®=U'U (5.30)
3. We write the M-by-1 deterministic cross-correlation vector between the
tap inputs and the desired responses as 0, with its element on row k + 1
given by Eq. (5.6). Accordingly, we may express @ in terms of U as
follows:
6= Ud (S31)
Substituting Eq. (5.29) in (5.31), we get
0 = U"(Uh, + e,)
= UTUh, + U’e,
= ®h, + U’e, (5232)
where in the last line we have made use of Eq. (5.30).
4. Substituting Eq. (5.32) in (5.15), we may express the least-squares esti-
mate of the coefficient vector as
h=0~'9
= © !(Oh, + U’e,)
=h,+ ® 'U’e, (555)
Hence, taking the mathematical expectation of both sides of Eq. (5.33),
we get
E{h] =h, + ® 'U7E[e, |
where we have treated the optimum value h, of the coefficient vector as a
constant. We have also treated ®~' and U as constants, since they are
based on known values of the tap inputs. If the error vector has zero
mean, then E[e,] equals zero, and the least-squares estimate h is unbi-
ased in that its expected value equals the optimum Wiener value ho:
E(h] =h,

Property 3. The covariance matrix of the least-squares estimate h equals


@ ", except for a scaling factor, if the error vector e, has zero mean and its
elements are uncorrelated.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 137

Using Property 2, the covariance matrix of the least-squares estimate of


the coefficient vector equals the expectation

E|(h— Efh])(h — Efh))"] = E[(a — ny )(h — ny)”


This expectation may also be viewed as the correlation matrix of the
coefficient-error vector h — hy. From Eq. (5.33), the coefficient-error vector
equals
h — hy = ®-'U’e,
Hence, we may write

E{(h — hy )(h — hy)"| = E[(@- U7, )(@"Ue,)"|


= E[®-1U ee |
where we have used the fact that ® is symmetric, that is, ®7 = ®. Treating
®~' and U as known quantities, we may take the expectation operator
inside the square brackets, operating only on e,ej:
E[(h
— hy)(h
—hy)”] = © 'UTE[eeZ] U0 (5.34)
Let the elements of the error vector eg be uncorrelated:
(fet eal Rh 7
E[eo(i)eo(s)
] i ee
Diack
0 i
(5.35)
where the constant e,,;, 1s the minimum mean squared error that results
when the tapped-delay line filter assumes the optimum Wiener structure.
Hence,
Elesed |= emial (5.36)
where I is the M-by-M identity matrix. Accordingly, we may simplify Eq.
(5.34) as follows:
E[(h — hy)(h — hy)| = ®'U"(e,i,1)UO™
= ¢,,;,0 1'U'U®~*
= Se we (5.37)

where, in the last line, we have made use of Eq. (5.30). We have thus proved
that, if the error vector e) has zero mean and its elements are uncorrelated,
the covariance matrix of the least-squares estimate h, or equivalently the
correlation matrix of the coefficient-error vector h — ho, equals the inverse
of the deterministic correlation matrix of the tap inputs, except for the
scaling factor €,:,-

Property 4. If the elements of the error vector eg are statistically indepen-


dent and Gaussian-distributed, then the least-squares estimate is the same as
the maximum-likelihood estimate.
138 INTRODUCTION TO ADAPTIVE FILTERS

The Gaussian distribution of the element e)(/) of the zero-mean error


vector e, (with variance o*= €,,;,) is described by the probability density
function
‘site|
flea(t)) = Tes |
_ e(i)
4a? )
= S346
‘ (5.38)

With the elements of the error vector e, assumed statistically independent of


each other, the joint probability density function of e) equals the product of
the probability density functions of its individual elements:
f(e@) = f(e€9(1), €9(2),..2,e9("))

ee

ll cexp|~ 5)
abel | (5.39)

where « is a constant defined by

Stee Yd (5.40)
(2102)"/”*
The summation in the exponent of Eq. (5.39) equals the residual sum of
squares, J(n). By expressing J(n) as a function of the tap coefficients
h(k),k =1,...,M, as in Eq. (5.8), we may view the joint probability
density function f(e,)) as a likelihood function. The maximum-likelihood
estimate of the coefficient vector is defined as that value of h for which this
likelihood function attains its maximum value. In effect, this maximum-like-
lihood estimate represents the most plausible value for the coefficient vector,
given the observations u(1), u(2),...,u(). It is clear that the value of the
coefficient vector that maximizes the likelihood function is precisely the
value of the coefficient vector that minimizes the residual sum of squares.
We conclude therefore that when the elements of the zero-mean optimum
error vector e, are statistically independent and Gaussian-distributed, the
least-squares estimate and the maximum-likelihood estimate of the coeffi-
cient vector assume the same value.

5.3 THE MATRIX-INVERSION LEMMA

Our next goal is to develop a recursive algorithm for computing the


least-squares estimate h(n) of the coefficient vector, based on Ede (3:15).
This development, in part, relies on a result in linear algebra known as the
matrix-inversion lemma. Let A and B be two positive definite, M-by-M
matrices related by
A=B-'+cpD°'!c" (5.41)
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 139

where D is another positive definite, N-by-N matrix, and C is an M-by-N


matrix. According to the matrix-inversion lemma, we may express the
inverse of the matrix A as follows:

A! = B- BC[D + C’BC] 'C’B (5.42)


The proof of this lemma is established simply by multiplying Eq. (5.41) by
(5.42) and recognizing that the product of a square matrix and its inverse is
equal to the identity matrix. This is left as an exercise for the reader.

5.4 THE RECURSIVE LEAST-SQUARES (RLS) ALGORITHM

Earlier we mentioned that the deterministic correlation matrix ®(n) is


nonnegative. In order to make sure that this correlation matrix is always
positive definite and therefore nonsingular, we modify the definition of the
deterministic autocorrelation function $(n; k, m) in a minor way as follows:
n

o(n;k,m)= > u(i- m)u(i-


k) + 6,, (5.43)
1
where c is a small positive constant, and 6,,, is the Kronecker delta, namely,

= Ly{> <nr=k 44
: i m#k (5 )
The effect of this modification 1s to add the small positive constant c to each
element on the main diagonal of the deterministic correlation matrix ®(n)
and thereby ensure its positive definiteness. By so doing, we will have
prepared the way for the application of the matrix inversion lemma.
By separating the product u(m — m)u(n — k), corresponding to i = n,
from the summation term on the right-hand side of Eq. (5.43), we may
rewrite the expression for the correlation function $(n; k, m) as follows
i=
o(n;k,m)=u(n—m)u(n—k) +] Yo u(i- m)u(i- k) + 66,
i=1

(5.45)
By definition, the expression inside the square brackets on the right-hand
side of Eq. (5.45) equals @(n — 1;k, m). Accordingly, we may rewrite this
equation as
o(n;k,m) = o{n—1;k,m) + u(n — m)u(n —k),
Kemi 0) deco 3 ly slo 40)
This is a recursive equation for updating the deterministic correlation
function of the tap inputs, with u(n — m)u(n — k) representing the correc-
tion term of the update. Note that this recursive equation is independent of
the constant c.
140 INTRODUCTION TO ADAPTIVE FILTERS

Define the M-by-1 tap-input vector

u(n)
u(n — 1)
u(n) = (5:47)
Wiss lial)
Then, by using Eq. (5.46), we may write the following recursive equation for
updating the deterministic correlation matrix:

®(n) = ®(n — 1) +:u(n)u’(n) (5.48)


where the M-by-M matrix u(n)u’(n) represents the correction term of the
update.
Comparing Eqs. (5.41) and (5.48), with the knowledge that ®(7) is
positive definite for all n, we may make the following identification:
A = ®(n)
B} = @(n- 1)
C= u(n)
D=1
Accordingly, we may use the matrix-inversion lemma described by Eqs.
(5.41) and (5.42), and so express the inverse of the deterministic correlation
matrix in the following recursive form

®-'(n — 1)u(n)u’(n)® '(n — 1)


®'(n)=®@ '(n-1)- 5.49
(") ( ) 1+ u’(n)® !(n — 1)u(n) hg
For convenience of computation, let

P(n) = ® '(n) (5.50)


and

ee P(n — 1)u(n)
at) 1+ ul(n)P(n — 1)u(n) 98)
Then, we may rewrite Eq. (5.49) as follows:

P(n)
= P(n — 1) — kK(n)u’(n)P(n— 1) (5.52)
The M-by-1 vector k(7) is called the gain vector.
Postmultiplying both sides of Eq. (5.52) by the tap-input vector u(),
we get

P(n)u(n) = P(n — 1)u(n) — k(n )u7(n)P(n — 1)u(n) (5.53)


ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 141

Rearranging Eq. (5.51), we find that

k(n )u’(n)P(n— 1)u(n)


= P(n — 1)u(n)
— k(n) (5.54)
Therefore, substituting Eq. (5.54) in (5.53) and simplifying, we get the
simple result

k(n) = P(n)u(n) (5:55)


The result of Eq. (5.55) may be considered as a definition of the gain
vector k(n). As for the matrix P(n) itself, we may view it as ® ‘(n), the
inverse of the deterministic correlation matrix ®(7), in accordance with Eq.
(5.50). Equivalently, except for the scaling factor Emins We May view P(n) as
the correlation matrix of the coefficient-error vector h(7) — ho, where hy is
the optimum Wiener value of the coefficient vector, in accordance with
Property 3 of the least-squares estimate. That is,
il
P(n) = —E| (h(n) — hy )(i(n) — hy)|
Emin
(5.56)
This result, however, only holds if the error vector ey has zero mean and its
elements are uncorrelated.
As mentioned earlier, our aim is to develop a recursive algorithm for
computing the least-squares estimate of the coefficient vector, which equals
®~'(n)6(n), as in Eq. (5.15). To satisfy this requirement, we need update
recursions for the inverse matrix ® 1(n) = P(n) and the cross-correlation
vector 6(n). The update recursion for P(7) is given in Eq. (5.52). There only
remains the need tor an update recursion for 6(7).
The deterministic cross-correlation function #(n; k) is defined by Eq.
(5.6). Separating the product d(n)u(n — k), corresponding to i = n, from
the summation on the right-hand side of Eq. (5.6), we may write
ij I
6(n;k) =d(n)u(n—k) + ¥ d(i)u(i-k) (5.57)
|
By definition, the sum on the right-hand side of Eq. (5.57) equals 6(n — 1; k).
Hence, we have
6(n;k) = O(n -—1;k) + d(n)u(n — k) (5.58)
This is a recursive equation for updating the deterministic cross-correlation
function, with d(n)u(n — k) representing the correction term of the update.
We are now ready to write down the recursion for updating the
deterministic cross-correlation vector, as shown by
6(n) = O(n — 1) + d(n)u(n) (5.59)
where the M-by-1 vector d(n)u(n) represents the correction term of the
update.
142 INTRODUCTION TO ADAPTIVE FILTERS

We note that, by definition, ® '(n) = P(n). Hence, substituting this


definition and Eq. (5.59) in (5.15), we get
h(n) = P(n)6(n)
= P(n)[0(n — 1) + u(n)d(n)]
= P(n)0(n — 1) + P(n)u(n)d(n)
= P(n)0(n — 1) + k(n)d(n) (5.60)
where we have used the fact that the matrix product P(n)u(n) equals the
gain vector k(n), as in Eq. (5.55). We next substitute the expression for
P(n), given in Eq. (5.52), in (5.60), obtaining
h(n) = [P(n — 1) — k(n)u"(n)P(n — 1)] O(n — 1) + k(n) d(n)
= P(n — 1)0(n — 1) + k(n)[d(n) — u"(n)P(n — 1)0(n — 1)|
(5.61)
We now recognize that
P(n — 1)0(n— 1) =@ '(n — 1)80(n —- 1)
= h(n - 1)
Accordingly, we may rewrite Eq. (5.61) as follows:
h(n) = h(n — 1) + k(n)[d(n) — u"(n)h(n - 1)]
= h(n — 1) + k(n)n(n) (5.62)
where (7) is a “true” estimation error defined by
n(n) = d(n) — w(n)h(n — 1) (5.63)
Equation (5.62) represents the desired update recursion for the coeffi-
cient vector. The correction term is represented by the product of the gain
vector k(n) and the true estimation error n(n). Note that, since during the
adaptive process the old estimate h(n — 1) of the coefficients vector is
different from the updated estimate h(7), the true estimation error n(n) 1s
different from the error signal e(n), defined by

e(n) = d(n) — u'(n)h(n) (5.64)


Equations (5.62) and (5.63) constitute the recursive least-squares (RLS)
algorithm, with Eq. (5.62) representing the adaptive operation and Eq.
(5.63) representing the filtering operation of the algorithm.

Initial Conditions
Putting n = 0 in Eq. (5.43), we get

o(0;k,m) =c6,, (5.65)


ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 143

Correspondingly, the initial value of the correlation matrix ®(n) is


®(n) =cl (5.66)
where I is the M-by-M identity matrix. We thus see that the introduction of
the (small positive) constant c in Eq. (5.43) only affects the initial value of
the correlation matrix ®(n),
Since, by definition, P() equals the inverse of the correlation matrix,
we have

(a0)
= cr (5.67)
For the initial value of the coefficient vector, it is customary to use
h(0) = 0 (5.68)
where 0 is the M@-by-M null vector. This corresponds to setting all the tap
coefficients of the tapped-delay-line filter initially equal to zero.

Summary of the RLS Algorithm


Starting with the initial conditions
®(0) =cl
P(0O) = cl
h(0)
=0
proceed as follows:
1 Pot n= 1.
2. Compute the gain vector

MO) TF ulm)
P(n — 1)u(n)
l P(n — Lal)
3. Compute the true estimation error

n(n) = d(n) —u"(n)h(n


—1)
4. Update the estimate of the coefficient vector:
h(n) = h(n — 1) + k(n) n(n)
5. Update the error correlation matrix
P(n) = P(n — 1) — k(n)u’(n)P(n — 1)
6. Increment n by 1, go back to step 2, and repeat the procedure.

We thus see that the RLS algorithm consists of first-order matrix difference
equations. Also, the inversion of the correlation matrix ®(7) is replaced by
the inversion of a scalar, namely, 1 + u’(n)P(n — 1)u(n).
144 INTRODUCTION TO ADAPTIVE FILTERS

5.5 UPDATE RECURSION FOR THE RESIDUAL SUM


OF SQUARES

The minimum value of the residual sum of squares, namely, J,,,,(7), results
when the coefficient vector of the tapped-delay-line filter is set equal to the
least-squares estimate h(n). To compute J,,;,(7) we may use Eq. (5.17). In
this section we will use Eq. (5.17) to develop a recursive relation for
computing J,,,,(7).
From Eq. (5.7) we deduce that
E,(n) = E, (m= 1) + d7(n) (5.69)
Therefore, substituting Eqs. (5.59), (5.62), and (5.69) in (5.17), we get

Jer
n) = E,(n—1)+d7(n)
—[i’(n— 1) + k(n) n(n)| [0(n — 1) + d(n)u(n)]
= oe
< —1)-W(n - 1)0(n - 1)|

n)[d(n) — h(n
— 1)u(n)] — n(n)k"(n)0(n) (5.70)
where in the last term we have restored 8(7) to its original form. For the
expression inside the first set of square brackets, we have
E,(n —1) —W(n — 1)0(n — 1) = Juin(n — 1)
For the expression inside the second set of square brackets, we have
d(n) — W(n — 1)u(n) = n(n)
For the last term we note that the vector product

k"(n)0(n) = [® '(n)u(n)] "O(n)


u’(n)® '(n)0(n)
u!(n)h(n)
where in the second line we have used the symmetric property of ®(7),
namely, ®’(n) = ®(n), and in the last line we have used the fact that
® '(n)6(n) equals the least-squares estimate h(n). Hence, we may simplify
Eq. (5.70) as follows:

Fnrin(
2) =Smin(n — 1) + d(n) n(n) = n(n)u"(n)h(n)
= Snyin(? — 1) + n(n) [d(n) — ul (n)h(n)] (5.71)
Accordingly, we may use Eq. (5.64) to simplify Eq. (5.71) as

Jmin( 2) = Jnin(n -¥ 1) + n(n)e(n) (S72)

which is the desired recursion for updating the residual sum of squares.
Thus, the product of the true estimation error n(n) and the error signal
e(n) represents the correction term in this updating recursion.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 145

5.6 SPEED OF CONVERGENCE

Although the RLS algorithm does not attempt to minimize the mean
Squared error (in the ensemble-averaged sense), nevertheless, the mean-
square value of the true estimation error n(n) converges within less than
2M iterations, where M is the number of tap coefficients in the tapped-
delay-line filter.
To prove this important property of the RLS algorithm, we first rewrite
the expression for n(n) by recognizing that, in the optimum Wiener
condition, the desired response d(n) equals

d(n) = u'(n)hy + eo (nr) (S773)


where u(7) is the tap-input vector, hy is the optimum Wiener value of the
coefficient vector, and e,)(n) is the corresponding value of the error signal.
Hence, substituting Eq. (5.73) in (5.63), we may express the true estimation
error n(n) as

n(n) = ul(n)(hy — h(n — 1)) + eg(n) (5.74)


Hence, the mean-square value of n(n) equals*
P 2
E[°(n)] = E[(u(n)[hy — (in— 1)] + e9(n))] (5.75)
From the principle of orthogonality, we know that (see Chapter 2)
E[e,(n)u(n)| = 0 (5.76)
Accordingly, the variables u’(n)[hy — h(n — 1)] and €)(n) are orthogonal,
and so we may simplify Eq. (5.75) as follows:

E[n?(n)] = E[(hy — h(n — 1))"u(n)u"(n) (hy — n(n — 1))] + ein


(5.77)
where €,,;, = E[eg(n)], denoting the minimum mean squared error.
Let R denote the M-by-M ensemble-averaged correlation matrix of the
tap inputs:
R = E[u(n)u’(n)| (5.78)
Also, from Eq. (5.56) we have (assuming that the error vector ey has zero
mean and its elements are uncorrelated)

P(n -1)=— E{(hy — h(n — 1))(hy — (nn — 1))"] (5.79)


min

*To be consistent with the statistical analysis presented in Section 5.2, we should evaluate
the average mean-squared value et n(n) in two stages: (1) We assume that the tap-input vector
u(7) is known, and we average 7 *(n) with respect to the estimate h(n) prodcued by the RLS
algorithm. (2) We average the resulting mean-square value of 7 *(n) with respect to u(77). In the
analysis presented in Section 5.6, these two steps are combined into one. The final result is, of
course, the same.
146 INTRODUCTION TO ADAPTIVE FILTERS

To express the mean-square value of n(7) in terms of R and P(n — 1) we


use the trace of a matrix and its properties. By definition, the trace of a
square matrix equals the sum of its elements on the main diagonal. In the
case of a scalar the trace equals the scalar itself. Accordingly, the scalar-
valued expectation on the right-hand side of Eq. (5.77), denoted by &(7), is
the same as its own trace, as shown by

E(n) = E[(hy — h(n — 1))"u(n)u"(n)(hy — n(n — 1))|


= tr E[(hy — h(n — 1))"u(n)u"(n)(hy
— n(n =1))]
Since the operations tr[-] and E[-] are both linear, we may interchange
them. Therefore we may express &(7) in the equivalent form

E(n) = E|tr[(hy — h(n — 1))"u(n)u"(n)(hy


—h(n —1))]] (5.80)
Now we use the property that if A is an N-by-M matrix and B is an
M-by-N matrix, then the trace of the matrix product AB is the same as the
trace of the second possible matrix product BA. Thus, let
A = [hy — h(n — 1)] “u(n)u7(n)
and
B =h, — h(n — 1)
Then, we may write

tr[(hy — n(n — 1))"u(n)u"(n)(hy — h(n — 1))|


= tr[AB]
tr[ BA]
ir{(hy — h(n — 1))(hy — h(n — 1))7 u(n)u’(n)|
Hence, we may rewrite Eq. (5.80) as

E(n) = E{tr|
(hy~ h(n — 1))((hy = h(n — 1))"u(n)u"(n)]}
tr{ E|(hy,— h(n — 1))(hy — h(n — 1))"'u(n)u?(n)|\ (5.81)
We now observe that the least-squares estimate h(n — 1) is determined
solely by data available up to and including time n — 1. We may thus
assume that h(n — 1) is independent of the tap-input vector u(”) measured
at time n, in accordance with the independence theory discussed in Chapter
4. Hence, we have

E|(hy — (nn — 1))(hy — h(n — 1)" u(n)u?(n)|


= E|(hy — h(n — 1))(y — h(n — 1))"] Efu(n)u?(n)]
~ Enin P(n — 1)R
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 147

where we have used the definitions of Eqs. (5.78) and (5.79). We may thus
express &(n) as
§(n) = Emintt [P(r is 1)R] (5.82)
Correspondingly, we may rewrite Eq. (5.77) as

E[n°(n)| is Emintt |[P(n - 1)R] cf Emin (5.83)

For large n, we find from Eq. (5.20) that we may approximate the
ensemble-averaged correlation matrix R by the time-averaged correlation
matrix ®(n — 1)/(n — 1) as follows

R =~ =——1 O(n‘oi- 1)
Also, since P(n — 1) equals ®~'(n — 1), it follows that

tr[P(n — 1)R] = ze7 tr] ®-(n =1)O(7— 1)|

|
n= |
where I is the M-by-M identity matrix. With each element of the identity
matrix I equal to one, its trace equals M. Hence, we may approximate the
mean-square value of the true estimation error n(n), for large n, as follows:
M
E|n 2 (n)] me
is oe |

For n large compared to one, we may further approximate this result as

Blan) = ema +5| (5.84)


Equation (5.84) shows that mean-square convergence of the RLS algorithm
is attained, in theory, within less than 2M iterations, where M is the
number of tap coefficients in the tapped-delay-line filter.

5.7 COMPARISON OF THE RLS AND LMS ALGORITHMS

Part (a) of Fig. 5.2 shows a multidimensional signal-flow graph representa-


tion of the RLS algorithm. In particular, the following two relations,
constituting the adaptive and filtering operations of the RLS algorithm,
respectively, are represented in this graph:
h(n) = h(n — 1) + k(n) y(n)
n(n) = d(n) —ul(n)h(n — 1)
In addition, we have included a branch representing the fact that the old
estimate h(n — 1) may be obtained by applying the operator z 'I to the
updated estimate h(n).
148 INTRODUCTION TO ADAPTIVE FILTERS

In part (b) of the figure we have included, for the sake of comparison,
the corresponding signal-flow graph representation of the LMS algorithm.
The LMS algorithm is expressed in a way that h(n — 1) represents the old
estimate of the coefficient vector, and h(n) represents the updated estimate,
so that it is consistent with the notation used for the RLS algorithm.
Based on the signal-flow graphs of Fig. 5.2 and the theory presented in
previous sections, we may point out the following basic differences between
the RLS and LMS algorithms:
1. In the LMS algorithm, the correction that is applied in updating the old
estimate of the coefficient vector is based on the instantaneous sample
value of the tap-input vector and the error signal. On the other hand, in
the RLS algorithm the computation of this correction utilizes all the past
available information.
2. In the LMS algorithms, the correction applied to the previous estimate
consists of the product of three factors: the (scalar) step-size parameter
u, the error signal e(n — 1), and the tap-input vector u(m — 1). On the

d(n)

h(n)

Figure 5.2 Multidimensional signal-flow graph: (a) RLS algorithm, (b) LMS algorithm.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 149

other hand, in the RLS algorithm this correction consists of the product
of two factors: the true estimation error n(n) and the gain vector k(n).
The gain vector itself consists of ®~'(n), the inverse of the deterministic
correlation matrix, multiplied by the tap-input vector u(n). The major
difference between the LMS and RLS algorithms is therefore the pres-
ence of ®~'(n) in the correction term of the RLS algorithm that has the
effect of decorrelating the successive tap inputs, thereby making the RLS
algorithm se/f-orthogonalizing. Because of this property, we find that the
RLS algorithm is essentially independent of the eigenvalue spread of the
correlation matrix of the filter input.
. The LMS algorithm requires approximately 20M iterations to converge
in mean square, where M is the number of tap coefficients contained in
the tapped-delay-line filter. On the other hand, the RES algorithm
converges in mean square within less than 2M iterations. The rate of
convergence of the RLS algorithm is therefore, in general, faster than
that of the wMS algorithm by an order of magnitude.
. Unlike the LMS algorithm, there are no approximations made in the
derivation of the RLS algorithm. Accordingly, as the number of itera-
tions approaches infinity, the least-squares estimate of the coefficient
vector approaches the optimum Wiener value, and correspondingly, the
mean-square error approaches the minimum value possible. In other
words, the RLS algorithm, in theory, exhibits zero misadjustment. On the
other hand, the LMS algorithm always exhibits a nonzero misadjust-
ment; however, this misadjustment may be made arbitrarily small by
using a sufficiently small step-size parameter pu.
. The superior performance of the RLS algorithm compared to the LMS
algorithm, however, is attained at the expense of a large increase in
computational complexity. The complexity of an adaptive algorithm for
real-time operation is determined by two principal factors: (1) the
number of multiplications (with divisions counted as multiplications) per
iteration, and (2) the precision required to perform arithmetic operations.
The RLS algorithm requires a total of 3M(3 + M)/2 multiplications,
which increases as the square of M, the number of filter coefficients. On
the other hand, the LMS algorithm requires 2M + 1 multiplications,
increasing linearly with M. For example, for M = 31 the RLS algorithm
requires 1581 multiplications, whereas the LMS algorithm requires only
63.

5.8 OPERATION IN A NONSTATIONARY ENVIRONMENT

The RLS algorithm may be modified for operation in a nonstationary


environment by adopting a residual sum of weighted squares as the cost
function:
J(n) = Shae) (5.85)
i=1
150 INTRODUCTION TO ADAPTIVE FILTERS

where e(i) is the error signal defined in the same way as before, and w(n, /)
is a weighting factor with the property that
Oy ist yisil es a ey (5.86)
The use of the weighting factor is intended to ensure that data in the distant
past is “forgotten,” in order to afford the possibility of following statistical
variations in the incoming data when the filter operates in a nonstationary
environment. One such form of weighting that is commonly used in practice
is the exponential weighting factor defined by
w(n,i) =A", ah Pee | (5.87)
where A is a positive scalar equal to or less than one. The reciprocal of
1 — A is, roughly speaking, a measure of the memory of the exponentially
weighted RLS algorithm. Thus, for A = 1 all past data is weighted equally in
computing the updated coefficient vector h(n). On the other hand, for A < 1
the past data are attenuated exponentially, with the result that the present
data have a larger influence on the updating computation than the past
data. This, indeed, is the feature that we like to have in the adaptive process
when it is required to deal with the output of a time-varying channel or a
time-varying desired response.
Following a procedure similar to that described above, we may show
that the exponentially weighted RLS algorithm is described by the following
set of equations:
A! P(n — 1)u(n)
Bsa + Au? (n)P(n — 1)u(n) ae)
P(n) =A 'P(n — 1) —A7!k(n)u’(n)P(n— 1) (5.89)
n(n) = d(n) — w'(n)h(n— 1) (5.90)
h(n)
= h(n — 1) + k(n) n(n) (5.91)
The derivations of these relations are left as an exercise for the reader. What
is really important to note, however, is the fact that the introduction of the
exponential weighting factor only affects the computations of the gain
vector k(n) and the estimation error correlation matrix P(n). The initial
conditions are chosen in the same way as before.

5.9 NOTES

Theory
Least-squares estimation has an old history, going back to Gauss [1] for its
origin in 1809. It is discussed in detail in many textbooks in mathematics,
e.g., Lawson and Hanson [2], Stewart [3], Miller [4], Draper and Smith [5],
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 151

and Weisberg [6]. See also the books by Ljung and Séderstrom [7], Hsia [8],
Goodwin and Payne [9], Franklin and Powell [10].
The recursive least squares (RLS) algorithm was apparently first derived
in 1950 by Plackett [11]. However, the algorithm has been derived indepen-
dently by several other authors; see, for example, Hastings-James [12]. The
book by Ljung and Séderstrém [7] presents a comprehensive treatment of
the algorithm. This book also contains an extensive list of references on the
subject.
The RLS algorithm represents a time-varying, nonlinear stochastic
difference equation. Accordingly, the analysis of its convergence behavior is
in general difficult. Ljung [13] has shown that a time-invariant deterministic
ordinary differential equation can be associated with the algorithm. The
stability properties of this equation are tied to the convergence properties of
the algorithm. This approach to convergence analysis is also described in
detail in the honk by Ljung and Séderstrém [7].
Mueller [14] presents a theory for explaining the first convergence of the
RLS algorithm, with particular reference to the use of this algorithm in
adaptive equalizers. Examining the solution of the deterministic normal
equations (5.10) after exactly M iterations (where M is the number of tap
coefficients), we find that in the noiseless case the tap-input vectors
u(1), u(2),..., u(M) are linearly independent of each other for the data
sequences that are usually used for equalizer startup. Accordingly, in the
special case where the transfer function of the channel is of the all-pole type
and of order M — 1, the residual sum of squares J(n) will remain zero for
n > M, and the least-squares estimate h(n)= h(M). Thus, after only M
iterations, the RLS algorithm yields a coefficient vector that is only asymp-
totically attainable by the LMS algorithm.
The RLS algorithm, as derived in the chapter, applies to real-valued
data. In [15], Mueller develops the complex form of the RLS algorithm for
dealing with complex-valued data.
The RLS algorithm is closely related to Kalman filter theory. The
derivation of the Kalman filter [16], [17] is based on modelling a linear
dynamical system by the following pair of equations:
1. A state equation, describing the motion of the system:
x(n + 1) = ®(n + 1,n)x(n) + v(n)
where the M-by-1 vector x(n) is the state of the system and ®(n + 1,7)
is a known M-by-M state transition matrix relating the states of the
system at times n + 1 and n. The M-by-1 vector v() represents errors
introduced in formulating the motion of the system.
2. A measurement equation, describing the observation process as follows:
y(n) =C(n)x(n) + e(n)
where the N-by-1 vector y(n) denotes the observed data, and C(n) 1s a
152 INTRODUCTION TO ADAPTIVE FILTERS

known N-by-M measurement matrix. The N-by-1 vector e(n) represents


measurement errors.
To apply the Kalman filter theory to an adaptive tapped-delay-line filter, we
choose the coefficient vector of the optimum Wiener structure as the state
vector. When the filter operates in a stationary environment, the error
performance surface has a fixed orientation, so that the optimum condition
is invariant with time. We thus write for the state equation
ho(n + 1) =h,(n)
or
x(n + 1) = x(n)
Accordingly, under these conditions, the state transition matrix ®(n + 1,7)
equals the identity matrix, and the error vector v(m) is zero. For the
measurement equation, we write
d(n) = u'(n)ho(n) + eo(n)
where d(n) is the desired response, u() is the tap-input vector, and e)(7)
is the error signal that results from the use of the optimum Wiener structure.
We may thus identify d(n), u’(n), and e,(n) with the observation vector
y(n), the measurement matrix C(n), and the error vector e(”) in the
Kalman filter theory, respectively.
Godard [18] was the first to use the model described above for the
application of the Kalman filter theory to an adaptive tapped-delay-line
filter. The result is an algorithm that is essentially identical to the RLS
algorithm. Sorenson [19], Berkhout and Zaanen [20], Ljung and Séderstrom
[7], Goodwin and Payne [9] give detailed expositions of the connections
between the Kalman filter theory and recursive least-squares estimation.
Falconer [21] presents a survey of RLS algorithms and other adaptive
algorithms, including applications.
In order to deal with a nonstationary environment, an exponential
weighting factor is introduced into the mechanization of the RLS algorithm,
as described in Section 5.8. On the other hand, to apply the Kalman filter
theory to design an adaptive tapped-delay-line filtering algorithm for opera-
tion in a nonstationary environment, we model the state equation as
[18,22, 23]
ho(n + 1) = ho(n) + Ah(n)
where Ah(7) is a white-noise process. According to this state equation, the
optimum coefficient vector executes a “random walk” from one iteration to
the next, so as to model the physical fact that in a nonstationary environ-
ment the error performance surface is continually in a random state of
motion.
Friedlander and Morf [24] present a recursive least-squares algorithm
for adjusting the coefficients of an adaptive linear phase filter with a
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 153

symmetric impulse response. The filter has a tapped-delay-line structure


with (1) an odd number of tap coefficients, (2) the center coefficient
constrained to be unity, and (3) the remaining tap coefficients having even
symmetry about the center point. The result is that the filtering process may
be viewed as one of symmetrically smoothing the input signal. For this
reason, the filter is termed the symmetric smoother. Amin and Griffiths [25]
discuss the use of such a filter in the tracking of time-varying sinusoids.
They present results illustrating the properties of the symmetric smoother in
both a stationary and a nonstationary environment, using two algorithms:
the exact least-squares algorithm [24] and the LMS algorithm. Very little
difference (if any) was observed between these two algorithms. Also the
study shows that care must be taken when interpreting the spectral esti-
mates obtained with this spectral estimation procedure.

Fast Algorithms
As mentioned in Section 5.7, the major limitation of the RLS algorithm,
compared to the LMS algorithm, is that it requires a number of multiplica-
tions that increases as the square of M, the number of tap coefficients,
whereas the LMS algorithm requires a number of multiplications that
increases linearly with M. Morf et al. [26] describe a fast implementation of
the RLS algorithm, which requires a computational complexity that is linear
in M. Thus the fast RLS algorithm offers the improved convergence
properties of least-squares algorithms at a cost that is competitive with the
LMS algorithm. The computational efficiency of the fast RLS algorithm is
made possible by exploiting the so-called shifting property that is encoun-
tered in most sequential estimation problems. We may describe the shifting
property as follows. If at time n, the tap-input vector u(7) is defined by

u(n)
heya i 1)

nn M + 1)

then incrementing the time index n by 1, we have

u(n + 1)

u(n + 1) = ee

un = M + 2)

That is, with the arrival of the new data sample u(n + 1), the oldest sample
154 INTRODUCTION TO ADAPTIVE FILTERS

u(n — M + 1) is discarded from the tap-input vector u(7), and the remain-
ing samples u(n), u(n — 1),...,u(n — M + 2) are shifted back in time by 1
sample duration, thereby making room for the newest sample u(n + 1). In
this way, the tap-input vector is updated.
The fast RLS algorithm is based on ideas originally introduced by Morf
[27]. An explicit application of these ideas to adaptive algorithms is devel-
oped by Ljung et al. [28], and the application to adaptive equalizers is
described by Falconer and Ljung [29]. The derivation of the fast RLS
algorithm uses an approach that parallels that of the Levinson-Durbin
recursion in that it relies on the use of forward and backward linear
predictors (that are optimum in the least-squares sense) to derive the gain
vector k(n) as a by-product. This is done without any manipulation or
storage of M-by-M matrices as required in the conventional form of the
RLS algorithm. The determination of the forward and backward predictors
is based on the matrix form of the normal equations (5.14) and the
minimum residual sum of squares given in Eq. (5.17). The derivation of the
(exponentially weighted) fast RLS algorithm for real-valued data is given by
Falconer and Ljung [29], and for complex-valued data it is given by Mueller
[15]. Falconer et al. [30] discuss the hardware implementation of the fast
RLS algorithm, and they show that the algorithm can be partitioned so as to
be realizable with an architecture based on multiple parallel processors. This
implementation issue is also discussed by Lawrence and Tewksbury [31].
A serious limitation of the fast RLS algorithm, however, is that it has a
tendency to become numerically unstable. Mueller [15] reports computer
simulations of an adaptive equalization problem that included the use of the
fast RLS algorithm with an exponential weighting factor. When single-preci-
sion floating-point arithmetic (i.e., 24 bits for the mantissa) was used,
unstable behavior of the fast RLS algorithm resulted. However, the use of
double-precision arithmetic (i.e., 56 bits for the mantissa) eliminated the
instability problem. Using computer simulation, Lin [32] found that the
instability and finite precision problems of the fast RLS algorithm are
closely related to an abnormal behavior of a quantity in this algorithm. In
particular, this quantity may be interpreted as a ratio of two autocorrela-
tions; therefore, it should always be positive. However, when the fast RLS
algorithm is implemented using finite-precision arithmetic, the computed
value of this quantity sometimes becomes negative, thereby causing the
performance of the algorithm to degrade seriously. Lin [32] describes a
method to re-initialize the algorithm periodically so as to overcome this
problem.
Cioffi and Kailath [33], and Cioffi [34] have developed another fasz,
fixed-order, least-squares algorithm for adaptive tapped-delay-line filter ap-
plications which requires slightly fewer operations per iteration and exhibits
better numerical properties than the fast RLS algorithm. The approach
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 155

taken by Cioffi and Kailath is based on a vector-space interpretation of the


least-squares problem. The residual sum of squares, defined as J(n) in Eq.
(5.2), is interpreted as the squared length of an appropriately defined error
vector. Elementary geometrical concepts are then used to minimize this
length, and the solution to the least-squares problem is thereby obtained.
Additionally, extensive use is made of a special vector called a pinning
vector [35] or time annihilator [36] to solve the least-squares problem
efficiently and recursively. The algorithm thus developed involves the use of
four tapped-delay-line filters that act on the input data. The algorithm is
said to be “fixed-order’” because the length of the tapped-delay-line filters is
determined prior to the adaptive filtering operation. Thus a set of algorith-
mic quantities is derived in terms of the tapped-delay-line filters and their
outputs. Another distinctive feature of the algorithm is a special investi-
gation that is made of these algorithmic quantities during the initialization
period 1 < n < ut (where M is the number of filter coefficients). In particu-
lar, an exact solution to the least-squares problem is determined for this
period as a prefix to the desired steady-state solution for n => M. Special
characteristics of the solution during the initialization period are exploited
to obtain a 70-percent reduction in computational complexity for n > M.
Cioffi and Kailath [33] and Cioffi [34] describe unnormalized and normal-
ized versions of the algorithm, with the normalized one offering excellent
numerical performance.
Carayannis et al. [37] have also independently derived a modified fast
RLS algorithm that requires fewer operations than the conventional form of
the fast RLS algorithm. This algorithm has the same computational require-
ments as the unnormalized version of the fast transversal filter algorithm
described by Cioffi and Kailath [33]. The computational efficiency of the
new algorithm developed by Carayannis et al. is mainly due to an alterna-
tive definition of the gain vector, which exploits the relationship between
forward and backward linear prediction more efficiently than the fast RLS
algorithm.

Applications
Early applications of the RLS algorithm to system identification were
reported by Hastings-James [12] and Astrom and Eykhoff [38].
The book by Ljung and Séderstrom [7] describes the application of
recursive identification techniques (including the RLS algorithm) to off-line
identification, adaptive control, adaptive estimation, and adaptive signal
processing (e.g., adaptive equalization, adaptive noise cancelling).
Marple [39] presents a fast, efficient RLS algorithm for modelling an
unknown dynamic system as a finite-impulse response (FIR) or tapped-
delay-line filter. Marple and Rabiner [40] measure the performance of this
156 INTRODUCTION TO ADAPTIVE FILTERS

system identification algorithm by using it to estimate a variety of FIR


systems excited by either white noise or a speech signal. For the case of
white-noise inputs, close to ideal performance was achieved. However, for
the case of speech signals rather poor performance was obtained; this was
attributed to the lack of certain frequency bands in the excitation. Marple
[41] presents efficient algorithms for system identification filters with linear
phase.

Gain
(dB)

al)

=i)
i=) 1000 2000 3000 4000
Frequency (Hz)

Delay
(ms)

0 1000 2000 3000 4000


Frequency (Hz)

Figure 5.3 Characteristics of voice channel used by Gitlin and Magee [42]; reproduced with
permission of the IEEE.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 157

LMS
1?) , pre)
algorithm
Output
squared
average
error algorithm

Minimum mean-squared error

01 2
0 100 200 300 400 500 600 700 800 900 1000
Number of iterations

Figure 5.4 Comparison of the output average squared errors produced by the LMS and the
RLS algorithms, as reported by Gitlin and Magee [42]; reproduced with permission of the
IEEE:

Gitlin and Magee [42] have compared the performance of the LMS and
RLS algorithm” for adaptive equalization in a noisy environment, using the
voice-channel characteristics shown in Fig. 5.3. For the 3l-tap equalizer
used in the simulation, the largest-to-smallest eigenvalue ratio of the correla-
tion matrix of the tap inputs was 18.6. The data rate used in the simulation
was 13,2000 bits per second. To transmit the digital data through the
channel, four-level vestigial sideband modulation was used, with the carrier
frequency at 3.455 kHz. Gaussian-noise samples were added to the channel
output to produce a signal-to-noise ratio of 31 dB. Figure 5.4 shows the
simulation results for the LMS and RLS algorithms. The conclusion to be
drawn from Fig. 5.4 is clear. The RLS algorithm reaches convergence within
60 iterations (data symbols), while the LMS algorithm requires about 900
iterations. This shows that for this application the rate of convergence of the

~ *Gitlin and Magee [42] refer to the RLS algorithm as Godard’s algorithm, in recognition
of its derivation by Godard [18] using Kalman filter theory. They also refer to the LMS
algorithm as the simple gradient algorithm.
158 INTRODUCTION TO ADAPTIVE FILTERS

LMS
algorithm

5
ae)
&
Ss

z
vo
op
fa}
o>
se}

= Fast RLS
£ algorithm
eq

Minimum mean
squared error

0.01 ee | (hae le
0 20 40 60 80 100 120 140
Number of iterations

Figure 5.5 Comparison of the output average squared errors produced by the LMS and fast
RLS algorithms, as reported by Falconer and Ljung [29]; reproduced with permission of the
Melee

RLS algorithm is faster than that of the LMS algorithm by more than an
order of magnitude.
Falconer and Ljung [29] present computer simulation results for the
same set of parameters as those considered by Gitlin and Magee [42]. The
fast RLS algorithm* was implemented with the weighting constant A = 1.
The results of the experiment, presented in Fig. 5.5, show that the fast RLS
algorithm does indeed retain the fast convergence property of the conven-
tional RLS algorithm.

*Palconer and Ljung [29] refer to the fast RLS algorithm as the fast Kalman algorithm.
They also refer to the LMS algorithm as the simple gradient algorithm.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 159

Further computer simulation results on the comparative evaluation of


the LMS, RLS, and fast RLS algorithms, applied to adaptive equalization
for data transmission using quadrature amplitude modulation over a voice
channel, may be found in Mueller [15]. The results reported here indicate
that, for the voice channel used, the RLS and fast RLS algorithms behave
very similarly (the difference in the output mean squared errors produced by
the two algorithms being always smaller than 0.01 dB) and convergence (to
within 3 dB of the minimum mean squared error) about 3 times faster than
the LMS algorithm.
Hsu (23] describes an algorithm based on the Kalman/Godard algo-
rithm for updating the tap coefficients of a decision-feedback equalizer for
fading dispersive HF channels. Figure 5.6 shows a block diagram of the
equalizer. It uses two sections, a feedforward section and a feedback section
in the form of tary d-delay-line filters. The number of feedforward taps and
the numbers of feedback taps are set to be greater than and equal to the
number of interfering symbols contained in the discrete-time model of the
channel, respectively. The receiver makes decisions on a symbol-by-symbol
basis. When a decision is made on the Ath transmitted symbol the feedback
section forms a weighted linear combination of the previous symbol deci-
sions, assumed to be correct. The output of the feedback section is sub-
tracted from that of the feedforward section, and the result is then applied
to a threshold device to determine the current symbol decision. This has the
effect of cancelling the ISI caused at the output of the feedforward section
by previous transmitted symbols. The idea of using previous decisions to
cope with the ISI problem was first described by Austin [43]. For a review
paper on decision feedback equalizers see Belfiore and Park [44].
Reddy et al. [45] discuss the use of a modified RLS algorithm (and a
lattice algorithm) for adaptive echo cancellation, where the modification is
made to overcome the effects of double talking. The modified algorithm
detects the start of the double-talking interval and effectively freezes the
adaptation during such intervals. Simulations are presented to verify the
validity of the algorithm. Soong and Peterson [46] use computer simulation
to compare the LMS and fast RLS algorithms applied to adaptive echo
cancellation.

re
Equalized
eal Feedforward Threshold PT
Fu(n)} section device signal

Feedback
section

Figure 5.6 Block diagram of decision feedback equalizer.


160 INTRODUCTION TO ADAPTIVE FILTERS

REFERENCES

_C. F. Gauss, “Theoria Motus Corporum Coelestium in Sectionibus Conicus Solem


Ambientum” (Hamburg, 1809; translation: Dover, 1963).
_C.L. Lawson and R. J. Hanson, “Solving Least-Squares Problems” (Prentice-Hall, 1974).
_ G. W. Stewart, “Introduction to Matrix Computations” (Academic Press, 1973).
_ K.S. Miller, “Complex Stochastic Processes: An Introduction to Theory and Application”
(Addison-Wesley, 1974).
_ N.R. Draper and H. Smith “Applied Regression Analysis” (Wiley, 1966).
_ S. Weisberg, “Applied Linear Regression” (Wiley, 1980).
_ L. Ljung and T. Sdéderstrém, “Theory and Practice of Recursive Identification” (MIT
Press, 1983).
. T.C. Hsia, “Identification: Least Squares Methods” (Lexington Books, 1977).
_ G. C. Goodwin and R. L. Payne, “Dynamic System Identification: Experiment Design and
Data Analysis” (Academic Press, 1977).
_ G. F, Franklin and J. D. Powell, “Digital Control of Dynamic Systems” (Addison-Wesley,
1980).
_ R.L. Plackett, “Some Theorems in Least Squares,” Biometrika, vol. 37, p. 149, 1950.
_ R. Hastings-James, “Recursive Generalized Least-Squares Procedure for Online Identifica-
tion of Process Parameters,” Proc. IEE (London), vol. 116, pp. 2057-2062, December 1969.
. L. Ljung, “Analysis of Recursive Stochastic Algorithms,” IEEE Trans. Automatic Control,
vol. AC-22, pp. 551-575, 1977.
. M.S. Mueller, “On the Rapid Initial Convergence of Least-Squares Equalizer Adjustment,”
Bell Syst. Tech. J., vol. 60, pp. 2345-2358, December, 1981.
. M. S. Mueller, “Least-Squares Algorithms for Adaptive Equalizers,” Bell Syst. Tech. J.,
vol. 60, pp. 1905-1925, October 1981.
. R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Trans.
ASME, Journal of Basic Engineering, vol. 82D, pp. 35-45, March 1960.
. §. A. Tretter, “Introduction to Discrete-Time Signal Processing” (Wiley, 1976).
. D. Godard, “Channel Equalization Using a Kalman Filter for Fast Data Transmission,”
IBM J. Research and Development, vol. 18, pp. 267-273, 1974.
. H. W. Sorenson, “Kalman Filtering Techniques,” Advances in Control Systems, vol. 3
(Academic Press, 1968).
. A. J. Berkhout and P. R. Zaanen, “A Comparison between Wiener Filtering, Kalman
Filtering, and Deterministic Least Squares Estimation,” Geophysical Prospecting, vol. 24,
pp. 141-197, 1976.
Ml, D. D. Falconer, “Adaptive Filter Theory and Applications,” in book “Lecture Notes in
Control and Information Sciences,” edited by A. Bensoussan and J. L. Lions (Springer-
Verlag, 1980), pp. 163-188.
IPR Qi-tu Zhang and S. Haykin, “Tracking Characteristics of the Kalman Filter in a Nonsta-
tionary Environment for Adaptive Filter Applications,” Proceedings IEEE International
Conference on Acoustics, Speech, and Signal Processing, Boston, Massachusetts, April
1983, pp. 671-674.
Ay F. M. Hsu, “Square Root Kalman Filtering for High-Speed Data Received Over Fading
Dispersive HF channels,” IEEE Trans. Information Theory, vol. IT-28, pp. 753-763,
September 1982.
24. B. Friedlander and M. Morf, “Least Squares Algorithm for Adaptive Linear-Phase
Filtering,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-30, pp.
381-390, June 1982.
78), M. Amin and L. J. Griffiths, “ Time-varying Spectral Estimation Using Symmetric Smooth-
ing,” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
Processing, Boston, Massachusetts, April 1983, pp. 9-12.
ADAPTIVE TAPPED-DELAY-LINE FILTERS USING LEAST SQUARES 161

26. M. Morf, L. Ljung, and T. Kailath, “Fast Algorithms for Recursive Identification,” Proc.
IEEE Conference on Decision and Control (Clearwater Beach, Florida, December 1976).
. M. Morf, “Fast Algorithms for Multivariable Systems,” Ph.D. Dissertation, Stanford
University, Stanford, California, 1974.
. L. Lyung, M. Morf, and D. D. Falconer, “Fast Calculation of Gain Matrices for Recursive
Estimation Schemes,” International Journal of Control, vol. 27, pp. 1-19, 1978.
29. D. D. Falconer and L. Ljung, “Application of Fast Kalman Estimation to Adaptive
Equalization,” IEEE Trans. Communications, vol. COM-26, pp. 1439-1446, 1978.
30. D. D. Falconer, V. B. Lawrence, and S. K. Tewksbury, “Processor-Hardware Considera-
tions for Adaptive Digital Filter Algorithms,” Proceedings IEEE International Conference
on Communications (Seattle, Washington, June 1980), pp. 57.5.1-57.5.6.
. V. B. Lawrence and S. K. Tewksbury, “Multiprocessor Implementation of Adaptive
Digital Filters,’ IEEE Trans. Communications, vol. COM-31, pp. 826-835, June 1983.
. D. W. Lin, “On Digital Implementation of the Fast Kalman Algorithms,” submitted to
IEEE Trans. Acoustics, Speech and Signal Processing.
BS J. M. Cioffi and T aailath, “Fast, Fixed-Order, Least-Squares Algorithms for Adaptive
Filtering” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
Processing Boston, Massachusetts, April 1983, pp. 679-682.
34. J. M. Cioffi, “Fast, Fixed-Order Least-Squares Algorithms for Communications Applica-
tions,” Ph.D. Dissertation, Stanford University, Stanford, California, 1984.
35) T. Kailath, “ Time-Variant and Time-invariant Lattice Filters for Nonstationary Processes,”
“Proceedings Fast Algorithms for Linear Dynamical Systems” (Aussios, France, 1981), pp.
417-464.
36. D. T. L. Lee, M. Morf, and B. Fiedlander, “Recursive Least-Squares Ladder Estimation
Algorithms,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp.
627-641, June 1981.
37. G. Carayannis, D. G. Manolakis, and N. Kalouptsidis, “A Fast Sequential Algorithm for
Least-Squares Filtering and Prediction,’ IEEE Trans. Acoustics, Speech, and Signal
Processing, vol. ASSP-31, pp. 1394-1402, December 1983.
38. K. J. Astrém and P. Eykhoff, “System IdentificationA Survey,” Automatica, vol. 7, pp.
123-162, 1971.
39: S. L. Marple, Jr., “Efficient Least Squares FIR System Identification,’ IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp. 62-73, February 1981.
40. S. L. Marple, Jr., and L. R. Rabiner, “Performance of a Fast Algorithm for FIR System
Identification Using Least-Squares Analysis,” Bell Syst. Tech. J., vol. 62, pp. 717-742,
March 1983.
41. S. L. Marple, Jr., “Fast Algorithms for Linear Prediction and System Identification Filters
with Linear Phase,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-30,
pp. 942-952, December 1982.
42. R. D. Gitlin and F. R. Magee, Jr., “Self-Orthogonelizing Adaptive Equalization Algo-
rithms,” IEEE Trans. Communications, vol. COM-25, pp. 666-672, 1977.
43. M. Austin, “Decision Feedback Equalization for Digital Communication over Dispersive
Channels,” MIT Res. Lab. Electron., Tech. Rep 461, August 1967.
44. C. A. Belfiore and J. H. Park, Jr., “Decision Feedback Equalization,” Proc. IEEE, vol. 67,
pp. 1143-1156, August 1979.
45. V. U. Reddy, T. J. Shan, and T. Kailath, “Application of Modified Least-Squares
Algorithms to Adaptive Echo Cancellations,” Proceedings IEEE International Conference
on Acoustics Speech, and Signal Processing (Boston, Massachusetts, April 1983), pp.
53-56.
46. F. K. Soong and A. M. Peterson, “Fast Least-Squares (LS) in the Voice Echo Cancellation
Application,” Proceedings IEEE International Conference on Acoustics, Speech, and
Signal Processing (Paris, France, May 1982), pp. 1398-1403.
CHAPTER

SIX
ADAPTIVE LATTICE FILTERS

In Chapter 3 we showed that the operation of a multistage Jattice filter is


completely described by specifying the sequence of reflection coefficients
that characterize the individual stages of the filter. For a given input, the
values assigned to these reflection coefficients depend on the criterion used
to optimize each stage of the filter. In this chapter we consider /east-mean-
square (LMS) lattice algorithms that result from attempting to minimize the
mean-square value of the forward prediction error or the backward predic-
tion error, or the sum of their mean-square values, at the output of each
stage. When the filter input is stationary, the backward prediction errors are
orthogonal to each other, with the result that the successive stages of the
lattice filter are decoupled from each other. This means that the global
optimization of a multistage lattice filter may indeed be accomplished as a
sequence of /ocal optimization problems, one at each stage of the lattice
filter. Accordingly, it is a straightforward matter to increase the order of the
lattice filter by simply adding one or more stages without affecting the
earlier design computations.

6.1 THE FORWARD-BACKWARD LATTICE METHOD

Consider a lattice filter of order M. For stage m of the filter, m = 1,2,..., M,


the flow of signals is described by the following pair of equations (see Fig.
6.1):

Fn(t) isnt) an SAG rs 1) (6.1)

byt) =o ag = Le eat) (6.2)

162
ADAPTIVE LATTICE FILTERS 163

MO)

m-1 (= W

Figure 6.1 A single stage of the lattice filter used in the development of the forward—backward
method.

where m = 1,2,...,M. For stage m, the forward reflection coefficient is


denoted by y,/”, and the backward reflection coefficient is denoted by y;””.
We have purposely used different symbols for these two reflection coeffi-
cients for reasons that will become apparent presently.
In the so-called forward—backward method, we choose the forward
reflection coefficient y{/) to minimize the mean-square value of the forward
prediction error f,,(i), and the backward reflection coefficient y{”) to
minimize the mean-square value of the backward prediction error b,,(/).
Both of these prediction errors are measured at the output of stage m. We
assume that both y\/ and y,”) are nonrandom parameters. Using Eq. (6.1),
7

we find that the mean-square value of /,,(7) is given by

Se eC
= B| fea] + [W]e [20 - 0]
oy Ee) brent ae 1)] (6.3)

Differentiating the mean squared error e'/) with respect to y//? and setting
the result equal to zero, we find that the optimum value of y(/ is defined by

ces ecm “os


Plifp ii be me 1)]

Similarly, we may show that the optimum value of the backward reflection
coefficient, for which the mean-square value of the backward prediction
164 INTRODUCTION TO ADAPTIVE FILTERS

error b,,(i) is minimized, is given by

1( De ili£t 1)| (6.5)


vy) tas As

ae
m= 1(
(i)|

Time-Update Recursions
When the time series applied to the input of the lattice filter is stationary,
and the forward and backward reflection coefficients of each stage of the
lattice filter are set at their optimum values, we have
ECG) =2lb Gal) lane (6.6)
where M is the order of the lattice filter. Under these conditions, we find
from Eqs. (6.4) and (6.5) that
43 =70).-_lsmsM (6.7)
When, however, the filter input is Pe: we find that, in
general, the two reflection coefficients y\/} and y,{’) have unequal values.
Also, there is no guarantee that they both will have a magnitude less than
one.
To use the formulas of Eqs. (6.4) and (6.5) to compute the forward and
backward reflection coefficients for stage m of the lattice filter, we need
estimators for the expectations in the numerators and denominators of these
formulas. Assuming a time series of n samples, we may use the estimators
tabulated below:

Expectation Eeeaion

1
El fn bp aG re 1)], ==Sy \"~ Tae 6h) Bye \@ c 1)
Pat

Een Gt) a,
Ny Na ae C8)
Baath

11 Un Cie 9) -soe Arey oye Seal


et

In each case, the estimate is in the form of an exponentially weighted time


average. The positive weighting constant X included in the time average is
confined to the interval 0 < A < 1. For a stationary input, we put A = 1.
Accordingly, we may express the estimates of the forward and back-
ward reflection coefficients as follows, respectively:

km—1(n)
YG po a (6.8)
EX),
m1 (n-1)
ADAPTIVE LATTICE FILTERS 165

and

(b) eas ke
m-1 am)
Yn (a) ED, (n) (6.9)
where

SscPeh i) — d NO er Cl) Dee ia 1) (6.10)

E®,(n) = > A ipa (i) (6.11)


i=1

and

E{?(n) = DN ent) (6.12)


i=1

Note that since the filter input equals zero for i < 0, by assumption, we
have
isl n

Die Popa eee PN” Diet eel)


i=1 i=1

We next modify Eqs. (6.10), (6.11), and (6.12) that define k,,,(n),
E‘?)\(n), and E”),(n), so that we may compute them recursively. Consider
first Eq. (6.10). By separating out the term corresponding to i = n from the
summation, we may rewrite this equation as follows:
= i

key = LesLe Penton eto (i) Deni 1)

i Sh

Ee a Px (i one (Dae Dita i(n ae 7a)

Sie eee mee ee ets 1) ,6.13)


Similarly, we may show that

Fay NE ta 1) ayaa) (6.14)


and

En )= ABE = 1) Bia eae) (6.15)

Equations (6.13)—(6.15) are the desired ei airecursions.

Order-Update Recursions
Having computed the forward reflection coefficient y\/’(n) and the back-
ward reflection coefficient Ye) for stage m of the lattice filter, we may
next compute for i = 1,2,...,” the forward prediction error f,,(7) and the
166 INTRODUCTION TO ADAPTIVE FILTERS

backward prediction error b,,(i) at the output of this stage by using the
order-update relations:

Talia fan ray Gn yen th t); 1 0 (6.16)

Bil
m = Bit = 1) Ye CE) fat h)s tae eevee eee
where y(n) and y(n) are treated as constants for the time interval
aren.
Extending the definition of Eq. (6.11) to filter order m, we may express
the estimate E“”
m
for the variance of the forward prediction error as follows:

EY(n) = YN ¥2(i)
i=]

Da Neen re YP (1) By Gi et vy’


i=1
n n

ey ez Neri) a 27a) = RO fate ae a 1)


i=1 i=1

+[y(n)]?
¥ Nb _G - 1) (6.18)
1=1

Substituting the definitions of Eqs. (6.8), (6.10), (6.11), and (6.12) in (6.18),
we get

2h n(n)
ED ((n) ) = EP,
mn 1( (n) E®) mt
(n — 1) foal
m 1( We )

k a
si m1) EY) (n a 1)
EO (n a. 1)

ke we(FE)
= EY (n co 6.19
Ma Otay ie
which is the desired order-update recursion. Similarly, using Eq. (6.12), we
may develop the companion order update recursion for the estimate of the
variance of the backward prediction error:

Ey TBO Ween eee (6.20)

Thus, given the quantities k,,_,(n), E{,(n),


m=] and E'”)
m— (n — 1) that per-
tain to filter order m — 1, we may use the order-update recursions of Eqs.
(6.19) and (6.20) to compute E\(n) and E{”)(n) for filter order m.
m
ADAPTIVE LATTICE FILTERS 167

Summary of the Forward—Backward Lattice Algorithm


There are two different ways of implementing the forward—backward lattice
algorithm, depending on how we update the forward and backward predic-
tion error variances. For this updating we have the choice of using either the
time update recursions of Eqs. (6.14) and (6.15), or the order update
recursions of Eqs. (6.19) and (6.20), as summarized below.

Version 1
1. Initialize the algorithm by setting
k,,(0) = 0, LA oadURNBier |

EXD) (0)=E(0)=E(-1)=c, m=0,1,...,.M-1


where c is an a priori estimate of the prediction-error variance.
2. At each step, compute the zeroth-order (i.e. m = 0) variables

fo(”) = by(n) = u(n)


where u(n) is the filter input at time n.
3. For filter order m = 1,..., M compute

kee Ue) ra Miley (nt = 30) + feed) Open t - 1)

yP(n) = seewere)
Bn = 1)

(b) ee etn)
m—1
Vn (n) ral
EL?)

[Ea per): VP (1) Bp (0 a)

b,,(n) = by —1(n Sag) Woo OI) fant)

Bn NE el ata)
Be NE el) t Bn)

Version 2
1. Initialize the algorithm by setting
k,,(0) = 0, Od
On sala I

EDO) = EOO) SEH1) Se) av 0) 1, 1


where c is an a priori estimate of the prediction-error variance.
168 INTRODUCTION TO ADAPTIVE FILTERS

2. For each instant of time n, compute the various zeroth-order (m = 0)


variables:

fo(n) = b(n) = u(n)


ED (n) = E6(n) = NEW(n — 1) + w?(n)
where u(n) is the value of the filter input at time n.
3. For filter order m = 1,2,..., M, compute

km—(n) y AK —1(n os 1) fe St) Dee el = 1)

(f)
kee
pe Aces
ok
Ym (n) =
EX)(n — 1)

(b) bs eee kets


nad (n)
Ym (n) aa
EX) (n)
Sa) = een Uta Ya Ch negra)
DAF )im Deer emt) he O11ofa CH)
EW = ES (n kn—1(1)
fost

ko ate)
Eehn De (ne
EK, (n)

The basic difference between the two versions of the algorithm as sum-
marized above is that in version | the time-update recursions of Eqs. (6.14)
and (6.15) are used for updating the values of the forward and backward
prediction-error variances, whereas in version 2 these variances are updated
by using the order-update recursions of Eqs. (6.19) and (6.20).

6.2 THE BURG METHOD

In the forward—backward method described above there is no guarantee


that both the forward and the backward reflection coefficients will always
have magnitude less than one. This can be a serious limitation when the
signal-processing application of interest includes both analysis and synthesis
(as, for example, in linear-predictive encoding of speech). This limitation
may be overcome by using the Burg method. For stationary inputs, the Burg
method is usually considered to be the standard against which the perfor-
mance of other estimates is judged.
ADAPTIVE LATTICE FILTERS 169

In the Burg method the forward and backward reflection coefficients


assume equal values. Indeed, development of the lattice-filter theory in
Chapter 2 led us to this conclusion for stationary inputs. For stage m in the
filter, we thus write (see Fig. 6.2)

frat rea pent) ty P net (dec) (6.21)


OUST EUs) oa ee ere) (6.22)
where m = 1,2,..., M. In the Burg method we choose the reflection coeffi-
cient y,, SO as to mir.‘mize the cost function

cee ler ELPa (eI (6.23)


which equals the sum of the mean-square values of the forward and~
backward prediction errors. Substituting Eqs. (6.21) and (6.22) in (6.23), we
get

] 2_,@@- D]}0 + ¥2)


i)£[6
= {e[/2_,(+
(6.24)
+ 4y,,E[ fn—1(i) Pn—1(i — 1)]
Differentiating ¢,, with respect to y,,, and setting the result equal to zero, we
find that the optimum value of this reflection coefficient equals

oFfeat) — UI = MN Da oncn, BME (6.25)


Vin Ov
E[ f2_,(i) +_,(i- 1]
Equation (6.25) is known as the Burg formula.

byt ei 1)

Figure 6.2 A single stage of the lattice filter used in the development of the Burg method.
170 INTRODUCTION TO ADAPTIVE FILTERS

Properties of the Burg Formula

Property 1. When the reflection coefficient of stage m equals the optimum


value Y,,, the mean-square values of the forward and backward prediction
errors at the output of stage m are as follows, respectively:
Elfr—1()]
El fa(i)] = 1 - ¥n,0) (6.26)
and
ELbmi = 1)]
E[bn(i)] = (1 = Yn.0) (6.27)
To prove Eq. (6.26), we use Eqs. (6.21), with y,, replaced by y,,9, to
write

Fly, @)|) 2G AG) Hino!


|G, P= 4)
ee Len Onan = 1)|

We next use~Eq: (6.25) to elimmaté #7, 4(7)8,,.54— Ll" irom: this


equation, obtaining

= bee i FA Baz ee; 1)])

= (1 a Veg) El eae)
which is the desired result. Similarly, we may prove Eq. (6.27) by using Eqs.
(6.22) and (6.25).

Property 2. The optimum reflection coefficient y,, .equals the harmonic


mean of the forward reflection coefficient and the backward reflection coeffi-
cient:

ge
lg
eee ge (6.28)
NAV oeeth
This result follows directly from Eqs. (6.4) and (6.5) for the forward and
backward reflection difficulties. Accordingly, the Burg method is sometimes
referred to as the harmonic-mean method.

Property 3. The optimum reflection coefficient y,,9 always satisfies the


requirement
Ym = 1-4 ferallm (6.29)
In other words, the Burg formula always yields a minimum-phase
condition for the use of the lattice filter as a forward prediction-error filter.
To prove this property, we use the statistical correlation coefficient of the
prediction errors f,,_,(i) and 6,,_ (i — 1), defined by
A ey ee ee 1)]
= (6.30)
-1)] wee
{E| fr (2) E|b._,(i
which is always less-than or equal to one in magnitude. Define a dimension-
ADAPTIVE LATTICE FILTERS 171

less constant

El teen) bes (6.31)


E[b2_,(i- 1)|
m

Then we may express the optimum forward reflection coefficient ¥/2 of Eq.
(6.4) and the optimum backward reflection coefficient y{%) of Eq. (6.5) in
terms of p,, and a,, as follows: .

gh ki
Pr,
(6.32)
and

% 4 = Pm (6.33)

Substituting Eqs. (6.32) and (6.33) in (6.28), we get


1
ee
1 |
ae
1 |
——
Ym,0 22m Aim

Since we always have a,, + (1/a,,) = 2, and |p,,| < 1, it follows that

= Il for all m
Ym,0

from which Eq. (6.29) follows immediately.

6.3 DISCUSSION

In addition to the forward—backward method and the Burg method de-


scribed above, there are several other methods that may be used to design
lattice filters. These other methods include:
1. The forward method, in which the forward reflection coefficient and the
backward reflection coefficient for stage m are assigned a common value
equal to y,/? defined by Eq. (6.4).
2. The backward method, in which the forward reflection coefficient and the
backward reflection coefficient for stage m are assigned a common value
equal to y,(”) defined by Eq. (6.5).
We thus see that the forward-backward method described in Section 6.1
is a hybridization of the forward and backward methods. All three methods
have a common theoretical problem: they do not guarantee a minimum-phase
condition for the lattice structure used as a forward prediction-error filter.

3. In the geometric-mean method the value assigned to both the forward


reflection coefficient and the backward reflection coefficient equals

Ym —
aa OLN (6.34)
VE[ #2-1@)] [62,4 - 0]
172 INTRODUCTION TO ADAPTIVE FILTERS

We observe that, except for a minus sign, the reflection coefficient of Eq.
(6.34) equals the geometric mean of the forward reflection coefficient
defined by Eq. (6.4) and the backward reflection coefficient defined by
Eq. (6.5); hence the name of the method. We also observe that the
formula of Eq. (6.34), except for the minus sign, may be interpreted as
the statistical correlation between f,,_,(/) and 6,,_,(i — 1). As with the
Burg method, we therefore have |y,,,| < 1 for all m.
4. In the minimum method, we compute the forward and backward reflec-
tion coefficients defined by Eqs. (6.4) and (6.5), respectively, and choose
the one with the smaller magnitude. If it turns out that |!) < |y,(”)|, we
assign y/)} to both the forward and Bae reflection coefficients of
stage m. If, on the other hand, we find that |") < |y{/}|, we assign y,’),
to both the forward and backward reflection coefficients of stage m. It
turns out that if the magnitude of either y{/} or y{”) is greater than one,
the magnitude of the other is necessarily less than one. This follows from
Eqs. (6.32) and (6.33). Hence, the use of the minimum method will
always yield a minimum-phase lattice filter.
From the above discussion, we observe that the Burg method, the
forward method, the backward method, and the forward-backward method
are all the direct results of the minimization of an error criterion. However,
only the Burg method guarantees a minimum-phase condition for forward
prediction-error filtering. We also observe that, although both the geomet-
ric-mean method and the minimum method do guarantee a minimum-phase
forward prediction-error filtering operation, neither method can be derived
directly by minimizing some error criterion. We therefore conclude that the
Burg method is in the unique position of being the direct result of mini-
mizing an error criterion and always producing a minimum-phase design. It
should, however, be stressed that a minimum-phase filter design is necessary
only if the problem involves both analysis and synthesis, as, for example, in
linear predictive encoding of speech (see the discussion of Section 3.7). If,
on the other hand, the problem of interest only requires analysis, then
clearly the minimum-phase requirement of the forward prediction-error
filter is not necessary.

6.4 BLOCK IMPLEMENTATION OF THE BURG METHOD

In a block estimation procedure, one value of the reflection coefficient for


each stage of the lattice filter is estimated for a whole block of input data.
Thus for stage m in the lattice filter, we describe the flow of signals by the
pair of relations
fAb) SH ey! Yano ead) (6.35)
b,,(i) = Oe 2 i Cie Lette) fees a) (6.36)
ADAPTIVE LATTICE FILTERS 173

where i = 1,2,...,, and Y,(") 18 an estimate of the reflection coefficient


that is made at time n. Every time a new sample is added to the block of
input data, the analysis is repeated. In this section we describe a block
estimation procedure based on the Burg formula of Eq. (6.25).
Consider the general case when the sequence {u(i)}, i = 1,2,..., 7, is
applied to the filter input. We use exponentially weighted time averages as
estimates of the expectations in the numerator and denominator of Eq.
(6.25). The exponential form of weighting is included to take account of
nonstationarities that ma, *e present in the input data. In particular, we use
the following estimates:
1. For the expectation E[f,_,(i)b,,-,;@ — 1)] in the numerator of Eq.
(6.25) we use the estimate
jl n ae

n realy jE A Ue 1)
i=1

where A is a positive weighting constant that is less than or equal to one.


For stationary input data, we put A = 1.
2. For the expectation E[f?_,(n) + b2_,(n — 1)] in the denominator of
Eq. (6.25), we use the estimate
1 n = ;

=, Ds x” oe) se by, (i ia 1)
i=]
Correspondingly, the estimate of the reflection coefficient for stage m in
the lattice filter, and at time n, is given by
k Bi)
Yur) = —- ac
a EG ime Sal eee (6.37)
6.37

where

ay) = 2 oY, Nees) ba 2 (i i OF Ma Lee eae V7, (6.38)


i=1

and

Eat) = ye Ron feet) ay Daal = Ole 1 — eee VE.


i=1

(6.39)
Note that the computation of both k,,_,(m) and E,,_,(n) depends on
the forward and backward prediction errors produced at the output of
stage m — 1 in the lattice filter.
When the filter input is stationary, the individual stages in the lattice
filter are decoupled from each other, and the repeated use of Eqs. (6.37) to
(6.39) results in a globally optimum design for the complete filter. When,
174 INTRODUCTION TO ADAPTIVE FILTERS

however, the filter input is nonstationary, the decoupling property of the


lattice filter is only approximately satisfied, and the resultant design of the
lattice filter is suboptimal. Nevertheless, this design is still of practical
utility.

Summary of the Block Estimation Procedure Based on


the Burg Formula
1. Start with the initial condition:

fo(i) = bo (i) = ui), lee ere (6.40)

where u(i) is the filter input at time /.


2. With m = 1, compute the reflection coefficient estimate y,(”) for stage 1
in the lattice filter.
3. Given y,(n), compute the forward prediction error f,(i) and the back-
ward prediction error 5,(i) at the output of stage 1 for all values of time
i up ton.
4. Repeat the computation for m = 2,...,.M, to account for all stages in
the lattice filter.
5. Repeat the whole computation for all values of time 7 up to n + 1, and
so on.
A drawback of the block estimation approach is that it requires a large
amount of computation and a large amount of storage. Also, for any stage
in the filter an estimate of the reflecting coefficient at time n + 1 does not
depend in a simple way on the previous estimate at time n. These limita-
tions may be overcome by using an adaptive estimation procedure, as
described in the next section.

6.5 ADAPTIVE IMPLEMENTATION OF THE BURG METHOD

We may formulate the adaptive estimation procedure for a lattice filter as


follows. Given (1) the old estimate of the reflection coefficient y,,(”) of
stage m in the lattice filter, m= 1,2,...,M, and (2) the forward and
backward prediction errors for stage m and the previous stages up to and
including time n, the requirement is to compute the updated estimate of the
reflection coefficient y,,(7 + 1), m = 1,2,..., M, and the updated values of
the forward and backward prediction errors at time n + 1. In carrying out
this adaptive estimation, the forward and backward prediction errors are
computed only once for each time instant, thereby reducing the computa-
tional burden (in comparison with the block estimation procedure).
In this section we will describe two different (but equivalent) procedures
for the derivation of an adaptive lattice algorithm based on the Burg
ADAPTIVE LATTICE FILTERS 175

method. The first procedure uses a minor modification of the Burg formula.
The second procedure uses an approach similar to that used for the
development of the LMS algorithm used for the adaptive operation of
tapped-delay-line filters.

Derivation of the Adaptive Lattice Algorithm Using a Minor


Modification of the Burg Formula
To compute the updated estimate of the reflection coefficient y,,(” + 1), we
use the formula of Eq. (6.37), but with a minor modification. In particular,
we substitute y,,(” + 1) for y,,(”) in this formula, and thus write

k m—1(1)
Yn(n + 1) = 7 m=1,2,...,M (6.41)
E,—1(”)

We make this modification in order to make sure that the correction which
is applied to the old estimate y,,(m) depends only on past values of the
forward and backward prediction errors that are available at time n. The
exact nature of this correction will be determined presently. It is clear,
however, that an estimation error is incurred in the use of Eq. (6.41) instead
of (6.37).
We may compute the quantities k,,_,() and E,,_,(n), recursively, as
follows:

k(n") = AK m—1(n Slat 2fn—1(1) By —1(n Fail) (6.42)

edt) = NE, —1(4 aly) ie) a by,-4(n sal) (6.43)

Dif hit) Op) (oR 1) Ge, K,-\(")

(PSE nae ete 1) Ge E,,-\(")

(b)

Figure 6.3 Signal-flow graphs for the computation of (a) k,, (7), (b) Em —1(")-
176 INTRODUCTION TO ADAPTIVE FILTERS

Equation (6.42) is obtained from (6.38) by singling out the term


2f,,-;(”)b,,_;(n — 1), corresponding to i= 17, and then expressing the
remainder as E,,(n — 1) multiplied by A. Similarly, Eq. (6.43) is obtained
from (6.39) by singling out the term f?_,(n) + b;_,(m — 1), corresponding
to i =n, and then expressing the remainder as E,,_,(”) multiplied by A.
The signal-flow graphs shown in Fig. 6.3 illustrate the recursive computa-
tions of k,,_,(m) and E,,_ ,(n).
We may express the time-update recursion for the reflection-coefficient
estimate of stage m in the lattice filter as follows:

Vilar) sty ym) F072) (6.44)


where 6,m () is a correction term to be determined. Using the definition of
Eq. (6.41) for y,,(m + 1), and also rewriting this expression into the corre-
sponding form for y,,(”), we may express the correction term 6,,(”) as
follows:

Saya Cpe Lan rt)


M km—1(1) we k(n z= 1)

Eu) E,,-\(" a 1)

1 BAR ean
SoS k 71 m
- 2 (6.45)
Es
m=) al mm) Balt)

Substituting the recursive relations of Eqs. (6.42) and (6.43) in (6.45), we get

eee Sea le = 1) + 2f,-1(1)b,


y(n — 1)
~ fal DIB, (n= 1) 442 (0) +R an — 0]
; BRAG =)

Fat Laat) + ateaI}


+%m(n)[ f2-1(n) + b2-1(n
—1)]} oe)
where in the last line we have expressed —k,,_,(n — 1)/E,,_,(n — 1) as
the reflection-coefficient estimate y,,(n). Next we rewrite Eq. (6.46) as
ADAPTIVE LATTICE FILTERS 177

follows:

§,(n) = — Saray
E,_ ne mew n)[B,— ree ay, (n Venn )]

no ae ec) seep) Drees 1)]}

nae ae 1(n)b,,
(1) + by,So ae) (6.47)
|e

where in the last line we have used the order-update recursions for stage m
in the lattice filter. Thus, substituting Eq. (6.47) in (6.44), we get the desired
time-update recursion for the reflection-cvefficient estimate for stage m of
the lattice filter:

PraTcte b= yp.(1) = Pealy Lm (2) Bn (2) + Bp 1(% = 1) fn (2)


(6.48)
where E,,,_,(7) is itself computed recursively by means of Eq. (6.43). Note
that whereas the computation of E,,_,() requires knowledge of the forward
prediction error f,,_,(”) and the delayed backward prediction error 5b,,_,
(n — 1) at the input of stage m, the correction term in Eq. (6.48) requires, in
addition to these two variables, knowledge of the forward prediction error
f,,(”) and the backward prediction error b,,(n) at the output of this stage.
Figure 6.4 shows a signal-flow graph representation of the time update
recursion of Eq. (6.48). For the special case when the input of the lattice
filter is stationary, the exponential weighting constant A = 1. In this case we
find that £,,_,(m) increases in a continuous manner, iteration after itera-
tion. Correspondingly, the correction term in Eq. (6.47) goes to zero as n
goes to infinity.
The main advantage of this adaptive estimation procedure over the
block estimation procedure of Section 6.4 is the reduced computation and
storage requirements. However, this improvement is attained at the cost of a
noisy estimate for the reflection coefficient, because the adaptive estimation
procedure is an approximation to the block estimation procedure.

ie (ie) Oe ct en i 1) f,.(7) Vie (rte)


E,-1\")

Figure 6.4 Signal-flow graph representation of the adaptive estimation procedure.


178 INTRODUCTION TO ADAPTIVE FILTERS

Interpretation of the Adaptive Lattice Algorithm as a Stochastic


Gradient Algorithm
The adaptive lattice algorithm of Eq. (6.48) may also be derived in another
way. Let ¢,,(”) denote the cost function defined as the sum of the mean-
square values of the forward and backward prediction errors at the output
of stage m of the lattice filter at time n, the reflection coefficient of which
has the value y,,(”). Then, following the procedure described in Section 6.2,
we may express €,,(”) as follows [in particular, see Eq. (6.24)]:

eH) — (E Bes
femal(n)| + E[b?eel (n — 1)]}[1 + 72(n)|
+4y,,(2)E | fin—1(1)b_—1("
— 1)] (6.49)
where the forward prediction error f,,_,(”) and the delayed backward
prediction error b,,_,(n — 1) refer to the input of stage m of the filter. The
dependence of the cost function ¢,,(”) on time n arises because of the
variation of the reflection coefficient y,,(”) with time. By differentiating
€,,() with respect to y,,(”), we get the following expression for the
gradient:

_ 9&,(N)
Vee) = dy,,(n)

= 2Y_(nn){ E[ f2_ i(n))] + E| b?_,(n — 1)]}

+4E|f, oy a(n - 1)] (6.50)

Using instantaneous values for the mean-square values in Eq. (6.50), we get
an instantaneous estimate for the gradient V,,,(”):

Vintt) = ay (mie
ipa i(n Nia mM 1( Ls 1)] ie 4fn-1(1)b,\(n = 1)

(6.51)
Clearly, this is an unbiased estimate in that its expected value equals the
true value of the gradient V,,,(7). Thus, by analogy with the LMS algorithm
for the adaptation of the coefficients of a tapped-delay-line filter, we may
write the following time-update recursion for the reflection coefficient of
stage m of the lattice filter:

Ym (M+ 1) = Ym (1) = dm (1)Vn (7)


Ta Yn (7) af It, (n) {Yn (
(n)| hee 1( Dae 1( he 1)|

+2 fA4(n)b., is=i (6.52)


where ,,(”) 1S a time-varying step-size parameter. The correction term
equals the difference between the updated estimate y,,(m + 1) and the old
estimate y,,(”). According to Eq. (6.52), this correction term is exactly the
same as that of Eq. (6.46), provided that we choose the time-varying
ADAPTIVE LATTICE FILTERS 179

step-size parameter y,,() equal to the reciprocal of E,,_,(n):


i
I a (7) eer
aes (6.53)
5

where E,,,_,(1) is itself defined by the recursive relation of Eq. (6.43).


We conclude therefore that the adaptive estimation procedure, de-
scribed by Eqs. (6.48) and (6.43), is equivalent to a stochastic gradient or
normalized LMS algorithm in which the step-size parameter p,,,() is varied
with time in accordance with Eqs. (6.53) and (6.43). This procedure is thus
commonly referred to as the gradient adaptive lattice (GAL) algorithm.*
It is important to appreciate that in this algorithm a variance estimate is
required at each stage of the lattice filter because both the forward and
backward prediction-error variances ordinarily decrease with increasing
number of stages in the filter. Thus, by varying the step-size parameter
L,,(1) inversely with the variance estimate E,,_,() for each stage m of the
filter, we are able to maintain the same adaptive time constant and misad-
justment at each stage of the lattice filter.
Finally, by following a procedure similar to that described above, we
may also develop the stochastic gradient version of the forward—backward
method for the adaptation of a multistage lattice filter.

Summary of the Gradient Adaptive Lattice Algorithm


Starting with the initial conditions

Wh O= 0
Ear (0) ae

where m = 1,2,..., M, and c is an a priori estimate of the prediction-error


variance, proceed as follows:

1. At each time instant set

fo(n) = bo(n)= u(n)


2. For m =1,2,...,M@, compute

Ym(® + 1) = Ym) = [fn—1(4) Om) + bale = I Sn ())/Em—1 4)


fal +1) = fal +1) + Yq (2 + 1)Bya(n)
Or Wire Oran) tay, (itl) 7An 2 1)
ane teen ener aN? Let, 24(1)
*The GAL algorithm is sometimes formulated with the correction term in the time-update
recursion of Eg. (6.48) multiplied by a scalar. This issue is discussed near the beginning of
Section 6.8 below.
180 INTRODUCTION TO ADAPTIVE FILTERS

6.6 CONVERGENCE PROPERTIES

Determination of the convergence behavior of a multistage lattice filter is


rather complex because, at the initiation of the adaptation procedure, the
various stages are not decoupled from each other. In particular, each
successive stage must decouple from the preceding stages before it can begin
to converge to its optimum value.
The decoupling action proceeds qualitatively as follows. Each stage in
the lattice filter attempts to minimize the mean-square value of the error
output by adapting to the most prominent spectral component of the input
signal for that stage. Thus, when a new signal is presented to the filter, the
filter starts to adapt to the most prominent spectral component of that
signal. At the same time, because the first stage has not yet started to filter
out this component, the second (and higher-order) stages are also attempt-
ing to adapt to the same component of the signal. However, owing to the
improving action of the first stage, we find that this spectral component at
the output of the first stage is continually changing, with the result that
proper adaptation of the second stage in the lattice filter is impaired. This
process continues until the first stage has adapted sufficiently to that signal
component, so that it is no longer the most prominent spectral component
at the output of the first stage. As further adaptation by this stage has little
overall importance with respect to the signal as perceived by the second
stage, the first stage may be said to have effectively “adapted” at that point.
The second stage then “decouples” from the first stage (i.e., its filtering
action becomes independent of it), and begins adapting to the nexr largest
spectral component remaining in the signal after filtering by the first stage.
Higher-order stages in the lattice filter decouple in a similar fashion.
An adaptive lattice filter has the useful property that the rates of
convergence of its reflection coefficients to their optimum values are essen-
tially independent of the eigenvalue spread of the correlation matrix of the
filter input. Accordingly, in those applications where the input signal is i//
conditioned (in the sense that its correlation matrix has a large eigenvalue
spread), the use of an adaptive lattice filter is preferred to an adaptive
tapped-delay-line predictor based on the simple LMS algorithm.

6.7 JOINT PROCESS ESTIMATION

When a stationary input is applied to a multistage lattice filter, the sequence


of backward prediction errors produced at the various stages in the filter are
orthogonal to each other. Furthermore, there is a one-to-one correspon-
dence between the sequence of input samples u(n), u(m — 1),...,u(n — M)
and the sequence of backward prediction errors by(n), b)(n),..., by{n),
ADAPTIVE LATTICE FILTERS 181

where M is the order (i.e., the number of stages) of the filter. Accordingly,
we may utilize this sequence of backward prediction errors as inputs to a
corresponding set of tap coefficients, w(0),w(1),...,w(M), to produce the
minimum-mean-square estimate of a desired response d(n).
The transformation between the sequence of input samples u(n), u(n —
1),...,u(m — M) and the sequence of backward prediction errors
by(n), b(n), ..., by(m) may be expressed as follows [see Eq. (3.27)]:

b,,(n)= »0 a, (m—k)u(n—k), m=0,1,...,M (6.54)


k=0
where a,,(0),a,,(1),...,4@,,(m) are the coefficients of a prediction-error
filter of order m. These coefficients are in turn uniquely defined by the set of
reflection coefficients y,,¥>,...,Y,, of the corresponding lattice filter. We
may rewrite Eq. (6.54) in matrix form as follows (see Section 3.10):
b(n) = Lu(n) (6.55)
where b(n) is the (M + 1)-by-1 vector of backward prediction errors:

by(”)
b(n) = ae (6.56)
mee
the (M + 1)-by-1 vector u() is the input vector:
u(n)
u(n) = ng int (6.57)
ALE
and the (M + 1)-by-(M + 1) transformation matrix L is a lower triangular
matrix defined by

il 0) 0 ve 0
a,(1) 1 0 vee
L=| 42(2) a,(1) ; sean (cr)

Ps eeonihatied GMED)i 25. ( wil


Note that (1) all the elements on the main diagonal of the matrix L equal
one, (2) all the elements above this diagonal are zero, and (3) the matrix L is
nonsingular.
182 INTRODUCTION TO ADAPTIVE FILTERS

Define the (M + 1)-by-1 vector of tap coefficients:

w(0)
1
w= a (6.59)

w(M )
Then with the backward prediction-error vector b(”) used as the input to
the tap-coefficient vector w, we may express the estimate of the desired
response as

=w' b(n) (6.60)


The error signal equals the difference between the desired response d(n)
and the estimate d(n), that is,
e(n) = d(n) —d(n) (6.61)
Let S denote the (M + 1)-by-(M + 1) correlation matrix of the back-
ward prediction error vector b(n):
S = E[b(n)b"(n)] (6.62)
Let q denote the (M + 1)-by-1 cross-correlation vector between the desired
response d(n) and the backward prediction-error vector b(7):
q = E[d(n)b(n)| (6.63)
Then we may use the Wiener filter theory to define the optimum value of the
tap coefficient vector by the matrix equation
Sw) = q (6.64)
It is informative to relate the optimum tap-coefficient vector w, to the
optimum Wiener solution h, that is obtained by using a conventional
tapped-delay-line filter. From Eq. (2.35) we have
Rh, = p (6.65)
where R is the (M + 1)-by-(M + 1) correlation matrix of the input vector
u(n):

R = E [u(n)u"( n)| (6.66)


and p is the (M + 1)-by-1 cross-correlation vector between the desired
response d(n) and the input vector u(7):
p = E[d(n)u(n)] (6.67)
Substituting Eq. (6.55) in (6.62) and using the definition of Eq. (6.66), we get
ADAPTIVE LATTICE FILTERS 183

(see Section 3.10)

S = LRL’ (6.68)
where L’ is an upper diagonal matrix that is the transpose of L. Similarly,
substituting Eq. (6.55) in (6.63) and using Eq. (6.67), we get

q=Lp (6.69)
Thus, substituting Eqs. (6.68) and (6.69) in (6.64) and comparing the results
with Eq. (6.65), we get the desired relationship between the two optimum
vectors W, and hy, namely,

Li w, = h, (6.70)
Equivalently, we may write

W = (L7!) "hy (6.71)


where L~' is the inverse of the transformation matrix L.
Since the backward prediction errors are mutually orthogonal for a
stationary input, the correlation matrix S is a diagonal matrix, as shown by
Sieg (Py itaseey) (6.72)
where |
Pee be a), Me nt SOG eae
Hence, the inverse of the matrix S is also a diagonal matrix:

Sor diastase ere) (6.73)


From Eq. (6.64) the optimum tap-coefficient vector equals
w= S ‘q (6.74)
Accordingly, from Eqs. (6.63), (6.73), and (6.74) we find that the optimum
value of the mth tap coefficient equals

Wo(m) = E[d(n)b,,(n)] ; m=0,1,...,M (6.75)


E|b;,(n)|
We may express this formula in a different but equivalent form by using
the fact that the backward predictor errors are mutually orthogonal. Define
d(n), m=0

em-1(") = a(n) ~ E w(k)P() m=1,2,...,M Gyo)

The cross-correlation between the desired response d(n) and the backward
184 INTRODUCTION TO ADAPTIVE FILTERS

prediction error b,,(n) may now be expressed as


gi Mh

E[d(n)b,,(2)] = Elem) on(a)] + Le w(k) Eb. (2) bn (72)]


k=
(6.77)
Since the backward prediction errors are mutually orthogonal, that is,
E[b,(n)b,,(n)] =0, k#m, (6.78)
we may simplify Eq. (6.77) as
E|d(n)b,.(n)|
= Ele, (7 )b,An)|, m= 0,1,-.5, M4 (6:79)
Therefore, substituting Eq. (6.79) in (6.75), we may rewrite the formula for
the optimum value of the mth tap coefficient as follows

Wy(m) = E[en—(n)b,(n)| n= Verlag 2 M (6.80)


E[b2(n)|
where e_,(”) = d(n).
We next address the issue of making the lattice-based joint-process
estimator adaptive. In particular, we wish to develop an algorithm for
adjusting the tap coefficients of the estimator so that they approach their
optimum values defined by Eq. (6.64), starting from some predetermined
initial conditions. Basically, there are two methods for performing this
adaptation. In the first method, based on the formula of Eq. (6.75), the tap
coefficients are adjusted to maintain orthogonality between each of the tap
inputs (i.e., the backward prediction errors) and the error signal e(n),
defined as the difference between the desired response and the output of the
(M + 1)-tap adaptive filter. In the second method, based on the formula of
Eq. (6.80), the tap coefficients are adjusted to maintain orthogonality
between the b,,(m) and the error signal e,,(m), defined as the difference
between the desired response and the output of the m-tap adaptive filter
with the last (M + 1)-m taps set equal to zero, where m = 1,2,..., M. We
refer to these two methods as adaptive lattice joint-process estimator A and
B, respectively.

Adaptive Lattice Joint-Processor Estimator A


The structure of this adaptive filter is shown in Fig. 6.5. It is similar to the
conventional form of adaptive tapped-delay-line filter in that it uses a single
error signal, e(n), to adjust the set of tap coefficients. However, it differs
from the conventional adaptive tapped-delay-line filter in an important
respect. In a conventional tapped-delay-line filter the tap inputs consist
of successive (correlated) samples of the input signal, that is, u(n),
u(n — 1),...,u(m — M). On the other hand, in the adaptive filter of Fig.
aandepy (u)a
i03e
wy

ounsly
$9 oy]

185
ammjonys
Jo aandepe3o1y}e] ssssz001d-ju
Sd ‘
Iol“UONPUITIS
UO[SISA
UOISIOA
S V"V
186 INTRODUCTION TO ADAPTIVE FILTERS

6.5 the tap inputs consist of (orthogonalized) backward prediction errors,


namely, b)(1), b,(n),..., by(n), that are generated by the multistage lattice
predictor.
In Fig. 6.5 the error signal is defined by
M
e(n) =d(n) — > w(m,n)b,,(n) (6.81)
m=0

where w(0,7),w(1,7),...,w(M, n) are the tap coefficients at time n.


The formula of Eq. (6.75) for the optimum value of the mth tap
coefficient represents the solution of the equation
0
Pe (A= 0, Hie OeeV

We may therefore express the instantaneous estimate of the gradient (with


respect to the mth tap coefficient) as
A idea)
Vi dw(m,n)
< de(n)
PTC arene) (6.82)

From Eq. (6.81) we have


de(n)
dw(m,n) = b,, (1) (6.83)

Hence, substituting Eq. (6.83) in (6.82), we get


Vann) = —2e(n)b,(n) (6.84)
Accordingly, by analogy with the normalized version of the LMS algorithm
used to update the coefficients of a tapped-delay-line filter, we may express
the algorithm for updating the mth tap coefficient in the adaptive filter of
Fig. 6.5 as follows:
w(m,n + 1) = w(m,n) — 38,,(2)¥,,(7)
w(m,n)+ B,,(n)e(n)b,,(n), m=0,1,...,M
(6.85)
where e(n) is the error signal and 6,,(n) is the backward prediction error
applied to the input of the mth tap coefficient. The time-varying step-size
parameter B,,(n)
m
is defined by
1
B,,(1) = E®(n) (6.86)

where

ED (nN (Mente (i) (6.87)


ADAPTIVE LATTICE FILTERS 187

and A is a positive real constant that is less than or equal to one. Here again
a time-varying step size parameter is used in the adaptation process in order
to keep the overall rate of convergence of the adaptive filter in Fig. 6.5
insensitive to disparity in the eigenvalues of the correlation matrix of the
lattice predictor input.

Adaptive Lattice Joint-Process Estimator B


Figure 6.6 shows the structure of this second lattice-based adaptive filter. It
differs from the structure of Fig. 6.5 in that it uses for the adaptation of the
tap coefficients a corresponding set of error signals, e)(7), e,(N),..., @y(”),
with each error signal responsible for the adaptation of its respective tap
coefficient. Thus the structure of Fig. 6.6 operates in the same spirit as the
lattice predictor in that the adaptation is performed on a stage-by-stage
basis. The error signal e,,(7) 1s defined by

d —w(0,n)b : = 0)
€,,(n) = Mer ving Pol oe (6.88)
wR. n)b, (4),
Egan) 5M
m=1,2,..
where d(n) is the desired response, and w(0,n),w(1,7),..., w(M,n) are the
adjustable tap coefficients at time n.
The formula of Eq. (6.80) for the optimum value of the ‘mth tap
coefficient represents the solution of the equation:

a
@ ee
; = 0, SS Oem, rasa,
rarer [ez(n)]} ue

where e,,(7) is itself defined by Eq. (6.88) for m = 0,1,2,..., M. We may


therefore define the instantaneous estimate of the gradient (with respect to
the mth tap coefficient) as follows

; de? (n)
Vimn(n) = dw(m,n)

de, (n)
= 2d ewan:
cea COSTS Sa m = 0 Mec don AY! (6.89)
6.89

From Eq. (6.88), we have

eA) S —b(n), m=0,1,...,M (6.90)


dw(m,n)
Hence, substituting Eq. (6.90) in (6.89), we get
V("2) = —2e,(2)b,(n), m=0,1,...,M (6.91)
Thus, by analogy with the normalized LMS algorithm, we may express
the algorithm for updating the mth tap coefficient of the adaptive filter in
i
i sANsIYy
‘G UOISIOA UONRUINS? ssodo1d-juIof soMIe] aandepe oy} JO sIMIONNS BY] 9°9
W 2818 I aseis

188
ADAPTIVE LATTICE FILTERS 189

Fig. 6.6 as
w(m,n+1)=w(m,n) —48,(n)v,,(n)
w(m,n) +B, (n)e,,(n)b,,(n), m=0,1,...,M
(6.92)
where the time-varying step-size parameter B,,(n) is defined by Eqs. (6.86)
and (6.87). As with version A of the adaptive lattice joint-process estimator
of Fig. 6.5, a time-varying step-size parameter is used to update each tap
coefficient of the structure in Fig. 6.6 so as to make the overall rate of
convergence insensitive to eigenvalue spread.

Discussion

The main difference between the two update algorithms of Eqs. (6.85) and
(6.92) is that the former algorithm (pertaining to the structure of Fig. 6.5)
only provides a single error signal e(n), whereas the second algorithm
(pertaining to the structure of Fig. 6.6) provides a set of individual error
signals, {e,,(7)}, m = 0,1,..., M, one for each tap.
Ignoring the effects of algorithm self-noise, the two adaptive lattice
joint-process estimators of Figs. 6.5 and 6.6 should provide identical results.
However, the results of computer simulation experiments indicate that the
algorithm self-noise produced by the structure of Fig. 6.5 may be consider-
ably greater than that of the structure in Fig. 6.6.
The adaptive lattice joint-process estimator of Fig. 6.6 has another
advantage over that of Fig. 6.5 in that it provides a mechanism for
determining the optimum number of stages in a time-varying environment.
Specifically, the mean-square value of the error signal e,,(m) must be a
minimum in m (i.e., the number of stages involved in its computation),
because the time constant of the adaptive lattice is proportional to the
number of stages, and later stages have longer time constants which cannot
track the input.
For these reasons we find that, in practice, the second adaptive lattice
joint-process estimator of Fig. 6.6 is preferred to that of Fig. 6.5.

6.8 NOTES

Theory
Makhoul [1, 2] presents an integrated treatment of the forward method, the
backward method, the forward—backward method, the geometric-mean
method, the minimum method, and the Burg method for the design of a
lattice filter. The geometric-mean method was originated by Itakura and
190 INTRODUCTION TO ADAPTIVE FILTERS

Saito [3]. The Burg method (harmonic-mean method) was originated by


Burg [4].
It appears that the development of adaptive lattice filter algorithms was
originated first by Srinath and Viswanathan [5], and independently by
Griffiths [6]. Srinath and Viswanathan developed a recursive algorithm [a
variation of that in Eq. (6.48)] for the identification of parameters of an
autoregressive process. The gradient adaptive lattice algorithms proposed by
Griffiths in [6] were unnormalized in that their formulation involved the use
of fixed step-size parameters as in the conventional LMS algorithm. This
was corrected later by Griffiths in [7]. Using our notation, the gradient
adaptive lattice (GAL) Gage proposed by Griffiths, is as follows:

Pie a ND eeSpe
rey
Fax Whit) Open t= Lae ey eae)
On—1

where pe prediction-error variance o,_,(n) is defined by


O,,—
*_\(n) = a,0, _(n—1) +(1 — a,)[ f?- i(n) + b2_\(n— 1)]

and a, and a, are positive constants. Griffiths’s algorithm is identical to that


of Eq. (6.48) upon making the following identifications:
Griffiths gradient adaptive Gradient adaptive lattice
lattice algorithm algorithm of Eq. (6.48)
2
On—1 (n )
be 8

a, r
The derivation of the adaptive lattice algorithm of Eq. (6.48) presented in
Section 6.5 follows the approach described by Makhoul and Viswanathan
[8]. This paper also includes a description of the block estimation approach
presented in Section 6.5. A variation of the adaptive lattice algorithm of Eq.
(6.48) is also reported by Durrani and Murukutla [9].
Makhoul and Cosell [10] suggest the minimization of the following cost
function (for stage m of the lattice filter):
e= E[(1—a)f2?(i) + ab2(i)], O<saK<l
as the basis of an adaptive lattice filter design. The constant a determines
the mix between the forward and backward prediction errors. The optimum
value of the reflection coefficient for this stage, for which e is minimum, is
defined by

Yo,m(@) =
E[ fn-1(i)bm—1(i
— 1]
m— (i yet
Elaf, Sas i= 1)|
ADAPTIVE LATTICE FILTERS 191

Three special cases of this result are of interest:


1. When a = 0, we get yo ,,(0) = y//),, which is the optimum reflection
coefficient for the forward method.
2. When a = 1, we get y,,(1) = y§"), which is the optimum reflection
coefficient for the backward method.
3. When a = 45, we get yp ,,(3) = Yo, Which is the optimum reflection
coefficient for the Burg method.
The adaptive estimate of the reflection coefficient of stage m in the lattice
filter is further generalized by using a window, w(n), as shown by

ye w(n - genre Ga)Dee =)


i=l
¥,(n + 1,a) =
Morr (cree es
II _
ee ee eans
(6.93)
The window w(n) must satisfy two conditions:
1. The window must be causal:
w(n) = 0, n<0
Otherwise, the estimate of Eq. (6.93) cannot be evaluated, because future
values of the forward and (delayed) backward prediction are not avail-
able.
2. The window must be positive definite:
w(n) > 0, n>=0
For a = 4, this condition is sufficient for the magnitude of the reflection
coefficient in Eq. (6.93) to be less than or equal to one for all signals.
The window w(n) may be viewed as the impulse response of a causal filter.
Makhoul and Cosell [10] suggest real-pole filters of the form

Wie) ts ee al elren(hie heel (6.94)


(lieve lye
where W(z) is the z-transform of w(n). The transfer function of Eq. (6.94)
represents a multiple-pole filter that is uniquely characterized by two
parameters: N, the order of the filter, and A, the pole location. For example,
when N = 1, we have
Ww ao ee
il
@) 1-Az!
the inverse z-transform of which equals
w(n) =X"
This window is recognized as the exponential window.
192 INTRODUCTION TO ADAPTIVE FILTERS

Both the forward—backward method discussed in Section 6.1 and the


Burg method discussed in Sections 6.3—6.5 assume that the correlation
matrix of the filter input is Toeplitz. This assumption is perfectly justified
when the filter input is stationary. When, however, the filter input is
nonstationary, the results obtained by the use of these two methods are
somewhat inexact. For nonstationary inputs exact solutions are obtained by
using the method of least squares to design the multistage lattice filter. The
development of the exact least squares lattice (LSL) algorithm can be
traced back to the work of Morf [11] on efficient solutions to the linear
prediction problem (formulated in terms of least squares) for correlation
matrices that are non-Toeplitz but have a special structure. The class of
matrices that are non-Toeplitz and yet “close enough” to being Toeplitz in
some sense is discussed in detail by Friedlander et al. [12]. Details of the
LSL algorithm itself are given by Morf et al. [13,14], Morf and Lee [15], and
Lee et al. [16]. In this algorithm, the sum of squared forward prediction
errors and the sum of squared backward prediction errors are minimized
simultaneously at the output of each stage in the lattice filter and for all
instants of time—hence the name of the algorithm. No approximations are
made in the derivation of this algorithm. The algorithm consists of a set of
order-update and a set of time-update recursions for the forward and
backward reflection coefficients and prediction errors. However, the
mathematical derivation of these recursions is somewhat difficult. A reada-
ble account of the derivation is given by Pack and Satorius [17], Shichor
[18], and Mueller [19].
A summary of the LSL algorithm is given below (see Fig. 6.1):

1. Start with the initial conditions:

k,,(0) = 0, HfMec al OO Mra

EY) (0) = E(0)= E® (-1) =c, misiQil sey, eas

where c is an a priori estimate of the prediction-error variance.


2. For each instant of time, n = 1,2,..., compute the various zeroth-order
(m = () variables

fo(n) = bo(n) = u(n)


ESP (n) = Ef?(n) = XE(n — 1) + u?(n)

an sal)

where a,(n — 1) is the zeroth-order value of a new parameter (defined


below).
ADAPTIVE LATTICE FILTERS 193

3. Compute the various order updates in the following sequence (m =


Re ev Alp

Fd Nk re) ae lee (n)b m— ies 1)


(6.95)
ea 1)

Pal ay
an Net ay? (rtdes(7 ah)

b(n) = BrySy try ACO i(n )

pa) n —Kn—a()
ice aa Ep tad)

Ej. (n) = Ep2i(4 — 1) oe

The remarkable feature of the LSL algorithm summarized above is that the
forward and backward prediction errors, f,,(n) and 6,,(n), obey a set of
order updates that are identical in structure to the corresponding order
updates derived for the forward—backward method (based on minimization
of the mean squared error). Basically, in the forward—backward method the
order updates arise because of the assumed Toeplitz structure of the
(ensemble-averaged) correlation matrix of the filter input. On the other
hand, no such assumption is made in the derivation of the LSL algorithm,
since, in general, the (deterministic) correlation matrix of the filter input is
non-Toeplitz (see Section 5.1).
The basic difference between the LSL algorithm and the forward—back-
ward lattice algorithm (discussed in Section 6.1) is that the former algorithm
includes a new parameter a,,(n) that only enters into the lattice recursions
194 INTRODUCTION TO ADAPTIVE FILTERS

through the time update for the parameter k,,(n). Indeed, if we were to set
a,,-1(1 — 1) = 1 for all m in Eq. (6.95), the recursive formula for k,,_ (1)
in the LSL algorithm reduces to the same form as that in Eq. (6.13) in the
forward—backward lattice algorithm. The factor 1/a,,_,(m — 1) appears as
a gain factor, determining the rate of convergence of k,,_,(m). An im-
portant property of a,,(n) is that it is bounded by zero and one:

Oa, (71) = 1 for all m and n

Therefore, when a,,_,(m — 1) approaches its minimum value of zero, the


gain factor 1/a,,_,(m — 1) becomes very large, thereby amplifying the
correction term in the time update of Eq. (6.95) for k,, (7).
Friedlander [20] presents a tutorial review of the different algorithms
that have been developed for the design of adaptive lattice filters, and their
numerous applications. The paper also includes an extensive list of refer-
ences on the subject.
The adaptive lattice joint-process estimator of Fig. 6.5 was first pro-
posed by Makhoul [21]. The adaptive lattice joint-process estimator of Fig.
6.6 was first proposed by Griffiths [7].
A mathematical analysis of the convergence behavior of an adaptive
multistage lattice filter is made difficult by the highly nonlinear nature of the
adaptive process, which is the result of interaction between successive stages
of the filter. Nevertheless, some results have been reported on the conver-
gence properties of an adaptive lattice filter. Griffiths [6, 7] first conjectured
the insensitivity of the rate of convergence of an adaptive lattice filter to
eigenvalue spread. This conjecture was later confirmed by computer simula-
tion results reported by Durrani and Murukutla [22], Satorius and Alexander
[23], Satorius and Pack [24], Friedlander [20], and others.
Gibson and Haykin [25] present some results that illustrate the nature
of the decoupling action between the stages of an adaptive lattice filter for:
(1) noiseless input consisting of single sine wave, and (2) radar clutter data.
For the case of stationary inputs, Honig and Messerschmitt [26] and Honig
[27] present a detailed mathematical analysis of the convergence properties
of adaptive lattice predictors and adaptive lattice joint-process estimators
using least-mean-square (LMS) and least-squares (LS) algorithms. Com-
puter simulation results are presented, which appear to confirm the validity
of the mathematical models proposed for single-stage and multistage lattice
structures.
A detailed study of the learning characteristics of adaptive lattice
predictors and joint-process estimators operating in a nonstationary en-
vironment has not received much attention in the literature, because it is
mathematically difficult. Indeed, this is a subject for future research. Some
preliminary results (based on computer simulation) have been reported by
Gibson and Haykin [28].
ADAPTIVE LATTICE FILTERS 195

Comparisons of Algorithms
Gibson and Haykin [29] present a comparison of the performance of four
lattice-filter algorithms: (1) the forward—backward method, (2) the Burg
method, (3) the minimum method, and (4) the geometric-mean method,
using computer-simulated radar data. The data consisted of signals repre-
sentative of radar returns due to targets and weather disturbances. The
radar return due to the latter is commonly referred to as clutter, as it tends
to “clutter” up the radar display and thereby obscure the detection of a
moving target (e.g., aircraft). The generation of the weather-clutter data was
based on a generalized form of the autocorrelation function of radar clutter,
starting from a collection of randomly distributed scatterers. The target
signal was simulated by the product of a complex sine wave representing the
Doppler component related to target radial velocity, and a Gaussian en-
velope approximating the horizontal beam pattern of the radar antenna. For
a performance measure, the improvement factor was used, which is defined
as: “the signal-to-clutter ratio at the output of the system, divided by the
signal-to-clutter ratio at the input of the system, averaged uniformly over all
target radial velocities of interest.” Note that the ratio used in this calcula-
tion is a ratio of average powers. With the simulated data, it is a straightfor-
ward matter to average over all possible target radial velocities. Figure 6.7
shows the results of this computer simulation experiment, with the improve-
ment factor plotted versus the exponential weighting constant A for the case
of a lattice filter consisting of five stages. A notable feature of Fig. 6.7 is that
the curves are not smooth, but show considerable local variation as the
weighting constant A changes. The geometric-mean algorithm seems to be
particularly sensitive to this influence. On the other hand, the Burg algo-
rithm shows little of this influence. Also, on the whole, the Burg algorithm
gives the highest improvement.
Satorius and Alexander [30] have used computer simulation to compare
the performances of the two adaptive lattice joint-process estimators of Figs.
6.5 and 6.6 that were used for adaptive equalization of highly dispersive
communication channels. The results of this simulation showed that the
algorithm self-noise produced by the structure in Fig. 6.5 is considerably
greater than that of the second structure in Fig. 6.6.
In [23], Satorius and Alexander present computer simulation results on
a comparative evaluation of two adaptive equalizers for two channels
representing pure, heavy amplitude distortion. One equalizer was a tapped-
delay-line filter adapted with the LMS algorithm. The other equalizer was
the lattice joint-process estimator of Fig. 6.6 adapted by means of the
gradient-adaptive-lattice (GAL) algorithm. In all simulations, 11-tap
equalizers were used, and the data sequence applied to the equalizer input
was a random sequence with polar signaling [a, = +1 in Eq. (1.3)]. The
sampled impulse response g(k) of the channel used in all the simulations
196 INTRODUCTION TO ADAPTIVE FILTERS

iL

10

lon

factor
Improvement

FB

0 0.2 0.4 0.6 0.8 1.0


Exponential weighting constant A

Figure 6.7 Comparison of four different methods of designing a lattice filter:


FB: forward-backward method
B: Burg (harmonic mean) method
M: minimum method
G: geometric mean method

was of a raised-cosine form, defined by

3k
2 :
g(k)= 51 + cos(20 W i les ieseas

0 otherwise
where W was set equal to 3.1 or 3.3, corresponding to an eigenvalue spread
(i.e., ratio of maximum eigenvalue to minimum eigenvalue) of 11 or 21,
respectively. A Gaussian white-noise sequence of zero mean and variance
0.001 was added to the channel output to simulate the effect of receiver
noise. The results of this experiment showed that: (1) the adaptive lattice
joint-process estimator of Fig. 6.6 using the GAL algorithm has a faster rate
of convergence than the corresponding tapped-delay-line filter adapted with
the LMS algorithm, and (2) unlike this adaptive tapped-delay-line filter, the
adaptive lattice structure of Fig. 6.6 has a rate of convergence that is
practically insensitive to the eigenvalue disparity of the channel correlation
matrix.
In [24], Satorius and Pack present computer simulation results on a
comparative evaluation of the adaptive lattice joint-process estimator of Fig.
6.6 using the following two algorithms: (1) the least-squares lattice (LSL)
algorithm, and (2) the gradient-adaptive lattice (GAL) algorithm described
by Eqs. (6.48) and (6.43). The structure was again used as an adaptive
equalizer for two channels representing pure, heavy amplitude distortion, as
ADAPTIVE LATTICE FILTERS 197

nice SoG LMS (1 = 0.005)


W_-GAL (A = 0.99)
LSL (A = 0.99)
Output
squared
average
error LMS (p: = 0.05)

0 40 80 120 160 200


Number of iterations

Figure 6.8 A comparison of the LMS, GAL, and LSL algorithms for case 1. Reproduced from
Friedlander [20] by permission of the IEEE.

in [23]. This experiment showed that:


1. For both adaptive lattice algorithms, the rate of convergence was practi-
cally insensitive to the eigenvalue spread of the channel correlation
matrix.
2. The LSL algorithm converged in approximately 40 iterations for both
channels, whereas the GAL algorithm required approximately 120 itera-
tions.

Friedlander [20] has also used computer simulations to make a comparison


of three adaptive predictors using (1) the LMS algorithm, (2) the GAL
algorithm, and (3) the LSL algorithm. Stationary data were generated by
passing a white-noise sequence through a second-order all-pole filter with
the following parameters:*

caseul: A(z) =1—1.6z~!


+ 0.952?
case'2: A(z) == 1e9z5*+ 0195257
where 1/A(z) denotes the transfer function of the filter. The output data
were normalized to have unit variance. For case 1 the eigenvalue spread (the
ratio of the largest to the smallest eigenvalue) equals 10, and for case 2 it
equals 77. Figures 6.8 and 6.9 show the learning curves (1.e., the variation of
the average squared error with time) for the two cases. Each curve was
obtained by averaging 200 independent trials. The curves in each figure
correspond to the LMS algorithm (with two different values for the step-size

*There is an error in Friedlander’s paper in the coefficient of z | in case 2; this coefficient


should have the value 1.9 (as indicated above) for an eigenvalue ratio of 77.
198 INTRODUCTION TO ADAPTIVE FILTERS

1.0
|
!
0.8 i
!
I
'
0.6 I
!
I. GAL (A = 0.99)
0.4 A
ye:
LSL (A = 0.99)

0.2 tr nt LMS (11 = 0.005)


Output
squared
average
error
SRO

0.0

Number of iterations

Figure 6.9 A comparison of the LMS, GAL and LSL algorithms for case 2. Reproduced from
Friedlander [20] by permission of the IEEE.

parameter), the GAL algorithm, and LSL algorithm. The two lattice algo-
rithms were run with the exponential weighting constant A = 0.99. The
LMS algorithm was run with the two step sizes pw = 0.005 and p = 0.05.
Examination of these two figures leads to the following observations:

1. The two adaptive lattice algorithms converge considerably faster than the
LMS algorithm. The LSL algorithm is the fastest, reaching steady-state
conditions in about 10 iterations.
2. The rate of convergence of the LMS algorithm is highly sensitive to
variations in the eigenvalue spread, whereas the adaptive lattice algo-
rithms are practically insensitive to it.

Medaugh and Griffiths [31,32] present a comparison of two adaptive


lattice predictors that were driven by an autoregressive input. One predictor
used the LSL algorithm, and the other used the forward—backward lattice
algorithm with order updates for the forward and backward prediction-error
variances. Except for a time shift of one iteration in some quantities used in
the order-update recursions, the second algorithm used by Medaugh and
Griffiths is identical to version 2 of the forward—backward lattice algorithm
described in Section 6.1. The results reported by Medaugh and Griffiths
show close conformity of estimated mean reflection coefficient and squared
prediction error for the exact least-squares and the forward—backward
lattice algorithms. Similar findings are reported by Honig [27] and Honig
and Messerschmitt [33].
ADAPTIVE LATTICE FILTERS 199

Implementations
Lawrence and Tewksbury [34] discuss the issue of using multiprocessors (i.e.,
arrays of interconnected processors) for implementing adaptive lattice filter
algorithms. In this form of high-density digital hardware considerable
emphasis is placed on memory speed and size. With the multiprocessors
structured in a pipelined configuration, the total processing /atency must be
kept to one sample period. The latency of a digital signal processing device
is defined as the time between the arrival of the first binary digit (bit) of the
input signal at the input port of the device and the time when the last bit of
the answer appears at the output port of the device. In the case of a lattice
structure, in particular, the b,,() samples used to compute the f,,(n) have
to be taken before the sample-period delay, allowing the computation of the
f,,(”) path one sample period earlier. An efficient approach to do this is to
use redundant storage, as illustrated in Fig. 6.10, where the number of
sample period delays per stage is doubled.
Fellman and Brodersen [35] describe the integration of an adaptive
lattice filter in MOS large-scale-integration (LSI) technology. The architec-
ture used in the implementation is designed to optimally exploit the
advantages of analog and digital approaches. In particular, switched-capaci-
tor circuitry is used to perform the filtering operation, and digital circuitry is
used to perform the adaptation.
Satorius et al. [36] present some preliminary results on the implementa-
tion of adaptive prediction-error filters with fixed-point arithmetic. Both
tapped-delay-line and lattice structures were considered. Their performance
is compared in terms of the number of bits required to reach a prescribed

In (1)

Figure 6.10 Illustrating the use of redundant storage for modifying the lattice structure.
200 INTRODUCTION TO ADAPTIVE FILTERS

steady-state error power. The results of fixed-point simulations (assuming


2’s-complement arithmetic) show that:

(1) The steady-state error power for each filter increases dramatically below
a minimum wordlength. This occurs when the difference between
successive time updates of the filter coefficients are less than the quantiz-
ing level of the filter, at which point the filter coefficients stop adapting.
This is in agreement with an earlier finding reported by Gitlin et al. (see
reference 34 of Chapter 4).
(2) Different filter structures, which behave identically when implemented
with infinite precision, can perform quite differently when finite preci-
sion is used.

Applications
Satorius and Alexander [23], Satorius and Pack [24], and Mueller [19]
discuss the application of the adaptive lattice joint-process estimator of Fig.
6.6 to adaptive equalization for data transmission over telephone channels.
Griffiths [7] describes an adaptive filter structure for multichannel
noise-cancelling applications, which is a generalization of the adaptive
lattice joint-process estimator of Fig. 6.6. Reddy et al. [37] present a study
of the lattice form of implementing an adaptive line enhancer, using the
exact least-squares lattice algorithm.
Carter [38] presents a preliminary investigation of adaptive lattice filter
algorithms (based on the forward—backward method and the Burg method)
applied to both real and artifically generated data. Makhoul and Cosell [10]
have investigated the adaptive lattice analysis of speech, using the (gener-
alized) adaptive estimate of the reflection coefficient given in Eq. (6.93).
Makhoul and Cosell deal exclusively with real speech signals, and use the
subjective judgment of the human listener as the criterion of goodness. The
results of this investigation showed that, for applications in speech analysis
and synthesis, the convergence of the adaptive lattice predictor was fast and
efficient enough for its performance to be indistinguishable from that of the
optimum (but more expensive) adaptive autocorrelation method developed
by Barnwell [39,40]. According to this method an infinite real-pole time
window is used to compute the autocorrelation function of the speech signal
for lags equal to 0,1,..., M, recursively at each instant n, and then this set
of values is used to solve the normal equations (for forward linear predic-
tion) for the M tapped-delay-line predictor coefficients.
Morf and Lee [41] discuss the use of the exact least squares lattice
(LSL) algorithm as a tool for modelling speech signals. They exploit a novel
feature of the algorithm [namely, the variation of the parameter a,,(”) with
changes in the statistics of the input signal] as a sensitive pitch detector.* It

*Morf and Lee [41] define the gain factor as 1/[1 — y,,(n)], where y,,(7) = 1 — a@,,4,(1).
ADAPTIVE LATTICE FILTERS 201

is found that a,,(”) takes on low values (close to zero) for non-Gaussian
components in the input signal. This causes the gain factor 1/a,,_,(n — 1)
appearing in the time-update recursion of Eq. (6.95) to assume a large value
for non-Gaussian components in the input, which in turn causes the lattice
parameters k,,_,(n), EY(n) and E“(n) to change quickly. Accordingly,
the gain factor 1/a,,_,(n — 1) may be used to track fast changes in the
statistics of the input signal.
Reddy et al. [42] and Soong and Peterson [43] discuss the use of lattice
algorithms for adaptive echo cancellation.
Porat and Kailath [44] present two normalized lattice algorithms for
least-squares identification of finite-impulse-response models. The one algo-
rithm, known as the growing-memory algorithm, is recursive in both time
and order. On the other hand, Marple’s algorithm, which basically solves
the same problem, is not recursive in time [45]. The second algorithm,
known as the sliding-memory algorithm, is suited for identifying time-vary-
ing models, for which neither Marple’s algorithm nor the growing-memory
algorithm is useful.
Gibson and Haykin [46, 47] present the results of an experimental study
(using real radar data) into the use of an adaptive lattice filter for the
improved detection of a moving target in the presence of clutter. It is
assumed that the target and clutter have different radial velocities. Use 1s
made of the normal condition that the clutter returns have constant statisti-
cal characteristics over a large area (and thus a large number of samples in
the time series). On the other hand, target returns normally cover a very
small area limited by the beamwidth of the radar antennas (typically, about
20 samples).
Metford and Haykin [48] describe a model-dependent detection algo-
rithm, and present experimental results (based on actual data obtained from
a coherent radar in an air traffic control environment). The results show an
improvement of 3 to 5 dB over the classical design of moving-target
detectors for radar surveillance, resulting from maximization of target to
noise plus clutter (power) ratio. Two different implementations of the
algorithm are considered, one using the LSL algorithm and the other using
the Kalman filtering algorithm that assumes a random walk state model.
The basic theory of this detection algorithm is described in [49].

REFERENCES

1. J. Makhoul, “New Lattice Methods for Linear Prediction,” Proceedings of the IEEE
International Conference on Acoustics, Speech, and Signal Processing 76 (Philadelphia,
April 1976).
2. J. Makhoul, “Stable and Efficient Lattice Methods for Linear Prediction,” HEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-25, pp. 423-428, October 1977.
202 INTRODUCTION TO ADAPTIVE FILTERS

3) F. Itakura and S. Saito, “Digital Filtering Techniques for Speech Analysis and Synthesis,”
paper 25-C-1, Proceedings of the 7th International Cong. Acoustics (Budapest, 1971), pp.
261-264.
_ J. P. Burg, “Maximum Entropy Spectral Analysis,” Ph.D. Dissertation, Stanford Univer-
sity, Stanford, California, 1975.
_M. D. Srinath and M. M. Viswanathan, “Sequential Algorithm for Identification of
Parameters of an Autoregressive Process,” IEEE Trans. Automatic Control, vol. AC-20,
pp. 542-546, August 1975.
_L. J. Griffiths, “A Continuously-Adaptive Filter Implemented as a Lattice Structure,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing 77 (Hartford, Connecticut, May 1977), pp. 683-686.
_L. J. Griffiths, “An Adaptive Lattice Structure for Noise-Cancelling Applications,” Pro-
ceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(Tulsa, Oklahoma, 1978), pp. 87—90.
. J. Makhoul and R. Viswanathan, “Adaptive Lattice Methods for Linear Prediction,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing (Tulsa, Oklahoma, 1978), pp. 83-86.
. T. S. Durrani and N. L. M. Murukutla, “Recursive Algorithm for Adaptive Lattices,”
Electronics Letters, vol. 15, pp. 831-833, December 1979.
10. J. Makhoul and L. K. Cosell, “Adaptive Lattice Analysis of Speech,” IEEE Trans. Circuits
and Systems, vol. CAS-28, pp. 494-499, June 1981.
Ne M. Morf, “Fast Algorithms for Multivariable Systems,” Ph.D. Dissertation, Stanford
University, Stanford, California, 1974.
. B. Friedlander, M. Morf, T. Kailath, and L. Ljung, “ New Inversion Formulas for Matrices
Classified in Terms of Their Distance from Toeplitz Matrices,” Linear Algebra and Its
Applications, vol. 27, pp. 31-60, 1979.
33 M. Morf, A. Vieira, and D. T. Lee, “Ladder Forms for Identification and Speech
Processing,” Proc. 1977 IEEE Conference on Decision and Control (New Orleans, Decem-
ber 1977), pp. 1074-1078.
. M. Morf, D. T. Lee, and A. Vieira, ““Ladder Forms for Estimation and Detection,”
Abstracts of Papers, IEEE Int. Symp. Information Theory” (Ithaca, New York, October
1977), pp. 111-112.
13). M. Morf and D. T. Lee, “Recursive Least Squares Ladder Forms for Fast Parameter
Tracking,” Proc. 1978 IEEE Conference on Decision and Control (San Diego, California,
January 1979), pp. 1362-1367.
16. D. T. L. Lee, M. Morf, and B. Friedlander, “Recursive Least Squares Ladder Estimation
Algorithms,” IEEE Trans. Circuits and Systems, vol. CAS-28, pp. 467-481, June 1981.
Mie J. K. Pack and E. H. Satorius, “Least Squares, Adaptive Lattice Algorithms,” Technical
Report 423, Naval Ocean Systems Center, San Diego, California, April 1979.
18. E. Shichor, “Fast Recursive Estimation Using the Lattice Structure,” Bell System Tech. J.,
vol. 61, pp. 97-115, January 1982.
Wy M. S. Mueller, “Least-Squares Algorithms for Adaptive Equalizers,” Bell System Tech. J.,
vol. 60, pp. 1905-1925, October 1981.
20.= B. Friedlander, “Lattice Filters for Adaptive Processing,” Proc. IEEE, vol. 70, pp.
829-867, August 1982.
oa J. Makhoul, “A Class of All-Zero Lattice Digital Filters: Properties and Applications,”
IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-26, pp. 304-314, August
1978. :
. T. S. Durrani and N. L. M. Murukutla, “Convergence of Adaptive Lattice Filters,”
Electronics Letters, vol. 15, pp. 633-635, December 1979.
3. E. H. Satorius and S. T. Alexander, “Channel Equalization Using Adaptive Lattice
Algorithms,” IEEE Trans. Communications, vol. COM-27, pp. 899-905, June 1979.
ADAPTIVE LATTICE FILTERS 203

24 E. H. Satorius and J. D. Pack, “Application of Least Squares Lattice Algorithms to


Adaptive Equalization,’ IEEE Trans. Communications, vol. COM-29, pp. 136-142,
February 1981.
Jy. C. Gibson and S. Haykin, “Learning Characteristics of Adaptive Lattice Filtering Algo-
rithms,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-28, pp. 681-691,
1980.
26. M. L. Honig and D. G. Messerschmitt, “Convergence Properties of an Adaptive Digital
Lattice Filter,” IEEE Trans. Circuits and Systems, vol. CAS-28, pp. 482-493, June 1981.
par M. L. Honig, “Convergence Models for Joint Process Estimators and Least Squares
Algorithms,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-31, pp.
415-425, April 1983.
28. C. Gibson and S. Haykin, “Nonstationary Learning Characteristics of Adaptive Lattice
Filters,” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
Processing (Paris, May 1982), pp. 671-674.
29: C. Gibson and S. Haykin, “A Comparison of Algorithms for the Calculation of Adaptive
Lattice Filters,” Proceedings IEEE International Conference on Acoustics, Speech, and
Signal Processing 80 (Denver, Colorado, April 1980), pp. 978-983.
30. E. H. Satorius and S. T. Alexander, “Rapid Equalization of Highly Dispersive Channels
Using Adaptive Lattice Algorithms,” Technical Report 249, Naval Ocean Systems Center,
San Diego, California, April 1978.
Sie R. S. Medaugh and L. J. Gmriffiths, “A Comparison of Two Fast Linear Predictors,”
Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing
(Atlanta, Georgia, April 1981).
. R. S. Medaugh and L. J. Griffiths, “Further Results of a Least Squares and Gradient
Adaptive Lattice Algorithm Comparison,” Proceedings IEEE International Conference on
Acoustics, Speech and Signal Processing (Paris, May 1982), pp. 1412-1415.
. M. L. Honig and D. G. Messerschmitt, “Comparison of LS and LMS Lattice Predictor
Algorithms Using Two Performance Criteria,” submitted for publication in IEEE Trans.
Acoustics, Speech, and Signal Processing.
. V. B. Lawrence and S. K. Tewksbury, “Multiprocessor Implementation of Adaptive
Digital Filters,” IEEE Trans. Communications, vol. COM-31, pp. 826-835, June 1983.
. R. D. Fellman and R. E. Brodersen, “A Switched-Capacitor Adaptive Lattice Filter,”
IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-31, pp. 294-304, February
1983.
36. E. H. Satorius, S. W. Larisch, S. C. Lee, and L. J. Griffiths, “Fixed-Point Implementation
of Adaptive Digital Filters,’ Proceedings IEEE International Conference on Acoustics,
Speech and Signal Processing 83 (Boston, April 1983), pp. 33-36.
Shih V. U. Reddy, B. Egardt, and T. Kailath, “Optimized Lattice-Form Adaptive Line En-
hancer for a Sinusoidal Signal in Broad-Band Noise,” IEEE Trans. Circuits and Systems,
vol. CAS-28, pp. 542-550, June 1981.
38. T. E. Carter, “Study of an Adaptive Lattice Structure for Linear Prediction Analysis of
Speech,” Proceedings IEEE International Conference on Acoustics, Speech, and Signal
Processing (Tulsa, Oklahoma, April 1978), pp. 27-30.
39 T. P. Barnwell, “Recursive Autocorrelation Computation for LPC Analysis,” Proceedings
IEEE International Conference on Acoustics, Speech, and Signal Processing 77 (Hartford,
Connecticut, May 1977), pp. 1-3.
40. T. P. Barnwell, “Recursive Windows for Generating Autocorrelation Coefficients for LPC
Analysis,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp.
1062-1066, October 1981.
41. M. Morf and D. T. L. Lee, “Fast Algorithms for Speech Modeling,” Technical Report No.
M303-1, Stanford University, Stanford, December 15, 1978.
V. U. Reddy, T. J. Shan, and T. Kailath, “Application of Modified Least-Squares
204 INTRODUCTION TO ADAPTIVE FILTERS

Algorithms to Adaptive Echo Cancellation,” Proceedings IEEE International Conference


on Acoustics, Speech and Signal Processing 82 (Boston, April 1983), pp. 53-56.
43. F. K. Soong and A. M. Peterson, “Fast Least-Squares (LS) in the Voice Echo Cancellation
Application,” Proceedings IEEE International Conference on Acoustics, Speech, and
Signal Processing 82 (Paris, May 1982), pp. 1398-1403.
44. B. Porat and T. Kailath, “Normalized Lattice Algorithms for Least-Squares FIR System
Identification,” IEEE Trans. Acoustics. Speech, and Signal Processing, vol. ASSP-31, pp.
122-128, February 1983.
45. S. L. Marple, Jr., “Efficient Least Squares FIR System Identification,’ IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. ASSP-29, pp. 62-73, February 1981.
46. C. Gibson and S. Haykin, “Performance Studies of Adaptive Lattice Prediction-Error
Filters for Target Detection in a Radar Environment Using Real Data,” Proceedings IEEE
International Conference on Acoustics, Speech and Signal Processing (Atlanta, Georgia,
1981), pp. 1054-1057.
47. C. Gibson and S. Haykin, “Radar Performance Studies of Adaptive Lattice Clutter-Sup-
pression Filters,” Proc. IEE. (London), vol. 130, Part F, August 1983.
48. P. A. S. Metford and S. Haykin, “Experimental Analysis of an Innovations-Based
Detection Algorithm for Radar Surveillance,” to be published in Proc. IEE (London).
49. P. A. S. Metford, S. Haykin, and D. P. Taylor, “An Innovations Approach to Discrete-Time
Detection Theory,” IEEE Trans. Information Theory, vol. IT-28, pp. 376-380, 1982.
APPENDIX
ONE
EIGENVALUES AND EIGENVECTORS

The eigenvalues and eigenvectors of a square matrix are of fundamental


importance in much of matrix theory [1, 2, 3, 4]. In adaptive-filter theory the
eigenvalue problem arises in matters relating to the correlation matrix of a
stationary process. In this appendix we discuss the definitions of eigenvalues
and eigenvectors and their properties from first principles.

Al.1 DEFINITIONS OF EIGENVALUES AND EIGENVECTORS

Let R be an M-by-M matrix with real elements. We wish to find an M-by-1


vector q that satisfies the condition

Rq = Aq (Al1.1)
for some constant A. This condition states that the vector q is transformed
to the vector Aq by the transformation R. Since A is a constant, the vector q
therefore has special significance in that it is left invariant in direction (in
the M-dimensional space) by a linear transformation. For a typical M-by-M
matrix R there will be M such vectors. To show this, we first rewrite Eq.
(A1.1) in the form

(R — Al)q = 0 (A1.2)
where I is the identity matrix. Equation (A1.2) has a nonzero solution in the
vector q if and only if the determinant of the matrix R — AI equals zero:
det(R — AI) = 0 (A1.3)
205
206 INTRODUCTION TO ADAPTIVE FILTERS

This determinant, when expanded, is clearly a polynomial in A of degree M.


We thus find that, in general, Eq. (Al.3) has M distinct roots, real or
complex (for it to have multiple roots is a restriction on the square
matrix R). Correspondingly, Eq. (Al.1) has M solutions in the vector q.
Equation (A1.3) is called the characteristic equation of the square matrix
R. Let A,,A5,.-., A, denote the M distinct roots of this equation, and let
M-by-1 vectors.q,,q>,---,Q, denote the M corresponding solutions of Eq.
(Al.1). The values A, are called the eigenvalues of the square matrix R, and
the vectors q, are called the associated eigenvectors.

Al.2 PROPERTIES OF EIGENVALUES AND EIGENVECTORS

Property 1. Let q,,q5,-.-,Q,y be the eigenvectors corresponding to the


distinct eigenvalues X,,A5,..-,Ay of an M-by-M matrix R, respectively.
Then the eigenvectors q,,4>,---,Qy are linearly independent.
We say that the eigenvectors q,,q>,---,Q,, are linearly dependent if
there are scalars v,,U5,...,U,,, not all zero, such that
M
yy U.dn=.0 (A1.4)
i=1

If no such scalars exist, we say that the eigenvectors are /inearly independent.
We will prove the validity of Property 1 by contradiction. Suppose that
Eq. (A1.4) holds for certain scalars v,. Repeated multiplication of Eq. (A1.4)
by the matrix R, and the use of Eq. (A1.1), yield the following set of M
equations:
M
eoNe
qe 05, Sie oe M (A1.5)
ae
This set of equations may be written in the form of a single matrix equation
as follows:

[orgy U9Qa5-+++5 Uudy |S =U (A1.6)


where

Int emt peace!


ks dM, ae we
S=|. : i ; wy (AL?)

ee M Nu as iva ;

The matrix S is called a Vandermonde matrix. When the i, are distinct, the
Vandermonde matrix S is nonsingular. Therefore, we may postmultiply Eq.
EIGENVALUES AND EIGENVECTORS 207

(A1.6) by the inverse matrix S~’, obtaining

[oyqy, UGy,-.-, Uuda] = 0


Hence, each column vu,q,; = 0. Since the eigenvectors q, are not zero, this
condition can only be satisfied if the v, are all zero. This shows that the
eigenvectors q,,q>,--.,Q,, cannot be linearly dependent if the correspond-
ing eigenvalues A,,A5,..., Aq, are distinct. In other words, they are linearly
independent.
We may put this property to an important use by having the linearly
independent eigenvectors q,,q>,.--,4,y serve as a basis for the representa-
tion of an arbitrary vector h with the same dimension as the eigenvectors
themselves. In particular, we may express the arbitrary vector h as a linear
combination of the eigenvectors q,,q5,.--, 4, as follows:
M
h= v4, (A1.8)
i=1
where U,,U;,..., Uy are constants. Suppose now we apply a linear transfor-
mation to the vector h by premultiplying it by the matrix R, obtaining
M
Rh = ) o,Rq,
ial

By definition, we have Rq, = A,q,. Therefore, we may express the result of


this linear transformation in the equivalent form
M
Rh= )) v;A 4;
1
Thus, in the eigenvector representation, we see that when a linear transfor-
mation is applied to an arbitrary vector, the eigenvectors remain indepen-
dent of each other, and the effect of the transformation is simply to multiply
each eigenvector by its respective eigenvalue.

Property 2. Let q),q5,---,4,4 be the eigenvectors corresponding to the


distinct eigenvalues \,,\5,...,Ay of an M-by-M symmetric matrix R, re-
spectively. Then the eigenvectors q,,Q,--+,Qy are orthogonal to each other.
Let q, and q, denote any two eigenvectors of the M-by-M symmetric
matrix R. We say that these two eigenvectors are orthogonal to each other if
their inner product 1s zero, that is,

qa, = 0, i#xj (A1.9)

Using Eq. (A1.1), we may express the conditions on the eigenvectors q;


and q ; as follows:

Rq,; = 4,4; (A1.10)


208 INTRODUCTION TO ADAPTIVE FILTERS

and
Rq, = 4,4; (A1.11)
Premultiplying both sides of Eq. (A1.10) by the transposed vector q'. we get
qi Rq,; = A,q/4, (A1.12)
The matrix R is symmetric, by hypothesis. That is, R’ = R. Hence, taking
the transpose of both sides of Eq. (A1.11), we get
qiR=),q/ (A1.13)
Postmultiplying both sides of Eq. (A1.13) by the vector q,, we get
qi Rq; = 4,444, (A1.14)
Subtracting Eq. (1.14) from (A1.12) we thus get
(A; —A,)aZq; = 0 (A1.15)
Since the eigenvalues of the matrix R are distinct, by hypothesis, we have
A, # A,. Accordingly, the condition of Eq. (A1.15) holds if and only if
qq; = 0, i#j (A1.16)
which is the desired result.
Note that both Property 1 and Property 2 apply only when the
eigenvalues of matrix R are distinct. Property 1 on the linear independence
of the associated eigenvectors applies to any square matrix R. On the other
hand, Property 2 on the orthogonality of the associated eigenvectors applies
only when the matrix R is symmetric. Note also that the orthogonality of
the eigenvectors for a symmetric matrix implies their linear independence.

Property 3. Let q),q>,---,Q,, be the eigenvectors corresponding to the


distinct eigenvalues \,,3,..., Aw of an M-by-M symmetric matrix R, re-
spectively. Define the M-by-M matrix
Q-= [q,.45 prey dar]
where

Define the M-by-M diagonal matrix


A = diag(A,,A5)..<; Na?)
Then the original matrix R may be diagonalized as follows:
Q’RQ=A
The condition that q/q; = 1, i = 1,2,..., M, requires that each eigen-
vector be normalized to have a length of one. The squared length or squared
EIGENVALUES AND EIGENVECTORS 209

norm Of a vector q;, is defined as q/q,, the inner ron of q; with itself. The
orthogonality condition that the inner product q/q ; = 9, i # J, follows from
Property 2 when the matrix R is symmetric with distinct eigenvalues. When
both of these conditions are satisfied, that is,
Ti ae ee
>
q49; ie AN (A1.17)

we say the eigenvectors q,,q>,.--,@,, form an orthonormal set. By defini-


tion, the eigenvectors q),q5,.-.,q,, Satisfy the equations [see Eq. (A1.1)]
Rq,; = A,q;, fee (A1.18)
The M-by-M matrix Q has as its columns the orthonormal set of eigenvec-
tOFS G5; q>5-%- /Qyyi
Q = [a.45,---. dy] (A1.19)
The M-by-M diagonal matrix A has the eigenvalues A,,A5,...,A,, for the
elements of its main diagonal:
A = diag(A,,A3,..., Aa) (A1.20)
Accordingly, we may rewrite the set of M equations (A1.18) as a single
matrix equation as follows:
RQ =QA (A1.21)
The matrix Q has the following property:

qi
q)
Q’Q=] ~ |[di.42,----4a/]
du
qi, 442° Gg
ee ee bP (A122)

Gv4i G42 °° 4u4u


Substituting the conditions of Eq. (A1.17) in (A1.22), we find that the
matrix product Q’Q equals the identity matrix:
Q’Q=I (A1.23)
Equivalently, we may write
Q°1=Q’ (A1.24)
That is, the matrix Q is nonsingular with inverse Q ' equal to the transpose
of Q. A matrix that has this property is called a unitary matrix.
210 INTRODUCTION TO ADAPTIVE FILTERS

Thus, premultiplying both sides of Eq. (A1.21) by the transposed matrix


Q’ and using the property of Eq. (A1.23), we get the desired result
Q™RQ=A (A1.25)
This transformation is called the unitary similarity transformation.
We have thus proved an important result: a symmetric matrix R (with
distinct eigenvalues) may be diagonalized by a unitary similarity transfor-
mation. Furthermore, the matrix Q that is used to diagonalize R has as its
columns an orthonormal set of eigenvectors for R. The resulting diagonal
matrix A has as its diagonal elements the eigenvalues of R.

Property 4. Let \,,A,..., Ay be the eigenvalues of an M-by-M matrix


R. Then the sum of these eigenvalues equals the trace of matrix R.
The trace of a square matrix is defined as the sum of the diagonal
elements of the matrix. Taking the trace of both sides of Eq. (A1.25), we
may write
tr[Q7 RQ] = tr[A] (A1.26)
The diagonal matrix A has as its diagonal elements the eigenvalues of R.
Hence, we have
M
tr[A] = > A, (A1.27)

To simplify the trace of Q7 RQ, we use the following rule in matrix algebra.
Let A be an M-by-N matrix and B be an N-by-M matrix. Then the trace of
the matrix product AB equals the trace of BA. Thus, identifying Q’ with A
and RQ with B, we may write

tr[Q” RQ] = tr[RQQ’]


However, QQ’ equals the identity matrix I [this follows from premultiplying
both sides of Eq. (A1.27) by Q]. Hence, we have
tr[Q’ RQ] = tr[R] (A1.28)
Accordingly, substituting Eqs. (A1.29) and (A1.28) in (A1.26), we get
M
tr[R] = > A, (A1.29)
ie=I

We have thus shown that the trace of a matrix R equals the sum of its
eigenvalues. In proving this result we used a property that requires the
matrix R to be symmetric with distinct eigenvalues; nevertheless, the result
applies to any square matrix.

Property 5. Let A,,A,..., Ay be the eigenvalues of a positive definite


M-by-M matrix R. Then all these eigenvalues are real and positive.
EIGENVALUES AND EIGENVECTORS 211

To prove this property, we first use Eq. (A1l.1) to express the condition
on the jth eigenvalue A, as
Rq,=A,q;, i=1,2,...,M (A1.30)
Premultiplying both sides of this equation by q/, the transpose of eigenvec-
tor q;, we get
qi Rg, =A,q'g,, §=1,2)5:5M (A1.31)
The inner product q/q, is a positive scalar, representing the squared length
of the eigenvector q,, that is, q’q, > 0. We may therefore divide both sides
of Eq. (A1.31) by q/q, and so express the ith eigenvalue A, as the ratio

ne:
. qi Rq;
‘ ol Patri” | (A1.32)
q'q;

When the matrix R is positive definite, the quadratic form q/Rq, in the
numerator of this ratio is positive, that is, q/Rq, > 0. Therefore, it follows
from Eq. (A1.32) that A; > 0 for all 7. That is, all the eigenvalues of a
positive definite matrix are real and positive.

REFERENCES

1. S. Perlis, “Theory of Matrices” (Addison-Wesley, 1952).


2. G. Hadley, “Linear Algebra” (Addison-Wesley, 1961).
3. F. A. Graybill, “Introduction to Matrices with Applications in Statistics” (Wadsworth,
1969).
4. G. W. Stewart, “Introduction to Matrix Computations” (Academic Press, 1973).
APPENDIX

rwo
CONVOLUTION

Consider the two sequences {h(n)} and {u(n)}, n=0,1,2,.... Both


sequences are assumed to be zero for negative values of time n. The
convolution of these two sequences is defined by the sum

y(n) = Sener), Meas eee (A2.1)


k=0
For example, the sequence {h(n)} may represent the unit-sample or impulse
response of a linear filter, and the sequence {u()} may represent the
excitation applied to the filter input. The sequence { y()} thus represents
the response produced at the filter output.
In this appendix we show that the z-transform of { y()} is equal to the
product of the z-transforms of {h(n)} and {u(n)}.
The one-sided z-transform of the sequence {h(n)} is defined by [1, 2]

H(z) = Shwe (A2.2)


n=0

where z ' is the unit-delay operator. Similarly, we may define the one-sided
z-transforms of the sequences { u(n)} and { y(7)} as follows:

U(z)= 3 u(n)z" (A2.3)


n=(0,.

and

¥(z) = Dt y(n)z" (A2.4)


n=0

212
CONVOLUTION 213

Multiplying both sides of the convolution sum of Eq. (A2.1) by z~” 5

and summing with respect to n, we get

E y(n)e* = : y ACS OK oa am (A2.5)


n=0 k=0

The summation on the left-hand side of Eq. (A2.5) is recognized as the


z-transform of the sequence {y(m)}. Interchanging the order of the double
summation on the right-hand side of this equation, we may thus write
(oe) oe)

V(z)= ohh) Salk)z" (A2.6)


k=0 n=0

The inner summation on the right-hand side of Eq. (A2.6) represents the
z-transform of the sequence {u(n)} delayed by k samples. Hence, using the
time-shifting property of the z-transform, we have

Voun—k)z "=z *U(z) (A2.7)


n=0

This result is readily proved simply by substituting / for n —k in the


summation on the left-hand side of Eq. (A2.6). Accordingly, we may rewrite
Eq. (A2.6) in the simplified form

Y(z)= : h(k)z *U(z)


k=0

= U(z) H(z) (A2.8)

We have thus proved that the z-transform of the convolution of two


sequences is equal to the product of their individual z-transforms.

REFERENCES

1. E. L. Jury, “Theory and Application of the z-Transform Method” (Wiley, 1964).


2. A. V. Oppenheim and R. W. Schafer, “Digital Signal Processing” (Prentice-Hall, 1975).
eT OMADy

é,)

7 ®
ayyi ¢

. et

2-0 mt es,
’ ane
o@ 4 —c oi pei otihe ies og =
e fe]
»
Sis 26 =

sot
= Petal

»
a
ee
-
«@

a° :
Ae aia
al
——

of | 4

- as
—_
-
A
img Gf
_
INDEX

Adaptivity, 2 Burg method of designing lattice predictor,


Adaptive differential pulse-code modulation, 168
16 adaptive implementation, 174
Adaptive equalization, 7, 20, 124, 157, 200 block implementation, 172
tracking mode, 10
training mode, 10 Characteristic equation, 35, 206
Adaptive lattice predictor, 162 Cholesky decomposition, 82
Adaptive line enhancer (ALE), 18, 21, 124 Clipped LMS algorithm, 121
Adaptive noise canceller, 21 Coefficient vector, 95
All-pole filter, 64 Coefficient-error vector, 98
All-zero filter, 75 transformed, 98
Amplitude response, 57 Coefficient vector lag, 118
Augmented normal equations, Coefficient vector noise, 117
backward linear prediction, 48 Correlation matrix,
forward linear prediction, 43 deterministic, 132
Autocorrelation function, 26 ensemble-averaged, 32
Average mean squared error, 112 Convolution sum, 26, 57, 212
Autoregressive model, 64, 73 Cross-correlation function, 27
adaptive, 125 Cross-correlation vector,
asymptotic stationarity, 69, 73 deterministic, 133
order one, 65, 83 ensemble-averaged, 31
order two, 69, 84
Decision feedback equalizer, 159
Backward linear predictor, 44 Digital representation of speech, 11
Backward method of designing lattice LPC Vocoder, 13
predictor, 171 waveform coders, 14
Backward prediction error, 44
Burg formula, 169 Echo canceller, 16, 21, 125, 201
properties, 170 Eigenvalues, 35, 205

215
216 INTRODUCTION TO ADAPTIVE FILTERS

Eigenvalue spread, 106 complex form, 118


Figenvectors, 35, 205 convergence in the mean, 110
Error performance surface, 28 convergence in mean square, 114
canonical form, 37 implementations, 121
Error signal, 27 misadjustment, 115
Excess mean squared error, 112 normalized, 119
Expectation operator, 26 operation in nonstationary environment,
116
Filters time constant, 116
digital, 4 Levinson-Durbin recursion, 50, 53, 85
infinite-impulse response (IIR), 4 Linear filtering problem, 26
sampled-data, 4 Linear prediction, 40
tapped-delay-line, 4 Linear predictive coding (LPC), 12, 20,
Filtering, 1, 25 75
Forward-backward method of designing
lattice predictor, 162 Matrix inversion lemma, 138
Forward linear predictor, 41 Maximum likelihood estimate, 137
Forward method of designing lattice Mean value, 26
predictor, 171 Method of steepest descent, 93
Fractionally spaced equalizer, 125 convergence, 97, 100
Frequency response, 57 time constant, 101
Minimum mean-squared error, 29
Gain vector, 140 Minimum method of designing lattice
Geometric-mean method of designing predictor, 172
lattice predictor, 171 Minimum residual sum of squares, 133
Gradient adaptive lattice (GAL) algorithm, Misadjustment, 115
hr a0) Model-dependent detection, 201

Harmonic mean method, see Burg method Nonnegative definite matrix, 34


Nonsingular matrix, 34
Inverse filter, 64 Normal equations,
deterministic, 129, 132
Joint-process estimation using lattice ensemble averaged, 27, 29
predictor, 180
One-step prediction, 41
Kalman filter, 151 Orthogonality of backward prediction
assuming random-walk model, 152 errors, 78

Latency, 199 Partial correlation (PARCOR) coefficient,


Lattice predictor, 78 see reflection coefficient
adaptive, 162 Phase response, 57
generalized, 87 Positive definite matrix, 34
implementations, 199 Prediction, 1, 25
normalized, 86 Predictor, 40
synthesis structure based on lattice, 82 Prediction-error filter, 43
Learning curve, 103 backward operation, 49
Least squares estimator, 133 minimum phase property, 56, 62
properties, 133 ‘whitening property, 63, 73
Least squares lattice (LSL) algorithm, 192 Principle of orthogonality, 29
comparison with LMS algorithm, 196
LMS algorithm, 20, 108 Reflection coefficient, 53
average mean squared error, 112 backward, 163
clipped, 121 forward, 163
INDEX 217

Recursive least squares (RLS) algorithm, Toeplitz matrix, 34


139, 142 Trace of a matrix, 210
comparison with LMS algorithm, 147 Transfer function, 57
fast, 153 True estimation error, 142
operation in nonstationary environment,
149
Unit-delay operator, 4
speed of convergence, 145
Unitary matrix, 36, 209
Unitary similarity transformation, 36, 210
Singular matrix, 35
Unvoiced speech sound, 11
Smoothing, 1
Speech production process, model, 11
Step-size parameter, 96 Vandermonde matrix, 206
Stochastic gradient algorithm, see LMS Vocoder, 11
algorithm Voiced speech sound, 11
Symmetric matrix, 33
Symmetric smoother, 153
System identification, 6, 155 Sys Somme
Wiener filter, 2, 24
;
Tap input vector, 33 Wiener- ion,
iener-Hopf equation, dis
discrete form,
form, 3 32

Tapped-delay-line filter, 4
adaptive, 90, 129 z-transform, one-sided, 212
, v=
ca oe oe

7 i

CU ee
7 ), Gag = 7
. sy ie ie tt
vile ‘1 ae, ei’

a ’
© 3e 2S ; ‘ ;
aga et
Pa) 7 1 y
§ a Py one
‘e)
- - ’

2 “ >

ee ee ve 7

i. 7

7
i
i A .
Also available

OPTIMUM SIGNAL PROCESSING


An Introduction
y |
Sophocles Orfanidis, Rutgers Universit
In this authoritative text/reference on optimum signal processing
concepts and procedures, Sophocles Orfanidis covers block
processing methods as well as real-time adaptive processing f7
techniques for optimum filtering, linear prediction, and high- |
resolution spectral analysis. The appendix features a library of |
FORTRAN 77 subroutines for performing many of the operations
required in designing optimal filters. With its practical orientation, ©
clear prose, and unified presentation, Optimum Signal Proces-
sing will be the definitive resource on the algorithmic structures
one must understand to implement and use optimum signal
processing methods.

“The purpose of this book is to provide an introduction to signal


processing methods that are based on optimum Wiener filtering and —
least-squares estimation concepts. Such methods have a remarkably
broad range of applications, ranging from the analysis and synthesis of |
speech, data compression, image processing and modeling, channel ©
equalization and echo cancellation in digital data transmission, geo- —
physical signal processing in oil exploration, linear predictive analysis
of EEG signals, modern methods of high-resolution spectrum estima- —
tion and superresolution array processing, to adaptive signal process-
ing for sonar, radar, system identification, and adaptive control applica-
tions. The structure of the book is to present the Wiener filtering concept -
as the basic unifying theme that ties together the various signal —
processing algorithms and techniques currently used in the above |
applications.”
‘ —from the Preface
288 pages ISBN: 0-02-949860-0 ; i

ISBN O-02-9494L0-5

You might also like