2.161 Signal Processing: Continuous and Discrete: Mit Opencourseware

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

MIT OpenCourseWare

http://ocw.mit.edu

2.161 Signal Processing: Continuous and Discrete


Fall 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY

DEPARTMENT OF MECHANICAL ENGINEERING

2.161 Signal Processing - Continuous and Discrete

Introduction to Recursive-Least-Squares (RLS) Adaptive Filters1

1 Introduction
In this handout we introduce the concepts of adaptive recursive-least-squares (RLS) FIR filters,
where the coefficients are continually adjusted on a step-by-step basis during the filtering operation.
The filter structure shown in Fig. 1 is similar to the least-mean-squares (LMS) filter described in
the handout Introduction to Least-Squares Adaptive Filters, but differs in the internal algorithmic
structure.
Like the LMS filter, the RLS filter is FIR of length M with coefficients bk , k = 0, 1, 2, . . . , M −1.
The input stream {f (n)} is passed through the filter to produce the sequence {y(n)}. At each time-
step the filter coefficients are updated using an error e(n) = d(n) − y(n) where d(n) is the desired
response (usually based of {f (n)}).
The LMS filter is implicitly designed around ensemble statistics, and uses a gradient descent
method based on expected values of the waveform statistics to seek optimal values for the filter
coefficients. On the other hand, the RLS filter computes the temporal statistics directly at each
time-step to determine the optimal filter coefficients. The RLS filter is adaptive and can adjust
to time varying input statistics. Under most conditions the RLS filter will converge faster than a
LMS filter.
d (n )

c a u s a l lin e a r F IR filte r + e rro r


y (n )
f(n ) e (n )
H (z ) -

filte r c o e ffic ie n ts > ( n )

R e c u r s iv e - L e a s t- S q u a r e s
A lg o r ith m

Figure 1: The recursive-least-squares (RLS) filter structure.

2 The Recursive-Least-Squares Filter Algorithm


For a filter as shown in Fig. 1, the total-squared-error E(n) at the nth iteration is defined as
n
� n

E(n) = e2 (i) = (d(i) − y(i))2 (1)
i=0 i=0

We modify the standard least-squares approach by including an exponential “forgetting factor”


λn−i , (0 < λ ≤ 1) to each error term, and modify Eq. 1 as follows
n
� n

E  (n) = λn−i e2 (i) = λn−i (d(i) − y(i))2
i=0 i=0
1
D. Rowell December 9, 2008

n
� � �2
= d (i) − y  (i) (2)
i=0
√ √
where d (i) = λn−i d(i), and y  (i) = λn−i y(i). The purpose of the factor λ is to weight recent
data points more heavily, and thus allow the filter to track changing statistics in the input data.
The FIR filter output is given by the convolution sum
M
� −1
y(n) = bk f (n − k) (3)
k=0

and for stationary waveforms, Eq. (1) at time step n reduces to


n
� n
� n

E  (n) = d2 (i) + y 2 (i) − 2 d (i)y  (i)
i=0 i=0 i=0
�n n
� M
� −1 M
� −1
= λn−i d2 (i) + λn−i bk bm f (n − k)f (n − m)
i=0 i=0 k=0 m=0
n
� M
� −1
n−i
−2 λ bk f (n − k)d(i) (4)
i=0 k=0

2.1 The RLS FIR Filter Coefficients


As we have seen previously, the optimal FIR filter coefficients bk (n), k = 0, . . . , M −1, that minimize
E  (n), are found by setting the derivatives with respect to each of the filter coefficients bk (n) equal
to zer, that is
∂ (E  )
= 0, k = 0, 1, . . . , M − 1, (5)
∂bk (n)
which generates a set of linear equations in the optimal coefficients bk (n). Using Eq. (4), in matrix
form these equations may be written

R(n)b(n) = P(n) (6)

or
b(n) = R−1 (n)P(n), (7)
where
b(n) = [b0 (n) b1 (n) b2 (n) ... bM −1 (n)]T
is a M × 1 column vector of the filter coefficients,
n

R(n) = λn−i f (i)f T (i), (8)
i=0

is a M × M matrix, and

f (i) = [f (i) f (i − 1) f (i − 2) ... f (i − (M − 1))]T

is a column vector of the recent input history, and


n

P(n) = λn−i f (i)d(i) (9)
i=0

is a M × 1 column vector.

2
Notice the similarity between these definitions of R(n) and P(n) in Eqs. (8) and (9) to the
correlation matrix and cross-correlation vector in least-mean-squares (LMS) filter design. In this
case however, the use of the weighting factor λ destroys the Toeplitz nature of the matrix R(n),
and efficient numerical inversion methods are not available.
The RLS design algorithm does not attempt to solve Eq. (7) at each time-step, which would be
impractical, requiring repeated inversion of R(n). Instead, the method uses an iterative algebraic
procedure to find the updated inverse of R(n) using the result from the previous step. Equation
(8) shows that R(n) may be computed recursively:

R(n) = λR(n − 1) + f (n)f T (n). (10)

However, we need a recursive definition of R−1 (n) based on R−1 (n − 1) to compute the filter
coefficients in Eq, (7).
The matrix inversion lemma (see the Appendix) states that if a square non-singular n × n
matrix A, with known inverse A−1 , is updated with an additive term the new inverse is given by
� �−1 � �−1
A + BCBT = A−1 − A−1 B C−1 + BT A−1 B BT A−1 (11)

where B is n × k, and C is k × k and non-singular.


From Eq. (10), if we let A = R−1 (n − 1), B = f (n), and C = λ−1 , Eq. (11) gives
� �−1
R−1 (n) = λ−1 R(n − 1) + f (n)λ−1 f T (n)
� �
−1 −1 R−1 (n − 1)f (n)f T (n)R−1 (n − 1)
= λ R (n − 1) − , (12)
λ + f T (n)R−1 (n − 1)f (n)

which is an algebraic recursion relationship that allows computation of R−1 (n) from the result of
the previous time-step and the current input history vector f (n). Note that no matrix inversion is
required.
In addition, from Eq. (9) P(n) may be defined recursively

P(n) = λP(n − 1) + d(n)f (n) (13)

so that Eqs.(12) and (13), when combined with Eq. (7), form the basis for the RLS filter design.
If we define an M -dimensional vector of Kalman gains

R−1 (n − 1)f (n)


k(n) = (14)
λ + f T (n)R−1 (n − 1)f (n)

we can rewrite Eq. (12) as


� �
R−1 (n) = λ−1 R−1 (n − 1) − k(n)f T (n)R−1 (n − 1) . (15)

The filter update equation, Eq. (7), may then be written as a recursion

b(n) = R−1 (n)P(n)


� �
= λ−1 R−1 (n − 1) − k(n)f T (n)R−1 (n − 1) [λP(n − 1) + d(n)f (n)]
= R−1 (n − 1)P(n − 1) + λ−1 d(n)R−1 (n − 1)f (n)
−k(n)f T (n)R−1 (n − 1)P(n − 1)
−λ−1 d(n)k(n)f T (n)R−1 (n − 1)f (n)
� �
= b(n − 1) + k(n) d(n) − f T (n)b(n − 1) . (16)

We also note that y(n) = f T (n)b(n − 1) is the convolution sum generating the filter output y(n)
using the previous set of filter coefficients, and therefore

b(n) = b(n − 1) + k(n) (d(n) − y(n))


= b(n − 1) + k(n)e(n) (17)

are the recursive filter update equations in terms of the Kalman gains and the filter output error.

2.2 Summary of the RLS Filter Coefficient Algorithm


With a new input sample f (n), and desired output value d(n),

1. Update the input history vector f (n).

2. Compute the filter output using the previous set of filter coefficients b(n − 1)

y(n) = f T (n)b(n − 1)

3. Compute the error


e(n) = d(n) − y(n)

4. Compute the Kalman gain vector

R−1 (n − 1)f (n)


k(n) =
λ + f T (n)R−1 (n − 1)f (n)

5. Update the matrix R−1 (n) for the next iteration


� �
R−1 (n) = λ−1 R−1 (n − 1) − k(n)f T (n)R−1 (n − 1)

6. Update the filter coefficients for the next iteration.

b(n) = b(n − 1) + k(n)e(n)

2.3 Filter Initialization


2.3.1 Initial Filter Coefficients b(0)
If a-priori knowledge of suitable coefficient values is available, it may be used to define the initial
coefficients b(0), otherwise it is usual to define the initial filter coefficients as b(0) = 0.

2.3.2 Initial Input Data History f (0)


In the absence of prior knowledge of the input history, the pre-windowing convention sets the input
f (n) = 0 for n ≤ 0.

2.3.3 The Matrix R−1 (0)


We note from Eq. (8) that
n

R(n) = λn−i f (i)f T (i),
i=0

and therefore true initialization of R−1 (0) requires knowledge of the the input history for n < 0.
Further, for small values of n, when the statistics are unreliable, there is a risk that R(n) may
become singular. The usual convention is to define

R−1 (0) = δI

where δ is a positive scalar, to ensure that R(n) is well behaved for small n. As n increases the
effect of this initialization error will decrease because of the effect of the weighting factor λ. It
may be necessary to determine a suitable value of δ by experimentation, balancing stability with
convergence rate, but a guideline that is often given is that

δ > 100σf2

where σ
f2 is the variance of the input.

2.3.4 The Weighting Factor λ


The value of λ defines the system memory, and affects (1) the convergence and the ability of the
filter to track time-varying statistics in the input sequence, and conversely (2) the stability of filter
coefficients.
The effective memory-length (time-constant) N , the number of iterations before the effect of a
disturbance has decayed to e−1 of its initial value, is
�∞ k
=0 kλ λ

N =
�k∞ k
=

k=0 λ 1 − λ

Common values used are between 0.95 < λ < 0.995, giving 19 < N < 199.

2.4 A Potential Problem


Unlike the LMS filter algorithm, which does not use the auto-correlation matrix directly, the RLS
algorithm requires that R(n) be non-singular. With deterministic input signals, if the filter length
M is greater than that required to reduce the error to zero, R(n) will become singular. For example,
when used as a one-step linear predictor with a sinusoidal input sequence, a filter length M = 2 is
sufficient to predict the input. An RLS filter with M = 2 is well-behaved, but a filter with M = 3
fails because there is no unique solution for the filter coefficients b(n), and R(n) is singular. Even
though R(n) is not inverted explicitly within the algorithm, the recursive estimation of R−1 (n)
also fails.
This is explored further in Example 1 below.

3 A MATLAB Demonstration Recursive-Least-Squares Filter


% ------------------------------------------------------------------------­
% 2.161 Classroom Example - RLSFilt - Demonstration Recursive Least-Squares FIR filter
% demonstration
% Usage : 1) Initialization:
% y = RLSFilt(’initial’, lambda, M, delta)
% where Lambda is the convergence rate parameter.
% lambda is the "fprgetting" exponential weight factor
% M is the filter length
% delta are the initial diagonal R^{-1}(n) matrix elements.
% Example:

5
% [y, e] = adaptfir(’initial’, .95, 51, 0.01);
% Note: RLSFilt returns y = 0 for initialization
% 2) Filtering:
% [y, b] = RLSFilt(f, d};
% where f is a single input value,
% d is the desired value, and
% y is the computed output value,
% b is the coefficient vector.
%
% Version: 1.0
% Author: D. Rowell 12/9/07
% ------------------------------------------------------------------------­
%
function [y, Bout] = RLSFilt(f, d, FIR_M, delta_n )
persistent F B lambda delta M Rinv
% The following is initialization, and is executed once
if (ischar(f) && strcmp(f,’initial’))
lambda = d;

M = FIR_M;

delta = delta_n;

F = zeros(M,1);

Rinv = delta*eye(M);

B = zeros(M,1);

y = 0;

else
% Filtering:
for J = M:-1:2
F(J) = F(J-1);
end;
F(1) = f;
% Perform the convolution
y= F’*B;
error = d - y;
% Kalman gains
K = Rinv*F/(lambda + F’*Rinv*F);
% Update Rinv
Rinvn = (Rinv - K*F’*Rinv)/lambda;
% Update the filter coefficients
B = B + K*error;
Bout=B;
Rinv = Rinvn;
end

4 Examples
4.1 Example 1: Demonstration of Convergence with a Sinusoidal Input
In the handout MATLAB Examples of Least-Squares FIR Filter Design, example we examined a
static least-squares filter design for a case described by Stearns and Hush: a one-step predictor of
a sine wave with with filter lengths M = 2 and M = 3. We also examined this example in the

handout Introduction to Least-Squares Adaptive Filters. The input is a noise-free sinusoid

� �
2πn
f (n) = sin .
12
The one-step linear predictor structure is shown in Fig. 2. The following MATLAB script was
used to examine the ability of the RLS algorithm to act as a one step predictor:

d (n ) = f(n )
f(n -1 ) R L S lin e a r F IR filte r y (n ) +
f(n ) Z
- 1
H (z ) - e rro r
d e la y e (n )

Figure 2: The one-step linear predictor.

% Example - Demonstration of convergence with a sinusoidal input

% See Stearns & Hush p. 246

L = 5000;

f=zeros(1,L);

for J=1:L

f(J) = sin(2*pi*J/12);

end

% Optionally Add a small random component to allow a non-singular R(n)

%f = f +0.001*randn(1,length(f));

% Initialize the filter with M = 2, Delta =1

% Choose filter gain parameter Lambda = 0.1

delta = .01; lambda = .99; M = 2;

x = RLSFilt(’initial’,lambda,M, delta);

% Filter the data

Delta = 1;

f_delay = zeros(1,Delta+1);

% Filter

for J = 1:length(f)

for K = Delta+1:-1:2
f_delay(K) = f_delay(K-1);

end

f_delay(1) = f(J);

d = f_delay(Delta+1);

[y,b] = RLSFilt(f_delay(Delta+1),f(J));

end;
% Report the final filter coefficients
b

The script was modified and run with M = 2 and 3, and with various values of λ.
The values reported for the filter coefficients with M = 2 were

b(0) = 1.73117, b(1) = −1



which are in agreement with the solution b(0) = 3, and b1 = −1 as reported by Stearns and Hush.

For M = 3 the RLS algorithm failed and no sensible values were returned. As Stearns and Hush
note, there is no unique solution for the coefficients for M = 3, and the optimal filter coefficients
must satisfy the conditions:
√ √
b(0) − b(2) = 3, b(0) + 3b(1) + 2b(2) = 0
Under these conditions the matrix R(n) is singular. However, when a small random component
was added to the sinusoidal input sequence (see the script), the matrix R(n) remains invertible,
and the values returned were
b(0) = 1.1232, b(1) = 0.0545, b(2) = −0.6088
which satisfy the above conditions. The LMS filter algorithm does not suffer from this problem.

4.2 Example 2: Suppression of a Sinusoid in Noise


For the second example we look at the rejection of a sinusoid of unknown frequency in white noise.
The filter structure is as shown in Fig. 3. A delay of Δ time steps is used to remove the correlation
between s(n − Δ) and d(n), so that only the narrow-band interference r(n) is predicted by the
filter, and y(n) ≈ r(n). The filter output is taken as the error signal, e(n) = d(n) − y(n) ≈ s(n).
The following MATLAB script demonstrates the efficacy of the method.
n a rro w -b a n d
in te r fe r e n c e
r(n )
d (n )
+ +
+ f(n ) c a u s a l lin e a r F IR filte r y (n )
s (n ) Z
- D
e (n ) » s (n )
w id e - b a n d H (z ) -
e rro r
s ig n a l d e la y
filte r c o e ffic ie n ts

R L S A lg o r ith m

Figure 3: Narrow band interference rejection filter.

% Create the input as white noise with a strong sinusoidal component


f = 0.1*randn(1,10000);
y = zeros(1,length(f));
e = zeros(1,length(f));
M=15; delta = .001; lambda = .95;
x = RLSFilt(’initial’,lambda, M, delta);
L = length(f);
f_delay = zeros(1,Delta+1);
for J = 1:L
f(J) = f(J) + 3*sin(2*pi*J/12);
for K = Delta+1:-1:2
f_delay(K) = f_delay(K-1);
end
f_delay(1) = f(J);
[y(J),b] = RLSFilt(f_delay(Delta+1),f(J));
e(J) = f(J) - y(J);
end;
Figure 4 shows the input and output spectra of the last 1000 samples of the data record.

Input spectrum Output spectrum


1200 12

1000 10

800 8

Magnitude

Magnitude
600 6

400 4

200 2

0 0
−3.14 −1.57 0 1.57 3.14 −3.14 −1.57 0 1.57 3.14
Normalized frequency Normalized frequency

Figure 4: Input and output spectra for the filter in Example 2.

4.3 Example 3: Frequency Domain Characteristics of an RLS Narrow-Band


Suppression Filter
This example is similar to Example 2. It uses the narrow-band interference suppression structure
shown in Fig. 3. The interfering signal is comprised of 100 sinusoids with random phase and
random normalized frequencies 0.3 < Ω < 0.6. The “signal” is white noise. The filter used has
M = 31, and is initialized with δ = 1. Because the system input is stationary, the weighting
factor is set to λ = 1. The decorrelation delay Δ = 1. The following MATLAB script runs
a simulation with 10,000 steps and determines the overall frequency response magnitude of the
system H(z) = E(z)/D(z) = 1 − z −Δ Hn (z), and plots the system pole-zero plot.
% Create the interference as a closely packed sum of sinusoids
% between 0.3pi < Omega < 0.6pi with random frequency and phase
phase = 2*pi*rand(1,100);
freq = 0.3 + 0.3*rand(1,100);
f = zeros(1,10000);
for J=1:10000
f(J) = 0;
for k = 1:100
f(J) = f(J) + sin(freq(k)*J + phase(k));
end
end
% The "signal" is white noise
signal = randn(1,10000);
f = .005*f + 0.01*signal;
% Initialize the filter
lambda = 1; M = 31; delta = 1;
x = RLSFilt(’initial’,lambda, M, delta);
% Filter the data
Delta = 1;
f_delay = zeros(1,Delta+1);
y = zeros(1,length(f));
e = zeros(1,length(f));
for J = 1:length(f)

9
for K = Delta+1:-1:2
f_delay(K) = f_delay(K-1);

end

f_delay(1) = f(J);

[y(J),b] = RLSFilt(f_delay(Delta+1),f(J));

e(J) = f(J) - y(J);

end;
% Compute the overall filter coefficients
% H(z) = 1 - z^{-Delta}H_{RLS}(z)
b_overall = [1 zeros(1,Delta-1) -b’];
% Find the frequency response
[H,w] = freqz(b_overall,1);
figure(1);
plot(w,20*log10(abs(H)));
figure(2);
zplane(b_overall,1)

The input and output spectra are shown in Fig. 5. The filter frequency response magnitude and
the pole-zero plot of the filter are shown in Fig. 7. The adaptive algorithm has clearly generated a
notch-filter covering the bandwidth of the interference.

Spectrum of input signal f(n) Spectrum of output signal e(n)


8 8

7 7

6 6

5 5
Magnitude

Magnitude

4 4

3 3

2 2

1 1

0 0
0 1 2 3 0 1 2 3
Normalized angular frequency Normalized angular frequency

Figure 5: Input and output spectra from a RLS suppression filter with interference in the band
0.3 < Ω < 0.6.

10

RLS Filter Frequency Response (lambda =1, M = 31, delta = 1)


5

0
Frequency response magnitude (dB)

−5

−10

−15

−20
0 0.5 1 1.5 2 2.5 3
Normalized frequency

Figure 6: Frequency response magnitude plot of a RLS suppression filter with interference in the
band 0.3 < Ω < 0.6.

RLS Filter Pole−Zero Plot (lambda =1, M = 31, delta = 1)

0.8

0.6

0.4
Imaginary Part

0.2
31
0

−0.2

−0.4

−0.6

−0.8

−1

−1 −0.5 0 0.5 1
Real Part

Figure 7: Pole-zero plot of a RLS suppression filter with interference in the band 0.3 < Ω < 0.6.

11

Appendix: The Matrix Inversion Lemma

The matrix inversion lemma, also known as the Woodbury matrix identity states that for a square
n × n matrix A with a known inverse A−1 , a new updated inverse may be formed
� �−1 � �−1
A + BCBT = A−1 − A−1 B C−1 + BT A−1 B BT A−1

where B is n × k and C is k × k and non-singular.


First, using the identity (PQ)−1 = Q−1 P−1 the right-hand side may be written
� �−1 � �−1
A−1 − A−1 B C−1 + BT A−1 B BT A−1 = A−1 − A−1 B I + CBT A−1 B CBT A−1 .

Now form the product


� �� � �−1 �
T −1 −1 T −1 T −1
A + BCB A −A B I + CB A B CB A
� �−1 � �−1
= I + BCBT A−1 − B I + CBT A−1 B CBT A−1 − BCBT A−1 B I + CBT A−1 B CBT A−1
� �� �−1
= I + BCBT A−1 − B I + CBT A−1 B I + CBT A−1 B CBT A−1

= I + BCBT A−1 − BCBT A−1

= I,

thus proving the lemma.


Notice that if k = 1, as in the case of the RLS filter where C = λ−1 , the updated inverse does
not require explicit matrix inversion.

12

You might also like