0% found this document useful (0 votes)
111 views60 pages

Wiener Filter - LMS

The document discusses Wiener filters and stochastic gradient-based algorithms for adaptive filtering. It describes how Wiener filters require prior statistical information about the input data to provide an optimal design, while adaptive filters can adjust their filter coefficients using gradient-based adaptation algorithms like steepest descent. These algorithms recursively update the filter weights to minimize the mean squared error between the actual and desired output, converging over iterations to the optimal Wiener solution.

Uploaded by

jaffa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views60 pages

Wiener Filter - LMS

The document discusses Wiener filters and stochastic gradient-based algorithms for adaptive filtering. It describes how Wiener filters require prior statistical information about the input data to provide an optimal design, while adaptive filters can adjust their filter coefficients using gradient-based adaptation algorithms like steepest descent. These algorithms recursively update the filter weights to minimize the mean squared error between the actual and desired output, converging over iterations to the optimal Wiener solution.

Uploaded by

jaffa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Wiener Filters and Stochastic Gradient

Based Algorithms

Adaptive Filter Theory


Advanced Digital Signal Processing

September 20, 2020

1/60
Advanced Digital Signal Processing
General Introduction

• The design of a Wiener lter requires a priori information


about the statistics of the data
• The lter is optimum when the statistical characteristics of
input data match the a priori information required for
designing the lter
• Non optimum design will be obtained if this information is not
available

2/60
Advanced Digital Signal Processing
Adaptive Linear Combiner

• It is the most basic element of learning systems or adaptive


signal processing
• Also known as tap delay line lter
• An input and set of adjustable weight vector is given as input
• It is called as linear since for a xed set of weights the o/p is a
linear combination of the inputs

Weight vector
Input Vector

Output Signal

3/60
Advanced Digital Signal Processing
Filtering

Input vector: xk = [xk , xk−1 · · · xk−L ]T


Weight vector: wk = [w0k , w1k , · · · wLk ]T
Output:
L
(1)
X
yk = wlk xk−l
l=0

Matrix form:yk = xTk wk = wkT xk


estimation error:ek = dk − yk
where, dk is the desired signal

4/60
Advanced Digital Signal Processing
Adaptive Linear Combiner

• Adaptive linear combiner (ALE) can be used in both closed


loop and open loop adaptation
• With closed loop systems, the weight vector depends on the
output as well as the other data
• The other data is the desired response or training signal
• We will be mostly studying closed loop performance feedback
systems
Weight vector

Desired
x0 w0
Input Vector

Output
d
x1 w1 Error

y e

xL wL

Figure: Adaptive linear combiner with desired response and error


computation
5/60
Advanced Digital Signal Processing
Performance Surface

xk xk-1 xk-L
z-1 z-1 z-1

w0k w1k
wLk

yk

Figure: Adaptive transversal lter with output computation

Computation of estimation error:ek = dk − xTk wk


Instantaneous squared error:e2k = d2k + wT xk xT w − 2dk xT w

6/60
Advanced Digital Signal Processing
Performance Surface contd...

Taking expectation operator and assuming ek , dk and xk to be


stationary
E[e2k ] = E[d2k ] + wT E[xk xT T
k ]wk − 2E[dk xk ]wk
Let R be dened as the correlation matrix
x20k
 
x0k x1k · · · x0k xLk
 x1k x0k x21k · · · x1k xLk 
R = E[xk xT ] = . .. ... .. 
 ..
 
k
. . 

xLk x0k xLk x1k · · · 2


xLk

Let p be dened as the column vector


p = E[dk xk ] = E[dk x0k , dk x1k · · · dk xLk ]T
This vector is the set of cross correlations between the desired
response and input components.
R, p represent second order statistics when xk and dk are
stationary
7/60
Advanced Digital Signal Processing
Performance Surface Contd..

Let the mean square error be denoted by


ξ = E[d2 ] + wT Rw − 2pT w
k

MSE ξ is a quadratic function of the weight vector w

8/60
Advanced Digital Signal Processing
Performance Surface Contd...

• Mean square error ξ is a quadratic function of the weight


vector when the input and desired response are stationary
• The vertical axis represents the mean square error and
horizontal axes values of two weights
• The bowl shaped quadratic error function or performance
surface formed in this manner is a paraboloid
• Contours of constant mean square error are elliptical as can be
seen by setting ξ constant
• The point at the bottom of bowl is projected onto
weight-vector plane as w∗ , the optimal weight vector or point
of minimum mean-square error.

9/60
Advanced Digital Signal Processing
Gradient and Minimum Mean Square Error

The gradient of mean square-error performance designated by ∇ξ is


obtained as
" #T
∂ξ ∂ξ ∂ξ ∂ξ
∇= = , ,··· = 2Rw∗ − 2p (2)
∂w ∂w0 ∂w1 ∂wL

To obtain the optimal value, the mean square error is minimized by


equating the gradient to zero as
∇ = 0 = 2Rw∗ − 2p (3)
The optimal Wiener Hopf solution for the weight vector is given by

w∗ = R−1 p (4)

10/60
Advanced Digital Signal Processing
Minimum Mean Square Error

The minimum mean-square error is now obtained as

ξmin = E[d2k ] + w∗T Rw∗ − 2pT w∗


= E[d2k ] + [R−1 p]T RR−1 p − 2pT R−1 p (5)

ξmin = E[d2k ] − pT R−1 p = E[d2k ] − pT w∗ (6)

11/60
Advanced Digital Signal Processing
Gradient Based Adaptation

• The requirement of an adaptive lter is to nd a solution for


the tap weight vector that satises Wiener Hopf equation
• Using analytical means, this system of equations can be solved.
• The problem has computational diculties when either the
number of tap weights or input data rate is high
• An alternative is to use the method of steepest descent, an
optimization method

12/60
Advanced Digital Signal Processing
Method of Steepest Descent-Gradient based adaptation

• It is derived on the basis of gradient based adaptation


• It is recursive as it starts from some initial value and improves
with increasing number of iterations
• The nal value of the tap weight vector converges to the
Wiener solution

u(n)
z-1 z-1 z-1

w w
0(n)
M-1(n)
d(n)
∑ ∑ y(n) e(n)
∑ ∑

Weight Update
Mechanism

13/60
Advanced Digital Signal Processing
Gradient based adaptation

The estimation error is denoted


as:e(n) = d(n) − y(n)=d(n) − xT (n)w(n)
Weight vector w(n)=[w0 (n), w1 (n), · · · wM −1 (n)]T
Input vector x(n)=[x(n), x(n − 1), · · · x(n − M + 1)]T
Cost Function (Mean Squared Error) J(n) is denoted as
J(n) = σd2 − wT (n)p − pT w(n) + wT (n)Rw(n) (7)
where, σd2 is the variance of the desired response
p(n) is the cross correlation vector between the input vector u(n)
and desired response d(n)
R is the correlation matrix of the tap input vector u(n).

14/60
Advanced Digital Signal Processing
Gradient based adaptation contd...

Let ∇J(n) denotes the gradient vector at time n.


According to the method of steepest descent, the updated value of
tap weight vector at time n + 1 is computed as
1
w(n + 1) = w(n) + µ[−∇J(n)] (8)
2
 ∂J(n) ∂J(n) ∂J(n) 
∇J(n) = , ,··· = −2p + 2Rw(n)
∂w0 (n) ∂w1 (n) ∂wM −1 (n)
(9)
The weight update rule according to the steepest descent method
is obtained by substituting the value of gradient vector in (8)
w(n + 1) = w(n) + µ[p − Rw(n)] (10)

15/60
Advanced Digital Signal Processing
Signal Flow Graph Diagram for Steepest Descent Algorithm

-µ R

p µ ∑ Z-1I ∑
w(n+1) w(n)

16/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm

• The steepest descent update involves feedback loop, hence the


possibility of algorithm becoming unstable
• The stability performance of steepest descent depends on
following two parameters:
1 µ (Convergence factor): also known as step size parameter
2 R: Input correlation matrix
• These two parameters control the transfer function of the
feedback loop

17/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm contd....

Dening the weight error vector c(n) as the deviation between the
desired and estimated weight vectors
c(n) = w(n) − w0 (11)
where, w0 is the optimal weight vector according to the
Wiener-Hopf equation (w0 = R−1 p) Using eq.(10), (11) and
(w0 = R−1 p)

w(n + 1) − w0 = w(n) + µ[p − Rw(n)] − w0 (12)


c(n + 1) = c(n) + µ[w0 R − Rw(n)] (13)
c(n + 1) = [I − µR]c(n) (14)
where, I is the identity matrix.

18/60
Advanced Digital Signal Processing
Signal Flow Graph Diagram for Weight Error Vector

-µ R

∑ Z-1I ∑
c(n+1) c(n)

19/60
Advanced Digital Signal Processing
The square input correlation matrix R can be represented as
R = QSQT using unitary similarity transformation
The unitary matrix Q contains orthogonal set of eigenvectors
related to eigenvalue of the matrix R as its column elements. R is
the diagonal matrix containing its eigenvalues as diagonal elements
as [λ1 , λ2 , · · · λM ]
c(n + 1) = [I − µQSQT ]c(n) (15)

20/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm contd....

Premultiplying both sides by QT and using the property of unitary


matrix
QT c(n + 1) = [I − µS]QT c(n) (16)
A new set of vectors as v(n) is dened as
v(n) = QT c(n) = QT [w(n) − w0 ] (17)
Hence, the previous expression can be simplied as
v(n + 1) = [I − µS]v(n) (18)
This can be further simplied considering kth natural mode of
decomposition as
vk (n + 1) = vk (n)[1 − µλk ] (19)

21/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm contd....

Assuming vk (n) as the initial value as vk (0), we can write


vk (n + 1) = vk (0)[1 − µλk ]k (20)
For the convergence of steepest descent algorithm the geometric
ratio 1 − µλk of the above expression representing geometric series
has to be less than 1 as shown
−1 < 1 − µλk < 1 (21)
Therefore, the necessary and sucient condition required for the
convergence of steepest descent algorithm is shown as

2
0<µ< (22)
λmax

where, λmax is the maximum eigenvalue of the input correlation


matrix R. 22/60
Advanced Digital Signal Processing
Least Mean Square Algorithm

23/60
Advanced Digital Signal Processing
Least Mean Square Algorithm (LMS)

• Least Mean Square is the most widely used algorithm given by


Widrow and Ho in 1960
• It belongs to the family of stochastic gradient algorithms
• Steepest Descent method uses a deterministic gradient in a
recursive computation of the Wiener lter for stochastic inputs
• Simplicity is one of the main feature of lms algorithm
• Does not require measurements of the pertinent correlation
functions or matrix inversion

24/60
Advanced Digital Signal Processing
Structure of LMS algorithm

LMS is a linear ltering adaptive algorithm that consists of two


basic processes:
• Filtering: It involves computation of lter output usig set of
input and weight vectors. Estimation error is obtained by
comparing the desired and ltered output.
• Adaptive process:It involves adjustment of the taps of weight
vector according to the estimation error
• These two process form the feedback loop facilitating adaptive
weight control mechanism

25/60
Advanced Digital Signal Processing
Block Diagram of Adaptive Filtering

d(n)
x(n) Transversal y(n) e(n)
Adaptive Filter

e(n)

Adaptive Weight
Control
mechanism

26/60
Advanced Digital Signal Processing
Adaptive Weight Control

x(n) δw0(n)

x(n-1) δw1(n)

x(n-2) e(n)
µ

x(n-M+1) δwM-1(n-1)

27/60
Advanced Digital Signal Processing
LMS Algorithm

For the tap weight vector obtained by steepest descent to converge


to Wiener solution, following points are to be considered as
• Exact measurement of the gradient vector
• Proper value of step size µ is to be chosen
Exact measurement of gradient vector not possible always as it
would require information about the input correlation matrix R and
cross correlation vector p.
Hence an estimate of the gradient vector is utilized for this purpose.

28/60
Advanced Digital Signal Processing
Derivation of LMS algorithm

• The estimate of the gradient vector is obtained by substituting


the estimates of correlation matrix R and cross correlation
vector p in eq.(10)
• We start by using instantaneous estimates of R and p based
on sample values of tap input vector and desired response as
R̂ = x(n)xT (n) (23)
p̂ = x(n)d(n) (24)
Substituting the above dened estimates in eq.(10) as
ŵ(n + 1) = ŵ(n) + µ[p̂ − R̂ŵ(n)] (25)
ŵ(n + 1) = ŵ(n) + µx(n)[d(n) − xT (n)ŵ(n)] (26)

29/60
Advanced Digital Signal Processing
LMS contd....

The structure, working and derivation of LMS algorithm consists of


following steps:
1 Filter Output: y(n) = xT (n)ŵ(n)
2 Estimation Error: e(n) = d(n) − y(n)
3 Weight Update: ŵ(n + 1) = ŵ(n) + µx(n)e(n)

30/60
Advanced Digital Signal Processing
Signal Flow Diagram of LMS

d(n)
e(n)
x(n) µ ∑ xT(n)


z-1 ∑
ŵ(n+1) ŵ(n)

31/60
Advanced Digital Signal Processing
Stability Analysis of LMS Algorithm

The stability and performance analysis of LMS algorithm is carried


out using mean squared value of the estimation error.
The weight error vector can be written as
c(n) = ŵ(n) − w0 (27)
Subtracting the optimal weight vector w0 from both sides of
eq.(26) as

ŵ(n + 1) − w0 = ŵ(n) + µx(n)[d(n) − xT ŵ(n)] − w0 (28)

c(n + 1) = c(n) + µx(n)d(n) − µx(n)xT (n){c(n) + w0 }


= [I − µx(n)xT (n)]c(n) + µx(n)e0 (n) (29)

where, e0 (n) is the estimation error generated by the optimum


Wiener solution as e0 (n) = d(n) − xT (n)w0 (n)
32/60
Advanced Digital Signal Processing
Direct Averaging Method

Eq.(29) is of the form of stochastic dierence equation in the


weight error vector c(n)
For very small value of the convergence factor µ, the system matrix
[I − µx(n)xT (n)] approaches the identity matrix I.
By using the method of direct averaging, the solution of eq.(29) for
small values of µ is similar to the solution of another stochastic
dierence equation whose system matrix is equal to the ensemble
average as
E[I − µx(n)xT (n)] = I − µR (30)
Hence the weight vector can be written as
c(n + 1) = [I − µR]c(n) + µx(n)e0 (n) (31)
The direct averaging method applies well for small values of µ
assuming randomness of c(n) will tend to average out.
33/60
Advanced Digital Signal Processing
Independence Theory

The statistical analysis of LMS algorithm is carried out by utilizing


the independence assumptions such as
• The input vectors x(n), x(n − 1), · · · x(1), form a sequence
of statistically independent vectors
• At time n, the input vector x(n) is statistically independent to
the previous values of desired responses such as d(n − 1),
d(n − 2), · · · d(1).
• At time n, the desired response d(n) is dependent on the
current input vector x(n), but statistically independent of all
the previous delayed desired responses.
• The current input vector x(n) and desired response d(n)
consist of mutually Gaussian distributed random variables

34/60
Advanced Digital Signal Processing
Statistical Analysis of LMS

The statistical analysis of LMS is based on so called independence


theory. The updated weight vector ŵ(n + 1) obtained by LMS is
dependent on the following three as:
• The previous sample input vectors x(n), x(n − 1), · · · x(1).
• The previous samples of the desired response d(n), · · · , d(1).
• The initial value of the weight vector ŵ(1).
The tap weight vector ŵ(n + 1) and c(n + 1) is independent of
x(n + 1) and d(n + 1).
In many applications, the sequence of input vectors x(n + 1) and
x(n) are statistically however dependent

35/60
Advanced Digital Signal Processing
Stability Analysis contd..

x(n) = [x(n), x(n − 1), · · · x(n − M + 1)]T


x(n + 1) = [x(n + 1), x(n), · · · x(n − M )]T However, we ignore the
statistical dependence among successive tap input vectors at
certain times
E[x(n)xT (n)c(n)cT (n)]=E[x(n)xT (n)]E[c(n)cT (n)]

36/60
Advanced Digital Signal Processing
Convergence Criteria

The necessary condition for the convergence of mean; that is given


as E[c(n)] → 0 as n → ∞
or equivalently E[ŵ(n)] → w0 as n → ∞
A stronger criterion is convergence in the mean given as
E[||c(n)||] → 0 as n → ∞.
where, E[||c(n)||] is the Euclidean norm of the weight error vector
c(n).
Convergence in the mean square
The LMS algorithm is convergent in the mean square if
D(n) = E[||c(n)||2 ] → constant as n → ∞ (32)
where, D(n) is called the squared error deviation.

37/60
Advanced Digital Signal Processing
Weight Error Correlation Matrix

Another way of describing the convergence of LMS in the mean


square is to require that
J(n) = E[|e(n)|2 ] → constant as n → ∞
where, e(n) is the estimation error and J(n) is the mean-squared
error.
The correlation matrix of the weight error vector c(n) is
K(n) = E[c(n)cT (n)]. Using the independence assumption, we
get K(n + 1) = E[c(n + 1)cT (n + 1)]
K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (33)

38/60
Advanced Digital Signal Processing
Weight Error Correlation Matrix Contd...

K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (34)


Some observations about the previous equation as
• The rst term (I − µR)K(n)(I − µR) is the result of
evaluating the expectation of the outer product of
(I − µR)c(n) with itself
• The expectation of the cross product term
µe0 (n)(I − µR)c(n)x(n) is zero due to the implied
independence of c(n) and x(n).'
• The last term µ2 Jmin R is obtained by applying the Gaussian
factorization theorem to the product µ2 e0 (n)x(n)xT (n)e0 (n)
the last term µ2 Jmin R prevents K(n) = 0 from being a solution to
this equation
39/60
Advanced Digital Signal Processing
Excess Mean Squared Error

e(n) = d(n) − ŵT (n)x(n) = d(n) − w0T (n)x(n) − cT (n)x(n)


= e0 (n) − cT (n)x(n) (35)

The mean squared error due to LMS algorithm can be calculated as


J(n) = E[|e(n)2 |] = E[(e0 (n) − cT (n)x(n))(e0 (n) − cT (n)x(n))T ]
(36)
J(n) = Jmin + E[cT (n)x(n)xT (n)c(n)] (37)
The last term of the above expression can be simplied as it
involves triple vector product and the result comes out to be a
scalar
E[cT (n)x(n)xT (n)c(n)] = E[tr{cT (n)x(n)xT (n)c(n)}] (38)
40/60
Advanced Digital Signal Processing
Excess Mean Squared Error Contd..

E[cT (n)x(n)xT (n)c(n)] = E[tr{cT (n)x(n)xT (n)c(n)}]


= E[tr{x(n)xT (n)c(n)cT (n)}] = tr{E[x(n)xT (n)c(n)cT (n)]}
(39)

The above is simplied using tr[AB] = tr[BA], here A = cT (n)


and B = x(n)xT (n)c(n), therefore
tr[cT (n)x(n)xT (n)c(n)] = tr[x(n)xT (n)c(n)cT (n)]

J(n) = Jmin + tr[RK(n)] (40)

41/60
Advanced Digital Signal Processing
Excess Mean Squared Error Contd..

J(n) = Jmin + tr[RK(n)] (41)


The mean square value of the estimation error consists of two
components as
• The minimum mean squared error Jmin
• A component depending on the transient behaviour of the
weight error correlation matrix K(n)
The excess mean-squared error can be written as
Jex = J(n) − Jmin = tr[RK(n)] (42)
Using the unitary transformation rule, for input correlation matrix
R = QSQT
QT RQ = S
42/60
Advanced Digital Signal Processing
Excess Mean Squared Error Contd..

let us dene QT K(n)Q = X(n). Therefore the term tr[RK(n)]


can be simplied as
tr[RK(n)] = tr[QSQT QX(n)QT ] = tr[QSX(n)QT ] = tr[QT QSX(n)]
(43)
tr[RK(n)] = tr[SX(n)] (44)
M
(45)
X
Jex = tr[RK(n)] = tr[SX(n)] = λi xi (n)
i=0

43/60
Advanced Digital Signal Processing
K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (46)
R = QSQT
QT RQ = S
X(n + 1) = (I − µS)X(n)(I − µS) + µ2 Jmin S (47)
xi (n) = (1 − µλi )2 xi (n) + µ2 Jmin λi ; i = 1, 2, · · · M (48)
Dene the M × 1 vector x(n) and λ as follows:
x(n) = [x1 (n), x2 (n), · · · xM (n)]T
λ = [λ1 , λ2 , · · · λM ]T ]
Based on the denition of the above two vectors, we can re-write
eq.(65) as
x(n + 1) = Bx(n) + µ2 Jmin λ (49)
where, B is a M × M matrix given as
(
(1 − µλi )2 , i = j
bij = (50)
µ2 λi λj , i 6= j
44/60
Advanced Digital Signal Processing
The matrix B can be represented in terms of its eigenvalues and
eigenvectors as B = GCGT
, where, C is the diagonal matrix consisting of eigenvalues as
C = diag[c1 , c2 , · · · cM ] and G = [g1 , · · · gM ] Using this the
solution of eq.(49) can be represented as
M
cni gi giT [x(0) − x(∞)] + x(∞) (51)
X
x(n) =
i=1

where, x(0) and x(∞) are the initial and nal values of the x(n)
The excess mean squared error can be represented as
M
λi xi (n) = λT x(n) (52)
X
Jex (n) =
i=0

M
cni λT gi giT [x(0) − x(∞)] + λT x(∞) (53)
X
Jex (n) =
i=1
45/60
Advanced Digital Signal Processing
M
cni λT gi giT [x(0) − x(∞)] + Jex (∞) (54)
X
Jex (n) =
i=1

The term M n T T
i=1 ci λ gi gi [x(0) − x(∞)] denotes the transient
P
behavior of the mean square error whereas the second term denotes
the nal value of the excess mean squared error

46/60
Advanced Digital Signal Processing
Transient Behavior of the Mean Squared Error

J(n) = Jmin + Jex (n) (55)


M
cni λT gi giT [x(0) − x(∞)] + Jex (∞) (56)
X
J(n) = Jmin +
i=1
M
(57)
X
J(n) = Jmin + γi cni + Jex (∞)
i=1

where γi is dened as γi = λT gi giT [x(0) − x(∞)]

47/60
Advanced Digital Signal Processing
Transient Behavior contd...

Property 1: The transient behavior of the mean squared error does


not exhibit oscillations. The transient component of J(n)
corresponds to M i=1 i i .
n
P
γ c
Property 2: The transient component of the mean-squared error
J(n) dies out; that is the LMS algorithm is convergent in the mean
square if and only if the step size parameter satises the condition
2
0<µ< (58)
λmax

48/60
Advanced Digital Signal Processing
For property 2 to hold, all the values of eigenvalues of the matrix
has to be less than 1. Then by denition of the eigenvalues of the
matrix B
B = GCG
Bg = cg
M
(59)
X
bij gj = cgi ; i = 1, 2, · · · M
j=1

Using B is a M × M matrix given as


(
(1 − µλi )2 , i = j
bij = (60)
µ2 λi λj , i 6= j

49/60
Advanced Digital Signal Processing
M
(61)
X
(1 − µλi )2 gi + µ2 λi λj gj = cgi ; i = 1, 2, · · · M
j=1,j6=i

Solving for gi , we may this write,


M
µ2 λ i
(62)
X
gi = λj gj
c − (1 − µλi )2
j=1,j6=i

B is a positive square matrix, there will be one highest eigenvalue,


hence setting c = 1 in the above expression as
M
µ
(63)
X
gi = λj gj
2 − µλi
j=1,j6=i

From this it can be concluded that for gi to be positive for all i, the
step size parameter µ has to be upper bounded as 0 < µ < λmax 2
.
50/60
Advanced Digital Signal Processing
Property 3: The nal value of the excess mean squared error is
less than the minimum mean squared error if the step size
parameter µ satises the condition
M
2λi
(64)
X
≤1
2 − µλi
i=1

As the number of iterations approaches ∞, Jex (∞) = J(∞) − Jmin


Putting n = ∞ in the given equation and solving for xi (∞) as
xi (n) = (1 − µλi )2 xi (n) + µ2 Jmin λi ; i = 1, 2, · · · M (65)
µJmin
xi (∞) = (66)
2 − µλi
M
λi xi (n) = λT x(n) (67)
X
Jex (n) =
i=0
M M
µλi
(68)
X X
Jex (∞) = λi xi (∞) = Jmin
2 − µλi
i=0 i=0
51/60
Advanced Digital Signal Processing
M M
µλi
(69)
X X
Jex (∞) = λi xi (∞) = Jmin
2 − µλi
i=0 i=0

For Jex (∞) to be less than the Jmin , the step-size parameter µ has
to satisfy the condition given in (64)

52/60
Advanced Digital Signal Processing
Property 4: The misadjustment dened as the ratio of the steady
state value Jex (∞) of the excess mean squared error to the
minimum mean squared error Jmin , equals
M
Jex (∞) X µλi
Φ= = (70)
Jmin 2 − µλi
i=0

which is less than unity if the step size parameter µ satises the
condition given in (64).

53/60
Advanced Digital Signal Processing
2
0<µ< (71)
λmax
The condition for the LMS algorithm to be convergent in mean
square, requires the knowledge of the largest value of eigenvalue as
λmax of correlation matrix R. However, that may not be available
always, the tr[R] is taken as the estimate of λmax
2
0<µ< (72)
tr[R]

R is seen to be a Toeplitz matrix with the diagonal elements as


r(0) i.e. the mean square value of the input at each tap of the M
taps in the transversal lter as

54/60
Advanced Digital Signal Processing
M −1
(73)
X
tr[R] = M r(0) = E[|x(n − k)|2 ]
k=0

Tap input power can be dened as the sum of all the input
elements of [x(n), x(n − 1), · · · x(n − M + 1)]. Therefore the
condition on µ can be further specied as
2
0<µ< (74)
tap − input − power

55/60
Advanced Digital Signal Processing
If the step size parameter µ is small compared to the largest λmax ,
the misadjustment Φ can be written as
M
Jex (∞) X µλi
Φ= = (75)
Jmin 2 − µλi
i=0

M
µX µ
Φ= λi = (tap − input − power) (76)
2 2
i=0

56/60
Advanced Digital Signal Processing
Dening an average eigenvalue as
M
1 X
λav = λi (77)
M
i=1

If the ensemble average learning curve of LMS is approximated by a


single exponential with time constant τav as
1
τav = (78)
2µλav

Using this approximation, misadjustment Φ can be written as


M
µX µM λav M
Φ= λi = = (79)
2 2 4τav
i=0

57/60
Advanced Digital Signal Processing
Comparison of LMS Algorithm with Steepest Descent

• The steepest descent algorithm converges to the Wiener Hopf


solution as the number of iterations approaches innity
• The LMS algorithm uses a noisy estimate of the gradient
vector, with the result that the tap weight vector estimate only
approached the optimum solution w0
• After large number of iterations, LMS algorithm results in
mean squared error J(∞) greater than the minimum mean
squared error Jmin .

58/60
Advanced Digital Signal Processing
Comparison of LMS Algorithm with Steepest Descent
Contd...

• The steepest descent algorithm has a well dened learning


curve obtained by plotting the mean squared error versus the
number of iterations.
• The learning curve consists of sum of decaying exponentials,
the number of which equals the number of taps of adaptive
lter.
• For LMS algorithm, the learning curve consists of noisy
decaying exponentials
• The amplitude of noise usually becomes smaller as the
step-size parameter µ is reduced.

59/60
Advanced Digital Signal Processing
Comparison of LMS Algorithm with Steepest Descent
Contd...

• In the steepest descent algorithm, the correlation matrix R


and the cross correlation vector p are obtained through the
use of ensemble average operations applied to statistical
populations of tap input and desired response; these values are
used to obtain the learning curve of the algorithm
• The learning curve consists of sum of decaying exponentials,
the number of which equals the number of taps of adaptive
lter.
• For LMS algorithm, the noisy learning curves are obtained for
an ensemble of adaptive LMS lters with identical parameters;
the learning curve is then smoothed by averaging over the
ensemble of noisy learning curves.

60/60
Advanced Digital Signal Processing

You might also like