0% found this document useful (0 votes)

111 views60 pages

Wiener Filter - LMS

The document discusses Wiener filters and stochastic gradient-based algorithms for adaptive filtering. It describes how Wiener filters require prior statistical information about the input data to provide an optimal design, while adaptive filters can adjust their filter coefficients using gradient-based adaptation algorithms like steepest descent. These algorithms recursively update the filter weights to minimize the mean squared error between the actual and desired output, converging over iterations to the optimal Wiener solution.

Uploaded by

jaffa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views60 pages

Wiener Filter - LMS

Uploaded by

jaffa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Wiener Filters and Stochastic Gradient

Based Algorithms

Adaptive Filter Theory

Advanced Digital Signal Processing

September 20, 2020

1/60
Advanced Digital Signal Processing
General Introduction

• The design of a Wiener lter requires a priori information

about the statistics of the data
• The lter is optimum when the statistical characteristics of
input data match the a priori information required for
designing the lter
• Non optimum design will be obtained if this information is not
available

2/60
Advanced Digital Signal Processing
Adaptive Linear Combiner

• It is the most basic element of learning systems or adaptive

signal processing
• Also known as tap delay line lter
• An input and set of adjustable weight vector is given as input
• It is called as linear since for a xed set of weights the o/p is a
linear combination of the inputs

Weight vector
Input Vector

Output Signal

3/60
Advanced Digital Signal Processing
Filtering

Input vector: xk = [xk , xk−1 · · · xk−L ]T

Weight vector: wk = [w0k , w1k , · · · wLk ]T
Output:
L
(1)
X
yk = wlk xk−l
l=0

Matrix form:yk = xTk wk = wkT xk

estimation error:ek = dk − yk
where, dk is the desired signal

4/60
Advanced Digital Signal Processing
Adaptive Linear Combiner

• Adaptive linear combiner (ALE) can be used in both closed

loop and open loop adaptation
• With closed loop systems, the weight vector depends on the
output as well as the other data
• The other data is the desired response or training signal
• We will be mostly studying closed loop performance feedback
systems
Weight vector

Desired
x0 w0
Input Vector

Output
d
x1 w1 Error

y e

xL wL

Figure: Adaptive linear combiner with desired response and error

computation
5/60
Advanced Digital Signal Processing
Performance Surface

xk xk-1 xk-L
z-1 z-1 z-1

w0k w1k
wLk

Figure: Adaptive transversal lter with output computation

Computation of estimation error:ek = dk − xTk wk

Instantaneous squared error:e2k = d2k + wT xk xT w − 2dk xT w

6/60
Advanced Digital Signal Processing
Performance Surface contd...

Taking expectation operator and assuming ek , dk and xk to be

stationary
E[e2k ] = E[d2k ] + wT E[xk xT T
k ]wk − 2E[dk xk ]wk
Let R be dened as the correlation matrix
x20k
 
x0k x1k · · · x0k xLk
 x1k x0k x21k · · · x1k xLk 
R = E[xk xT ] = . .. ... .. 
 ..
 
k
. . 


xLk x0k xLk x1k · · · 2

xLk

Let p be dened as the column vector

p = E[dk xk ] = E[dk x0k , dk x1k · · · dk xLk ]T
This vector is the set of cross correlations between the desired
response and input components.
R, p represent second order statistics when xk and dk are
stationary
7/60
Advanced Digital Signal Processing
Performance Surface Contd..

Let the mean square error be denoted by

ξ = E[d2 ] + wT Rw − 2pT w
k

MSE ξ is a quadratic function of the weight vector w

8/60
Advanced Digital Signal Processing
Performance Surface Contd...

• Mean square error ξ is a quadratic function of the weight

vector when the input and desired response are stationary
• The vertical axis represents the mean square error and
horizontal axes values of two weights
• The bowl shaped quadratic error function or performance
surface formed in this manner is a paraboloid
• Contours of constant mean square error are elliptical as can be
seen by setting ξ constant
• The point at the bottom of bowl is projected onto
weight-vector plane as w∗ , the optimal weight vector or point
of minimum mean-square error.

9/60
Advanced Digital Signal Processing
Gradient and Minimum Mean Square Error

The gradient of mean square-error performance designated by ∇ξ is

obtained as
" #T
∂ξ ∂ξ ∂ξ ∂ξ
∇= = , ,··· = 2Rw∗ − 2p (2)
∂w ∂w0 ∂w1 ∂wL

To obtain the optimal value, the mean square error is minimized by

equating the gradient to zero as
∇ = 0 = 2Rw∗ − 2p (3)
The optimal Wiener Hopf solution for the weight vector is given by

w∗ = R−1 p (4)

10/60
Advanced Digital Signal Processing
Minimum Mean Square Error

The minimum mean-square error is now obtained as

ξmin = E[d2k ] + w∗T Rw∗ − 2pT w∗

= E[d2k ] + [R−1 p]T RR−1 p − 2pT R−1 p (5)

ξmin = E[d2k ] − pT R−1 p = E[d2k ] − pT w∗ (6)

11/60
Advanced Digital Signal Processing
Gradient Based Adaptation

• The requirement of an adaptive lter is to nd a solution for

the tap weight vector that satises Wiener Hopf equation
• Using analytical means, this system of equations can be solved.
• The problem has computational diculties when either the
number of tap weights or input data rate is high
• An alternative is to use the method of steepest descent, an
optimization method

12/60
Advanced Digital Signal Processing
Method of Steepest Descent-Gradient based adaptation

• It is derived on the basis of gradient based adaptation

• It is recursive as it starts from some initial value and improves
with increasing number of iterations
• The nal value of the tap weight vector converges to the
Wiener solution

u(n)
z-1 z-1 z-1

w w
0(n)
M-1(n)
d(n)
∑ ∑ y(n) e(n)
∑ ∑

Weight Update
Mechanism

13/60
Advanced Digital Signal Processing
Gradient based adaptation

The estimation error is denoted

as:e(n) = d(n) − y(n)=d(n) − xT (n)w(n)
Weight vector w(n)=[w0 (n), w1 (n), · · · wM −1 (n)]T
Input vector x(n)=[x(n), x(n − 1), · · · x(n − M + 1)]T
Cost Function (Mean Squared Error) J(n) is denoted as
J(n) = σd2 − wT (n)p − pT w(n) + wT (n)Rw(n) (7)
where, σd2 is the variance of the desired response
p(n) is the cross correlation vector between the input vector u(n)
and desired response d(n)
R is the correlation matrix of the tap input vector u(n).

14/60
Advanced Digital Signal Processing
Gradient based adaptation contd...

Let ∇J(n) denotes the gradient vector at time n.

According to the method of steepest descent, the updated value of
tap weight vector at time n + 1 is computed as
1
w(n + 1) = w(n) + µ[−∇J(n)] (8)
2
∂J(n) ∂J(n) ∂J(n)
∇J(n) = , ,··· = −2p + 2Rw(n)
∂w0 (n) ∂w1 (n) ∂wM −1 (n)
(9)
The weight update rule according to the steepest descent method
is obtained by substituting the value of gradient vector in (8)
w(n + 1) = w(n) + µ[p − Rw(n)] (10)

15/60
Advanced Digital Signal Processing
Signal Flow Graph Diagram for Steepest Descent Algorithm

-µ R

p µ ∑ Z-1I ∑
w(n+1) w(n)

16/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm

• The steepest descent update involves feedback loop, hence the

possibility of algorithm becoming unstable
• The stability performance of steepest descent depends on
following two parameters:
1 µ (Convergence factor): also known as step size parameter
2 R: Input correlation matrix
• These two parameters control the transfer function of the
feedback loop

17/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm contd....

Dening the weight error vector c(n) as the deviation between the
desired and estimated weight vectors
c(n) = w(n) − w0 (11)
where, w0 is the optimal weight vector according to the
Wiener-Hopf equation (w0 = R−1 p) Using eq.(10), (11) and
(w0 = R−1 p)

w(n + 1) − w0 = w(n) + µ[p − Rw(n)] − w0 (12)

c(n + 1) = c(n) + µ[w0 R − Rw(n)] (13)
c(n + 1) = [I − µR]c(n) (14)
where, I is the identity matrix.

18/60
Advanced Digital Signal Processing
Signal Flow Graph Diagram for Weight Error Vector

-µ R

∑ Z-1I ∑
c(n+1) c(n)

19/60
Advanced Digital Signal Processing
The square input correlation matrix R can be represented as
R = QSQT using unitary similarity transformation
The unitary matrix Q contains orthogonal set of eigenvectors
related to eigenvalue of the matrix R as its column elements. R is
the diagonal matrix containing its eigenvalues as diagonal elements
as [λ1 , λ2 , · · · λM ]
c(n + 1) = [I − µQSQT ]c(n) (15)

20/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm contd....

Premultiplying both sides by QT and using the property of unitary

matrix
QT c(n + 1) = [I − µS]QT c(n) (16)
A new set of vectors as v(n) is dened as
v(n) = QT c(n) = QT [w(n) − w0 ] (17)
Hence, the previous expression can be simplied as
v(n + 1) = [I − µS]v(n) (18)
This can be further simplied considering kth natural mode of
decomposition as
vk (n + 1) = vk (n)[1 − µλk ] (19)

21/60
Advanced Digital Signal Processing
Stability of Steepest Descent Algorithm contd....

Assuming vk (n) as the initial value as vk (0), we can write

vk (n + 1) = vk (0)[1 − µλk ]k (20)
For the convergence of steepest descent algorithm the geometric
ratio 1 − µλk of the above expression representing geometric series
has to be less than 1 as shown
−1 < 1 − µλk < 1 (21)
Therefore, the necessary and sucient condition required for the
convergence of steepest descent algorithm is shown as

2
0<µ< (22)
λmax

where, λmax is the maximum eigenvalue of the input correlation

matrix R. 22/60
Advanced Digital Signal Processing
Least Mean Square Algorithm

23/60
Advanced Digital Signal Processing
Least Mean Square Algorithm (LMS)

• Least Mean Square is the most widely used algorithm given by

Widrow and Ho in 1960
• It belongs to the family of stochastic gradient algorithms
• Steepest Descent method uses a deterministic gradient in a
recursive computation of the Wiener lter for stochastic inputs
• Simplicity is one of the main feature of lms algorithm
• Does not require measurements of the pertinent correlation
functions or matrix inversion

24/60
Advanced Digital Signal Processing
Structure of LMS algorithm

LMS is a linear ltering adaptive algorithm that consists of two

basic processes:
• Filtering: It involves computation of lter output usig set of
input and weight vectors. Estimation error is obtained by
comparing the desired and ltered output.
• Adaptive process:It involves adjustment of the taps of weight
vector according to the estimation error
• These two process form the feedback loop facilitating adaptive
weight control mechanism

25/60
Advanced Digital Signal Processing
Block Diagram of Adaptive Filtering

d(n)
x(n) Transversal y(n) e(n)
Adaptive Filter
∑

e(n)

Adaptive Weight
Control
mechanism

26/60
Advanced Digital Signal Processing
Adaptive Weight Control

x(n) δw0(n)

x(n-1) δw1(n)

x(n-2) e(n)
µ

x(n-M+1) δwM-1(n-1)

27/60
Advanced Digital Signal Processing
LMS Algorithm

For the tap weight vector obtained by steepest descent to converge

to Wiener solution, following points are to be considered as
• Exact measurement of the gradient vector
• Proper value of step size µ is to be chosen
Exact measurement of gradient vector not possible always as it
would require information about the input correlation matrix R and
cross correlation vector p.
Hence an estimate of the gradient vector is utilized for this purpose.

28/60
Advanced Digital Signal Processing
Derivation of LMS algorithm

• The estimate of the gradient vector is obtained by substituting

the estimates of correlation matrix R and cross correlation
vector p in eq.(10)
• We start by using instantaneous estimates of R and p based
on sample values of tap input vector and desired response as
R̂ = x(n)xT (n) (23)
p̂ = x(n)d(n) (24)
Substituting the above dened estimates in eq.(10) as
ŵ(n + 1) = ŵ(n) + µ[p̂ − R̂ŵ(n)] (25)
ŵ(n + 1) = ŵ(n) + µx(n)[d(n) − xT (n)ŵ(n)] (26)

29/60
Advanced Digital Signal Processing
LMS contd....

The structure, working and derivation of LMS algorithm consists of

following steps:
1 Filter Output: y(n) = xT (n)ŵ(n)
2 Estimation Error: e(n) = d(n) − y(n)
3 Weight Update: ŵ(n + 1) = ŵ(n) + µx(n)e(n)

30/60
Advanced Digital Signal Processing
Signal Flow Diagram of LMS

d(n)
e(n)
x(n) µ ∑ xT(n)

∑
z-1 ∑
ŵ(n+1) ŵ(n)

31/60
Advanced Digital Signal Processing
Stability Analysis of LMS Algorithm

The stability and performance analysis of LMS algorithm is carried

out using mean squared value of the estimation error.
The weight error vector can be written as
c(n) = ŵ(n) − w0 (27)
Subtracting the optimal weight vector w0 from both sides of
eq.(26) as

ŵ(n + 1) − w0 = ŵ(n) + µx(n)[d(n) − xT ŵ(n)] − w0 (28)

c(n + 1) = c(n) + µx(n)d(n) − µx(n)xT (n){c(n) + w0 }

= [I − µx(n)xT (n)]c(n) + µx(n)e0 (n) (29)

where, e0 (n) is the estimation error generated by the optimum

Wiener solution as e0 (n) = d(n) − xT (n)w0 (n)
32/60
Advanced Digital Signal Processing
Direct Averaging Method

Eq.(29) is of the form of stochastic dierence equation in the

weight error vector c(n)
For very small value of the convergence factor µ, the system matrix
[I − µx(n)xT (n)] approaches the identity matrix I.
By using the method of direct averaging, the solution of eq.(29) for
small values of µ is similar to the solution of another stochastic
dierence equation whose system matrix is equal to the ensemble
average as
E[I − µx(n)xT (n)] = I − µR (30)
Hence the weight vector can be written as
c(n + 1) = [I − µR]c(n) + µx(n)e0 (n) (31)
The direct averaging method applies well for small values of µ
assuming randomness of c(n) will tend to average out.
33/60
Advanced Digital Signal Processing
Independence Theory

The statistical analysis of LMS algorithm is carried out by utilizing

the independence assumptions such as
• The input vectors x(n), x(n − 1), · · · x(1), form a sequence
of statistically independent vectors
• At time n, the input vector x(n) is statistically independent to
the previous values of desired responses such as d(n − 1),
d(n − 2), · · · d(1).
• At time n, the desired response d(n) is dependent on the
current input vector x(n), but statistically independent of all
the previous delayed desired responses.
• The current input vector x(n) and desired response d(n)
consist of mutually Gaussian distributed random variables

34/60
Advanced Digital Signal Processing
Statistical Analysis of LMS

The statistical analysis of LMS is based on so called independence

theory. The updated weight vector ŵ(n + 1) obtained by LMS is
dependent on the following three as:
• The previous sample input vectors x(n), x(n − 1), · · · x(1).
• The previous samples of the desired response d(n), · · · , d(1).
• The initial value of the weight vector ŵ(1).
The tap weight vector ŵ(n + 1) and c(n + 1) is independent of
x(n + 1) and d(n + 1).
In many applications, the sequence of input vectors x(n + 1) and
x(n) are statistically however dependent

35/60
Advanced Digital Signal Processing
Stability Analysis contd..

x(n) = [x(n), x(n − 1), · · · x(n − M + 1)]T

x(n + 1) = [x(n + 1), x(n), · · · x(n − M )]T However, we ignore the
statistical dependence among successive tap input vectors at
certain times
E[x(n)xT (n)c(n)cT (n)]=E[x(n)xT (n)]E[c(n)cT (n)]

36/60
Advanced Digital Signal Processing
Convergence Criteria

The necessary condition for the convergence of mean; that is given

as E[c(n)] → 0 as n → ∞
or equivalently E[ŵ(n)] → w0 as n → ∞
A stronger criterion is convergence in the mean given as
E[||c(n)||] → 0 as n → ∞.
where, E[||c(n)||] is the Euclidean norm of the weight error vector
c(n).
Convergence in the mean square
The LMS algorithm is convergent in the mean square if
D(n) = E[||c(n)||2 ] → constant as n → ∞ (32)
where, D(n) is called the squared error deviation.

37/60
Advanced Digital Signal Processing
Weight Error Correlation Matrix

Another way of describing the convergence of LMS in the mean

square is to require that
J(n) = E[|e(n)|2 ] → constant as n → ∞
where, e(n) is the estimation error and J(n) is the mean-squared
error.
The correlation matrix of the weight error vector c(n) is
K(n) = E[c(n)cT (n)]. Using the independence assumption, we
get K(n + 1) = E[c(n + 1)cT (n + 1)]
K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (33)

38/60
Advanced Digital Signal Processing
Weight Error Correlation Matrix Contd...

K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (34)

Some observations about the previous equation as
• The rst term (I − µR)K(n)(I − µR) is the result of
evaluating the expectation of the outer product of
(I − µR)c(n) with itself
• The expectation of the cross product term
µe0 (n)(I − µR)c(n)x(n) is zero due to the implied
independence of c(n) and x(n).'
• The last term µ2 Jmin R is obtained by applying the Gaussian
factorization theorem to the product µ2 e0 (n)x(n)xT (n)e0 (n)
the last term µ2 Jmin R prevents K(n) = 0 from being a solution to
this equation
39/60
Advanced Digital Signal Processing
Excess Mean Squared Error

e(n) = d(n) − ŵT (n)x(n) = d(n) − w0T (n)x(n) − cT (n)x(n)

= e0 (n) − cT (n)x(n) (35)

The mean squared error due to LMS algorithm can be calculated as

J(n) = E[|e(n)2 |] = E[(e0 (n) − cT (n)x(n))(e0 (n) − cT (n)x(n))T ]
(36)
J(n) = Jmin + E[cT (n)x(n)xT (n)c(n)] (37)
The last term of the above expression can be simplied as it
involves triple vector product and the result comes out to be a
scalar
E[cT (n)x(n)xT (n)c(n)] = E[tr{cT (n)x(n)xT (n)c(n)}] (38)
40/60
Advanced Digital Signal Processing
Excess Mean Squared Error Contd..

E[cT (n)x(n)xT (n)c(n)] = E[tr{cT (n)x(n)xT (n)c(n)}]

= E[tr{x(n)xT (n)c(n)cT (n)}] = tr{E[x(n)xT (n)c(n)cT (n)]}
(39)

The above is simplied using tr[AB] = tr[BA], here A = cT (n)

and B = x(n)xT (n)c(n), therefore
tr[cT (n)x(n)xT (n)c(n)] = tr[x(n)xT (n)c(n)cT (n)]

J(n) = Jmin + tr[RK(n)] (40)

41/60
Advanced Digital Signal Processing
Excess Mean Squared Error Contd..

J(n) = Jmin + tr[RK(n)] (41)

The mean square value of the estimation error consists of two
components as
• The minimum mean squared error Jmin
• A component depending on the transient behaviour of the
weight error correlation matrix K(n)
The excess mean-squared error can be written as
Jex = J(n) − Jmin = tr[RK(n)] (42)
Using the unitary transformation rule, for input correlation matrix
R = QSQT
QT RQ = S
42/60
Advanced Digital Signal Processing
Excess Mean Squared Error Contd..

let us dene QT K(n)Q = X(n). Therefore the term tr[RK(n)]

can be simplied as
tr[RK(n)] = tr[QSQT QX(n)QT ] = tr[QSX(n)QT ] = tr[QT QSX(n)]
(43)
tr[RK(n)] = tr[SX(n)] (44)
M
(45)
X
Jex = tr[RK(n)] = tr[SX(n)] = λi xi (n)
i=0

43/60
Advanced Digital Signal Processing
K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (46)
R = QSQT
QT RQ = S
X(n + 1) = (I − µS)X(n)(I − µS) + µ2 Jmin S (47)
xi (n) = (1 − µλi )2 xi (n) + µ2 Jmin λi ; i = 1, 2, · · · M (48)
Dene the M × 1 vector x(n) and λ as follows:
x(n) = [x1 (n), x2 (n), · · · xM (n)]T
λ = [λ1 , λ2 , · · · λM ]T ]
Based on the denition of the above two vectors, we can re-write
eq.(65) as
x(n + 1) = Bx(n) + µ2 Jmin λ (49)
where, B is a M × M matrix given as
(
(1 − µλi )2 , i = j
bij = (50)
µ2 λi λj , i 6= j
44/60
Advanced Digital Signal Processing
The matrix B can be represented in terms of its eigenvalues and
eigenvectors as B = GCGT
, where, C is the diagonal matrix consisting of eigenvalues as
C = diag[c1 , c2 , · · · cM ] and G = [g1 , · · · gM ] Using this the
solution of eq.(49) can be represented as
M
cni gi giT [x(0) − x(∞)] + x(∞) (51)
X
x(n) =
i=1

where, x(0) and x(∞) are the initial and nal values of the x(n)
The excess mean squared error can be represented as
M
λi xi (n) = λT x(n) (52)
X
Jex (n) =
i=0

M
cni λT gi giT [x(0) − x(∞)] + λT x(∞) (53)
X
Jex (n) =
i=1
45/60
Advanced Digital Signal Processing
M
cni λT gi giT [x(0) − x(∞)] + Jex (∞) (54)
X
Jex (n) =
i=1

The term M n T T
i=1 ci λ gi gi [x(0) − x(∞)] denotes the transient
P
behavior of the mean square error whereas the second term denotes
the nal value of the excess mean squared error

46/60
Advanced Digital Signal Processing
Transient Behavior of the Mean Squared Error

J(n) = Jmin + Jex (n) (55)

M
cni λT gi giT [x(0) − x(∞)] + Jex (∞) (56)
X
J(n) = Jmin +
i=1
M
(57)
X
J(n) = Jmin + γi cni + Jex (∞)
i=1

where γi is dened as γi = λT gi giT [x(0) − x(∞)]

47/60
Advanced Digital Signal Processing
Transient Behavior contd...

Property 1: The transient behavior of the mean squared error does

not exhibit oscillations. The transient component of J(n)
corresponds to M i=1 i i .
n
P
γ c
Property 2: The transient component of the mean-squared error
J(n) dies out; that is the LMS algorithm is convergent in the mean
square if and only if the step size parameter satises the condition
2
0<µ< (58)
λmax

48/60
Advanced Digital Signal Processing
For property 2 to hold, all the values of eigenvalues of the matrix
has to be less than 1. Then by denition of the eigenvalues of the
matrix B
B = GCG
Bg = cg
M
(59)
X
bij gj = cgi ; i = 1, 2, · · · M
j=1

Using B is a M × M matrix given as

(
(1 − µλi )2 , i = j
bij = (60)
µ2 λi λj , i 6= j

49/60
Advanced Digital Signal Processing
M
(61)
X
(1 − µλi )2 gi + µ2 λi λj gj = cgi ; i = 1, 2, · · · M
j=1,j6=i

Solving for gi , we may this write,

M
µ2 λ i
(62)
X
gi = λj gj
c − (1 − µλi )2
j=1,j6=i

B is a positive square matrix, there will be one highest eigenvalue,

hence setting c = 1 in the above expression as
M
µ
(63)
X
gi = λj gj
2 − µλi
j=1,j6=i

From this it can be concluded that for gi to be positive for all i, the
step size parameter µ has to be upper bounded as 0 < µ < λmax 2
.
50/60
Advanced Digital Signal Processing
Property 3: The nal value of the excess mean squared error is
less than the minimum mean squared error if the step size
parameter µ satises the condition
M
2λi
(64)
X
≤1
2 − µλi
i=1

As the number of iterations approaches ∞, Jex (∞) = J(∞) − Jmin

Putting n = ∞ in the given equation and solving for xi (∞) as
xi (n) = (1 − µλi )2 xi (n) + µ2 Jmin λi ; i = 1, 2, · · · M (65)
µJmin
xi (∞) = (66)
2 − µλi
M
λi xi (n) = λT x(n) (67)
X
Jex (n) =
i=0
M M
µλi
(68)
X X
Jex (∞) = λi xi (∞) = Jmin
2 − µλi
i=0 i=0
51/60
Advanced Digital Signal Processing
M M
µλi
(69)
X X
Jex (∞) = λi xi (∞) = Jmin
2 − µλi
i=0 i=0

For Jex (∞) to be less than the Jmin , the step-size parameter µ has
to satisfy the condition given in (64)

52/60
Advanced Digital Signal Processing
Property 4: The misadjustment dened as the ratio of the steady
state value Jex (∞) of the excess mean squared error to the
minimum mean squared error Jmin , equals
M
Jex (∞) X µλi
Φ= = (70)
Jmin 2 − µλi
i=0

which is less than unity if the step size parameter µ satises the
condition given in (64).

53/60
Advanced Digital Signal Processing
2
0<µ< (71)
λmax
The condition for the LMS algorithm to be convergent in mean
square, requires the knowledge of the largest value of eigenvalue as
λmax of correlation matrix R. However, that may not be available
always, the tr[R] is taken as the estimate of λmax
2
0<µ< (72)
tr[R]

R is seen to be a Toeplitz matrix with the diagonal elements as

r(0) i.e. the mean square value of the input at each tap of the M
taps in the transversal lter as

54/60
Advanced Digital Signal Processing
M −1
(73)
X
tr[R] = M r(0) = E[|x(n − k)|2 ]
k=0

Tap input power can be dened as the sum of all the input
elements of [x(n), x(n − 1), · · · x(n − M + 1)]. Therefore the
condition on µ can be further specied as
2
0<µ< (74)
tap − input − power

55/60
Advanced Digital Signal Processing
If the step size parameter µ is small compared to the largest λmax ,
the misadjustment Φ can be written as
M
Jex (∞) X µλi
Φ= = (75)
Jmin 2 − µλi
i=0

M
µX µ
Φ= λi = (tap − input − power) (76)
2 2
i=0

56/60
Advanced Digital Signal Processing
Dening an average eigenvalue as
M
1 X
λav = λi (77)
M
i=1

If the ensemble average learning curve of LMS is approximated by a

single exponential with time constant τav as
1
τav = (78)
2µλav

Using this approximation, misadjustment Φ can be written as

M
µX µM λav M
Φ= λi = = (79)
2 2 4τav
i=0

57/60
Advanced Digital Signal Processing
Comparison of LMS Algorithm with Steepest Descent

• The steepest descent algorithm converges to the Wiener Hopf

solution as the number of iterations approaches innity
• The LMS algorithm uses a noisy estimate of the gradient
vector, with the result that the tap weight vector estimate only
approached the optimum solution w0
• After large number of iterations, LMS algorithm results in
mean squared error J(∞) greater than the minimum mean
squared error Jmin .

58/60
Advanced Digital Signal Processing
Comparison of LMS Algorithm with Steepest Descent
Contd...

• The steepest descent algorithm has a well dened learning

curve obtained by plotting the mean squared error versus the
number of iterations.
• The learning curve consists of sum of decaying exponentials,
the number of which equals the number of taps of adaptive
lter.
• For LMS algorithm, the learning curve consists of noisy
decaying exponentials
• The amplitude of noise usually becomes smaller as the
step-size parameter µ is reduced.

59/60
Advanced Digital Signal Processing
Comparison of LMS Algorithm with Steepest Descent
Contd...

• In the steepest descent algorithm, the correlation matrix R

and the cross correlation vector p are obtained through the
use of ensemble average operations applied to statistical
populations of tap input and desired response; these values are
used to obtain the learning curve of the algorithm
• The learning curve consists of sum of decaying exponentials,
the number of which equals the number of taps of adaptive
lter.
• For LMS algorithm, the noisy learning curves are obtained for
an ensemble of adaptive LMS lters with identical parameters;
the learning curve is then smoothed by averaging over the
ensemble of noisy learning curves.

60/60
Advanced Digital Signal Processing

Cluster Analysis and Applications
No ratings yet
Cluster Analysis and Applications
277 pages
ASP Lecture 7 Adaptive Filters 2019
No ratings yet
ASP Lecture 7 Adaptive Filters 2019
48 pages
Prediction
No ratings yet
Prediction
20 pages
L6 Adaptive Filters
No ratings yet
L6 Adaptive Filters
35 pages
L9 8up
No ratings yet
L9 8up
7 pages
Adaptive Filters and Applications: Supervised by Prof. Dr. Ehab A. Hussein
No ratings yet
Adaptive Filters and Applications: Supervised by Prof. Dr. Ehab A. Hussein
41 pages
LMS Algorithm It6303 - 4 PDF
No ratings yet
LMS Algorithm It6303 - 4 PDF
103 pages
Recursive Least-Squares Algorithm (RLS) : September 30, 2020
No ratings yet
Recursive Least-Squares Algorithm (RLS) : September 30, 2020
17 pages
Welcome To Adaptive Signal Processing! Lectures and Exercises
No ratings yet
Welcome To Adaptive Signal Processing! Lectures and Exercises
9 pages
Adaptive Filtering
No ratings yet
Adaptive Filtering
49 pages
05 LMS
No ratings yet
05 LMS
30 pages
1-Adaptive Signal Processing
No ratings yet
1-Adaptive Signal Processing
37 pages
Adaptive Filter Notes
0% (1)
Adaptive Filter Notes
26 pages
Steepest Descent Multimodulus Algorithm For Blind Signal Retrieval in QAM Systems
No ratings yet
Steepest Descent Multimodulus Algorithm For Blind Signal Retrieval in QAM Systems
3 pages
Neural Networks: Least Mean Squares Algorithm (LMS)
No ratings yet
Neural Networks: Least Mean Squares Algorithm (LMS)
28 pages
Ch02 WienerFilters Lect 04
No ratings yet
Ch02 WienerFilters Lect 04
51 pages
Advanced Digital Signal Processing: Linear Prediction and Optimum Linear Filters
No ratings yet
Advanced Digital Signal Processing: Linear Prediction and Optimum Linear Filters
52 pages
EC 614: Adaptive Signal Processing Techniques: Course Instructor: Dr. Debashis Ghosh
No ratings yet
EC 614: Adaptive Signal Processing Techniques: Course Instructor: Dr. Debashis Ghosh
56 pages
Real Time DSP: Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria
No ratings yet
Real Time DSP: Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria
29 pages
Chapter 3 Wiener Filters
No ratings yet
Chapter 3 Wiener Filters
26 pages
Ch2 Wiener Filters
No ratings yet
Ch2 Wiener Filters
80 pages
An Introduction To Adaptive Filtering & It's Applications: Asst - Prof.Dr - Thamer M.Jamel
No ratings yet
An Introduction To Adaptive Filtering & It's Applications: Asst - Prof.Dr - Thamer M.Jamel
81 pages
Adaptive Signal
No ratings yet
Adaptive Signal
9 pages
Adaptive Noise Cancellation
100% (2)
Adaptive Noise Cancellation
33 pages
Lecture 5
No ratings yet
Lecture 5
38 pages
Wiener Filters-Chapter56-2020 PDF
No ratings yet
Wiener Filters-Chapter56-2020 PDF
48 pages
Lecture 06 - Oprimum Filters
No ratings yet
Lecture 06 - Oprimum Filters
44 pages
Chapter6-Wiener Filters and The LMS Algorithm-Pp32
No ratings yet
Chapter6-Wiener Filters and The LMS Algorithm-Pp32
32 pages
Lecture 13: Survey of Adaptive Filtering Methods: - Basic Problems
No ratings yet
Lecture 13: Survey of Adaptive Filtering Methods: - Basic Problems
21 pages
Exp5 Ee433
No ratings yet
Exp5 Ee433
9 pages
Adaptive Filter With LMS
No ratings yet
Adaptive Filter With LMS
39 pages
Chapter - 4 THE LMS ALGORITHM
No ratings yet
Chapter - 4 THE LMS ALGORITHM
21 pages
Adaptive Systems, Problem Classes
No ratings yet
Adaptive Systems, Problem Classes
25 pages
Lecture 1
No ratings yet
Lecture 1
40 pages
Adaptive Signal Processing: Course Objectives
No ratings yet
Adaptive Signal Processing: Course Objectives
2 pages
Chapter 4 Least-Mean-Square Algorithm (LMS Algorithm)
No ratings yet
Chapter 4 Least-Mean-Square Algorithm (LMS Algorithm)
10 pages
Weiner Filter
No ratings yet
Weiner Filter
32 pages
Introduction to Adaptive Signal Processing Moonen Proudler
No ratings yet
Introduction to Adaptive Signal Processing Moonen Proudler
197 pages
Fpga Implementation of Adaptive Weight PDF
No ratings yet
Fpga Implementation of Adaptive Weight PDF
7 pages
Full Marks:-70
No ratings yet
Full Marks:-70
7 pages
Adaptive Filtering - An Introduction: Jos e C. M. Bermudez
No ratings yet
Adaptive Filtering - An Introduction: Jos e C. M. Bermudez
21 pages
Chapter 4 Linear Estimation of Signals
No ratings yet
Chapter 4 Linear Estimation of Signals
54 pages
Session 13c - Adaptive Filter
No ratings yet
Session 13c - Adaptive Filter
16 pages
EE434 Lecture 07
No ratings yet
EE434 Lecture 07
43 pages
Chapter - 2 Wiener Filters
No ratings yet
Chapter - 2 Wiener Filters
13 pages
Wiener Filter
No ratings yet
Wiener Filter
6 pages
BFEMU-Volume 18-Issue 1 - Page 81-93
No ratings yet
BFEMU-Volume 18-Issue 1 - Page 81-93
13 pages
Software Optimisation Techniques For Real-Time Applied Adaptive Filtering Using The TMS320C6201 Digital Signal Processor
No ratings yet
Software Optimisation Techniques For Real-Time Applied Adaptive Filtering Using The TMS320C6201 Digital Signal Processor
6 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Random Processes, Optimal Filtering and Model-Based Signal Processing
No ratings yet
Random Processes, Optimal Filtering and Model-Based Signal Processing
121 pages
P6 Adaptive Filtering LMS
No ratings yet
P6 Adaptive Filtering LMS
25 pages
Aasp 15 Book
No ratings yet
Aasp 15 Book
87 pages
Dr. Tarun Rawat Optimum Filter
No ratings yet
Dr. Tarun Rawat Optimum Filter
71 pages
Slides Filtering Time Frequency Analysis Tutorial
No ratings yet
Slides Filtering Time Frequency Analysis Tutorial
50 pages
Presentation 1
No ratings yet
Presentation 1
20 pages
Chapter - 4 Wiener Filters
No ratings yet
Chapter - 4 Wiener Filters
13 pages
Dsp2 13-01-24
No ratings yet
Dsp2 13-01-24
91 pages
Introduction To Equalization: Guy Wolf Roy Ron Guy Shwartz
No ratings yet
Introduction To Equalization: Guy Wolf Roy Ron Guy Shwartz
50 pages
Electronics II Essentials
From Everand
Electronics II Essentials
The Editors of REA
No ratings yet
Analog Dialogue, Volume 47, Number 2
From Everand
Analog Dialogue, Volume 47, Number 2
Analog Dialogue
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
1-Module 4 Numerical Methods-Regular Falsi and Newton Raphson Method
No ratings yet
1-Module 4 Numerical Methods-Regular Falsi and Newton Raphson Method
11 pages
Techniques of Integration - Solutions
No ratings yet
Techniques of Integration - Solutions
31 pages
X XXX X N: N N N N
No ratings yet
X XXX X N: N N N N
19 pages
Semester:: Yeshiva University Yeshiva College
No ratings yet
Semester:: Yeshiva University Yeshiva College
6 pages
Math 22-1 Syllabus (Ee, Ece, Cpe)
No ratings yet
Math 22-1 Syllabus (Ee, Ece, Cpe)
7 pages
Computing Approximate GCD
No ratings yet
Computing Approximate GCD
13 pages
Theoretical Computer Science Cheat Sheet
No ratings yet
Theoretical Computer Science Cheat Sheet
10 pages
Pre Book 2
No ratings yet
Pre Book 2
8 pages
Logarithm Problems
100% (1)
Logarithm Problems
8 pages
Surface Integral
No ratings yet
Surface Integral
2 pages
Chapter 7 - ILP Transportation Model (Part 2)
No ratings yet
Chapter 7 - ILP Transportation Model (Part 2)
26 pages
Foundation of Computational Fluid Dynamics Dr. S. Vengadesan Department of Applied Mechanics Indian Institute of Technology, Madras Lecture - 13
No ratings yet
Foundation of Computational Fluid Dynamics Dr. S. Vengadesan Department of Applied Mechanics Indian Institute of Technology, Madras Lecture - 13
15 pages
Dirac Function
No ratings yet
Dirac Function
23 pages
Hydrological Frequency Analysis
No ratings yet
Hydrological Frequency Analysis
14 pages
Limits With Constants
No ratings yet
Limits With Constants
2 pages
Maths Part 2 EM
No ratings yet
Maths Part 2 EM
102 pages
Tutorial Sheet-2 (MA2201)
No ratings yet
Tutorial Sheet-2 (MA2201)
7 pages
Department of Mathematics
No ratings yet
Department of Mathematics
3 pages
MT390 (DIP) : Tutorial 3 Intensity Transformations and Spatial Filtering
No ratings yet
MT390 (DIP) : Tutorial 3 Intensity Transformations and Spatial Filtering
54 pages
Introduction To Metaheuristics
100% (1)
Introduction To Metaheuristics
60 pages
Cambridge Methods 1/2 - Chapter 17 Differentiation and Antidifferentiation
No ratings yet
Cambridge Methods 1/2 - Chapter 17 Differentiation and Antidifferentiation
32 pages
Cyclic Subspace S
No ratings yet
Cyclic Subspace S
2 pages
Mathematica by Example 5th Edition Martha L. Abell & James P. Braselton - Ebook PDF Download
100% (1)
Mathematica by Example 5th Edition Martha L. Abell & James P. Braselton - Ebook PDF Download
82 pages
Part C-Instructional Materials
No ratings yet
Part C-Instructional Materials
11 pages
Ariel Rubinstein: Lecture Notes in Microeconomic Theory
No ratings yet
Ariel Rubinstein: Lecture Notes in Microeconomic Theory
17 pages
6 Lecture Notes 14 Gauss Jordan Method
No ratings yet
6 Lecture Notes 14 Gauss Jordan Method
19 pages
Algebra (100 Questions)
No ratings yet
Algebra (100 Questions)
22 pages
Application of Calculus of Variation in The Optimi
No ratings yet
Application of Calculus of Variation in The Optimi
6 pages

Wiener Filter - LMS

Uploaded by

Wiener Filter - LMS

Uploaded by

Wiener Filters and Stochastic Gradient

Adaptive Filter Theory

September 20, 2020

• The design of a Wiener lter requires a priori information

• It is the most basic element of learning systems or adaptive

Input vector: xk = [xk , xk−1 · · · xk−L ]T

Matrix form:yk = xTk wk = wkT xk

• Adaptive linear combiner (ALE) can be used in both closed

Figure: Adaptive linear combiner with desired response and error

Figure: Adaptive transversal lter with output computation

Computation of estimation error:ek = dk − xTk wk

Taking expectation operator and assuming ek , dk and xk to be

xLk x0k xLk x1k · · · 2

Let p be dened as the column vector

Let the mean square error be denoted by

MSE ξ is a quadratic function of the weight vector w

• Mean square error ξ is a quadratic function of the weight

The gradient of mean square-error performance designated by ∇ξ is

To obtain the optimal value, the mean square error is minimized by

The minimum mean-square error is now obtained as

ξmin = E[d2k ] + w∗T Rw∗ − 2pT w∗

ξmin = E[d2k ] − pT R−1 p = E[d2k ] − pT w∗ (6)

• The requirement of an adaptive lter is to nd a solution for

• It is derived on the basis of gradient based adaptation

The estimation error is denoted

Let ∇J(n) denotes the gradient vector at time n.

• The steepest descent update involves feedback loop, hence the

w(n + 1) − w0 = w(n) + µ[p − Rw(n)] − w0 (12)

Premultiplying both sides by QT and using the property of unitary

Assuming vk (n) as the initial value as vk (0), we can write

where, λmax is the maximum eigenvalue of the input correlation

• Least Mean Square is the most widely used algorithm given by

LMS is a linear ltering adaptive algorithm that consists of two

For the tap weight vector obtained by steepest descent to converge

• The estimate of the gradient vector is obtained by substituting

The structure, working and derivation of LMS algorithm consists of

The stability and performance analysis of LMS algorithm is carried

ŵ(n + 1) − w0 = ŵ(n) + µx(n)[d(n) − xT ŵ(n)] − w0 (28)

c(n + 1) = c(n) + µx(n)d(n) − µx(n)xT (n){c(n) + w0 }

where, e0 (n) is the estimation error generated by the optimum

Eq.(29) is of the form of stochastic dierence equation in the

The statistical analysis of LMS algorithm is carried out by utilizing

The statistical analysis of LMS is based on so called independence

x(n) = [x(n), x(n − 1), · · · x(n − M + 1)]T

The necessary condition for the convergence of mean; that is given

Another way of describing the convergence of LMS in the mean

K(n + 1) = (I − µR)K(n)(I − µR) + µ2 Jmin R (34)

e(n) = d(n) − ŵT (n)x(n) = d(n) − w0T (n)x(n) − cT (n)x(n)

The mean squared error due to LMS algorithm can be calculated as

E[cT (n)x(n)xT (n)c(n)] = E[tr{cT (n)x(n)xT (n)c(n)}]

The above is simplied using tr[AB] = tr[BA], here A = cT (n)

J(n) = Jmin + tr[RK(n)] (40)

J(n) = Jmin + tr[RK(n)] (41)

let us dene QT K(n)Q = X(n). Therefore the term tr[RK(n)]

J(n) = Jmin + Jex (n) (55)

where γi is dened as γi = λT gi giT [x(0) − x(∞)]

Property 1: The transient behavior of the mean squared error does

Using B is a M × M matrix given as

Solving for gi , we may this write,

B is a positive square matrix, there will be one highest eigenvalue,

As the number of iterations approaches ∞, Jex (∞) = J(∞) − Jmin

R is seen to be a Toeplitz matrix with the diagonal elements as

If the ensemble average learning curve of LMS is approximated by a

Using this approximation, misadjustment Φ can be written as

• The steepest descent algorithm converges to the Wiener Hopf

• The steepest descent algorithm has a well dened learning

• In the steepest descent algorithm, the correlation matrix R

You might also like

• The design of a Wiener lter requires a priori information

Figure: Adaptive transversal lter with output computation

Let p be dened as the column vector

• The requirement of an adaptive lter is to nd a solution for

LMS is a linear ltering adaptive algorithm that consists of two

Eq.(29) is of the form of stochastic dierence equation in the

The above is simplied using tr[AB] = tr[BA], here A = cT (n)

let us dene QT K(n)Q = X(n). Therefore the term tr[RK(n)]

where γi is dened as γi = λT gi giT [x(0) − x(∞)]

• The steepest descent algorithm has a well dened learning