Statistical Signal Processing
Statistical Signal Processing
Statistical Signal Processing
REVIEW OF
RANDOM VARIABLES & RANDOM PROCESS
1
REVIEW OF RANDOM VARIABLES..............................................................................3
1.1 Introduction..............................................................................................................3
1.2 Discrete and Continuous Random Variables ........................................................4
1.3 Probability Distribution Function ..........................................................................4
1.4 Probability Density Function .................................................................................5
1.5 Joint random variable .............................................................................................6
1.6 Marginal density functions......................................................................................6
1.7 Conditional density function...................................................................................7
1.8 Baye’s Rule for mixed random variables...............................................................8
1.9 Independent Random Variable ..............................................................................9
1.10 Moments of Random Variables ..........................................................................10
1.11 Uncorrelated random variables..........................................................................11
1.12 Linear prediction of Y from X ..........................................................................11
1.13 Vector space Interpretation of Random Variables...........................................12
1.14 Linear Independence ...........................................................................................12
1.15 Statistical Independence......................................................................................12
1.16 Inner Product .......................................................................................................12
1.17 Schwary Inequality ..............................................................................................13
1.18 Orthogonal Random Variables...........................................................................13
1.19 Orthogonality Principle.......................................................................................14
1.20 Chebysev Inequality.............................................................................................15
1.21 Markov Inequality ...............................................................................................15
1.22 Convergence of a sequence of random variables ..............................................16
1.23 Almost sure (a.s.) convergence or convergence with probability 1 .................16
1.24 Convergence in mean square sense ....................................................................17
1.25 Convergence in probability.................................................................................17
1.26 Convergence in distribution................................................................................18
1.27 Central Limit Theorem .......................................................................................18
1.28 Jointly Gaussian Random variables...................................................................19
2
REVIEW OF RANDOM VARIABLES
1.1 Introduction
• Mathematically a random variable is neither random nor a variable
• It is a mapping from sample space into the real-line ( “real-valued” random
variable) or the complex plane ( “complex-valued ” random variable) .
Suppose we have a probability space {S , ℑ, P} .
Let X : S → ℜ be a function mapping the sample space S into the real line such that
For each s ∈ S , there exists a unique X ( s ) ∈ ℜ. Then X is called a random variable.
Thus a random variable associates the points in the sample space with real numbers.
ℜ Notations:
X (s) • Random variables are represented by
s • upper-case letters.
• Values of a random variable are
denoted by lower case letters
S • Y = y means that y is the value of a
random variable X .
Example 1: Consider the example of tossing a fair coin twice. The sample space is S= {
HH,HT,TH,TT} and all four outcomes are equally likely. Then we can define a random
variable X as follows
Sample Point Value of the random P{ X = x}
Variable X = x
HH 0 1
4
HT 1 1
4
TH 2 1
4
TT 3 1
4
3
Example 2: Consider the sample space associated with the single toss of a fair die. The
sample space is given by S = {1, 2,3, 4,5,6} . If we define the random variable X that
associates a real number equal to the number in the face of the die, then X = {1, 2,3, 4,5,6}
• FX (−∞) = 0
• FX (∞) = 1
• P{x1 < X ≤ x} = FX ( x) − FX ( x1 )
4
Example 3: Consider the random variable defined in Example 1. The distribution
function FX ( x) is as given below:
x<0 0
0 ≤ x <1 1
4
1≤ x < 2 1
2
2≤ x<3 3
4
x≥3 1
FX ( x)
X =x
∞
• ∫f
−∞
X ( x)dx = 1
x2
• P ( x1 < X ≤ x 2 ) = ∫f
− x1
X ( x)dx
Remark: Using the Dirac delta function we can define the density function for a discrete
random variables.
5
1.5 Joint random variable
X and Y are two random variables defined on the same sample space S.
P{ X ≤ x, Y ≤ y} is called the joint distribution function and denoted by FX ,Y ( x, y ).
• FX ( x) = FXY ( x,+∞).
To prove this
( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ )
∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x , Y ≤ ∞ )= F XY ( x , +∞ )
FX ( x) = P( X ≤ x ) = P( X ≤ x, Y ≤ ∞ ) = FXY ( x,+∞)
by
∂2
f X ,Y ( x, y ) = FX ,Y ( x, y ) , provided it exists
∂x∂y
• f X ,Y ( x, y ) is always a positive quantity.
x y
• FX ,Y ( x, y ) = ∫∫
− ∞− ∞
f X ,Y ( x, y )dxdy
6
1.7 Conditional density function
f Y / X ( y / X = x) = f Y / X ( y / x) is called conditional density of Y given X .
Let us define the conditional distribution function.
We cannot define the conditional distribution function for the continuous random
variables X and Y by the relation
F ( y / x) = P(Y ≤ y / X = x)
Y/X
P(Y ≤ y, X = x)
=
P ( X = x)
as both the numerator and the denominator are zero for the above expression.
The conditional distribution function is defined in the limiting sense as follows:
F ( y / x) = lim∆x →0 P (Y ≤ y / x < X ≤ x + ∆x)
Y/X
P (Y ≤ y , x < X ≤ x + ∆x)
=lim∆x →0
P ( x < X ≤ x + ∆x)
y
∫ f X ,Y ( x, u )∆xdu
∞
=lim∆x →0
f X ( x ) ∆x
y
∫ f X ,Y ( x, u )du
=∞
f X ( x)
∴ fY / X ( x / y ) = f X ,Y ( x, y ) / f X ( x ) (2)
Similarly, we have
∴ f X / Y ( x / y ) = f X ,Y ( x, y ) / fY ( y ) (3)
7
From (2) and (3) we get Baye’s rule
f X ,Y (x, y)
∴ f X /Y (x / y) =
fY ( y)
f (x) fY / X ( y / x)
= X
fY ( y)
f (x, y)
= ∞ X ,Y
(4)
∫ f X ,Y (x, y)dx
−∞
f ( y / x) f X (x)
= ∞ Y/X
∫ f X (u) fY / X ( y / x)du
−∞
Given the joint density function we can find out the conditional density function.
Example 4:
For random variables X and Y, the joint probability density function is given by
1 + xy
f X ,Y ( x, y ) = x ≤ 1, y ≤ 1
4
= 0 otherwise
Find the marginal density f X ( x), fY ( y ) and fY / X ( y / x). Are X and Y independent?
1 + xy
1
1
f X ( x) = ∫
−1
4
dy =
2
Similarly
1
fY ( y ) = -1 ≤ y ≤ 1
2
and
f X ,Y ( x, y ) 1 + xy
fY / X ( y / x ) = = , x ≤ 1, y ≤ 1
f X ( x) 4
= 0 otherwise
∴ X and Y are not independent
8
PX /Y ( x / y ) = li m ∆y→ 0 PX /Y ( x / y < Y ≤ y + ∆ y )∞
PX ,Y (x, y < Y ≤ y + ∆y)
= li m ∆y→ 0
PY ( y < Y ≤ y + ∆ y )
PX ( x ) fY / X ( y / x ) ∆ y
= lim ∆y→ 0
fY ( y ) ∆ y
PX ( x ) fY / X ( y / x )
=
fY ( y )
PX ( x ) fY / X ( y / x )
==
∑ PX ( x ) fY / X ( y / x )
x
Example 5:
X Y
+
Then
PX ( x) fY / X ( y / x)
PX / Y ( x = 1/ y ) =
∑ PX ( x) fY / X ( y / x)
x
− ( y −1) 2 / 2σ 2
e
=
− ( y −1)2 / 2σ 2 / 2σ 2
+ (e − ( y +1)
2
e
and f X ,Y ( x, y ) = ∂2
∂x∂y
FX ,Y ( x, y )
density functions.
9
1.10 Moments of Random Variables
• Expectation provides a description of the random variable in terms of a few
parameters instead of specifying the entire distribution function or the density
function
• It is far easier to estimate the expectation of a R.V. from data than to estimate its
distribution
First Moment or mean
The mean µ X of a random variable X is defined by
µ X = EX = ∑ xi P( xi ) for a discrete random variable X
∞
= ∫ xf X ( x)dx for a continuous random variable X
−∞
Y = g ( X ) is given by EY = Eg ( X ) = ∫ g ( x) f
−∞
x ( x )dx
Second moment
∞
EX 2 = ∫ x 2 f X ( x)dx
−∞
Variance
∞
σ X2 = ∫ ( x − µ X ) 2 f X ( x) dx
−∞
E( X − µX )(Y − µY ) σ XY
The ratio ρ= =
E( X − µX )2 E(Y − µY )2 σ XσY
is called the correlation coefficient. The correlation coefficient measures how much two
random variables are similar.
10
1.11 Uncorrelated random variables
Random variables X and Y are uncorrelated if covariance
Cov ( X , Y ) = 0
Two random variables may be dependent, but still they may be uncorrelated. If there
exists correlation between two random variables, one may be represented as a linear
regression of the others. We will discuss this point in the next section.
Yˆ = aX + b Regression
Prediction error Y − Yˆ
Mean square prediction error
E (Y − Yˆ ) 2 = E (Y − aX − b) 2
For minimising the error will give optimal values of a and b. Corresponding to the
optimal solutions for a and b, we have
∂ E (Y − aX − b) 2 = 0
∂a
∂ E (Y − aX − b) 2 = 0
∂b
1
Solving for a and b , Yˆ − µY = 2 σ X ,Y ( x − µ X )
σX
σ σ XY
so that Yˆ − µY = ρ X ,Y y ( x − µ X ) , where ρ X ,Y = is the correlation coefficient.
σx σ XσY
If ρ X ,Y = 0 then X and Y are uncorrelated.
=> Yˆ − µ Y = 0
=> Yˆ = µ Y is the best prediction.
Note that independence => Uncorrelatedness. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).
Example 6:
Y = X 2 and f X (x) is uniformly distributed between (1,-1).
X and Y are dependent, but they are uncorrelated.
Cov( X , Y ) = σ X = E ( X − µ X )(Y − µ Y )
Because = EXY = EX 3 = 0
= EXEY (∵ EX = 0)
In fact for any zero- mean symmetric distribution of X, X and X 2 are uncorrelated.
11
1.13 Vector space Interpretation of Random Variables
The set of all random variables defined on a sample space form a vector space with
respect to addition and scalar multiplication. This is very easy to verify.
x =< x, x >
2
• The set of RVs along with the inner product defined through the joint expectation
operation and the corresponding norm defines a Hilbert Space.
12
1.17 Schwary Inequality
For any two vectors x and y belonging to a Hilbert space V
| < x, y > | ≤ x y
For RV X and Y
E 2 ( XY ) ≤ EX 2 EY 2
Proof:
Consider the random variable Z = aX + Y
E (aX + Y ) 2 ≥ 0
.
⇒ a 2 EX 2 + EY 2 + 2aEXY ≥ 0
Non-negatively of the left-hand side => its minimum also must be nonnegative.
For the minimum value,
dEZ 2 EXY
= 0 => a = −
da EX 2
E 2 XY E 2 XY
so the corresponding minimum is + EY 2
− 2
EX 2 EX 2
Minimum is nonnegative =>
E 2 XY
EY 2 − ≥0
EX 2
=> E 2 XY < EX 2 EY 2
Cov( X , Y ) E ( X − µ X )(Y − µY )
ρ ( X ,Y ) = =
σ Xxσ X E ( X − µ X ) 2 E (Y − µY ) 2
to Xˆ (Y ) .
And the corresponding estimation principle is called minimum mean square error
principle. For finding the minimum, we have
∂
∂Xˆ
E ( X − Xˆ (Y ))2 = 0
∞ ∞
⇒ ∂
∂Xˆ ∫ ∫ ( x − Xˆ ( y )) f X ,Y ( x, y )dydx = 0
2
−∞ −∞
∞ ∞
⇒ ∂
∂Xˆ ∫ ∫ ( x − Xˆ ( y )) fY ( y ) f X / Y ( x )dyd x = 0
2
−∞ −∞
∞ ∞
⇒ ∂
∂Xˆ ∫ fY ( y )( ∫ ( x − Xˆ ( y )) f X / Y ( x )dx )dy = 0
2
−∞ −∞
⇒ Xˆ ( y ) = E ( X / Y )
Thus, the minimum mean-square error estimation involves conditional expectation which
is difficult to obtain numerically.
Let us consider a simpler version of the problem. We assume that Xˆ ( y ) = ay and the
estimation problem is to find the optimal value for a. Thus we have the linear
minimum mean-square error criterion which minimizes E ( X − aY ) 2 .
d
da E ( X − aY ) 2 = 0
⇒ E dad ( X − aY ) 2 = 0
⇒ E ( X − aY )Y = 0
⇒ EeY = 0
where e is the estimation error.
14
The above result shows that for the linear minimum mean-square error criterion,
estimation error is orthogonal to data. This result helps us in deriving optimal filters to
estimate a random signal buried in noise.
The mean and variance also give some quantitative information about the bounds of RVs.
Following inequalities are extremely useful in many practical problems.
σ X2
P{ X − µ X ≥ ε } ≤
ε2
Proof:
∞
σ x2 = ∫ ( x − µ X ) 2 f X ( x)dx
−∞
≥ ∫ ( x − µ X ) 2 f X ( x)dx
X − µ X ≥ε
≥ ∫ ε 2 f X ( x) dx
X − µ X ≥ε
= ε 2 P{ X − µ X ≥ ε }
σ X2
∴ P{ X − µ X ≥ ε } ≤
ε2
= aP{ X ≥ a}
15
E( X )
∴ P{ X ≥ a} ≤
a
E ( X − k )2
Result: P{( X − k ) 2 ≥ a} ≤
a
variables. Suppose we want to estimate the mean of the random variable on the basis of
the observed data by means of the relation
1N
µˆ X = ∑ X i
n i =1
How closely does µˆ X represent µ X as n is increased? How do we measure the
to µ X ?
Consider a deterministic sequence x1 , x 2 ,....x n .... The sequence converges to a limit x if
correspond to any ε > 0 we can find a positive integer m such that x − xn < ε for n > m.
{ X n → X } this is an event.
P{s | X n ( s ) → X ( s )} = 1 as n → ∞,
If
P{s X n ( s ) − X ( s ) < ε for n ≥ m} = 1 as m → ∞,
16
1.24 Convergence in mean square sense
If E ( X n − X ) 2 → 0 as n → ∞, we say that the sequence converges to X in mean
square (M.S).
Example 7:
If X 1 , X 2 ,.... X n .... are iid random variables, then
1N
∑ X i → µ X in the mean square 1as n → ∞.
n i =1
1N
We have to show that lim E ( ∑ X i − µ X )2 = 0
n →∞ n i =1
Now,
1N 1 N
E ( ∑ X i − µ X ) 2 = E ( ( ∑ ( X i − µ X )) 2
n i =1 n i =1
1 N 1 n n
= 2 ∑ E ( X i − µ X ) 2 + 2 ∑ ∑ E ( X i − µ X )( X j − µ X )
n i=1 n i=1 j=1,j≠i
nσ X2
= +0 ( Because of independence)
n2
σ X2
=
n
1N
∴ lim E ( ∑ X i − µ X ) 2 = 0
n→∞ n i =1
P{ X n − X > ε } → 0 as n → ∞.
17
Example 8:
Suppose { X n } be a sequence of random variables with
1
P ( X n = 1} = 1 −
n
and
1
P ( X n = −1} =
n
Clearly
1
P{ X n − 1 > ε } = P{ X n = −1} = →0
n
as n → ∞.
Therefore { X n } ⎯⎯
P
→{ X = 0}
F X n ( x ) → FX ( x ) as n → ∞.
Let Y = X 1 + X 2 + ... X n
Then µ Y = µ X 1 + µ X 2 + ...µ X n
And σ Y2 = σ X2 1 + σ X2 2 + ... + σ X2 n
The central limit theorem states that under very general conditions Y converges to
N ( µ Y , σ Y2 ) as n → ∞. The conditions are:
1. The random variables X , X ,..., X are independent with same mean and
1 2 n
18
1.28 Jointly Gaussian Random variables
Two random variables X and Y are called jointly Gaussian if their joint density function
is
⎡ ( x−µ X )2 ( x− µ )( y − µ ) ( y − µ )2 ⎤
− 1
2 ) ⎢ 2
− 2 ρ XY σ
X
σ
Y + Y
⎥
σ Y2
2 (1− ρ X
⎣⎢ σ X ⎦⎥
f X ,Y ( x, y ) = Ae ,Y X Y
where A= 1
2πσ xσ y 1− ρ X2 ,Y
Properties:
(1) If X and Y are jointly Gaussian, then for any constants a and b, then the random
variable
Z , given by Z = aX + bY is Gaussian with mean µ Z = aµ X + bµY and variance
σ Z 2 = a 2σ X 2 + b 2σ Y 2 + 2abσ X σ Y ρ X ,Y
(2) If two jointly Gaussian RVs are uncorrelated, ρ X ,Y = 0 then they are statistically
independent.
f X ,Y ( x, y ) = f X ( x ) f Y ( y ) in this case.
(4) If X and Y are joint by Gaussian random variables then the optimum nonlinear
estimator X̂ of X that minimizes the mean square error ξ = E{[ X − Xˆ ] 2 } is a linear
estimator Xˆ = aY
19
REVIEW OF RANDOM PROCESS
2.1 Introduction
Recall that a random variable maps each sample point in the sample space to a point in
the real line. A random process maps each sample point to a waveform.
• A random process can be defined as an indexed family of random variables
{ X (t ), t ∈ T } where T is an index set which may be discrete or continuous usually
denoting time.
• The random process is defined on a common probability space {S , ℑ, P}.
• A random process is a function of the sample point ξ and index variable t and
may be written as X (t , ξ ).
• For a fixed t (= t 0 ), X (t 0 , ξ ) is a random variable.
X (t , s3 )
s3 X (t , s2 )
s1
s2
X (t , s1 )
t
Figure Random Process
2.2 How to describe a random process?
random variables. Thus a random process can be described by the joint distribution
function FX ( t1 ), X ( t2 )..... X ( tn ) ( x1 , x 2 .....x n ) = F ( x1 , x 2 .....x n , t1 , t 2 .....t n ), ∀n ∈ N and ∀t n ∈ T
Example 1:
[ X (t1 ), X (t2 ),..... X (tn )]' is jointly Gaussian with the joint density function given by
1
− X'C−X1X
e 2
f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) = where C X = E ( X − µ X )( X − µ X ) '
( )
n
2π det(CX )
For n = 1,
FX ( t1 ) ( x1 ) = FX ( t1 +t0 ) ( x1 ) ∀t 0 ∈ T
FX (t1 ) ( x1 ) = FX ( 0 ) ( x1 )
⇒ EX (t1 ) = EX (0) = µ X (0) = constant
For n = 2,
FX ( t1 ), X ( t 2 ) ( x1 , x 2 .) = FX ( t1 +t0 ), X ( t 2 +t0 ) ( x1 , x 2 )
Put t 0 = −t 2
FX ( t1 ), X ( t2 ) ( x1 , x2 ) = FX ( t1 −t2 ), X ( 0 ) ( x1 , x2 )
⇒ R X (t1 , t2 ) = R X (t1 − t2 )
A random process X (t ) is called wide sense stationary process (WSS) if
µ X (t ) = constant
R X (t1 , t2 ) = R X (t1 − t2 ) is a function of time lag.
For a Gaussian random process, WSS implies strict sense stationarity, because this
process is completely described by the mean and the autocorrelation functions.
The autocorrelation function R X (τ ) = EX (t + τ ) X (t ) is a crucial quantity for a WSS
process.
• RX (0) = EX 2 (t ) is the mean-square value of the process.
X (t + τ )
2 2
≤ X (t )
= EX 2 (t ) EX 2 (t + τ )
= RX2 (0) RX2 (0)
∴ RX (τ ) <RX (0)
• RX (τ ) is a positive semi-definite function in the sense that for any positive
n n
integer n and real a j , a j , ∑ ∑ ai a j RX (ti , t j )>0
i =1 j =1
• If X (t ) is periodic (in the mean square sense or any other sense like with
−τ t1
−τ
dτ
X T (ω ) X T * (ω ) | X (ω ) |2 1 TT − jωt + jωt
E =E T = ∫ ∫ EX T (t1 ) X T (t 2 )e 1 e 2 dt1dt 2
2T 2T 2T −T −T
1 T T − jω ( t1 −t2 )
= ∫ ∫ RX (t1 − t2 )e dt1dt2
2T −T −T
1 2T − jωτ
= ∫ RX (τ )e (2T − | τ |)dτ
2T −2T
X T (ω ) X T * (ω ) 2T |τ |
E = ∫ R x (τ )e − jωτ (1 − ) dτ
2T − 2T
2T
If R X (τ ) is integrable then as T → ∞,
E X T (ω )
2
∞
limT →∞ = ∫ RX (τ )e − jωτ dτ
2T −∞
E X T (ω )
2
Properties
∞
• EX 2 (t) = R X (0) = ∫ S X (ω )dw = average power of the process.
−∞
w2
• The average power in the band ( w1 , w2) is ∫ S X ( w)dw
w1
E X T (ω )
2
defined as
R X,Y (τ ) = Ε X (t + τ )Y (t )
so that RYX (τ ) = Ε Y (t + τ ) X (t )
= Ε X (t )Y (t + τ )
= R X,Y ( − τ )
∴ RYX (τ ) = R X,Y ( − τ )
Cross power spectral density
∞
S X ,Y ( w) = ∫ RX ,Y (τ )e− jwτ dτ
−∞
S X ,Y ( w) = SY* , X ( w)
1 π jωm
∴ RX [m ] = ∫ S X ( w)e dw
2π −π
For a discrete sequence the generalized PSD is defined in the z − domain as follows
∞
S X ( z) = ∑ R [m ] z
m =−∞
x
−m
m
(2) RX ( m) = a a >0
1 − a2
S X ( w) = -π ≤ w ≤ π
1 − 2a cos w + a 2
→ f
Linear
White Noise System WSS Random Signal
2.7 White Noise Sequence
For a white noise sequence x[n],
N
S X ( w) = −π ≤ w ≤ π
2
Therefore
N
R X ( m) =δ ( m)
2
where δ (m) is the unit impulse sequence.
S X (ω )
RX [ m] NN
N
N
N
22
22
2
• • • • • m→
−π
2.8 Linear Shift Invariant System with Random Inputs π →ω
Consider a discrete-time linear system with impulse response h[n].
h[n]
y[n]
x[n]
• Note that though the input is an uncorrelated process, the output is a correlated
process.
Consider the case of the discrete-time system with a random sequence x[n ] as an input.
x[n ] y[n ]
h[n ]
RY [m] = R X [m] * h[m] * h[ −m]
Taking the z − transform, we get
S Y ( z ) = S X ( z ) H ( z ) H ( z −1 )
R XX [m ] RYY [m ]
H ( z) H ( z −1 )
S XX ( z ) SYY [z ]
Example 4:
1
If H ( z) = and x[n ] is a unity-variance white-noise sequence, then
1 − α z −1
SYY ( z ) = H ( z ) H ( z −1 )
⎛ 1 ⎞⎛ 1 ⎞ 1
=⎜ −1 ⎟⎜ ⎟
⎝ 1 − α z ⎠⎝ 1 − α z ⎠ 2π
By partial fraction expansion and inverse z − transform, we get
1
RY [m ] = a |m|
1−α 2
2.9 Spectral factorization theorem
A stationary random signal X [n ] that satisfies the Paley Wiener condition
π
∫ | ln S X ( w) | dw < ∞ can be considered as an output of a linear filter fed by a white noise
−π
sequence.
If S X (w) is an analytic function of w ,
π
and ∫ | ln S X ( w) | dw < ∞ , then S X ( z ) = σ v2 H c ( z ) H a ( z )
−π
where
H c ( z ) is the causal minimum phase transfer function
1 v[n]
X [n ] H c ( z)
1 π iwn
where c[k ] = ∫ ln S XX ( w)e dw is the kth order cepstral coefficient.
2π −π
For a real signal c[k ] = c[−k ]
1 π
and c[0] = ∫ ln S XX ( w)dw
2π −π
∞
∑ c[ k ] z − k
S XX ( z ) = e k =−∞
∞ −1
∑ c[ k ] z − k ∑ c[ k ] z − k
=e c[0]
e k =1 e k =−∞
∞
∑ c[ k ] z − k
Let H C ( z ) = ek =1 z >ρ
= 1 + hc (1)z -1 + hc (2) z −2 + ......
(∵ hc [0] = Lim z →∞ H C ( z ) = 1
H C ( z ) and ln H C ( z ) are both analytic
∞
∑ c( k ) zk 1
= e k =1 = H C ( z −1 ) z<
ρ
Therefore,
S XX ( z ) = σ V2 H C ( z ) H C ( z −1 )
where σ V2 = ec ( 0)
Salient points
• S XX (z ) can be factorized into a minimum-phase and a maximum-phase factors
i.e. H C ( z ) and H C ( z −1 ).
• In general spectral factorization is difficult, however for a signal with rational
power spectrum, spectral factorization can be easily done.
• Since is a minimum phase filter, 1 exists (=> stable), therefore we can have a
H C (z)
1
filter to filter the given signal to get the innovation sequence.
HC ( z)
• X [n ] and v[n] are related through an invertible transform; so they contain the
same information.
sequence as input.
• X p [n] is a predictable process, that is, the process can be predicted from its own
3.1 Introduction
The spectral factorization theorem enables us to model a regular random process as
an output of a linear filter with white noise as input. Different models are developed
using different forms of linear filters.
• These models are mathematically described by linear constant coefficient
difference equations.
• In statistics, random-process modeling using difference equations is known as
time series analysis.
RV [m]
• • • • m
SV ( w )
σ V2
2π
FIR
v[n] X [ n ]
filter
µe = 0 ⇒ µ X = 0
and v[n] is an uncorrelated sequence means
q
σ X 2 = ∑ bi 2σ V2
i =0
The autocorrelations are given by
RX [ m] = E X [ n ] X [ n - m]
q q
= ∑ ∑ bib j Ev[n − i ]v[n − m − j ]
i = 0 j =0
q q
= ∑ ∑ bib j RV [m − i + j ]
i = 0 j =0
RV [m] = σ V2 when
m-i + j = 0
⇒ i=m+ j
and
RX [ − m ] = R X [m ]
q− m
= 0 otherwise
Notice that, RX [ m ] is related by a nonlinear relationship with model parameters. Thus
finding the model parameters is not simple.
The power spectral density is given by
σ V2 − jw
B ( w) , where B( w) = = bο + b1 e + ......bq e − jqw
2
S X ( w) =
2π
FIR system will give some zeros. So if the spectrum has some valleys then MA will fit
well.
RX [m]
m
Figure: Autocorrelation function of a MA process
Figure: Power spectrum of a MA process
S X ( w) R X [ m]
q →m
→ω
From above b0 and b1 can be calculated using the variance and autocorrelation at lag 1 of
the signal.
v[n]
IIR X [n ]
filter
σ e2
with a0 = 1 (all poles model) and S X ( w) =
2π | A(ω ) |2
If there are sharp peaks in the spectrum, the AR(p) model may be suitable.
S X (ω )
R X [ m]
→ω →m
p
∴ RX [ m] = ∑ ai RX [ m − i ] + σ V2δ [ m] ∨m∈I
i =1
The above relation gives a set of linear equations which can be solved to find a i s.
X [ n]
v[n] B ( w)
H ( w) =
A( w)
S X ( w) =
A(ω ) 2π
2
where
z[n] = [ x[ n] X [n − 1].... X [ n − p]]′
⎡ a1 a2 ......a p ⎤
⎢ ⎥
A= ⎢ 0 1......0 ⎥
, B = [1 0...0]′ and
⎢............... ⎥
⎢ ⎥
⎢⎣ 0 0......1 ⎥⎦
C = [b0 b1...bq ]
4.1 Introduction
• For speech, we have LPC (linear predictive code) model, the LPC-parameters are
to be estimated from observed data.
• We may have to estimate the correct value of a signal from the noisy observation.
Array of sensors
Signals generated by
the submarine due
Mechanical movements
of the submarine
An estimator θˆ(X) is a rule by which we guess about the value of an unknown θ on the
basis of X.
θˆ(X) is a random, being a function of random variables.
For a particular observation x1 , x2 ,...., xN , we get what is known as an estimate (not
estimator)
Let X 1 , X 2 ,...., X N be a sequence of independent and identically distributed (iid) random
1 N
µˆ = ∑ X i is an estimator for µ X .
N i =1
1 N
σˆ X2 = ∑ (Y1 − µˆ X ) is an estimator for
2
σ X2 .
N i =1
An estimator is a function of the random sequence X 1 , X 2 ,...., X N and if it does not
involve any unknown parameters. Such a function is generally called a statistic.
1 N
and σˆ 2 2 = ∑
N − 1 i =1
( X i − µˆ X ) 2
= E ∑ {( X i − µ X ) 2 + ( µ X − µˆ X ) 2 + 2( X i − µ X )( µ X − µˆ X )}
Now E ( X i − µ X ) 2 = σ 2
2
⎛ ∑ Xi ⎞
and E (µ X − µˆ X ) = E⎜ µ X −
2
⎟
⎝ N ⎠
E
= 2
( N µ X − ∑ X i )2
N
E
= 2 (∑( X i − µ X )) 2
N
E
= 2 ∑( X i − µ X ) 2 + ∑ ∑ E ( X i − µ X )( X j − µ X )
N i j ≠i
E
= 2 ∑( X i − µ X ) 2 (because of independence)
N
σ X2
=
N
also E ( X i − µ X )( µ X − µˆ X ) = − E ( X i − µ X ) 2
N
∴ E ∑ ( X i − µˆ X ) 2 = Nσ 2 + σ 2 − 2σ 2 = ( N − 1)σ 2
i =1
1
So Eσˆ 22 = E ∑( X i − µˆ X ) 2 = σ 2
N −1
∴σˆ 22 is an unbiased estimator of σ .2
Similarly sample mean is an unbiased estimator.
N
1
µ̂ X =
N
∑X
i =1
i
1 N
Nµ X
Eµˆ X =
N
∑ E{ X } =
i =1
i
N
= µX
MSE
(
P θˆ-θ ≥ ε ) ≤
E (θˆ-θ ) 2
ε2
variance σ X2 .
1 N
Let µˆ X = ∑ Xi be an estimator for µ X . We have already shown that µˆ X is unbiased.
N i=1
σ X2
Also var( µˆ X ) = Is it a consistent estimator?
N
σ2
Clearly lim var( µˆ X ) = lim X = 0. Therefore µˆ X is a consistent estimator of µ X .
N →∞ N →∞ N
1 N
Then µˆ X = ∑ Xi is a sufficient statistic of .
N i=1
( )
1 2
N 1 − x −µ X
2 i
f X1 , X 2 .. X N / µ X ( x1 , x2 ,...xN ) = ∏ e
i =1 2π
( )
1N 2
1 − ∑ x −µ X
2 i =1 i
= e
( 2π ) N
( )
1N 2
1 − ∑ x − µˆ + µˆ − µ
2 i =1 i
Because = e
( 2π ) N
1 −
1N
2 i =1 i
( 2 2
∑ ( x − µˆ X ) + ( µˆ − µ X ) + 2( xi − µˆ )( µˆ − µ X ) )
= e
( 2π ) N
1 −
1N
∑ ( x − µˆ X )
2 i =1 i
2
−
1N
(
∑ ( µˆ − µ X )
2
)
= e e 2 i =1
e0 ( why ?)
( 2π ) N
E (θˆ ) = θ
is also called likelihood function. θ may also be random. In that case likelihood function
will represent conditional joint density function.
L( x / θ ) = ln f X ( x1 ...... x N / θ ) is called log likelihood function.
4.9 Statement of the Cramer Rao theorem
∞ ∞
∂
∴ ∫ (θˆ − θ ) f (x / θ )}dy = ∫ f X (x / θ )dx = 1. (1)
∂θ X
−∞ −∞
∂ ∂
Note that f X (x / θ ) = {ln f X ( x / θ )} f X ( x / θ )
∂θ ∂θ
∂L
= ( ∂θ ) f X (x / θ )
So that
2
⎧∞ ˆ ∂ ⎫
⎨ ∫ (θ − θ ) f X (x / θ ) L(x / θ ) f X (x / θ )dx dx ⎬ = 1 . (2)
⎩ −∞ ∂θ ⎭
since f X (x / θ ) is ≥ 0.
Recall the Cauchy Schawarz Ineaquality
2 2 2
< a, b > < a b
where the equality holds when a = cb ( where c is any scalar ).
Applying this inequality to the L.H.S. of equation (2) we get
2
⎛∞ ˆ ∂ ⎞
⎜ ∫ (θ − θ ) f X (x / θ ) L(x / θ ) f X (x / θ )dx dx ⎟
⎝ −∞ ∂θ ⎠
∞ ∞ 2
⎛ ∂ ⎞
≤ ∫ (θˆ-θ ) 2 f X (x / θ ) dx ∫-∞ ⎜⎝ ( ∂θ L(x / θ ) ⎟⎠ f X (x / θ ) dx
−∞
= var(θˆ ) I(θ )
∞
Also from ∫ f X (x / θ )dx = 1, we get
−∞
∞
∂
∫ ∂θ
−∞
f X (x / θ )dx = 0
∞
∂L
∴∫ f X (x / θ )dx = 0
−∞
∂θ
Taking the partial derivative with respect to θ again, we get
∞ ⎧ ∂2L ∂L ∂ ⎫
∫ ⎨ 2 f X (x / θ ) + f X (x / θ ) ⎬ dx = 0
−∞ ⎩ ∂θ ∂θ ∂θ ⎭
⎧⎪ ∂ 2 L
∞
⎛ ∂L ⎞
2
⎫⎪
∴ ∫ ⎨ 2 f X (x / θ ) + ⎜ ⎟ f ( x / θ ) ⎬ dx = 0
−∞ ⎪ ∂θ ⎝ ∂θ ⎠ X
⎩ ⎭⎪
2
⎛ ∂L ⎞ ∂ 2L
E⎜ ⎟ =-E 2
⎝ ∂θ ⎠ ∂θ
If θˆ satisfies CR -bound with equality, then θˆ is called an efficient estimator.
Remark:
(1) If the information I (θ ) is more, the variance of the estimator θ ˆ will be less.
(2) Suppose X 1 ............. X N are iid. Then
2
⎛ ∂ ⎞
I1 (θ ) = E ⎜ ln( f X /θ ( x)) ⎟
⎝ ∂θ 1
⎠
2
⎛ ∂ ⎞
∴ I N (θ ) = E ⎜ ln( f X , X .. X / θ ( x1 x2 ..xN )) ⎟
⎝ ∂θ 1 2 N
⎠
= NI1 (θ )
Example 3:
Let X 1 ............. X N are iid Gaussian random sequence with known variance σ 2 and
unknown mean µ .
N
1
Suppose µ̂ =
N
∑X
i =1
i which is unbiased.
1 N ( x − µ )2
1
− ∑
∴ f X ( x1 , x 2 ,..... x N / θ ) = e 2σ 2 i =1 i
( ( 2π ) N σ N
N
1
so that L( X / µ ) = −ln( 2π ) N σ N −
2σ 2 ∑(x
i =1
i − µ )2
∂L 1 N
Now = 0 - 2 ( -2) ∑ (X i − µ )
∂µ 2/ σ i =1
∂2L N
∴ = - 2
∂µ 2
σ
∂2L N
So that E =- 2
∂µ 2
σ
1 1 1 σ2
∴ CR Bound = = = N
=
I (θ ) ∂2L N
- E 2 σ2
∂µ
∂L 1 N
N ⎛ Xi ⎞
=
∂θ 2σ 2
∑ (X
i =1
i − µ) = 2 ∑
⎜
σ ⎝ i N
- µ⎟
⎠
N
= ( µˆ - µ )
σ2
∂L
Hence - = c (θˆ - θ )
∂θ
and µˆ is an efficient estimator.
∂
f X ( x1 , … x N /θ ) θ̂ = 0
∂θ MLE
∂L(x | θ )
or =0
∂θ θ̂ MLE
Thus the MLE is given by the solution of the likelihood equation given above.
⎡θ1 ⎤
⎢θ ⎥
If we have a number of unknown parameters given by θ = ⎢ ⎥
2
⎢# ⎥
⎢ ⎥
⎣θ N ⎦
Then MLE is given by a set of conditions.
∂L ⎤ ∂L ⎤ ∂L ⎤
⎥ = ⎥ = .... = ⎥ =0
∂θ1 ⎦ θ =θˆ ∂θ 2 ⎦ θ ˆ ∂θ M ⎦ θ ˆ
1 1MLE 2 =θ 2MLE M =θ MMLE
Example 4:
Let X 1 ................. X N are independent identically distributed sequence of N ( µ , σ 2 )
∂L
=0=−
N
+
∑ (x i
− µ̂ MLE ) 2
=0
∂σ σˆ MLE σˆ MLE
Solving we get
1 N
µˆ MLE = ∑ xi
N i=1
and
1 N
∑ ( xi − µˆ MLE )
2
σˆ MLE 2 =
N i=1
Example 5:
Let X 1 ................. X N are independent identically distributed sequence with
1 − x−θ
f X /θ ( x) = e -∞ < x < ∞
2
Show that the median of X 1 ................. X N is the MLE for θ .
N
1 − ∑ xi −θ
f X1 , X 2 ... X N / θ ( x1 , x2 ....x N ) = N e i =1
2
N
L( X / θ ) = ln f X / θ ( x1 ,........., xN ) = − N ln 2 − ∑ xi − θ
i =1
N
∑ x −θ
i =1
i is minimized by median ( X 1................. X N )
Some properties of MLE (without proof)
• MLE may be biased or unbiased, asymptotically unbiased.
• MLE is consistent estimator.
• If an efficient estimator exists, it is the MLE estimator.
An efficient estimator θˆ exists =>
∂
L(x / θ) = c( θ̂ − θ)
∂θ
at θ = θ̂,
∂L( x/θ)
= c(θˆ − θˆ) = 0
∂θ θ̂
invertible function of θ.
Obervation x
f (x / θ )
Parameter θ
with density
fθ (θ )
f X ,Θ ( x,θ ) = f Θ (θ ) f X / Θ ( x )
We associate a cost function C (θ̂, θ ) with every estimator θ̂. It represents the positive
penalty with each wrong estimation. C (ε )
∞ ∞
ˆ =
C = EC (θ, θ) ∫ ∫ C(θ , θ)ˆ f
−∞ −∞
X ,Θ ( x , θ )dx dθ
-δ/2 δ/2 ε
The estimator seeks to minimize the Bayescan Risk.
∫ ∫ (θ − θ)ˆ
2
Minimize f X ,Θ ( x , θ )dx dθ
−∞ −∞
with respect to θˆ .
This is equivalent to minimizing
∞ ∞
∫ ∫ (θ − θˆ)
2
f(θ | x)f ( x )dθ dx
−∞−∞
∞ ∞
= ∫ ( ∫ (θ − θˆ) f(θ | x)dθ ) f (x) dx
2
−∞ −∞
Since f ( x ) is always +ve, the above integral will be minimum if the inner integral is
minimum. This results in the problem:
∞
∫ (θ − θˆ)
2
Minimize f(θ | x)dθ )
−∞
with respect to θˆ.
∞
∂
=> ∫ (θˆ − θ)2 f Θ / X (θ ) dθ = 0
∂θ̂ −∞
∞
=> −2 ∫ (θˆ − θ) f Θ / X (θ ) dθ = 0
−∞
∞ ∞
=> θˆ ∫ f Θ / X (θ ) dθ = ∫θ f Θ/ X (θ ) dθ
−∞ −∞
∞
=> θˆ = ∫θ f Θ/ X (θ )dθ
−∞
∴θ̂ is the conditional mean or mean of the a posteriori density. Since we are minimizing
quadratic cost it is also called minimum mean square error estimator (MMSE).
Salient Points
• Information about distribution of θ available.
• a priori density function f Θ (θ ) is available. This denotes how observed data
depend on θ
• We have to determine a posteriori density f Θ / X (θ ) . This is determined form the
Bayes rule.
1
Case II Hit or Miss Cost Function
∞ ∞
ˆ =
Risk C = EC (θ, θ) ∫ ∫ C(θ , θ)ˆ f
−∞ −∞
X ,Θ ( x , θ )dx dθ
∞ ∞
We have to minimize
∞
−∞
∫ C (θ,θ)ˆ f Θ/ X (θ )dθ ˆ
with respect to θ.
∆
∫ f Θ / X (θ ) dθ ≅ ∆f Θ / X (θˆ) when ∆ is very small
θ̂ −
2
This will be maximum if f Θ / X (θ ) is maximum. That means select that value of θ̂ that
maximizes the a posteriori density. So this is known as maximum a posteriori estimation
(MAP) principlee.
This estimator is denoted by θ̂ MAP .
Case III
C (θˆ,θ ) = θˆ − θ
C (ε )
C = Average cost=E θˆ − θ
∞ ∞
= ∫ ∫ θˆ − θ fθ , X (θ , x)dθ dx
−∞ −∞ Æ ε = θˆ − θ
∞ ∞
= ∫ ∫ θˆ − θ fθ (θ ) f X /θ (x)dθ dx
−∞ −∞
∞ ∞
= ∫ ∫ θˆ − θ f X /θ (x)dxfθ (θ )dθ
−∞ −∞
ˆ ∫ C (θˆ,θ ) fθ / X (θ | x) dθ = 0
∂θ −∞
∂ ⎧ θˆ ˆ ∞
⎨ ∫ (θ − θ ) fθ / X (θ | x) dθ + ∫ (θ − θˆ) fθ / X (θ | x) dθ = 0
∂θˆ ⎩ −∞ θˆ
At the θˆMAE
θˆMAE ∞
∫ fθ / X (θ | x) dθ − ∫ fθ / X (θ | x) dθ = 0
−∞ θˆMAE
θ . Further θ is known to be a 0-mean Gaussian with Unity Variance. Find the MAP
estimator for θ .
( 2π ) N
f Θ (θ ) f X / Θ (x)
Therefore f Θ / X (θ ) =
f X ( x)
⇒ ln f Θ (θ ) f X / Θ (x) is maximum
1 N
(x − θ )2
⇒− θ2 −∑ i is maximum
2 i =1 2
N
⎤
⇒ θ − ∑ ( x i − θ )⎥ =0
i =1 ⎦ θ =θˆMAP
1
⇒ θˆMAP = ∑ xi
N + 1 i =1
Example 7:
Consider single observation X that depends on a random parameter θ . Suppose θ has a
prior distribution
fθ (θ ) = λ e − λθ for θ ≥ 0, λ > 0
and
f X / Θ ( x) = θ e −θ x x >0
f Θ (θ ) f X / Θ ( x)
f Θ / X (θ ) =
f X ( x)
ln( f (θ | x)) = ln( f Θ (θ )) + ln( f X / Θ ( x)) − ln f X ( x)
1
⇒ θˆMAP =
λ+X
X Y
+
Then
1
f x ( x ) = [δ ( x − 1) + δ ( x + 1)]
2
1 2 2
fY / X ( y / x ) = e − ( y − x ) / 2σ
2π σ
f ( x ) fY / X ( y / x )
f X /Y ( x / y) = ∞ X
∫ f X ( x ) fY / X ( y / x ) fx
−∞
− ( y − x ) 2 / 2σ 2
e [δ ( x − 1) + δ ( x + 1)]
= − ( y −1)2 / 2σ 2 2
/ 2σ 2
e + e − ( y +1)
− ( y − x )2 / 2σ 2
e ∞ [δ ( x − 1) + δ ( x + 1)]
Xˆ MMSE = E ( X / Y ) = ∫ x − ( y −1)2 / 2σ 2 2
/ 2σ 2
dx
−∞ e + e − ( y +1)
2
/ 2σ 2 2
/ 2σ 2
e− ( y −1) − e − ( y +1)
Hence = 2 2 2
/ 2σ 2
e − ( y −1) / 2σ + e− ( y +1)
= tanh( y / σ 2 )
To summarise f X / Θ ( x)
MLE:
Simplest
θˆMLE θ
MMSE:
θˆMMSE = E (Θ / X) MMSE
• Find a posteriori density.
• Find the average value by integration
• Lots of calculation hence it is computationally exhaustive.
MAP: f Θ / X (θ )
θˆMAP
From
f Θ (θ ) f X / Θ (x)
f Θ / X (θ ) =
f X ( x)
θˆMAP is given by
∂ ∂
ln f Θ (θ ) + ln( f X / Θ (x)) = 0
∂θ ∂θ
θ MIN θ MAX
θ
If θ MIN ≤ θˆMLE ≤ θ MAX
then θˆMAP = θˆMLE
If θˆMAP ≤ θ MIN
then θˆMAP = θ MIN
If θˆMLE ≥ θ MAX
then θˆMAP = θ MAX
CHAPTER – 5: OPTIMAL LINEAR FILTER: WIENER
FILTER
X Y
+
Maximum likelihood estimation for X [n ] determines that value of X [n ] for which the
sequence Y [i ], i = 1, 2,..., n is most likely. Let us represent the random sequence
Y[n] = [Y [n], Y [n − 1],.., Y [1]]'
Y [i ], i = 1, 2,..., n by the random vector and the value sequence y[1], y[2],..., y[n] by
y[n] = [ y[n], y[n − 1],.., y[1]]'.
The likelihood function f Y[ n ] / X [ n ] ( y[ n] / x[ n]) will be Gaussian with mean x[n ]
n
( y [ i ]− x[ n ])2
1 − ∑
f Y[ n ] / X [ n ] ( y[n] / x[n]) = i =1
2
e
( )
n
2π
1 n
⇒ χˆ MLE [n] = ∑ y[i]
n i =1
Similarly, to find χˆ MAP [n ] and χˆ MMSE [n] we have to find a posteriori density
f X [ n ] ( x[n]) f Y[ n ]/ X [ n ] (y[n] / x[n])
f X [ n ]/ Y[ n ] ( x[n] / y[n]) =
f Y[ n ] (y[n])
n
( y [ i ]− x[ n ])2
− x [ n ]− ∑
1 2
1 2 i =1
2
= e
f Y[ n ] (y[n])
Taking logarithm
1 n
( y[i] − x[n])2
log e f X [ n ]/ Y[ n ] ( x[n]) = − x 2 [n] − ∑ − log e f Y[ n ] (y[n])
2 i =1 2
log e f X [ n ]/ Y[ n ] ( x[ n]) is maximum at xˆMAP [n]. Therefore, taking partial derivative of
∑ y[i]
xˆMAP [n] = i =1
n +1
Similarly the minimum mean-square error estimator is given by
n
∑ y[i]
xˆMMSE [n] = E ( X [n]/ y[n]) = i =1
n +1
• For MMSE we have to know the joint probability structure of the channel and the
source and hence the a posteriori pdf.
• Finding pdf is computationally very exhaustive and nonlinear.
• Normally we may be having the estimated values first-order and second-order
statistics of the data
We look for a simpler estimator.
The answer is Optimal filtering or Wiener filtering
We have seen that we can estimate an unknown signal (desired signal) x[ n] from an
observed signal y[ n] on the basis of the known joint distributions of y[ n] and x[ n]. We
could have used the criteria like MMSE or MAP that we have applied for parameter es
timations. But such estimations are generally non-linear, require the computation of a
posteriori probabilities and involves computational complexities.
The approach taken by Wiener is to specify a form for the estimator that depends on a
number of parameters. The minimization of errors then results in determination of an
optimal set of estimator parameters. A mathematically sample and computationally easier
estimator is obtained by assuming a linear structure for the estimator.
5.2 Linear Minimum Mean Square Error Estimator
x[ n] y[ n ] xˆ[ n]
Syste +
Filter
m
Noise
y[ n − M − 1] …….. y[ n] …
n − M −1 n n+ N
The linear minimum mean square error criterion is illustrated in the above figure. The
problem can be slated as follows:
E ( x[ n] − xˆ[n]) is
2
and the mean square error a minimum with respect to
The problem of deterring the estimator parameters by the LMMSE criterion is also called
the Wiener filtering problem. Three subclasses of the problem are identified
We have to minimize Ee 2 [n] with respect to each h[i ] to get the optimal estimation.
Corresponding minimization is given by
∂E {e2 [n]}
= 0, for j = − N ..0..M − 1
∂h[ j ]
∂
( E being a linear operator, E and can be interchanged)
∂h[ j ]
Ee[ n] y[ n - j ] = 0, j = − N ...0,1,...M − 1 (1)
or
e[ n ]
⎛ M −1 ⎞
E ⎜ x[n] - ∑ h[i] y[n − i] ⎟ y[n - j ] = 0, j = − N ...0,1,...M − 1 (2)
⎝ i =− N a ⎠
M −1
RXY ( j ) = ∑ h[i]R
i =− N a
YY [ j − i ], j = − N ...0,1,...M − 1 (3)
This set of N + M + 1 equations in (3) are called Wiener Hopf equations or Normal
equations.
• The result in (1) is the orthogonality principle which implies that the error is
orthogonal to observed data.
• xˆ[ n] is the projection of x[ n] onto the subspace spanned by observations
y[n − M ], y[ n − M + 1].. y[ n],... y[ n + N ].
⎛ M −1
⎞
E ⎜ x[n] -
⎝
∑ h[i] y[n − i]⎟⎠ y[n - j] = 0,
i =0
j = 0,1,...M − 1
M −1
∑ h[i]R
i =0
YY [ j − i ] = RXY ( j ), j = 0,1,...M − 1
and
⎡ RXY [0] ⎤
⎢ R [1] ⎥
rXY = ⎢ XY ⎥
⎢... ⎥
⎢ ⎥
⎢⎣ RXY [ M − 1] ⎥⎦
and
⎡ h[0] ⎤
⎢ h[1] ⎥
h= ⎢ ⎥
⎢... ⎥
⎢ ⎥
⎣ h[ M − 1] ⎦
Therefore,
−1
h = R YY rXY
⎝ i =0 ⎠
=Ee[n] x[n] ∵ error isorthogonal to data
⎛ M −1
⎞
=E ⎜ x[n] −
⎝
∑ h[i] y[n − i]⎟⎠ x[n]
i =0
M −1
=RXX [0] − ∑ h[i ]RXY [i ]
i =0
R XX rXY
Wiener
Estimation
• • h[0] xˆ[n]
Z-1
h[1]
Z-1
h[ M − 1]
Example1: Noise Filtering
Consider the case of a carrier signal in presence of white Gaussian noise
π
x[n] = A cos[ w0 n + φ ], w0 =
4
y[n] = x[n] + v[n]
here φ is uniformly distributed in (1, 2π ).
v[ n] is white Gaussian noise sequence of variance 1 and is independent of x[ n]. Find the
parameters for the FIR Wiener filter with M=3.
A2
RXX [m] = cos w 0 m
2
RYY [m] = E y[n] y[n − m]
= E ( x[n] + v[m])( x[n − m] + v[n − m])
= RXX [m] + RVV [m] + 0 + 0
A2
= cos( w0 m) + δ [m]
2
RXY [m] = E x[n] y[n − m]
= E x[n] ( x[n − m] + v[n − m])
= RXX [m]
Hence the Wiener Hopf equations are
⎡ RYY [0] RYY [1] Ryy[2]⎤ ⎡ h[0]⎤ = ⎡ RXX [0]⎤
⎢ R [1] R [0] R [1] ⎥ ⎢ h[1] ⎥ ⎢ R [1] ⎥
⎢ YY YY YY ⎥⎢ ⎥ ⎢ XX ⎥
⎢⎣ RYY [2] RYY 1] RYY [0] ⎥⎦ ⎢⎣ h[2]⎥⎦ ⎢⎣ RXX [2]⎥⎦
⎡ A2 A2 π A2 π⎤ ⎡ A2 ⎤
⎢ 2 +1 cos cos ⎥ ⎡ h[0]⎤ ⎢ ⎥
⎢ 2 2 4 2 2⎥⎢ ⎥ ⎢ 2 ⎥
⎢A π A 2
A2 π ⎥⎢ ⎥ ⎢ A2 π⎥
⎢ 2 cos 4 +1 cos ⎥ ⎢ h[1] ⎥ = ⎢ cos ⎥
2 2 4 ⎢ ⎥ ⎢ 2 4
⎢ 2 ⎥ ⎥
⎢ A cos π A2 π A 2
⎥ ⎢ ⎥ ⎢ A 2
π⎥
⎢ 2 cos + 1 ⎥ ⎢⎣ h[2]⎥⎦ ⎢ cos ⎥
⎣ 2 2 4 2 ⎦ ⎣ 2 2⎦
suppose A = 5v then
⎡ 12.5 ⎤
⎢13.5 0 ⎥
⎡12.5 ⎤
⎢ 2 ⎥ ⎡ h[0]⎤ = ⎢ ⎥
⎢12.5 12.5 ⎥ ⎢ ⎥ ⎢12.5 ⎥
⎢ 2 13.5 h[1]
⎢ 2 ⎥⎥ ⎢ ⎥ ⎢ 2⎥
⎢⎣ h[2]⎥⎦ ⎢
⎢ 12.5 ⎥ ⎣ 0 ⎥⎦
⎢ 0 13.5 ⎥
⎣ 2 ⎦
−1
⎡ 12.5 ⎤
⎢ 13.5 0 ⎥ ⎡12.5 ⎤
⎡ h[0]⎤ 2 ⎢ ⎥
⎢ h[1] ⎥ = ⎢12.5 12.5 ⎥ ⎢12.5 ⎥
⎢ ⎥ ⎢ 2 13.5 ⎥
2⎥ ⎢ 2⎥
⎢⎣ h[2]⎥⎦ ⎢
13.5 ⎥ ⎢⎣ 0 ⎥⎦
12.5
⎢ 0
⎢⎣ 2 ⎥⎦
h[0] = 0.707
h[1 = 0.34
h[2] = −0.226
Plot the filter performance for the above values of h[0], h[1] and h[2]. The following
figure shows the performance of the 20-tap FIR wiener filter for noise filtering.
Example 2 : Active Noise Control
Suppose we have the observation signal y[ n] is given by
y[n] = 0.5cos( w0 n + φ ) + v1 [n]
noise. We want to control v1[n] with the help of another correlated noise v2 [n] given by
v2 [n] = 0.8v[n − 1] + v[n]
v2 [n]
2-tap FIR
filter
and
⎡1.64 0.8 ⎤
RV2V2 = ⎢ ⎥ and
⎣0.8 1.64 ⎦
⎡1.48 ⎤
rV1V2 = ⎢ ⎥
⎣0.6 ⎦
⎡ h[0]⎤ ⎡ 0.9500 ⎤
∴⎢ ⎥=⎢ ⎥
⎣ h[1] ⎦ ⎣-0.0976 ⎦
Example 3:
(Continuous time prediction) Suppose we want to predict the continuous-time process
X (t ) at time (t + τ ) by
Xˆ (t + τ ) = aX (t )
Then by orthogonality principle
E ( X (t + τ ) − aX (t )) X (t ) = 0
RXX (τ )
⇒a=
RXX (0)
RXX (τ )
∴a = = e − Aτ
RXX (0)
Observe that for such a process
E ( X (t + τ ) − aX (t )) X (t − τ 1 ) = 0
= RXX (τ + τ 1 ) − aRXX (τ 1 )
= RXX (0)e − A(τ +τ1 ) − e − A(τ ) RXX (0)e − A(τ1 )
=0
Therefore, the linear prediction of such a process based on any past value is same as the
linear prediction based on current value.
y[n] xˆ[n]
h[n]
We have to minimize Ee 2 [n] with respect to each h[i ] to get the optimal estimation.
∑ h[i]
i =0
RYY [ j − i ] = RXY [ j ], j = 0, 1, .....
• We have to find h[i ], i = 0,1,...∞ by solving the above infinite set of equations.
SYY ( z ) = σ v2 H c ( z ) H c ( z −1 )
y[n] v[n]
1
H1 ( z ) =
H c ( z)
Whitening filter
Wiener filter
Now h2 [n] is the coefficient of the Wiener filter to estimate x[ n] I from the innovation
sequence v[n]. Applying the orthogonality principle results in the Wiener Hopf equation
∞
xˆ ( n) = ∑ h2 (i )v( n − i )
i =0
⎧ ∞
⎫
E ⎨ x[n] − ∑ h2 [i ]v[n − i ]⎬ v[n − j ]) = 0
⎩ i =0 ⎭
∞
∴ ∑ h2 [i ]RVV [ j − i ] = RXV [ j ], j = 0,1,...
i =0
So that
RXV [ j ]
h2 [ j ] = j≥ 0
σV 2
H 2 ( z) =
[ S XV ( z )] +
σV 2
series expansion of S XV ( z ).
∞
v[n] = ∑ h1[i ] y[n − i ]
i =0
RXV [ j ] = Ex[n]v[n − j ]
∞
= ∑ hi [i ]E x[n] y[n - j - i ]
i =0
∞
= ∑ h [i]R
i =0
1 XY [j + i]
1
S XV ( z ) = H1 ( z −1 ) S XY ( z ) = S XY ( z )
H c ( z −1 )
1 ⎡ S XY ( z ) ⎤
∴ H 2 ( z) = ⎢ ⎥
σ V2 ⎣ H c ( z −1 ) ⎦ +
Therefore,
1 ⎡ S XY ( z ) ⎤
H ( z ) = H1 ( z ) H 2 ( z ) =
σ H c ( z ) ⎢⎣ H c ( z −1 ) ⎥⎦ +
2
V
We have to
• find the power spectrum of data and the cross power spectrum of the of the
desired signal and data from the available model or estimate them from the data
• factorize the power spectrum of the data using the spectral factorization theorem
1 π 1 π
= ∫πS ( w)dw − ∫ π H (w)S
*
( w)dw
2π 2π
− X − XY
1 π
= ∫ π (S ( w) − H ( w) S XY
*
( w))dw
2π − X
1
=
2π ∫ (S
C
X ( z ) − H ( z ) S XY ( z −1 )) z −1dz
Example 4:
y[n] = x[n] + v1[n] observation model with
x[ n] = 0.8 x [ n -1] + w[ n]
where v1[n] is and additive zero-mean Gaussian white noise with variance 1 and w[ n]
is zero-mean white noise with variance 0.68. Signal and noise are uncorrelated.
Find the optimal Causal Wiener filter to estimate x[ n].
Solution:
w[n] 1 x[n]
1 − 0.8z −1
0.68
S XX ( z ) =
(1 − 0.8 z −1 ) (1 − 0.8 z )
RYY [m] = E y[n] y[n − m]
= E ( x[n] + v[m])( x[n − m] + v[n − m])
= RXX [m] + RVV [m] + 0 + 0
SYY ( z ) = S XX ( z ) + 1
Factorize
0.68
SYY ( z ) = +1
(1 − 0.8z −1 ) (1 − 0.8 z )
2(1 − 0.4 z −1 )(1 − 0.4 z )
=
(1 − 0.8z −1 ) (1 − 0.8z )
(1 − 0.4 z −1 )
∴ H c ( z) =
(1 − 0.8 z −1 )
and
σ V2 = 2
Also
RXY [m] = E x[n] y[n − m]
= E ( x[n])( x[n − m] + v[n − m])
= RXX [m]
S XY ( z ) = S XX ( z )
0.68
=
(1 − 0.8z −1 ) (1 − 0.8z )
1 ⎡ S XY ( z ) ⎤
∴ H ( z) = ⎢ ⎥
σ V2 H c ( z ) ⎣ H c ( z −1 ) ⎦ +
1 (1 − 0.8 z −1 ) ⎡ 0.68 ⎤
2 (1 − 0.4 z −1 ) ⎣ (1 − 0.8 z −1 )(1 − 0.8 z ) ⎦⎥ +
⎢
=
0.944
=
(1 − 0.4 z ) −1
h[n] = 0.944(0.4) n n ≥ 0
y[n] xˆ[n]
H ( z)
∑
i =−∞
h[i ] RYY [ j − i ] = RXY [ j ], j = −∞,...0, 1, ...∞
H ( z ) SYY ( z ) = S XY ( z )
so that
S XY ( z )
H ( z) =
SYY ( z )
or
S XY ( w)
H ( w) =
SYY ( w)
5.9 Mean Square Estimation Error – IIR Filter (Noncausal)
The mean square error of estimation is given by
⎛ ∞
⎞
E ( e 2 [n ] = Ee[n ] ⎜ x[n ] − ∑ h[i ] y[n − i ] ⎟
⎝ i =−∞ ⎠
=Ee[n ] x[n ] ∵ error isorthogonal to data
⎛ ∞
⎞
=E ⎜ x[n ] −
⎝
∑
i =−∞
h[i ] y[n − i ] ⎟ x[n ]
⎠
∞
=RXX [0] − ∑
i =−∞
h[i ]RXY [i ]
1 π 1 π
= ∫π S X ( w)dw − ∫ π H ( w) S
*
( w)dw
2π 2π
XY
− −
1 π
= ∫ π (S ( w) − H ( w) S XY
*
( w))dw
2π
X
−
1
=
2π ∫ (S
C
X ( z ) − H ( z ) S XY ( z −1 )) z −1dz
where v[ n] is and additive zero-mean Gaussian white noise with variance σ V2 . Signal
and noise are uncorrelated
SYY ( w) = S XX ( w) + SVV ( w)
and
S XY ( w) = S XX ( w)
S XX ( w)
∴ H ( w) =
S XX ( w) + SVV ( w)
S XY ( w)
SVV ( w)
=
S XX ( w
) +1
SVV ( w)
Suppose SNR is very high
H ( w) ≅ 1
(i.e. the signal will be passed un-attenuated).
When SNR is low
S XX ( w)
H ( w) =
SVV ( w)
(i.e. If noise is high the corresponding signal component will be attenuated in proportion
of the estimated SNR.
H ( w)
Æ SNR
Å Signal
Noise
Æ w
Example 7:
Consider the signal in presence of white noise given by
x[ n] = 0.8 x [ n -1] + w[ n]
where v[ n] is and additive zero-mean Gaussian white noise with variance 1 and w[ n]
is zero-mean white noise with variance 0.68. Signal and noise are uncorrelated.
Find the optimal noncausal Wiener filter to estimate x[ n].
0.68
H ( z) =
SXY ( z )
=
(1-0.8z ) (1 − 0.8z )
-1
0.34
= One pole outside the unit circle
(1-0.4z ) (1 − 0.4z )
-1
0.4048 0.4048
= +
1-0.4z -1 1-0.4z
∴ h[n] = 0.4048(0.4) n u (n) + 0.4048(0.4) − n u (− n − 1)
h[ n]
n
CHAPTER – 6: LINEAR PREDICTION OF SIGNAL
6.1 Introduction
Given a sequence of observation
y[ n - 1], y[ n - 2], …. y[ n - M ], what is the best prediction for y[ n]?
(one-step ahead prediction)
The minimum mean square error prediction yˆ[ n] for y[ n] is given by
is the prediction error and the corresponding filter is called prediction error filter.
Linear Minimum Mean Square error estimates for the prediction parameters are given by
the orthogonality relation
E e[n] y[n - j ] = 0 for j = 1, 2 ,… , M
M
∴ E ( y[n] − ∑ h[i ] y[n - i ])y[n − j ] = 0 j = 1, 2 ,…, M
i =1
M
⇒ RYY [ j ] - ∑ h[i] R
i =1
YY [ j - i] = 0
M
⇒ RYY [ j ] = ∑ h[i] R
i =1
YY [ j - i] j = 1, 2 ,…, M
which is the Wiener Hopf equation for the linear prediction problem and same as the Yule
Walker equation for AR (M) Process.
In Matrix notation
⎡ RYY [0] RYY [1] .... RYY [ M − 1] ⎤ ⎡ h[1] ⎤ ⎡ RYY [1] ⎤
⎢ R [1] R [o] ... R [ M - 2] ⎥ ⎢ h[2] ⎥ ⎢ R [2] ⎥
⎢ YY YY YY ⎥ ⎢ ⎥ ⎢ YY ⎥
⎢. ⎥ ⎢. ⎥ ⎢. ⎥
⎢ ⎥ ⎢ ⎥ = ⎢ ⎥
⎢. ⎥ ⎢. ⎥ ⎢. ⎥
⎢. ⎥ ⎢. ⎥ ⎢. ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣ RYY [ M -1] RYY [ M - 2] ...... RYY [0]⎥⎦ ⎢⎣ h[ M ]⎥⎦ ⎢⎣ RYY [ M ]⎦⎥
R YY h = rYY
∴ h = ( R YY ) −1 rYY
i =1
= Ey[n]e[n]
M
= Ey[n]( y[n] − ∑ h[i ] y[n − i ])
i =1
M
= RYY [0] − ∑ h[i ]RYY [i ])
i =1
6.4 Forward Prediction Problem
The above linear prediction problem is the forward prediction problem. For notational
simplicity let us rewrite the prediction equation as
M
yˆ [n] = ∑h
i =1
M [i ] y[n - i ]
Example 1:
Find the second order predictor for y[n ] given y[ n] = x[ n] + v[ n], where v[n ] is a 0-mean
white noise with variance 1 and uncorrelated with x[n ] and x[ n] = 0.8 x[ n - 1] + w[ n] ,
w[n] is a 0-mean random variable with variance 0.68
1 − (0.8) 2
RYY [0] = 2.89, RYY [1] = 1.51 and RYY [2] = 1.21
∑h
i =1
m +1 [i ] RYY [ m + 1 − i ] + hm +1[ m + 1]RYY [0] = RYY [ m + 1] (4)
−1
From equation (3) premultiplying by R YY , we get
∑{h
i =1
m [i ] + k m +1 hm [m + 1 − i ] }RYY [m + 1 − i] − k m +1 RYY [0] = RYY [m + 1]
m m
ε [ m]
m
∑h m [i]RYY [m + 1 − i ]
= i =0
ε [ m]
where
m
ε [m] = RYY [0] − ∑ hm [m + 1 − i] RYY [m + 1 − i] is the mean - square prediction error.
i =1
Here we have used the assumption that hm [0] = −1
m +1
∴ ε [m + 1] = RYY [0] − ∑ hm+1 [m + 2 − i ] RYY [m + 2 − i ]
i =1
• If km < 1, the LPC error filter will be minimum-phase, and hence the corresponding
where
n
emf [n ] = forward prediction error = y[n ] − ∑ hm [i ] y[n − i ]
i =1
and
n
emb [n ] = backward prediction error=y[n − m ] − ∑ hm [m + 1 − i ] y[n + 1 − i ]
i =1
For m = 0,
ε [0] = RYY [0]
For m = 1,2,3...
m −1
∑h [i ]RYY [m − i ]
m −1
km = i =0
ε [m − 1]
hm [i ] = hm −1 [i ] + k m hm −1 [m − i ] , i = 1,2..., m - 1
hm [m] = −k m
ε m = ε m−1 (1 − k m2 )
Go on computing up to given final value of m.
m
emb [n] = y[n - m] - ∑ hm [m + 1 − i ] y[n + 1 − i ]
i =1
km
−1 +
emb [n]
z
b
e m −1 [ n]
y[n] k1
z−1 +
=> It follows from the fact that for W.S.S. signal, em [n ] sequences as a function of m are
uncorrelated.
eib [n] and emb [ n] 0≤ i < m
=0 for 0 ≤ k < m
Thus the lattice filter can be used to whiten a sequence.
With this result, it can be shown that
E (emf −1 [n]emb −1 [n − 1])
km = − (i)
E (emb −1 [n − 1])
2
and
km = −
(
E emf −1 [n]emb −1 [n − 1] )
( ) 2
(ii)
f
Ee m −1 [ n]
Proof:
Mean Square Prediction Error
(
= E emf [n] )
2
= E (e )
2
f
m −1 [n] + k m emb −1 [n − 1]
Example 2:
Consider the random signal model y[ n] = x[ n] + v[ n], where v[n ] is a 0-mean white noise
with variance 1 and uncorrelated with x[n ] and x[ n] = 0.8 x[ n - 1] + w[ n] , w[n] is a 0-
mean random variable with variance 0.68
a) Find the second –order linear predictor for y[n ]
b) Obtain the lattice structure for the prediction error filter
c) Use the above structure to design a second-order FIR Wiener filter to estimate
x[ n] from y[ n].
Adaptive Filters
7.1 Introduction
In practical situations, the system is operating in an uncertain environment where the
input condition is not clear and/or the unexpected noise exists. Under such circumstances,
the system should have the flexible ability to modify the system parameters and makes
the adjustments based on the input signal and the other relevant signal to obtain optimal
performance.
A system that searches for improved performance guided by a computational algorithm
for adjustment of the parameters or weights is called an adaptive system. The adaptive
system is time-varying.
How to do this?
• Assume stationarity within certain data length. Buffering of data is required and
may work in some applications.
• The time-duration over which stationarity is a valid assumption, may be short so
that accurate estimation of the model parameters is difficult.
• One solution is adaptive filtering. Here the filter coefficients are updated as a
function of the filtering error. The basic filter structure is as shown in Fig. 1.
Adaptive algorithm
e[n]
x[ n ]
The filter structure is FIR of known tap-length, because the adaptation algorithm updates
each filter coefficient individually.
7.2 Method of Steepest Descent
Consider the FIR Wiener filter of length M. We want to compute the filter coefficients
iteratively.
Let us denote the time-varying filter parameters by
hi [n], i = 0,1, ... M - 1
and define the filter parameter vector by
⎡ h0 [n ] ⎤
⎢ h [n ] ⎥
h[n ] = ⎢ 1 ⎥
⎢# ⎥
⎢ ⎥
⎣ hM −1[n ]⎦
We want to find the filter coefficients so as to minimize the mean-square error Ee 2 [n]
where
e[n ] = x[n ] − xˆ[n ]
M −1
= x[n] - ∑ hi [n ] y[n − i ]
i =0
⎡ y[ n ] ⎤
⎢ y[n − 1] ⎥
where y[n ] = ⎢ ⎥
⎢# ⎥
⎢ ⎥
⎣ y[n − M + 1]⎦
Therefore
Ee 2 [n] = E ( x[n] − h ′[n]y[n]) 2
= R XX [0] − 2h ′[n]rxy + h ′[n]R YY h[n]
⎡ R XY [0] ⎤
⎢ R [1] ⎥
where rXY = ⎢ XY ⎥
⎢# ⎥
⎢ ⎥
⎣ R XY [ M − 1]⎦
and
⎡ RYY [0] RYY [−1] .... RYY [1 − M ]⎤
⎢ R [1] RYY [0] .... RYY [2 − M ]⎥⎥
R YY = ⎢ YY
⎢... ⎥
⎢ ⎥
⎣⎢ RYY [ M − 1] RYY [ M − 2] .... RYY [0] ⎦⎥
Many of the adaptive filter algorithms are obtained by simple modifications of the
algorithms for deterministic optimization. Most of the popular adaptation algorithms are
based on gradient-based optimization techniques, particularly the steepest descent
technique.
The optimal Wiener filter can be obtained iteratively by the method of steepest descent.
The optimum is found by updating the filter parameters by the rule
µ
h[n + 1] = h[n ] + ( −∇Ee 2 [n ] )
2
where
⎡ ∂Ee∂h [ n ] ⎤
2
⎢ 0 ⎥
⎢.........⎥
⎢ ⎥
∇Ee 2 [n] = ⎢.........⎥
⎢........ ⎥
⎢ ∂Ee2 [ n ] ⎥
⎢⎣ ∂hM −1 ⎥⎦
= −2rXY + 2R YY h[n]
and µ is the step-size parameter.
So the steepest descent rule will now give
h[n + 1] = h[n ] + µ (rXY − R YY h[n ] )
7.3 Convergence of the steepest descent method
We have
h[n + 1] = h[n ] + µ (rXY − R YY h[n ] )
= h[n ] − µR YY h[n ] + µrXY
= (I − µR YY )h[n ] + µrXY
where I is the MxM identity matrix.
This is a coupled set of linear difference equations.
R YY = QΛQ′
where Q is the orthogonal matrix of the eigenvectors of R YY .
Λ is a diagonal matrix with the corresponding eigen values as the diagonal elements.
Also I = QQ ′ = Q ′Q
Therefore
h[n + 1] = (QQ′ − µQΛQ′)h[n] + µ ⋅ rXY
Multiply by Q′
Q′h[n + 1] = (I − µΛ)Q′h[n] + µQ′rXY
Define a new variable
h[n ] = Q′h[n ] and rXY = Q′rXY
Then
h[n + 1] = (I − µΛ)h[n ] + µ rXY
and can be easily solved for stability. The stability condition is given by
1 − µλi < 1
⇒ −1 < 1 − µλi < 1
⇒ 0 < µ < 2 / λi , i = 1,.......M
if the step size µ is within the range of specified by the above relation.]
Thus the rate of convergence depends on the statistics of data and is related to the eigen
value spread for the autocorrelation matrix. This rate is expressed using the condition
λmax
number of R YY , defined as k = where λmax and λmin are respectively the maximum
λmin
and the minimum eigen values of R YY . The fastest convergence of this system occurs
when k = 1, corresponding to white noise.
⎢ 0 ⎥
⎢.........⎥
⎢ ⎥
∇Ee [n] = ⎢.........⎥
2
⎢........ ⎥
⎢ ∂Ee 2 [ n ] ⎥
⎢⎣ ∂hM −1 ⎥⎦
⎢...........⎥
⎢ ⎥
⎢ ∂e[n] ⎥
⎢⎣ ∂hM −1 ⎥⎦
Now consider
M −1
e[n ] = x[n ] − ∑ hi [n ] y[n − i ]
i =0
∂e[n].
= − y[n − j ], j = 0,1,.......M − 1
∂h j
⎡ ∂e[n] ⎤ ⎡ y[n] ⎤
⎢ ∂h ⎥ ⎢ y[n − 1] ⎥
⎢ 0 ⎥ ⎢ ⎥
⎢......... ⎥ ⎢................ ⎥
∴⎢ ⎥ = −⎢ ⎥ = − y[n]
⎢ ..........⎥ ⎢ ................ ⎥
⎢ ∂e[n] ⎥ ⎢.............. ⎥
⎢ ⎥ ⎢ ⎥
⎣⎢ ∂hM −1 ⎦⎥ ⎣⎢ y[n − M + 1]⎦⎥
∴ ∇Ee 2 [n ] = −2e[n ]y[n ]
The steepest descent update now becomes
h[n + 1] = h[n] + µe[n ]y[n]
This modification is due to Widrow and Hopf and the corresponding adaptive filter is
known as the LMS filter.
2 For n > 0
Filter output
xˆ[n ] = h′[n ]y[n ]
x[n ]
y[n ] xˆ[n ] e[n ]
FIR filter
hi [n], i = 0,1,..M − 1 +
LMS algorithm
2
• 0<µ <
Trace(R YY )
• Also note that trace,Trace(R YY ) = MR YY [0] = Tape input power of the LMS
filter.
Generally, a too small value of µ results in slower convergence where as big values of µ
will result in larger fluctuations from the mean. Choosing a proper value of µ is very
important for the performance of the LMS algorithm.
In addition, the rate of convergence depends on the statistics of data and is related to the
eigenvalue spread for the autocorrelation matrix. This is defined using the condition
λmax
number of R YY , defined as k = where λmin is the minimum eigenvalue of R YY . The
λmin
fastest convergence of this system occurs when k = 1, corresponding to white noise. This
states that the fastest way to train a LMS adaptive system is to use white noise as the
training input. As the noise becomes more and more colored, the speed of the training
will decrease.
The average of each filter tap –weight converges to the corresponding optimal filter tap-
weight. But this does not ensure that the coefficients converge to the optimal values.
∆h[ n] = h[ n] − h opt
The LMS algorithm is said to converge in the mean-square sense provided the step-length
parameter satisfies the relations
M
µλi
µ∑ <1
i =1 2 − µλi
2
and 0<µ <
λ max
M
µλ i
If ∑ 2 − µλ
i =1
<< 1
i
M
µλi
then ε excess = ε min ∑
i =1 2 − µλi
Further, if
µ << 1
1
Trace(R YY )
ε excess = ε min µ 2
1− 0
1
ε min µ Trace(R YY )
2
ε excess M µλi
The factor =∑ is called the misadjustment factor for the LMS filter.
ε min i =1 2 − µλi
Solution:
From the given model
x[ n] y[ n ] xˆ[ n]
Chan +
Equqlizer
+
+
nel
+
Noise
σW 2
RXX [m] = (0.8)|m|
1 − 0.8 2
where h[n] is the modulus of the LMS weight vector and α is a positive quantity.
2 2
0<µ < M −1
= 2
y[n]
∑ y 2 [n − i]
i =0
Then we can take
2
µ=β 2
y[n]
Notice that the NLMS algorithm does not change the direction of updation in steepest
descent algorithm
2
If y[ n] is close to zero, the denominator term ( y[n] ) in NLMS equation becomes very
small and
1
h[n + 1] = h[n] + β 2
e[n]y[n] may diverge
y[n]
To overcome this drawback a small positive number ε is added to the denominator term
the NLMS equation. Thus
1
h[n + 1] = h[n] + β e[n]y[n]
ε + y[n]
2
For computational efficiency, other modifications are suggested to the LMS algorithm.
Some of the modified algorithms are blocked-LMS algorithm, signed LMS algorithm etc.
LMS algorithm can be obtained for IIR filter to adaptively update the parameters of the
filter
M −1 N −1
y[n] = ∑ ai [n] y[ n − i ] + ∑ bi [ n] x[ n − i ]
i =1 i =0
How ever, IIR LMS algorithm has poor performance compared to FIR LMS filter.
7.12 Recursive Least Squares (RLS) Adaptive Filter
• LMS convergence slow
• Step size parameter is to be properly chosen
• Excess mean-square error is high
• LMS minimizes the instantaneous square error e 2 [n ]
• Where e[n ] = x[n ] - h′[n ] y[n ] = x[n ] - y ′[n ] h[n ]
The RLS algorithm considers all the available data for determining the filter parameters.
The filter should be optimum with respect to all the available data in certain sense.
⎡h0 [n ] ⎤
with respect to the filter parameter vector h[n ] = ⎢h1 [n ] ⎥
⎢ ⎥
⎢⎣hM −1 [n ]⎥⎦
k =0
(
ˆ XY [n ] −1 rˆXY [n ]
Hence h[n ] = R )
Matrix inversion is involved which makes the direct solution difficult. We look forward
for a recursive solution.
= λR
ˆ YY [n − 1] + y[n ] y ′[n ]
This shows that the autocorrelation matrix can be recursively computed from its previous
values and the present data vector.
Similarly rˆXY [n ] = λrˆXY [n − 1] + x[n ]y[n ]
h[n ] = R [
ˆ [n ] −1 rˆ [n ]
YY XY ]
= (λR )−1
ˆ YY [n − 1] + y[n ]y[n]′ rˆXY [n ]
For the matrix inversion above the matrix inversion lemma will be useful.
Taking A = λR
ˆ [n − 1], B = y[n ], C = 1 and D = y ′[n ]
YY
we will have
−1
(Rˆ yy [ n ] )
−1
=
1 ˆ −1
λ
1 ˆ −1
R YY [n − 1] − R
λ
⎛ 1 ˆ −1
[n − 1]y[n ] ⎜ y ′[n ] R
λ
[ ] ⎞ ′ ˆ −1
YY [ n − 1] y[ n ] + 1⎟ y [ n ]R YY [ n − 1]
YY
⎝ ⎠
1 ⎛ ˆ −1 ˆ −YY
R 1
[n − 1]y[n ]y[n ′]R ˆ −YY
1
[n − 1] ⎞
⎜
= ⎜ R YY [ n − 1] − ⎟
λ⎝ λ + y ′[n ]Rˆ −YY
1
[n − 1]y[n ] ⎟⎠
−1
Rename P[n ] = R
ˆ YY [n ]. Then
P[n] =
1
(P[n − 1] − k[n]y ′[n]P[n − 1])
λ
where k[n] is called the ‘gain vector’ and given by
P[n − 1]y[n]
k[ n ] =
λ + y ′[n]P[n − 1]y[n]
k[n] important to interpret adaptation is also related to the current data vector y[n]
by
k[ n] = P[ n]y[ n]
To establish the above relation consider
P[n] =
1
(P[n − 1] − k[n]y ′[n]P[n − 1])
λ
Multiplying by λ and post-multiplying by y[n] and simplifying we get
λP[n]y[n ] = (P[n − 1] − k[n ]y ′[n ]P[n − 1])y[n ]
= P[n − 1]y[n ] − k[n ]y ′[n ]P[n − 1]y[n ]
= λk[n ]
Therefore
( )
ˆ YY [n ] −1 rˆXY [n ]
h[n ] = R
= P[n ](λrˆXY [n − 1] + x[n ]y[n ])
= λP[n ]rˆXY [n − 1] + x[n ]P[n ]y[n ]
= λ λ1 [P[n − 1] − k[n ]y ′[n ]P[n − 1]]rˆXY [n − 1] + x[n ]P[n ]y[n ]
= h[n − 1] − k[n ] y ′[n ] h[n − 1] + x[n ] k[n ]
= h[n − 1] + k[n ]( x[n ] − y ′[n ]h[n − 1])
Choose λ
Operation:
For 1 to n = Final do
1. Get x[ n], y[ n]
2. Get e[n ] = x[n ] − h′[n − 1]y[n ]
P[n − 1]y[n]
3. Calculate gain vector k[n] =
λ + y ′[n]P[n − 1]y[n]
4. Update the filter parameters
h[n ] = h[n − 1] + k[n ]e[n ]
5. Update the P matrix
P[n] =
1
(P[n − 1] − k[n]y ′[n]P[n − 1])
λ
end do
Dividing by n + 1
n
ˆ [n]
R ∑λ n−k
y[k ]y ′′[k ]
YY
= k =0
n +1 n +1
ˆ [n]
R YY
if we consider the elements of , we see that each is a estimator for the auto-
n +1
correlation of specific lag.
Rˆ [ n]
lim YY
= R YY [n]
n →∞ n + 1
Corresponding to
ˆ −1 [0] = δ I
R YY
we have ˆ [0] = I
R YY
δ
With this initial condition the matrix difference equation has the solution
n
ˆ [−1] + ∑ λ n − k y[k ]y′[k ]
[n] = λ n +1R
R YY
k =0
=λ n +1 ˆ [−1] + R
R YY
ˆ [ n]
YY
I
= λ n +1 +R
ˆ [ n]
δ YY
8.1 Introduction
Noise
The basic mechanism in Kalman filter ( R.E. Kalman, 1960) is to estimate the signal
recursively by the following relation
x̂[n] = A n x̂[n − 1] + K n y[n]
The whole of Kalman filter is also based on the innovation representation of the signal.
We used this model to develop causal IIR Wiener filter.
Example 1:
Consider the AR ( M ) model
x[n] = a1 x[n − 1] + a2 x[n − 2]+....+aM x[n − M ]+w[n]
⎢ ⎥
⎣⎢ xM [n]⎦⎥
⎡ a1 a2 .. ..aM ⎤
⎢1 0 .. .. 0 ⎥
A=⎢ ⎥
⎢0 1 .. .. 0 ⎥
⎢ ⎥
⎢⎣0 0 .. .. 1 ⎥⎦
⎡1 ⎤
⎢0 ⎥
and b = ⎢ ⎥
⎢.. ⎥
⎢ ⎥
⎣0 ⎦
Our analysis will include only the simple (scalar) Kalman filter
The Kalman filter also uses the innovation representation of the stationary signal as
does by the IIR Wiener filter. The innovation representation is shown in the following
diagram.
Let xˆ[ n] be the LMMSE of x[ n] based on the data y[0], y[1]"" , y[ n].
In the above representation ~y [n] is the innovation of y[n] and contains the same
The LMMSE estimation of x[n ] based on y[0], y[1]"" , y[n ], is same as the estimation
based on the innovation sequence , ~y [0], y[1],.... ~y [ n − 1], ~y [ n] . Therefore,
n
xˆ[n] = ∑ k i ~
y [ n]
i =0
Similarly
x[n − 1] = xˆ[n − 1] + e[n − 1]
n −1
= ∑ ki′y[i ] + e[n − 1]
i =0
Or
y[n]
+
kn +
xˆ[n]
-
z −1
e[n ] is orthogonal to current and past data. First consider the condition that e[n ] is
orthogonal to the current data.
∴ Ee[n] y[n] = 0
⇒ Ee[n]( x[n] + v[n]) = 0
⇒ Ee[n]x[n] + Ee[n]v[n] = 0
⇒ Ee[n]( xˆ[n] + e[n]) + Ee[n]v[n] = 0
⇒ Ee 2 [n] + Ee[n]v[n] = 0
⇒ ε 2 [n] + E ( x[n] − An xˆ[n − 1] − kn y[n])v[n] = 0
⇒ ε 2 [n] − knσ V2 = 0
ε 2 [ n]
⇒ kn =
σ V2
σ W2 + a 2 ε 2 [n − 1]
Hence ε 2 [ n] = σ V2
σ W + σ V + a ε [n − 1]
2 2 2 2
We have still to find ε [0]. For this assume x[ −1] = xˆ[ −1] = 0. Hence from the relation
σ X 2σ V 2
We get ε 2 [0] =
σ X 2 + σV 2
σ X + σV 2
Step 2 Calculate
ε 2 [ n]
kn = 2
σV
Step 3 Input y[n]. Estimate xˆ[ n] by
Predict
xˆ[n / n − 1] = axˆ[n − 1]
correct
xˆ[n] = xˆ[n / n − 1] + kn ( y[n] − yˆ[n / n − 1])
∴ xˆ[n] = axˆ[n − 1] + kn ( y[n] − axˆ[n − 1]
Step 4 n = n + 1. Calculate
σ W2 + a 2ε 2 [n − 1]
ε [ n] = 2
2
σ V2
σ W + σ V + a ε [n − 1]
2 2 2
Step 5 Go to Step 2
0.25 + 0.62 ε 2
We get ε = 2
0.5
0.25 + 0.5 + 0.6ε 2
Solving and taking the positive root
ε2 = 0.320
kn = ε 2 = 0.390
where w[n] is the 0-mean Gaussian noise vector with covariance matrix QW .
where v[n] is the 0-mean Gaussian noise vector with covariance matrix QV .
ˆ n / n] = x[
Denote x[ ˆ n] = best linear estimate of x[n] given y[0], y[1],..., y[ n].
With these definitions and notations the vector Kalman filter algorithm is as follows:
(b) Observation parameter matrix c[n], n = 01, 2... and the observation noise
covariance matrix Q V
9.1 Introduction
The aim of spectral analysis is to determine the spectral content of a random process from
a finite set of observed data.
Spectral analysis is s very old problem: Started with the Fourier Series (1807) to solve
the wave equation.
Strum generalized it to arbitrary function (1837)
Schuster devised periodogram (1897) to determine frequency content numerically.
Consider the definition of the power spectrum of a random sequence {x[n],−∞ < n < ∞}
∞ − j 2π wm
S XX ( w) = ∑R
m =−∞
XX [m]e
Note that
N −1− m
1
E Rˆ XX [m] = ∑ E ∑ x[n]x[n + m]
N n =0
N−m
= RXX [m]
N
m
= RXX [m] − RXX [m]
N
⎛N− m⎞
=⎜ ⎟ RXX [m]
⎝ N ⎠
Hence Rˆ XX [m ] is a biased estimator of R XX [m ]. Had we divided the terms under
This means that the estimated autocorrelation values are highly correlated
The variance of Rˆ XX [m ] is obtained from above as
∞
∑ (R [ n ] + R XX [ n − m ]R XX [ n + m ])
1
v a r ( Rˆ X X [ m ] ≅ XX
2
N n = −∞
Note that the variance of Rˆ XX [m ] is large for large lag m, especially as m approaches
N.
( )
Also as N → ∞ , var Rˆ XX [m] → 0 provided ∑ R XX [n ] < ∞.
2 ∞
n = −∞
( )
As N → ∞, var( Rˆ XX [m ]) → 0. Though sample autocorrelation function is a consistent
estimator, its Fourier transform is not and here lies the problem of spectral estimation.
′ [m ] is not used for spectral
Though unbiased and consistent estimators for RXX [m ], Rˆ XX
Sˆ XX
p
( w) gives the power output of band pass filters of impulse response
1 n
hi [n] = e − wi n rect ( )
N N
hi [n] is a very poor band-pass filter.
Also
N −1
Sˆ XX
p
( w) = ∑
m =− ( N −1)
Rˆ XX [m]e − jwm
N −1−|m|
1
where Rˆ XX [m] =
N
∑
n=0
x[n]x[n + m]
N −1
So Sˆ XX
p
( w) = ∑
m =− ( N −1)
Rˆ XX [m]e − jwm
N −1
E Sˆ XX
p
( w) = ∑
m =− ( N −1)
E ⎡⎣ Rˆ XX [m]⎤⎦ e − jwm
N −1 ⎛ m⎞
= ∑ ⎜ 1 − ⎟RXX [m]e
m =− ( N −1) ⎝
− jwm
N ⎠
as N → ∞ the right hand side approaches true power spectral density S XX ( f ). Thus the
periodogram is an asymptotically unbiased estimator for the power spectral density.
To prove consistency of the periodogram is a difficult problem.
We consider the simple case when a sequence of Gaussian white noise in the following
example.
2πk
Let us examine the periodogram only at the DFT frequencies wk = , k = 0,1,... N − 1.
N
Example 1:
The periodogram of a zero-mean white Gaussian sequence x[n ] , n = 0, . . . , N-1.
The power spectral density is given by
σ x2
S XX ( w) = -π < w ≤ π
2π
The periodogram is given by
2
1 ⎛ N −1 ⎞
Sˆ ( w) = ⎜ ∑ x[n]e − jwn ⎟
p
XX
N ⎝ n=0 ⎠
2πk
Let us examine the periodogram only at the DFT frequencies wk = , k = 0,1,... N − 1.
N
N −1 2π 2
1 −j
∑ x[n]e
kn
Sˆ ( k ) =
p
XX
N
N n =0
N −1 2
1
=
N
∑ x[n]e
n =0
− jwK n
2 2
⎛ N −1 1 ⎞ ⎛ N −1 x[n ]sin wK n ⎞
= ⎜∑ x[n ]cos wK n ⎟ + ⎜ ∑ ⎟
⎝ n =0 N ⎠ ⎝ n =0 N ⎠
= C X ( w K ) + S X ( wK )
2 2
var ( CK (w K ) ) = σ X σ X2 ⎛ 1 + cos 2 wK n ⎞
2 N −1 N −1
N
∑ cos2 wK n =
n =0 N
∑ ⎜⎝n =0 2
⎟
⎠
σ X2 ⎛ N sin NwK ⎞
= ⎜ + cos ( N − 1) wK ⎟
N ⎝2 2sin wK ⎠
⎛1 w ⎞
= σ X2 ⎜ + cos ( N − 1) wK sin N K ⎟
⎝2 2 N sin wK ⎠
2π
For wK = k
N
sin NwK N
=0 k ≠ 0, k ≠ (assuming N even).
2sin wK 2
σ X2
∴ var ( CK (w K ) ) = for k =/ 0
2
sin NwK
Again = 1 for k = 0.
N sin wK
∴ var ( CK (w K ) ) = σ X2 for k = 0, k =
N
2
Similarly considering the sine part
N-1
1
S K ( wK ) =
N
∑ x[n]sin w
n =0
K n
EY = E ( X 12 + X 22 + . . . + X N2 ) = Nσ X2 and variance 2 N σ X4 .
Sˆ XX [k ] = C X [k ] + S X2 [k ].
It is a χ 22 distribution.
σ σ 2 2
E Sˆ XX [k ] = X + X = σ X2 = S XX [k ]
2 2
⇒ S XX [k ] is unbiased
ˆ
2
⎛σ 2 ⎞
var (Sˆ XX [k ]) = 2 × 2 ⎜ X ⎟ = S XX
2
[k ] which is independent of N
⎝ 2 ⎠
⇒ Sˆ XX [0] is unbiased
( )
and var Sˆ XX [0] = σ X4 = S XX
2
[0].
It can be shown that for the Gaussian independent white noise sequence at any
frequency w,
var (Sˆ XX
p
( w)) = S XX
2
( w)
SˆXX
p
( w)
w
−π π
For general case
N −1
Sˆ XX
p
( w) = ∑
m =− ( N −1)
Rˆ XX [m]e − jwm
N −1−|m|
1
where Rˆ XX [m] =
N
∑n =0
x[n]x[n + m], the biased estimator of autocorrelation fn.
N −1
⎛ | m |⎞ ˆ′
= ∑ ⎜1 −
m =− ( N −1) ⎝
⎟ RXX [m]e
N ⎠
− jwm
N-1
= ∑
m = -(N-1)
′ [m]e − jwm
wB [m]Rˆ XX
E Sˆ XX
p
( w) = W ( w) ∗ FT { E Rˆ [m]}
B
'
XX
= WB ( w) ∗ S XX ( w)
π
= ∫W
−π
B ( w − ξ ) S XX (ξ )dξ
As N → ∞ , E Sˆ XpX ( w ) → S XX ( w )
Now var Sˆ XX
p
( )
( w) cannot be found out exactly (no analytical tool). But an approximate
Cov Sˆ XX
p
(
( w1 ), Sˆ XX
p
( w2 ) )
⎡⎛ N ( w1 + w2 ) ⎞ ⎛
2
N ( w1 − w2 ) ⎞ ⎤
2
⎢⎜ sin ⎟ ⎜ sin ⎟ ⎥
⎢
~ S XX ( w1 ) S XX ( w2 ) ⎜ 2
⎟ +⎜
2
⎟ ⎥
⎢⎜ w1 + w2 ⎟ ⎜ w1 − w2 ⎟ ⎥
⎢⎜⎝ N sin( 2 ) ⎟⎠ ⎜⎝ N sin( 2 ) ⎟⎠ ⎥
⎣ ⎦
⎡ ⎛ sin Nw ⎞2 ⎤
ˆ p
(
var S XX ( w) ~ S XX ( w) ⎢1 + ⎜
2
) ⎟ ⎥
⎣⎢ ⎝ N sin w ⎠ ⎦⎥
∴ var Sˆ XX
p
{ 2
( w) ~ 2S XX ( w) } for w = 0, π
≅ S XX
2
( w) for 0 < w < π
k1 k
Consider w1 = 2π and f 2 = 2π 2 , k1 , k2 integers.
N N
Then
Cov Sˆ XX
p
(
( w1 ), Sˆ XX
p
( w2 ) ) ≅ 0
This means that there will be no correlation between two neighboring spectral estimates.
Therefore periodogram is not a reliable estimator for the power spectrum for the
following two reasons:
(
(1) The periodogram is not a consistent estimator in the sense that var Sˆ XX
p
)
( w) does
We have to modify Sˆ XX
p
( w) to get a consistent estimator for S XX ( f ).
1 K −1 ( k )
Then Sˆ XX ( w) = ∑ Sˆ XX ( w).
( av )
K m =0
As shown earlier,
π
ESˆ XX
(k )
( w) = ∫π
−
WB ( w − ξ )S XX (ξ )dξ
⎛ |m|
1− , | m |≤ M − 1
where wB [m ] = ⎜ L
⎜
⎝ 0, otherwise
2
⎛ wL ⎞
⎜ sin 2 ⎟
WB ( w) = ⎜
w ⎟
⎜ sin ⎟
⎝ 2 ⎠
L −1
Sˆ XX ( w) =
(k )
∑
m =− ( L −1)
Rˆ XX [m ]e − jwm
K −1
( w) = E {S XX ( w)}
1
ESˆ XX
av
( w) =
K
∑ ESˆ
k =0
(k )
XX
(k )
π
ESˆ XX ( w) = ∫W ( w − ξ ) S XX (ξ )dξ .
av
B
−π
To find the mean of the averaged periodogram, the true spectrum is now convolved with
the frequency WB ( f ) of the Barlett window. The effect of reducing the length of the data
from N points to L = N / K results in a window whose spectral width has been increased
by a factor K consequently the frequency resolution has been reduced by a factor K.
Original
WB (w)
Modified
var( Sˆ XX
av
( w)) is not simple to evaluate as 4th order moments are involved.
Simplification :
Assume the K data segments are independent.
⎧ K −1 ( k ) ⎫
Then var(Sˆ XX
av
( w)) = var ⎨∑ Sˆ XX ( w) ⎬
⎩ k =0 ⎭
K −1
1
2 ∑
= var Sˆ XX
(m)
( w)
K k =0
1 ⎛ ⎛ sin wL ⎞ 2 ⎞ 2
~ × K ⎜⎜ 1 + ⎜ ⎟ ⎟⎟ S XX ( w)
K2 ⎝ ⎝ L sin w ⎠ ⎠
1
~ original variance of the periodogram.
K
So variance will be reduced by a factor less than K because in practical situations, data
segments will not be independent.
estimator.
1 L −1 2
= ∑ w [ n]
L n=0
The window w[n] need not be an is a even function and is used to control spectral leakage.
L −1
∑ x[n]w[n]e
n=0
− jwn
is the DTFT of x[n] w[n] where w[n] = 1 for n = 0 , . . . L -1
= 0 otherwise
K −1
1
(3) Compute Sˆ XX
(Welch )
( w) =
K ∑ Sˆ
k =0
(mod)
XX ( w).
autocorrelation function at large lags. The window function w[m] has the following
properties.
0 < w[m] < 1
w[0] = 1
w[-m ] = w[m ]
w[m] = 0 for | m | > M .
w[0] = 1 is a consequence of the fact that the smoothed periodogram should not modify
π
a smooth spectrum and so ∫ W ( w)dw = 1.
−π
Issues concerned :
1. How to select w[m]?
There are a large numbers of windows available. Use a window with small side-lobes.
This will reduce bias and improve resolution.
How to select M. – Normally
N
M ~ or M ~ 2 N (Mostly based on experience)
5
N ~ 1000 if N is large 10,000
Sˆ ( w) = convolution of Sˆ ( w) and W ( w), the F.T. of the window sequence.
BT
XX
p
XX
= Smoothing the periodogram, thus decreasing the variance in the estimate at the
expense of reducing the resolution.
E ( Sˆ BT ( w)) = E Sˆ p ( w) * W ( w)
XX XX
π
where E SˆXX
p
( w) = ∫ SXX (θ )WB ( w − θ ) dθ
−π
M −1
= ∑
m =− ( M −1)
W [m] E RˆXX [m] e − jwm
SˆXX
BT
( w) can be proved to be asymptotically unbiased.
2
S XX ( w) M −1
and variance of Sˆ XX
BT
( w)~ ∑
N K =− ( M −1)
w 2 [k ]
Some of the popular windows are reectangualr window, Bartlett window, Hamming
window, Hanning window, etc.
Procedure
1. Given the data x[n], n = 0,1.., N − 1
N −1 2
1
2. Find periodogram. Sˆ XX
p
(2π k / N ) =
N
∑ x[n]e
n=0
− j 2π k / N
,
p q
x [n] = ∑ ai x[n − i ] + ∑ bi v[n − i ]
i =1 i =0
v[n] x[ n]
H(z)
B( z ) ∑b i z −i
H ( z) = = i =0
p
A( z )
1 − ∑ a i z −i
i =1
and
S XX ( w) = H ( w) σ V2
2
• Model signal as AR/MA/ARMA process
• Estimate model parameters from the given data
• Find the power spectrum by substituting the values of the model parameters in
expression of power spectrum of the model
σ V2
Sˆ XX ( w) = 2
P
1 − ∑ ai e − jwi
i =1
where
1 N
Rˆ XX [k , l ] = ∑ x[n − k ]x[n − l ]
N − p n= p
is an estimate for the autocorrelation function.
Note that the autocorrelation matrix in the Covariance method is not Toeplitz and cannot
be solved by efficient algorithms like the Levinson Durbin recursion.
The flow chart for the AR spectral estimation is given below:
x[ n], n = 0,1,.., N
Rˆ XX [m], m = 0,1,.. p
Select an order p
ai , i = 1, 2,.. p
and
Solve for σ V
2
σ V2
Sˆ XX ( w) = 2
P
1 + ∑ ai e − jwi
Find i =1
Sˆ XX ( w)
Some questions :
Can an A R ( p ) process model a band pass signal?
• If we use A R (1) model, it will never be able to model a band-pass process. If
one sinusoid is present then A R ( 2 ) process will be able to discern it. If there
N − p −1
P
where N = No of data ÆP
σˆ P2 = mean square prediction error (variance for non zero mean case)
• Akike Information Criteria
-Minimize
2p
AIC ( p) = ln(σˆ v2 ) +
N