0% found this document useful (0 votes)
5 views55 pages

spr052 ch13

Uploaded by

orkco6565
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views55 pages

spr052 ch13

Uploaded by

orkco6565
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Chapter 13 Mean Square Estimation

Po-Ning Chen, Professor

Institute of Communications Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30050, R.O.C.


13-1 Introduction 13-1

Concern:
• To estimate the random process s(t) in terms of another related process x(ξ)
for a ≤ ξ ≤ b.

Theorem 13-1 The best linear estimator of s(t) in terms of {x(ξ) : a ≤ ξ ≤ b},
which is of the form  b
ŝ(t) = h(α, t)x(α)dα
a
and which minimizes the MS error Pt = E[(s(t) − ŝ(t))2], satisfies
 b
Rsx (t, s) = h(α, t)Rxx (α, s)dα for a ≤ s ≤ b.
a

Proof:
Pt = E[(s(t) − ŝ(t))2]
 b b  b
= E[s2(t)] + h(α, t)h(β, t)E[x(α)x(β)]dαdβ − 2 h(α, t)E[s(t)x(α)]dα
 ba ba  b a
= Rss(0) + h(α, t)h(β, t)Rxx(α, β)dαdβ − 2 h(α, t)Rsx(t, α)dα.
a a a
13-1 Introduction 13-2

Under Riemann integrability assumption,


∂Pt
∂h(s, t)
 b  b 
= h(β, t)Rxx(s, β)dβ + h(α, t)Rxx(α, s)dα + 2h(s, t)Rxx(s, s) −2Rsx(t, s)
 a,β=s a,α=s
 
conceptually
 b
= 2 h(α, t)Rxx (s, α)dα − 2Rsx (t, s).
a
2
Remark
• The minimum MS error is given by:
 b  b   b
Pt = Rss(0) + h(α, t) h(β, t)Rxx(α, β)dβ dα − 2 h(α, t)Rsx(t, α)dα
a a a
 b  b
= Rss(0) + h(α, t)Rsx(t, α)dα − 2 h(α, t)Rsx(t, α)dα
a b a

= Rss(0) − h(α, t)Rsx(t, α)dα.


a
Orthogonality of Optimal MS Estimation 13-3

Theorem 13-1 Following Theorem 13-1, we also have:


E[(s(t) − ŝ(t))x(ξ)] = 0 for a ≤ ξ ≤ b.

Proof:
 b
E[(s(t) − ŝ(t))x(ξ)] = E[s(t)x(ξ)] − h(α, t)E[x(α)x(ξ)]dα
 b a
= Rsx(t, ξ) − h(α, t)Rxx(α, ξ)dα
a
= Rsx(t, ξ) − Rsx(t, ξ) = 0.
2
Orthogonality principle
• Linear estimator ŝ(t) that minimizes E[s(t) − ŝ(t), s(t) − ŝ(t)] = E[s(t) −
ŝ(t)2] should satisfy E[s(t) − ŝ(t), ŝ(t)] = 0.
• This may not be true for a non-linear estimator! (Note that the linear combi-
nation of {x(ξ), a ≤ ξ ≤ b} spans a hyperplane in an inner product space.)
Terminologies 13-4

Terminologies. With [a, b] = data interval,


• If t ∈ [a, b], the estimate operation of ŝ(t) is called smoothing.
• If t > b and x(ξ) = s(ξ), ŝ(t) is called forward predictor.
• If t < a and x(ξ) = s(ξ), ŝ(t) is called backward predictor.
• If t ∈ [a, b] and x(ξ) = s(ξ), the estimate operation of ŝ(t) is called filtering
and prediction.
Forward Prediction Under Stationarity 13-5

Theorem 13-1 The best linear estimator of s(t) in terms of {s(ξ) : a ≤ ξ ≤ b},
which is of the form  b
ŝ(t) = h(α, t)s(α)dα
a
and which minimizes the MS error P = E[(s(t) − ŝ(t))2], satisfies
 b
Rss(t, s) = h(α, t)Rss(α, s)dα for a ≤ s ≤ b.
a

In addition,
 b
Pt = Rss(0) − h(α, t)Rss(t, α)dα.
a

• If s(t) is stationary, a = b (i.e., ξ = a = b) and t = a + λ, we have s = a and


Rss(λ) = h(a, a + λ)Rss(0)
2
Rss(λ) Rss(λ) Rss (λ)
⇒ h(a, a+λ) = and ŝ(a+λ) = s(a) and Pa+λ = Rss(0) − .
Rss(0) Rss(0) Rss (0)
Forward Prediction Under Stationarity 13-6

Theorem 13-1
E[(s(t) − ŝ(t))s(ξ)] = 0 for a ≤ ξ ≤ b.

If E[(s(t) − ŝ(t))s(ξ)] = 0 for ξ < a,


then ŝ(t) is the best linear predictor of s(t) in terms of {s(ξ), ξ ≤ b},
although it only uses the information of {s(ξ), a ≤ ξ ≤ b}.
Rss(v) Rss(0)
If = for any u ≥ v, then for ξ < a,
Rss(u) Rss(u − v)
 b
E[(s(t) − ŝ(t))s(ξ)] = Rss(t − ξ) − h(α, t)Rss(α − ξ)dα
 a  b 
Rss(α − ξ)
= Rss(t − ξ) 1 − h(α, t) dα
R ss (t − ξ)
 a
 b 
Rss(α)
= Rss(t − ξ) 1 − h(α, t) dα
a R ss (t)
= 0.
If, in addition, a = b in the above case, s(t) is named the wide-sense Markov of
order 1 (i.e., best linear prediction based on one point is the best prediction based
on the entire past.)
Theorem 13-1 Revisited 13-7

Theorem 13-1 The best linear estimator of s(t) in terms of {xi (ξ) : a ≤ ξ ≤
b}ki=1, which is of the form
k  b

ŝ(t) = hi(α, t)xi (α)dα
i=1 a

and which minimizes the MS error Pt = E[(s(t) − ŝ(t))2], satisfies


k  b

Rsxi (t, s) = hi (α, t)Rxxi (α, s)dα for a ≤ s ≤ b and 1 ≤ i ≤ k.
=1 a

Proof: A different proof is used here. The optimal estimator should satisfy:
E[(s(t) − ŝ(t))xi (ξ)] = 0 for a ≤ ξ ≤ b and 1 ≤ i ≤ k.
Hence,
n 
 b
Rsxi (t, ξ) = h(α, t)Rx xi (α, ξ)dα.
=1 a
2
Theorem 13-1 Revisited 13-8

Example. If s(t) is stationary, a = b, t = a + λ, x1(t) = s(t) and x2(t) = s (t),


then

 Rss(a + λ, a) = h1(a, a + λ)Rss(a, a) + h2(a, a + λ)Rss(a, a)
= h1(a, a + λ)Rss(a, a) + h2(a, a + λ)Rss (a, a)

Rss (a + λ, a) = h1(a, a + λ)Rss (a, a) + h2(a, a + λ)Rss (a, a)
which in turns implies:

Rss(λ) = h1(a, a + λ)Rss(0) − h2(a, a + λ)Rss (0)
  
−Rss (λ) = −h1(a, a + λ)Rss (0) − h2(a, a + λ)Rss (0)

∂Rss(t1 − t2) 
Rss (t1, t2) = = −Rss (t1 − t2)
∂t2
and
∂Rss (t1, t2) 
Rss (t1, t2) = = −Rss (t1 − t2).
∂t1
Theorem 13-1 Revisited 13-9


  

 Rss(λ)Rss (0) + Rss (λ)Rss (0) Rss(λ)
 h
 1 (a, a + λ) = h 1 (λ) =  (0) + [R (0)]2
=
Rss(0)Rss ss Rss(0)
⇒   

 Rss (λ)Rss(0) − Rss (0)Rss(λ) Rss (λ)

 2
h (a, a + λ) = h 2 (λ) = =
Rss(0)Rss  (0) + [R (0)]2  (0)
Rss
ss

where it is reasonable to assume that Rss (0) = 0.

Rss(λ) Rss (λ) 
⇒ ŝ(a + λ) = s(a) +  s (a).
Rss(0) Rss(0)
Theorem 13-1 Revisited 13-10

⇒ Pt = E[(s(a + λ) − ŝ(a + λ))2]


 2   2
R (λ) R (λ)
= E[s2(a + λ)] + E[s2(a)] +  2
ss ss
 (0)
E[(s (a)) ]
Rss(0) Rss
    
Rss(λ) Rss(λ)
−2 E[s(a + λ)s(a)] − 2 
E[s(a + λ)s(a)]
R (0) Rss(0)
 ss    
Rss(λ) Rss(λ)
+2 
E[s(a)s (a)]
Rss(0) Rss(0)
 2   2
Rss(λ) Rss(λ) 
= Rss(0) + Rss(0) − 
Rss (0)
Rss(0) Rss(0)
         ((((
Rss(λ) Rss(λ) R (λ) R (λ) (((
 ss (( 
−2 Rss(λ) + 2  (0)
R ss (λ) − 2 (( (( ( (( ( ( (
ss
 (0)
R ss (0)
Rss(0) Rss (((Rss (0) Rss
2 
Rss (λ) (Rss (λ))2
= Rss(0) − +  (0)
.
Rss(0) Rss
Filtering Under Stationarity 13-11

Theorem 13-1 The best linear estimator of s(t) in terms of {x(ξ) : a ≤ ξ ≤ b},
which is of the form  b
ŝ(t) = h(α, t)x(α)dα
a
and which minimizes the MS error Pt = E[(s(t) − ŝ(t))2], satisfies
 b
Rsx (t, s) = h(α, t)Rxx (α, s)dα for a ≤ s ≤ b.
a

• If s(t) and x(t) are joint stationary and t = a = b, we have s = t and


Rsx(0) = h(t, t)Rss(0)
2
Rsx (0) Rsx(0) Rsx (0)
⇒ h(t, t) = h = and ŝ(t) = x(t) and Pt = Rss(0) − .
Rss(0) Rss(0) Rss(0)
Interpolation 13-12

Concern
• To estimate, in the MS sense, s(t + λ) in terms of {s(t + kT )}N
k=−N with the
form
N
ŝ(t + λ) = ak s(t + kT ).
k=−N

By using the orthogonality principle:


  

N
E s(t + λ) − ak s(t + kT ) s(t + nT ) = 0,
k=−N

we obtain

N
ak Rss(kT − nT ) = Rss(λ − nT ) for − N ≤ n ≤ N.
k=−N
In addition,
  

N 
N
Pt = E s(t + λ) − ak s(t + kT ) s(t + λ) = Rss(0) − ak Rss(λ − kT ).
k=−N k=−N
Interpolation 13-13

Theorem 10-9 (Stochastic sampling theorem) If s(t) is BL with band-


width σ, then

 sin[σ(λ − kT )]
s(t + λ) = s(t + kT ) (in the MS sense),
σ(λ − kT )
k=−∞

where T = π/σ.
Example. Let Sss(ω) = 1 for |ω| < σ and zero, otherwise. Then,
 ∞  σ
1 1 sin(τ σ)
Rss(τ ) = Sss(ω)ejωτ dω = ejωτ dω = .
2π −∞ 2π −σ πτ
We thus derive for T σ = π that

N
sin((kT − nT )σ) sin((λ − nT )σ)
ak = for − N ≤ n ≤ N
π(kT − nT ) π(λ − nT )
k=−N

N
sin(π(k − n)) T sin(σ(λ − nT ))
⇔ ak = for − N ≤ n ≤ N
π(k − n) π(λ − nT )
k=−N
sin(σ(λ − nT ))
⇔ an = for − N ≤ n ≤ N
σ(λ − nT )
2
Example for Smoothing 13-14

Concern
• To estimate (real WSS) s(t) in terms of {x(ξ), −∞ < ξ < ∞} with WSS
x(t) = s(t) + v(t).
Using the orthogonality principle:
  ∞  
E s(t) − h(α, t)x(t − α)dα x(t − ξ) = 0 for − ∞ < ξ < ∞,
−∞

or equivalently,
 ∞
Rsx(ξ) = h(α, t)Rxx (ξ − α)dα
−∞

This gives that


Ssx (ω)
H(ω; t) = H(ω) =
Sxx (ω)
which is named the noncausal Wiener filter.
Assume v(t) is zero-mean and is independent of s(t). Then,
Rsx(τ ) = E[s(t + τ )(s(t) + v(t))] = Rss(τ )

Rxx (τ ) = E[(s(t + τ ) + v(t + τ ))(s(t) + v(t))] = Rss(τ ) + Rvv (τ ).


Example for Smoothing 13-15

In such case, the best filter is:


Sss (ω)
H(ω) = ,
Sss (ω) + Svv (ω)
which is real and symmetric (because Rss(τ ) is real and symmetric, and Rvv (τ ) is
real and symmetric). And
  ∞  
Pt = E s(t) − h(α, t)x(t − α)dα s(t)
 ∞ −∞
= Rss(0) − h(α)Rss(α)dα
 ∞ −∞  ∞  ∞   ∞ 
1 1 1 
= Sss(ω)dω − H(ω)ejωα dω Sss (ω  )ejω α dω  dα
2π −∞ −∞ 2π −∞ 2π −∞
 ∞  ∞
1 1
= Sss(ω)dω − H(−ω  )Sss(ω  )dω 
2π −∞ 2π −∞
 ∞
1 Sss (ω)Svv (ω)
= dω.
2π −∞ Sss(ω) + Svv (ω)
Conclusion: As long as there is no overlap in Sss(ω) and Svv (ω), Pt is zero!

The end of Section 13-1 Introduction


13-2 Prediction 13-16

Prediction of s[n] in terms of:


• Entire past: {s[n − k]}k≥1
• r-step away past: {s[n − k]}k≥r
• Finite past: {s[n − k]}r≤k≤N
• ...

Assume throughout Section 13-2 that s[n] is stationary.


Prediction Based on Entire Past 13-17

∞
• ŝ[n] = k=1 h[k, n]s[n − k].
• Orthogonality principle: For all m ≥ 1,


0 = E[(s[n] − ŝ[n])s[n − m]] = Rss[m] − h[k, n]Rss[m − k]
k=1

Hence, again under stationarity assumption, h[k, n] = h[k] is invariant in n.


• Therefore, the best prediction filter satisfies:


Rss[m] = h[k]Rss[m − k] for m ≥ 1.
k=1

This is called the Wiener-Höpf equation (in digital form).


Property of Error Process 13-18

• ŝ[n] is the response of the predictor filter


H[z] = h[1]z −1 + h[2]z −2 + · · · + h[k]z −k + · · ·
due to the input s[n].
∞
• Hence, the error process defined as e[n] = s[n]− ŝ[n] = s[n]− k=1 h[k]s[n−
k] is the response of the filter
E[z] = 1 − H[z]
due to the input s[n].
• Claim: The error process e[n] is white.
Proof:
– e[n] is orthogonal to s[n − m] for all m ≥ 1.
– e[n − m] is a linear combination of s[n − m − ] for all  ≥ 0.
– Hence, e[n] is orthogonal to e[n − m] for all m ≥ 1, and Ree[m] = P δ[m],
where P = E[e2[n]] = E[e[n]s[n]] is the minimum MS power. 2
Property of Error Process 13-19

Theorem 13-2 All zeros of E[z] satisfy |z| ≤ 1.


Proof: If there exists a zi such that E[zi ] = 0 and |zi| > 1, then form a new error
filter as:
1 − z −1 /zi∗
E0[z] = E[z] .
1 − zi z −1
Then, by letting zi = |zi |ejθi and ω  = ω − θi , we have:
 jω 2   2
   − 1/|zi| 
jω 2  e − e /|zi | 
jθi jω
jω 2 jω 2  e
|E0[e ]| = |E[e ]|  jω = |E[e ]|  jω 
e − |zi|ejθi   e − |zi| 
jω 2 1 − (2/|zi |) cos(ω  ) + 1/|zi|2
= |E[e ]|
1 − 2|zi| cos(ω  ) + |zi|2
1
= |E[ejω ]|2 2 < |E[ejω ]|2.
|zi |
However,  π
1
P = |E[ejω ]|2Sss [ω]dω
2π −π
is the minimum MS error that can be achieved, and E0[z] improves the minimum
MS error. Thus, the desired contradiction is obtained. 2
Solving Wiener-Höpf Equation Under Regularity 13-20

The z-transform technique cannot be applied to solve the Wiener-Höpf Equation.


If
∞
Rsx[m] = h[k]Rxx[m − k] for all integer m,
k=1

then H[z] = Ssx [z]/Sxx [z], where



 ∞

Sxx [z] = Rxx [k]z −k and Ssx [z] = Rsx[k]z −k .
k=−∞ k=−∞

However,


Rsx [m] = h[k]Rxx [m − k] only for m ≥ 1.
k=1
Solving Wiener-Höpf Equation Under Regularity 13-21

Solving Wiener-Höpf equation under the assumption that s[n] is


stationary and regular
• A regular process can be represented as the response of a causal finite-energy
system due to a unit-power white-noise process i[n]. So,
∞
s[n] = l[k]i[n − k].
k=0
∞
• Then, ŝ[n] = k=1 h[k]s[n− k] can be written as


ŝ[n] = g[k]i[n − k],
k=1
for some {g[k]}∞
k=1 that minimizes the MS error.

• The orthogonality principle then gives that for all m ≥ 1,


 ∞
 

0 = E s[n] − g[k]i[n − k] i[n − m]
k=1


= Rsi [m] − g[k]Rii[m − k] = Rsi[m] − g[m],
k=1
which implies g[m] = Rsi[m].
Solving Wiener-Höpf Equation Under Regularity 13-22

• By regularity,


Rsi[m] = E[s[n]i[n − m]] = l[k]E{i[n − k]i[n − m]} = l[m].
k=0

This concludes to the first important result:




ŝ[n] = l[k]i[n − k]
k=1
is the best linear predictor for a regular and stationary process


s[n] = l[k]i[n − k] and P = l2[0].
k=0

• By noting that i[n] is the response of system 1/L[z] due to input s[n], and ŝ[n]
is the response of system L[z] − l[0] due to input i[n], we obtain:
1 l[0] limz↑∞ L[z]
H[z] = (L[z] − l[0]) = 1 − =1− .
L[z] L[z] L[z]
2
Solving Wiener-Höpf Equation Under Regularity 13-23

H[z]

1 i[n] -
s[n] - L[z] − L[0] - ŝ[n]
L[z]

If S[ω] is a rational spectrum, then L[z] can be obtained as follows.


• S[z] = A((z + z −1 )/2)/B((z + z −1 )/2).
• Then the roots of S[z] are symmetric with respect to the unit circle.
So, we can separate them into two groups: Inside group that consists of all
roots with |z| < 1, and the outside group that consists of all roots with
|z| > 1.
• Form L[z] by the ratio of two polynomials with the inside roots of S[z].
Solving Wiener-Höpf Equation Under Regularity 13-24

5 − 4 cos(ω)
Example 13-3 (Slide 11-16) Sss[ω] =
10 − 6 cos(ω)
2z − 1
Then, L[z] = .
3z − 1
In this case,
limz↑∞ L[z] 2/3 2z − 2/3 −(1/6)z −1
H[z] = 1 − = 1− =1− = .
L[z] 2z − 1 2z − 1 1 − (1/2)z −1
3z − 1
Consequently,
1 1
ŝ[n] − ŝ[n − 1] = − s[n − 1]
2 6
or equivalently,
1 1
ŝ[n] = − s[n − 1] + ŝ[n − 1].
6 2
Kolmogorov-Szego MS Error Formula 13-25

Appendix 12A A minimum-phase system L[z] satisfies


 π
1
log l2[0] = log |L[ejω ]|2dω.
2π −π

Kolmogorov and Szego noted from the above result and Sss (ω) = |L[ejω ]|2 that
 π   π 
1 1
P = l2[0] = exp log |L[ejω ]|2dω = exp log Sss[ω]dω .
2π −π 2π −π
This is named the Kolmogorov-Szego MS Error Formula.
Wide-Sense Markov of Order N 13-26

• If s[n] is an autoregressive (AR) process, then (Slide 11-46)


b0
L[z] = .
1 + a1z −1 + · · · + anz −n
• Then,
limz↑∞ L[z] b0
H[z] = 1 − = 1− = a1z −1 + · · · + anz −n ,
L[z] b0
1 + a1z −1 + · · · + anz −n
which implies
ŝ[n] = −a1s[n − 1] − · · · − aN s[n − N ].
• Then, s[n] is called the wide-sense Markov of order N .
– Best linear prediction based on the past N points is the best prediction
based on the entire past.
r-Step Predictor 13-27

Concern
• To find the best linear estimator of s[n], in the MS sense, in terms of the
r-step-away entire past, i.e., {s[n − k]}k≥r .


ŝ[n] = l[k]i[n − k]
k=r
is the best linear r-step predictor for a regular and stationary process

 
r−1
s[n] = l[k]i[n − k] and P = l2[k].
k=0 k=0

Proof:
– A regular process can be represented as the response of a causal finite-energy
system due to a unit-power white-noise process i[n]. So,


s[n] = l[k]i[n − k].
k=0
r-Step Predictor 13-28

∞
– Then, ŝ[n] = k=r h[k]s[n − k] can be written as


ŝ[n] = g[k]i[n − k],
k=r

for some {g[k]}∞


k=1 that minimizes the MS error.
– The orthogonality principle then gives that for all m ≥ r,
 ∞
 

0 = E s[n] − g[k]i[n − k] i[n − m]
k=r


= Rsi[m] − g[k]Rii[m − k] = Rsi[m] − g[m],
k=r

which implies g[m] = Rsi [m] for m ≥ r.


– By regularity,


Rsi[m] = E[s[n]i[n − m]] = l[k]E{i[n − k]i[n − m]} = l[m].
k=0
2
r-Step Predictor 13-29

• In addition, it can be derived that


1 
r−1
Hr [z] = 1 − l[k]z −k .
L[z]
k=0

Example 13-4 Suppose Rss[m] = a|m| for 0 < a < 1. Then,



 ∞

−m
Sss [z] = Rss[m]z = am(z −m + z m ) − 1
m=−∞ m=0
1 1 1 − a2
= + −1= .
1 − az −1 1 − az (1 − az −1 )(1 − az)
b 
−1 2 −2
⇒ L[z] = = b(1 + az + a z + · · · ), where b = 1 − a2.
1 − az −1

1 
r−1
⇒ Hr [z] = 1 − l[k]z −k
L[z]
k=0

(1 − az −1 )  k −k
r−1
= 1− ba z
b
k=0
= 1 − (1 − az )(1 + az −1 + a2z −2 + · · · + ar−1z −(r−1) ) = ar z −r
−1
Analog Wiener-Höph Equation 13-30

Concern:
• To linearly estimate the random process s(t + λ) in terms of its entire past
{s(t − τ ), τ ≥ 0} in the MS sense.

Analog Wiener-Höph equation


• Orthogonality principle:
  ∞  
E s(t + λ) − h(α)s(t − α)dα s(t − τ ) = 0 for all τ ≥ 0
 ∞0
⇔ Rss(λ + τ ) = h(α)Rss(τ − α)dα for all τ ≥ 0
0

The solution of (analog) Wiener-Höph equation is named the causal Wiener


filter.
Solving Wiener-Höpf Equation Under Regularity 13-31

Solving Wiener-Höpf equation under the assumption that s(t) is


stationary and regular.
• A regular process can be represented as the response of a causal finite-energy
system due to a unit-power white-noise process i(t). So,
 ∞
s(t + λ) = l(α)i(t + λ − α)dα.
0
 ∞
• Then, ŝ(t + λ) = h(α)s(t − α)dα can be written as
0
 ∞
ŝ(t + λ) = g(α)i(t − α)dα,
0

for some {g(t)}t≥0 that minimizes the MS error.


• The orthogonality principle then gives that for all τ ≥ 0,
  ∞  
0 = E s(t + λ) − g(α)i(t − α)dα i(t − τ )
 ∞0
= Rsi(λ + τ ) − g(α)Rii (τ − α)dα,
0

which implies g(τ ) = Rsi(λ + τ ).


Solving Wiener-Höpf Equation Under Regularity 13-32

• By regularity,
 ∞
Rsi (λ + τ ) = E[s(t)i(t − λ − τ )] = l(α)E{i(t − α)i(t − λ − τ )}dα = l(λ + τ ).
0

This concludes to the first important result:


 ∞  ∞
ŝ(t + λ) = l(λ + α)i(t − α)dα = l(α)i(t + λ − α)dα
0 λ
is the best linear predictor for a regular and stationary process
 ∞  λ
s(t + λ) = l(α)i(t + λ − α)dα and P = l2(α)dα.
0 0

• By noting that i(t) is the response of system 1/L(s) due to input s(t), and
ŝ(t + λ) is the response of system l(τ + λ)1{τ ≥ 0} due to input i(t), we
obtain:  ∞
1
H(ω) = l(τ + λ)e−jωτ dτ.
L(ω) 0
2
Solving Wiener-Höpf Equation Under Regularity 13-33

Example 13-5 Rss(τ ) = 2αe−α|τ | with 0 < α < 1.

 ∞
−α|τ | −jωτ 4α2
⇒ Sss (ω) = 2αe e dτ = 2
−∞ α + ω2

4α2  4α2 2α 2α
⇒ Sss (s) = 2 = = = L(s)L(−s)
α + ω 2 ω=−js α2 − s2 (α + s) (α − s)

⇒ L(s) =
α+s

⇒ L(ω) =
α + jω

1 2α −jωτ
⇒ l(τ ) = e dω = 2αe−ατ 1{τ ≥ 0}
2π −∞ α + jω
 ∞
1
⇒ H(ω) = l(τ + λ)e−jωτ dτ = e−αλ
L(ω) 0
⇒ h(τ ) = e−αλδ(τ )
 ∞
⇒ ŝ(t + λ) = h(τ )s(t − τ )dτ = e−αλ s(t).
0
Solving Wiener-Höpf Equation Under Regularity 13-34

An alternative way to express Wiener-Höpf equation


• The Wiener-Höpf equation only depends on Rss(τ ); hence, any process with
the same autocorrelation function should result in the same predictor.
• (Slide 9-100) Define a process z(t) = ejωt, where ω has density A(ω). Then,
   ∞
Rzz (τ ) = E ejω(t+τ ) e−jωt = A(ω)ejωτ dτ.
−∞

So, z(t) is a process with power spectrum 2πA(ω).


• The best-MS linear predictor for z(t + λ) in terms of {z(t − τ )}τ ≥0 is
 ∞  ∞
ẑ(t + λ) = h(α)z(t − α)dα = h(α)ejω(t−α)dα = ejωtH(ω),
0 0

and should satisfy


E {(z(t + λ) − ẑ(t + λ))z ∗(t − τ )} = 0 for τ ≥ 0
 
jω(λ+τ )
⇔ E e − e H(ω) = 0 for τ ≥ 0
jωτ
 ∞  ∞
⇔ [A(ω)ejωλ]ejωτ dω = A(ω)H(ω)ejωτ dω for τ ≥ 0
−∞ −∞
Solving Wiener-Höpf Equation Under Regularity 13-35

Example 13-5 Revisited. Let’s confirm the alternative expression in terms of


Example 13-5.
4α2
Szz (ω) = 2πA(ω) = 2
α + ω2
Then,
 ∞  ∞
1 4α2 jω(λ+τ ) −α|τ +λ|
[A(ω)ejωλ]ejωτ dω = 2 2
e dω = 2αe
−∞ 2π −∞ (α + ω )

and
 ∞  ∞
1 4α2
A(ω)H(ω)ejωτ dω = 2αe−α(|τ |+λ) = e −αλ jωτ
e dω
−∞ 2π −∞ (α2 + ω 2 )
= 2αe−α(|τ |+λ)

⇒ |τ + λ| = |τ | + λ, which is valid only for τ ≥ max{0, −λ} = 0 (since λ > 0)


Note that it is erroneous to claim A(ω)ejωλ = A(ω)H(ω) from
 ∞  ∞
[A(ω)ejωλ]ejωτ dω = A(ω)H(ω)ejωτ dτ
−∞ −∞

because the equation holds only for τ ≥ 0.


Predictable Processes 13-36

Definition (Predictable processes) A process s[n] is predictable if it equals


its linear predictor, i.e.,


s[n] = h[k]s[n − k]
k=1
and there is no MS prediction error.
Formula for predictable processes.

• Let E[z] = 1 − H[z] = 1 − ∞ −k
k=1 h[k]z . Then, the prediction error equals


P = E[(s[n] − ŝ[n])s[n]] = Rss[0] − h[k]Rss[k].
k=1
Equivalently,  π
1
P = |E[ejω ]|2Sss[ω]dω.
2π −π

• For predictable processes, P = 0, which indicates from Sss [ω] ≥ 0 that


Sxx [ω] > 0 only possibly at those ω’s with E[ejω ] = 0.
As E[z] is a polynomial of z, it follows that for countably many ωi,

Sss[ω] = 2π αi δ(ω − ωi ) where E[ejωi ] = 0.
i
Predictable Processes 13-37

• This concludes that a process s[n] that is a sum of exponentials:



s[n] = ciejωi n where {ci } uncorrelated and zero-mean
i

is predictable, and its prediction filter equals H[z] = 1 − E[z], where



m
 
jωi −1
E[z] = 1−e z .
i=1
FIR Predictors 13-38

Concern
• To find the best linear estimator of s[n], in the MS sense, in terms of its N
most recent past, i.e., {s[n − k]}1≤k≤N .
• This is also named the forward predictor of order N .
Yule-Walker equations
• By orthogonality principle,
  
N
E s[n] − ak s[n − k] s[n − m] = 0 for 1 ≤ m ≤ N.
k=1
This yields

N
Rss[m] − ak Rss[m − k] = 0 for 1 ≤ m ≤ N
k=1
or equivalently,
    
Rss[1] Rss[0] Rss[−1] · · · Rss[1 − N ] a1
    
 Rss[2]   Rss[1] Rss[0] · · · Rss[2 − N ]  a2 
 ..  =  .
.. ... ... ...   .. 
 .    . 
Rss[N ] Rss[N − 1] Rss[N − 2] ··· Rss[0] aN
FIR Predictors 13-39

• The MS estimate error is equal to:


  
 N 
N
PN = E s[n] − ak s[n − k] s[n] = Rss[0] − ak Rss[−k].
k=1 k=1

• We can incorporate the above result into the Yule-Walker equations:


 
Rss[0] Rss[1] Rss[2] ··· Rss[N ]
 R [−1] · · · Rss[N − 1]
% & % &
ss Rss[0] Rss[1] 

PN 0 · · · 0 = 1 −a1 · · · −aN  Rss[−2] Rss[−1] Rss[0] · · · Rss[N − 2]
 . . ... ... 
 .. .. 
Rss[−N ] Rss[1 − N ] Rss[2 − N ] ··· Rss[0]
% &
= 1 −a1 · · · −aN DN+1

Recall that for a square matrix D:


D · Adj(D) = |A|I,
where Di,j is the cofactor of element di,j in D (specifically, Di,j = (−1)i+j Mi,j and
Mi,j is the determinant of the matrix by removing those elements at the same row
and the same column as di,j ), and Adj(D) = [Di,j ]T .
FIR Predictors 13-40

Hence,
% & % &
PN 0 · · · 0 Adj(DN+1) = 1 −a1 · · · −aN |DN+1|,
which implies PN |DN | = |DN+1|.


 0, if for some k ≤ N, |Dk | = 0 and |Dk+1| = 0
• As a result, PN = |D |

 N+1 , |DN | = 0.
|DN |

Final note of the optimal {ak }N


k=1

• The optimal a1 in a system of order N may be different from that in a system


of order N + 1. This may cause some scalability problem in implementation.
Example. Suppose Rss[m] = ρ|m| for m = 0, ±1, and zero, otherwise.
Then,
a1 = ρ and P1 = 1 − ρ2 when N = 1.
a1 = ρ/(1 − ρ2 ), a2 = −ρ2/(1 − ρ2) and P2 = (1 − 2ρ2)/(1 − ρ2) when N = 2.
Implementation Structure of FIR Predictor 13-41

Non-scalable straightforward structure


• The predictor error

N
e[n] = s[n] − ŝ[n] = s[n] − ak s[n − k]
k=1

can be obtained by input s[n] to the filter H[z] = 1 − a1z −1 − · · · − aN z −N .


• The filter H[z] can be implemented using the ladder structure as follows.

s[n] -
z −1 - -
z −1
? ?
−a1AA −aNAA
- ?-
⊕ - ?
⊕ - e[n]

• This structure is not scalable in coefficients {ak }N


k=1 (since coefficients {ak }k=1
N

are dependent on N ).
Implementation Structure of FIR Predictor 13-42

Is there a scalable implementation structure?


(N)
• Denote the optimal {ak }N
k=1 in a system of order N as {ak }k=1 .
N

• Consider the below lattice structure:


r
@@
-⊕ t -

−k1
R  B1
t - @
@
A −k1@@ @
 R
@
-
z −1 r ⊕
- t -
C1

Denote the input at A as s[n].


Denote the outputs at B1 and C1 respectively by ê1[n] and ě1[n].
Then,
ê1[n] = s[n] − k1s[n − 1]
ě1[n] = −k1s[n] + s[n − 1]
So, the filters for output ê1[n] and output ě1[n] are
Ê1[z] = 1 − k1z −1
Ě1[z] = −k1 + z −1 = z −1 Ê1[1/z]
Implementation Structure of FIR Predictor 13-43

• Consider the below lattice structure:


r
@@
-⊕ t - r
@
-⊕ t -

−k1
R  B1 −k2
R
@  B2
t - @ @
@ @
A −k1@@ @ −k2@@ @
 R
@  R
@
-
z −1 r ⊕
- t - -
z −1 r ⊕
- t -
C1 C2

Denote the input at A as s[n].


Denote the outputs at B2 and C2 respectively by ê2[n] and ě2[n].
Then,
ê2[n] = ê1[n] − k2ě1[n − 1]
ě2[n] = −k2ê1[n] + ě1[n − 1]
So, the filters for output ê2[n] and output ě2[n] are
Ê2[z] = Ê1[z] − k2z −1 Ě1[z]
Ě2[z] = −k2Ê1[z] + z −1 Ě1[z]
Implementation Structure of FIR Predictor 13-44

• Continuing cascading more “lattices,” we obtain


êN [n] = êN−1 [n] − kN ěN−1 [n − 1]
ěN [n] = −kN êN−1 [n] + ěN−1 [n − 1]
and
ÊN [z] = ÊN−1 [z] − kN z −1 ĚN−1 [z]
ĚN [z] = −kN ÊN−1 [z] + z −1 ĚN−1 [z]
Then, ĚN [z] = z −N ÊN [1/z].

Proof: Suppose ĚN−1 [z] = z −(N−1) ÊN−1 [1/z]. Then,


−N −N
 
z ÊN [1/z] = z ÊN−1 [1/z] − kN z ĚN−1 [1/z]
−N
 
= z z N−1
ĚN−1[z] − kN z(z N−1
ÊN−1 [z])
= −kN ÊN−1 [z] + z −1 ĚN−1 [z]
= ĚN [z].
Implementation Structure of FIR Predictor 13-45

• By ĚN [z] = z −N ÊN [1/z], we know that if


(N) (N)
ÊN [z] = 1 − a1 z −1 − · · · − aN z −N ,
then
(N) (N)
ĚN [z] = z −N − a1 z −(N−1) − · · · − aN .
In summary,
– êN [n] is the forward prediction error for predicting s[n] in terms of its most
recent N pasts. In other words,

N
(N)
êN [n] = s[n] − ŝN [n] = s[n] − ak s[n − k].
k=1

– ěN [n] is the backward prediction error for predicting s[n − N ] in terms of
its most recent N futures. In other words,

N
(N)
ěN [n] = s[n − N ] − šN [n − N ] = s[n − N ] − ak s[n − N + k].
k=1
Implementation Structure of FIR Predictor 13-46

Derivation of kN
• From
(N−1) −1 (N−1)
ÊN−1 [z] = 1 − a1 z − · · · − aN−1 z −(N−1) ,
and
ĚN−1 [z] = z −N ÊN−1 [1/z],
we derive:
ÊN [z] = ÊN−1 [z] − kN z −1 ĚN−1 [z]
' (
(N−1) −1 (N−1) −(N−1)
= 1 − a1 z − · · · − aN−1 z
' (
−N (N−1) −(N−1) (N−1) −1
−kN z − a1 z − · · · − aN−1 z
' ( ' (
(N−1) (N−1) −1 (N−1) (N−1)
= 1 − a1 − kN aN−1 z − a2 − kN aN−2 z −2 − · · ·
' (
(N−1) (N−1)
− aN−1 − kN a1 z −(N−1) − kN z −N .
Comparing termwisely with
(N) (N)
ÊN [z] = 1 − a1 z −1 − · · · − aN z −N ,
we yield:
(N) (N−1) (N−1) (N)
ak = ak − kN aN−k for 1 ≤ k < N and aN = kN .
Implementation Structure of FIR Predictor 13-47

• It remains to solve kN :
 
Rss[0] Rss[1] Rss[2] ··· Rss[N ]
 · · · Rss[N − 1]
% & ) *  Rss[−1] Rss[0] Rss[1] 
(N) (N)  
PN 0 · · · 0 = 1 −a1 · · · −aN  Rss[−2] Rss[−1] Rss[0] · · · Rss[N − 2]
 ... ... ... ... 
 
Rss[−N ] Rss[1 − N ] Rss[2 − N ] ··· Rss[0]
implies

N−1
(N)
0 = Rss[N ] − ak Rss[N − k] − kN Rss[0]
k=1
'
N−1
(N−1) (N−1)
(
⇒ 0 = Rss[N ] − ak − kN aN−k Rss[N − k] − kN Rss[0]
k=1
N−1  
(N−1)
Rss[N ] − k=1 ak Rss[N − k] 1 
N−1
(N−1)
⇒ kN = N−1 (N−1) = Rss[N ] − ak Rss[N − k] ,
Rss[0] − k=1 aN−k Rss[N − k] PN−1
k=1

where the last step follows from the fact that Rss[N − k] = Rss[k − N ] (See Slide 13-39).
(N−1) N−1
• The above (blue-colored) formula gives kN from known PN−1 and {ak }k=1 .
Implementation Structure of FIR Predictor 13-48

Alternative derivation of kN in terms of êN [n] = êN−1 [n]−kN ěN−1 [n−1].


  (N)
êN [n] = s[n] − ŝN [n] = s[n] − N k=1 ak s[n − k]
•  (N−1)
êN−1 [n] = s[n] − ŝN−1 [n] = s[n] − N−1 k=1 ak s[n − k]
% & % &
implies PN = E êN [n] s[n] and PN−1 = E êN−1 [n] s[n] .


N−1
(N−1)
ěN−1 [n − 1] = s[(n − 1) − (N − 1)] − ak s[(n − 1) − (N − 1) + k]
k=1

N−1
(N−1)
= s[n − N ] − ak s[n − N + k]
k=1
implies

N−1
(N−1)
E[ ěN−1 [n − 1]s[n] ] = E[s[n − N ]s[n]] − ak E[s[n − N + k]s[n]]
k=1

N−1
(N−1)
= Rss[N ] − ak Sss [N − k] = kN PN−1.
k=1
2
• Hence, PN = PN−1 − kN (kN PN−1 ) = (1 − kN )PN−1 .
Levinson’s Algorithm 13-49

Concern:
• A recursive algorithm to obtain kN and MS estimate error PN .

Levinson’s algorithm
(1)
• k1 = a1 = Rss[1]/Rss[0] and P1 = (1 − k12)Rss[0].
(N−1) N−1
• Assume that {ak }k=1 , kN−1 and PN−1 are known.
Then, it can be derived that
 
1 
N−1
(N−1)
kN = Rss[N ] − ak Rss[N − k]
PN−1
k=1
2
PN = (1 − kN )PN−1

a(N−1) − kN a
(N−1)
N−k , 1 ≤ k ≤ N − 1
(N) k
ak =
k , k=N
N
Properties of FIR estimator 13-50

• P1 ≥ P2 ≥ · · · ≥ PN ≥ · · · ≥ 0.
• If PN > 0,
then |ki | < 1 for 1 ≤ i ≤ N ,
N (N)−k
and zi (the root of ÊN [z] = 1 − k=1 ak z ) satisfies |zi | < 1 for 1 ≤ i ≤ N .

• If PN−1 > 0 and PN = 0,


then |ki | < 1 for 1 ≤ i < N and kN = 1,
and |zi | = 1 for 1 ≤ i ≤ N ,
which indicates that s[n] is predictable and consists of line spectrum.
• If P = limN→ PN > 0,
then  
1 π
|DN+1|
P = exp log(Sss [ω])dω = l[0] = lim .
2π −π N→∞ |DN |

• If PM−1 > PM but PM (= PM+1 = · · · ) = P ,


then ki = 0 for i > M ,
and s[n] is wide-sense Markov of order M .
s[n] is autoregressive (AR) if, and only if, it is wide-sense Markov of finite order.
Properties of FIR estimator 13-51

Equal predictor of two processes


• Suppose process s[n] and s̄[n] have the same autocorrelation function up to
order M .
Then, the predictors of these two processes of order M are identical because
the predictors only depend on the value of Rss[m] for |m| ≤ M .
Also, from Levinson’s algorithm, we learn that PM for both processes are the
+ 2
same since PM = M i=1 (1 − ki )Rss [0].
Kalman Innovations 13-52

Define the process i[n] as


ên [n]
i[n]  √
Pn  
1  n
(n)
= √ s[n] − ak s[n − k]
Pn k=1
 (n)
n
(n) (n)
 
(n)
= γk s[k] for some {γk = −an−k / Pn}k=0 and γn = 1/ Pn.
n−1

k=0

By orthogonality principle, i[n] is orthogonal to s[n − m] for 1 ≤ m ≤ n;


hence, i[n] is orthogonal to i[n − m] for 1 ≤ m ≤ n, and E[i2[n]] = 1.
Kalman Innovations 13-53

In matrix form,
 (0) (1) (n)

γ0 γ0 ··· γ0
% & % &
 0
(1)
γ1 ···
(n) 
γ1 
i[0] i[1] · · · i[n] = s[0] s[1] · · · s[n]  ... ... 
. . . ... 

(n)
0 0 · · · γn
% &
= s[0] s[1] · · · s[n] Γn+1
Remarks
% &
• This is similarly the Gram-Schmidt orthonormalization procedure for s[0] s[1] · · · s[n] .
• In terminologies, i[n] is called the Kalman innovations of s[n], and Γn+1 is
called the Kalman whitening filter of s[n].
• It can then be derived:
 (0) (1) (n)

0 0 ··· 0
% & % & 0 1
(1)
···
(n) 
1 
s[0] s[1] · · · s[n] = i[0] i[1] · · · i[n]  .. ... . . . ... 
 . 
(n)
0 0 · · · n
% &
= i[0] i[1] · · · i[n] Ln+1
Kalman Innovations 13-54

Then, the covariance matrix of s[n] is given by:


  

 s[0] 


 % 

 s[1]  &
Rn+1  E   s[0] s[1] · · · s[n] = LTn+1Ln+1.

  ···  


 s[n] 

Therefore,
ΓTn+1Rn+1Γn+1 = In+1,
where In+1 is the identity matrix.

The end of Section 13-2 Prediction

You might also like