0% found this document useful (0 votes)
8 views47 pages

Adsp

it is related to applied electronics

Uploaded by

dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views47 pages

Adsp

it is related to applied electronics

Uploaded by

dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Linear Stochastic Processes

It therefore follows that the general form for the power spectrum of a WSS
process is

N
X
Px(eω ) = Pxr (eω ) + αk u0(ω − ωk )
k=1

We look at processes generated by filtering white noise with a linear


shift–invariant filter that has a rational system function. These include the

• Autoregressive (AR) → all pole system

• Moving Average (MA) → all zero system

• Autoregressive Moving Average (ARMA) → poles and zeros


Notice the difference between shift–invariance and time–invariance

c Danilo P. Mandic Digital Signal Processing 5


ACF and Spectrum of ARMA models
Much of interest are the autocorrelation function and power spectrum of
these processes. (Recall that ACF ≡ P SD in terms of the available
information)
Suppose that we filter white noise w[n] with a causal linear shift–invariant
filter having a rational system function with p poles and q zeros
Pq
Bq (z) k=0 bq (k)z −k
H(z) = = Pp
Ap(z) 1 + k=1 ap(k)z −k

Assuming that the filter is stable, the output process x[n] will be
2
wide–sense stationary and with Pw = σw , the power spectrum of x[n] will
be
−1
2 Bq (z)Bq (z )
Px(z) = σw
Ap(z)Ap(z −1)

Recall that “(·) ” in analogue frequency corresponds to “z −1” in “digital freq.”

c Danilo P. Mandic Digital Signal Processing 6


Frequency Domain
In terms of “digital” frequency θ (unit circle – e−θ = e−ωT )

• Bq (z)Bq (z −1) # “quadratic form” and real valued

• Ap(z)Ap(z −1) # “quadratic form” and real valued

2
Bq (eθ )
Pz (eθ ) = σw
2
2
|Ap(eθ )|

We are therefore using H(z) to shape the spectrum of white noise.


A process having a power spectrum of this form is known as an
autoregressive moving average process of order (p, q) and is referred to
as an
ARMA(p,q) process

c Danilo P. Mandic Digital Signal Processing 7


Example
Plot the power spectrum of an ARMA(2,2) process for which
• the zeros of H(z) are z = 0.95e±π/2

• poles are at z = 0.9e±2π/5

Solution: The system function is (poles and zeros – resonance & sink)
1 + 0.9025z −2
H(z) =
1 − 0.5562z −1 + 0.81z −2
7
Power Spectrum

−1
0 0.5 1 1.5 2 2.5 3 3.5
Frequency

c Danilo P. Mandic Digital Signal Processing 8


Difference Equation Representation
Random processes x[n] and w[n] are related by the linear constant
coefficient equation

p
X q
X
x[n] − ap(l)x[n − l] = bq (l)w[n − l]
l=1 l=0

Notice that the autocorrelation function of x[n] and crosscorrelation


between x[n] and w[n] follow the same difference equation, i.e. if we
multiply both sides of the above equation by x[n − k] and take the
expected value, we have

p
X q
X
rxx(k) − ap(l)rxx(k − l) = bq (l)rxw (k − l)
l=1 l=0

Since x is WSS, it follows that x[n] and w[n] are jointly WSS.

c Danilo P. Mandic Digital Signal Processing 9


General Linear Processes: Stationarity and Invertibility
Consider a linear stochastic process # output from a linear filter, driven by
WGN w[n]

X
x[n] = w[n] + b1w[n − 1] + b2w[n − 2] + · · · = w[n] + bj w[n − j]
j=1

that is, a weighted sum of past inputs w[n].


For this process to be a valid P
stationary process, the coefficients must be
absolutely summable, that is ∞ j=0 |bj | < ∞.

The model implies that under suitable condition, x[n] is also a weighted
sum of past values of x, plus an added shock w[n], that is

x[n] = a1x[n − 1] + a2x[n − 2] + · · · + w[n]


P∞
• Linear Process is stationary if j=0 |bj | < ∞
P∞
• Linear Process is invertible if j=0 |aj | < ∞

c Danilo P. Mandic Digital Signal Processing 10


Are these ARMA(p,q) processes?


0, n < 0
• Unit response u[n] =
1, n ≥ 0
– If w[n] = δ[n] then

u[n] = u[n − 1] + w[n], n≥0



0, n < 0
• Ramp function r[n] =
n, n ≥ 0
– If w[n] = u[n] then

r[n] = r[n − 1] + w[n], n≥0

c Danilo P. Mandic Digital Signal Processing 11


Autoregressive Processes

A general AR(p) process (autoregressive of order p) is given by


p
X
x[n] = a1x[n − 1] + · · · + apx[n − p] + w[n] = aix[n − i] + w[n]
i=1
Observe the auto–regression above

Duality between AR and MA processes:


For instance the first order autoregressive process

X
x[n] = a1x[n − 1] + w[n] ⇔ bj w[n − j]
j=0

Due to its “all–pole“ nature follows the duality between IIR and FIR filters.

c Danilo P. Mandic Digital Signal Processing 12


Example:- Yule–Walker modelling in Matlab
In Matlab – Power spectral density using Y–W method pyulear

Pxx = pyulear(x,p)
[Pxx,w] = pyulear(x,p,nfft)
[Pxx,f] = pyulear(x,p,nfft,fs)
[Pxx,f] = pyulear(x,p,nfft,fs,’range’)
[Pxx,w] = pyulear(x,p,nfft,’range’)

Description:-

Pxx = pyulear(x,p)

implements the Yule-Walker algorithm, and returns Pxx, an estimate of the


power spectral density (PSD) of the vector x.

To remember for later → This estimate is also an estimate of the


maximum entropy.
Se also aryule, lpc, pburg, pcov, peig, periodogram

c Danilo P. Mandic Digital Signal Processing 17


Example:- AR(p) signal generation
• Generate the input signal x by filtering white noise through the AR
filter

• Estimate the PSD of x based on a fourth-order AR model

Solution:-
randn(’state’,1);
x = filter(1,a,randn(256,1)); % AR system output
pyulear(x,4) % Fourth-order estimate

c Danilo P. Mandic Digital Signal Processing 18


Alternatively:- Yule–Walker modelling
AR(4) system given by
y[n] = 2.2137y[n−1]−2.9403y[n−2]+2.1697y[n−3]−0.9606y[n−4]+w[n]
a = [1 -2.2137 2.9403 -2.1697 0.9606]; % AR filter coefficients
freqz(1,a) % AR filter frequency response
title(’AR System Frequency Response’)

c Danilo P. Mandic Digital Signal Processing 19


Example:- Finding parameters of
x[n] = 1.2x[n − 1] − 0.8x[n − 2] + w[n]

AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w)
6 2000

1500
1500
4

ACF of AR(2) signal

ACF of AR(2) signal


AR(2) signal values

1000
1000
2

500 500

0
0 0

−2
−500
−500

−4
−1000
−1000

−6 −1500
0 50 100 150 200 250 300 350 400 −400 −300 −200 −100 0 100 200 300 400 −20 −10 0 10 20
Sample number Correlation lag Correlation lag

Apply:- for i=1:6; [a,e]=aryule(x,i); display(a);end

a(1) = [0.6689] a(2) = [1.2046, −0.8008]


a(3) = [1.1759, −0.7576, −0.0358]
a(4) = [1.1762, −0.7513, −0.0456, 0.0083]
a(5) = [1.1763, −0.7520, −0.0562, 0.0248, −0.0140]
a(6) = [1.1762, −0.7518, −0.0565, 0.0198, −0.0062, −0.0067]

c Danilo P. Mandic Digital Signal Processing 21


Special case:- AR(1) Process (Markov)
Given below (Recall p(x[n], x[n − 1], . . . , x[0]) = p(x[n] | x[n − 1]))

x[n] = a1x[n − 1] + w[n] = w[n] + a1x[n − 1] + a21w[n − 2] + · · ·

i) for the process to be stationary −1 < a1 < 1.

ii) Autocorrelation Function:- from Yule-Walker equations

rxx(k) = a1rxx(k − 1), k>0

or for the correlation coefficients, with ρ0 = 1

ρk = ak1 , k>0

Notice the difference in the behaviour of the ACF for a1 positive and negative

c Danilo P. Mandic Digital Signal Processing 22


Variance and Spectrum of AR(1) process
Can be calculated directly from a general expression of the variance and
spectrum of AR(p) processes.

• Variance:- Also from a general expression for the variance of linear


processes from Lecture 1

2 2
σw σw
σx2 = =
1 − ρ1a1 1 − a21

• Spectrum:- Notice how the flat PSD of WGN is shaped according to


the position of the pole of AR(1) model (LP or HP)

2 2
2σw 2σw
Pxx(f ) = 2 = 1 + a2 − 2a cos(2πf )
|1 − a1e−2πf | 1
1

c Danilo P. Mandic Digital Signal Processing 23


Example: ACF and Spectrum of AR(1) for a = ±0.8
x[n] = −0.8*x[n−1] + w[n] x[n] = 0.8*x[n−1] + w[n]
4 5

Signal values

Signal values
0 0

−2

−4 −5
0 20 40 60 80 100 0 20 40 60 80 100
Sample Number Sample Number
ACF ACF
1 1

0.5
Correlation

Correlation
0 0.5

−0.5

−1 0
0 5 10 15 20 0 5 10 15 20
Correlation lag Correlation lag
Power/frequency (dB/rad/sample)

Power/frequency (dB/rad/sample)
Burg Power Spectral Density Estimate Burg Power Spectral Density Estimate
10 15

5 10

5
0
0
−5
−5
−10 −10

−15 −15
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Normalized Frequency (×π rad/sample) Normalized Frequency (×π rad/sample)

a < 0 → High Pass a > 0 → Low Pass

c Danilo P. Mandic Digital Signal Processing 24


Special Case:- Second Order Autoregressive Processes
AR(2)

The input–output functional relationship is given by

x[n] = a1x[n − 1] + a2x[n − 2] + w[n]

For stationarity- (to be proven later)

a1 + a2 < 1
a2 − a1 < 1
−1 < a2 < 1

This will be shown within the so–called “stability triangle”

c Danilo P. Mandic Digital Signal Processing 25


Work by Yule – Modelling of sunspot numbers

Recorded for more than 300 years.


In 1927, Yule modelled them and invented AR(2) model
Sunspot series
150

100

Signal values
50

−50
0 50 100 150 200 250 300
Sample Number

ACF for sunspot series


1

0.5
Correlation

−0.5
0 5 10 15 20 25 30 35 40 45 50
Correlation lag

Sunspot numbers and its autocorrelation function

c Danilo P. Mandic Digital Signal Processing 26


Autocorrelation function of AR(2) processes
The ACF
ρk = a1ρk−1 + a2ρk−2 k>0

• Real roots: ⇒ (a21 + 4a2 > 0) ACF = mixture of damped exponentials


• Complex roots: ⇒ (a21 + 4a2 < 0) ⇒ ACF exhibits a pseudo–periodic
behaviour
D k sin(2πf0 k + F )
ρk =
sin F
D - damping factor, of a sine wave with frequency f0 and phase F.

D = −a2
a1
cos(2πf0) = √
2 −a2
1 + D2
tan(F ) = tan(2πf0 )
1 − D2

c Danilo P. Mandic Digital Signal Processing 27


Stability Triangle
a2

1
ACF ACF

II I

m m
Real Roots

−2 2 a1
ACF
ACF

m
m
III Complex Roots IV

−1

i) Real roots Region 1: Monotonically decaying ACF


ii) Real roots Region 2: Decaying oscillating ACF
iii) Complex roots Region 3: Oscilating pseudoperiodic ACF
iv) Complex roots Region 4: Pseudoperiodic ACF

c Danilo P. Mandic Digital Signal Processing 28


Yule–Walker Equations
Substituting p = 2 into Y-W equations we have
ρ1 = a1 + a2ρ1
ρ2 = a1ρ1 + a2

which when solved for a1 and a2 gives


ρ1(1 − ρ2) ρ2 − ρ21
a1 = a2 =
1 − ρ21 1 − ρ21
or substituting in the equation for ρ

a1
ρ1 =
1 − a2
a21
ρ2 = a2 +
1 − a2

c Danilo P. Mandic Digital Signal Processing 29


Variance and Spectrum

More specifically, for the AR(2) process, we have:-


Variance
2 2
 
2 σw 1 − a2 σw
σx = =
1 − ρ1a1 − ρ2a2 1 + a2 (1 − a2)2 − a21

Spectrum

2
2σw
Pxx(f ) = 2
|1 − a1e−2πf − a2e−4πf |
2
2σw
= , 0 ≤ f ≤ 1/2
1 + a21 + a22 − 2a1(1 − a2 cos(2πf ) − 2a2 cos(4πf ))

c Danilo P. Mandic Digital Signal Processing 30


Example AR(2): x[n] = 0.75x[n − 1] − 0.5x[n − 2] + w[n]

x[n] = 0.75*x[n−2] − 0.5*x[n−1] + w[n]


20

10

Signal values
0

−10

−20
0 50 100 150 200 250 300 350 400 450 500
Sample Number
ACF
1
Correlation

0.5

−0.5
0 5 10 15 20 25 30 35 40 45 50
Correlation lag
Power/frequency (dB/rad/sample)

Burg Power Spectral Density Estimate


15

10

−5

−10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

√ Normalized Frequency (×π rad/sample)

cos−1 (0.5303) 1
The damping factor D = 0.5 = 0.71, frequency f0 = 2π = 6.2
The fundamental period of the autocorrelation function is 6.2.

c Danilo P. Mandic Digital Signal Processing 31


Partial Autocorrelation Function:- Motivation
Let us revisit example from page 21 of Lecture Slides.
AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w)
6 2000

1500
1500
4

ACF of AR(2) signal

ACF of AR(2) signal


AR(2) signal values

1000
1000
2

500 500

0
0 0

−2
−500
−500

−4
−1000
−1000

−6 −1500
0 50 100 150 200 250 300 350 400 −400 −300 −200 −100 0 100 200 300 400 −20 −10 0 10 20
Sample number Correlation lag Correlation lag

We do not know p, let us re-write the coefficients as [a_1p,...,a_p

p = 1 # [0.6689] = a11 p = 2 # [1.2046, −0.8008] = [a21, a22]


p = 3 #[1.1759, −0.07576, −0.0358] = [a31, a32, a33]
p = 4 # [1.1762, −0.7513, −0.0456, 0.0083] = [a41, a42, a43, a44]
p = 5 # [1.1763, −0.7520, −0.0562, 0.0248, −0.0140] = [a51, . . . , a55]
p = 6 # [1.1762, −0.7518, −0.0565, 0.0198, −0.0062, −0.0067] =
[a61, . . . , a66]

c Danilo P. Mandic Digital Signal Processing 32


Partial Autocorrelation Function

Notice: ACF of AR(p) infinite in duration, but can by be described in


terms of p nonzero functions ACFs.
Denote by akj the jth coefficient in an autoregressive representation of
order k, so that akk is the last coefficient. Then

ρj = akj ρj−1 + · · · + ak(k−1)ρj−k+1 + akk ρj−k j = 1, 2, . . . , k

leading to the Yule–Walker equation, which can be written as


    
1 ρ1 ρ2 ··· ρk−1 ak1 ρ1
 ρ1 1 ρ1 ··· ρk−2   ak2   ρ2 
= . 

 .. .. .. ... ..  
 .
 .   . 
ρk−1 ρk−2 ρk−3 ··· 1 akk ρk

c Danilo P. Mandic Digital Signal Processing 33


Partial ACF Coefficients:

Solving these equations for k = 1, 2, . . . successively, we obtain

1 ρ1 ρ1
ρ1 1 ρ2
ρ2 − ρ21 ρ2 ρ1 ρ3
a11 = ρ1, a22 = , a33 = , etc
1 − ρ21 1 ρ1 ρ2
ρ1 1 ρ1
ρ2 ρ1 1

• The quantity akk , regarded as a function of lag k, is called the partial


autocorrelation function.

• For an AR(p) process, the PAC akk will be nonzero for k ≤ p and zero
for k > p ⇒ tells us the order of an AR(p) process.

c Danilo P. Mandic Digital Signal Processing 34


Importance of Partial ACF
For a zero mean process x[n], the best linear predictor in the mean
square error sense of x[n] based on x[n − 1], x[n − 2], . . . is
x̂[n] = ak−1,1x[n − 1] + ak−1,2x[n − 2] + · · · + ak−1,k−1 x[n − k + 1]

(apply the E{·} operator to the general AR(p) model expression, and
recall that E{w[n]} = 0)
(Hint:
E{x[n]} = x̂[n] = E {ak−1,1 x[n − 1] + · · · + ak−1,k−1 x[n − k + 1] + w[n]} =
ak−1,1 x[n − 1] + · · · + ak−1,k−1 x[n − k + 1]) )
whether the process is an AR or not

In MATLAB, check the function:


ARYULE
and functions
PYULEAR, ARMCOV, ARBURG, ARCOV, LPC, PRONY

c Danilo P. Mandic Digital Signal Processing 35


Model order for Sunspot numbers

Sunspot series ACF for sunspot series


150 1

100

Signal values
0.5

Correlation
50

0
0

−50 −0.5
0 100 200 300 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for sunspot series Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
1 35

30
0.5
25
Correlation

0 20

15
−0.5
10

−1 5
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

Sunspot numbers, their ACF and partial autocorrelation (PAC)


After lag k = 2, the PAC becomes very small

c Danilo P. Mandic Digital Signal Processing 36


Model order for AR(2) generated process

AR(2) signal ACF for AR(2) signal


8 1

4 0.5

Signal values

Correlation
2

0 0

−2

−4 −0.5

−6

−8 −1
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for AR(2) signal Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
0.8 15

0.6 10
0.4
5
Correlation

0.2
0
0
−5
−0.2
−10
−0.4

−0.6 −15

−0.8 −20
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

AR(2) signal, its ACF and partial autocorrelation (PAC)


After lag k = 2, the PAC becomes very small

c Danilo P. Mandic Digital Signal Processing 37


Model order for AR(3) generated process

AR(3) signal ACF for AR(3) signal


10 1

5 0.8

Signal values

Correlation
0 0.6

−5 0.4

−10 0.2

−15 0
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for AR(3) signal Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
0.4 30

0.2
20
0
Correlation

10
−0.2

−0.4
0

−0.6
−10
−0.8

−1 −20
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

AR(3) signal, its ACF and partial autocorrelation (PAC)


After lag k = 3, the PAC becomes very small

c Danilo P. Mandic Digital Signal Processing 38


Model order for a financial time series
From:- http://finance.yahoo.com/q/ta?s=%5EIXIC&t=1d&l=on&z=m&q=b&p=v&a=&c=
Nasdaq ascending Nasdaq descending
Nasdaq composite June 2003 − February 2007 Nasdaq composite February 2007 − June 2003
2600 2600

2400 2400
Nasdaq value

Nasdaq value
2200 2200

2000 2000

1800 1800

1600 1600

1400 1400
0 500 1000 1500 2000 0 500 1000 1500 2000
Day number Day number
ACFx of
10 Nasdaq composite June 2003 − February 2007
7 ACFx of
10 Nasdaq composite June 2003 − February 2007
7

7 7

6 6

5 5

4 4
ACF value

ACF value
3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3
−2000 −1500 −1000 −500 0 500 1000 1500 2000 −2000 −1500 −1000 −500 0 500 1000 1500 2000
Correlation lag Correlation lag

c Danilo P. Mandic Digital Signal Processing 39


Partial ACF for financial time series

a = 1.0000 -0.9994

a = 1.0000 -0.9982 -0.0011

a = 1.0000 -0.9982 0.0086 -0.0097

a = 1.0000 -0.9983 0.0086 -0.0128 0.0030

a = 1.0000 -0.9983 0.0086 -0.0128 0.0026 0.0005

a = 1.0000 -0.9983 0.0086 -0.0127 0.0026 0.0017 -0.0

c Danilo P. Mandic Digital Signal Processing 40


Model Order Selection – Practical issues
In practice – the greater the model order the higher the accuracy
⇒ When do we stop?

To save on computational complexity, we introduce “penalty” for a high


model order. The criteria for model order selection are, for instance MDL
(minimum description length - Rissanen), AIC (Akaike Information
criterion), given by

p ∗ log(N )
M DL = log(E) +
N
AIC = log(E) + 2p/N

E = the loss function (typically cumulative squared error,


p = the number of estimated parameters
N = the number of estimated data.

c Danilo P. Mandic Digital Signal Processing 41


Example:- Model order selection – MDL vs AIC
Let us have a look at the squared error and the MDL and AIC criteria for
an AR(2) model with
a1 = 0.5 a2 = −0.3
MDL for AR(2)
1
MDL
0.98 Cumulative Squared Error

0.96

0.94

0.92

0.9

0.88
1 2 3 4 5 6 7 8 9 10

AIC for AR(2)


1
AIC
0.98 Cumulative Squared Error

0.96

0.94

0.92

0.9

0.88
1 2 3 4 5 6 7 8 9 10
AR(2) Model Order

(Model error)2 versus the model order p

c Danilo P. Mandic Digital Signal Processing 42


Moving Average Processes
A general MA(q) process is given by
x[n] = w[n] + b1w[n − 1] + · · · + bq w[n − q]

Autocorrelation function: The autocovariance function of MA(q)


ck = E[(w[n] + b1w[n − 1] + · · · + bq w[n − q])(w[n − k]
+b1w[n − k − 1] + · · · + bq w[n − k − q])]

Hence the variance of the process


c0 = (1 + b21 + · · · + b2q )σw
2

The ACF of an MA process has a cutoff after lag q.

Spectrum: All zeros ⇒ struggles to model PSD with peaks


2
P (f ) = 2σw 1 + b1e−2πf + b2e−4πf + · · · + bq e−2πqf

c Danilo P. Mandic Digital Signal Processing 43


Example:- MA(3) process

MA(3) signal ACF for MA(3) signal


3 1.2

1
2
0.8

Signal values

Correlation
1
0.6

0.4
0

0.2
−1
0

−2 −0.2
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for MA(3) signal Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
0.5 −4

0.4
−6
0.3
−8
Correlation

0.2

0.1 −10

0
−12
−0.1
−14
−0.2

−0.3 −16
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

MA(3) model, its ACF and partial autocorrelation (PAC)


After lag k = 3, the ACF becomes very small

c Danilo P. Mandic Digital Signal Processing 44


Analysis of Nonstationary Signals
Speech Signal
1
W1 W2 W3

Signal values
0.5

0
2000 3000 4000 5000 6000 7000 8000 9000 10000
Sample Number
Partial ACF for W1 Partial ACF for W2 Partial ACF for W3
0 1 1

−1 0.5 0.5
Correlation

Correlation

Correlation
1 0 0

0 −0.5 −0.5

−1 −1 −1
0 25 50 0 25 50 0 25 50
Correlation lag Correlation lag Correlation lag
MDL calculated for W1 MDL calculated for W2 MDL calculated for W3
1 1 1

0.8 Calculated Calculated Model 0.8 Calculated Model


Model Order = 13 Order > 50 Order = 24
MDL

MDL

MDL
0.6 0.5 0.6

0.4 0.4

0.2 0 0.2
0 25 50 0 25 50 0 25 50
Model Order Model Order Model Order

Different AR models for different segments of speech


To deal with nonstationarity we need short sliding windows

c Danilo P. Mandic Digital Signal Processing 45


Duality Between AR and MA Processes

i) A stationary finite AR(p) process can be represented as an infinite order


MA process. A finite MA process can be represented as an infinite AR
process.

ii) The finite MA(q) process has an ACF that is zero beyond q. For an AR
process, the ACF is infinite in extent and consits of mixture of damped
exponentials and/or damped sine waves.

iii) Parameters of finite MA process are not required to satisfy any


condition for stationarity. However, for invertibility, the roots of the
characteristic equation must lie inside the unit circle.

ARMA modelling is a classic technique which has found a


tremendous number of applications

c Danilo P. Mandic Digital Signal Processing 46


3
SPECTRAL FACTORIZATION

As we will see, there is an infinite number of time functions with any given spectrum.
Spectral factorization is a method of finding the one time function which is also
minimum phase. The minimum-phase function has many uses. It, and it alone,
may be used for feedback filtering. It will arise frequently in wave propagation
problems of later chapters. It arises in the theory of prediction and regulation for
the given spectrum. We will further see that it has its energy squeezed up as close
as possible to t = 0. It determines the minimum amount of dispersion in viscous
wave propagation which is implied by causality. It finds application in two-dimen-
sional potential theory where a vector field magnitude is observed and the com-
ponents are to be inferred.
This chapter contains four computationally distinct methods of computing
the minimum-phase wavelet from a given spectrum. Being distinct, they offer
separate insights into the meaning of spectral factorization and minimum phase.

3-1 ROOT METHOD


The time function (2, 1) has the same spectrum as the time function (1, 2). The
autocorrelation is (2, 5, 2). We may utilize this observation to explore the multi-
plicity of all time functions with the same autocorrelation and spectrum. It would
50 FUNDAMENTALS OF GEOPHYSICAL DATA PROCESSING

seem that the time reverse of any function would have the same autocorrelation
as the function. Actually, certain applications will involve complex time series;
therefore we should make the more precise statement that any wavelet and its
complex-conjugate time-reverse share the same autocorrelation and spectrum. Let
us verify this for simple two-point time functions. The spectrum of (b,, b,) is

The conjugate-reversed time function (ti,, 6,) with Z transform Br(Z) = 6, + 6,Z
has a spectrum

We see that the spectrum (3-1-1) is indeed identical to (3-1-2). Now we wish to
extend the idea to time functions with three and more points. Full generality may
be observed for three-point time functions, say B(Z) = b, + +
b,Z b 2 z 2 . First,
we call upon the fundamental theorem of algebra (which states that a polynomial
of degree n has exactly n roots) to write B(Z) in factored form.

Its spectrum is

Now, what can we do to change the wavelet (3-1-3) which will leave its
spectrum (3- 1-4) unchanged ? Clearly, b, may be multiplied by any complex num-
ber of unit magnitude. What is left of (3-1-4) can be broken up into a product of
factors of form (Zi - l/Z)(Zi - Z). But such a factor is just like (3-1-1). The time
function of (Zi - Z) is (Zi , - l), and its complex-conjugate time-reverse is (- 1, Zi).
Thus, any factor (Zi - Z ) in (3-1-3) may be replaced by a factor ( - 1 + ZiZ). In a
generalization of (3-1-3) there could be N factors [(Zi - Z), i = 1, 2, . . . , N)]. Any
combination of them could be reversed. Hence there are 2Ndifferent wavelets which
may be formed by reversals, and all of the wavelets have the same spectrum. Let us
look off the unit circle in the complex plane. The factor (Zi - Z ) means that Zi is
+
a root of both B(Z) and R(Z). If we replace (Zi - Z) by (- 1 ZiZ) in B(Z), we
have removed a root at Zi from B(Z) and replaced it by another at Z = l/Zi. The
roots of R(Z) have not changed a bit because there were originally roots at both
Zi and l/Zi and the reversal has merely switched them around. Summarizing the
situation in the complex plane, B(Z) has roots Zi which occur anywhere, R(Z) must
FIGURE 3-1
Roots of B ( l / Z )B(Z).

have all the roots Zi and, in addition, the roots l / z i . Replacing some particular
root Zi by l/Zi changes B(Z) but not R(Z). The operation of replacing a root at
Zi by one at l/Zi may be written as

The multipyling factor is none other than the all-pass filter considered in an earlier
chapter. With that in mind, it is obvious that B'(Z) has the same spectrum as B(Z).
In fact, there is really no reason for Zi to be a root of B(Z). If Zi is a root of B(Z),
then B'(Z) will be a polynomial; otherwise it will be an infinite series.
Now let us discuss the calculation of B(Z) from a given R(Z). First, the roots
of R(Z) are by definition the solutions to R(Z) = 0. If we multiply R(Z) by ZN
(where R(Z) has been given up to degree N), then Z ~ R ( Z )is a polynomial and the
solutions Zi to Z N ~ ( Z=) 0 will be the same as the solutions of R(Z) = 0. Finding
all roots of a polynomial is a standard though difficult task. Assuming this to have
been done we may then check to see if the roots come in the pairs Z i and l/Zi.
If they do not, then R(Z) was not really a spectrum. If they do, then for every
zero inside the unit circle, we must have one outside. Refer to Fig. 3-1. Thus,
if we decide to make B(Z) be a minimum-phase wavelet with the spectrum R(Z),
we collect all of the roots outside the unit circle. Then we create B(Z) with

This then summarizes the calculation of a minimum-phase wavelet from a


given spectrum. When N is large, it is computationally very awkward compared
to methods yet to be discussed. The value of the root method is that it shows
certain basic principles.
I Every spectrum has a minimum-phase wavelet which is unique within a
complex scale factor of unit magnitude.
2 There are infinitely many time functions with any given spectrum.
3 Not all functions are possible autocorrelation functions.
The root method of spectral factorization was apparently developed by
economists in the 1920s and 1930s. A number of early references may be found in
Wold's book, Stationary Time Series [Ref. 101.

EXERCISES
I How can you find the scale factor bNin (3-1-6)?
2 Compute the autocorrelation of each of the four wavelets (4,0, -I), (2, 3, -2),
(-2, 39% (LO, -4).
3 A power spectrum is observed to fit the form P(w) = 38 + 10 cos u - 12 cos 2w.
What are some wavelets with this spectrum? Which is minimum phase? [HINT:
cos 2w = 2 cos2 w - 1; 2 cos o = Z + 1/Z; use quadratic formula.]
4 Show that if a wavelet b, = (bo, bl ,..., 6,) is real, the roots of the spectrum R come in
the quadruplets Zo, l/Zo, z o , and l/Zo. Look into the case of roots exactly on the
unit circle and on the real axis. What is the minimum multiplicity of such roots?

3-2 ROBINSON'S ENERGY DELAY THEOREM [Ref. 111


We will now show that a minimum-phase wavelet has less energy delay than any
other one-sided wavelet with the same spectrum. More precisely, we will show
that the energy summed from zero to any time t for the minimum-phase wavelet is
greater than or equal to that of any other wavelet with the same spectrum. Refer
t o Fig. 3-2.
We will compare two wavelets P , , and Pout which are identical except for
one zero, which is outside the unit circled for Pout and inside for P i , . We may
write this as
POU,(Z) = (b + sZ)P(Z)
Pi,(Z) = ( s bZ)P(Z) +
where b is bigger than s and P is arbitrary but of degree n. Next we tabulate the terms
in question.

n Time

FIGURE 3-2
Percent of total energy in a filter between time 0 and time t.
SPECTRAL FACTORIZATION 53

t
t Pout Pin Pkt - E n 2 (Pkt -pi%)
k=O

The difference, which is given in the right-hand column, is clearly always positive.
To prove that the miminum-phase wavelet delays energy the least, the pre-
ceding argument is repeated with each of the roots until they are all outside the
unit circle.

EXERCISE
I Do the foregoing minimum-energy-delay proof for complex-valued b, s, and P.
[CAUTION: + +
Does Pi,= (s bZ)P or Pin= (S bZ)P?]

3-3 THE TOEPLITZ METHOD


The Toeplitz method of spectral factorization is based on special properties of
Toeplitz matrices [Ref. 121. In this chapter we introduce the Toeplitz matrix to
perform spectral factorization. In later chapters we will refer back several times
to the algebra described here. When one desires to predict a time series, one can
do this with a so-called prediction filter. This filter is found as the solution to
Toeplitz simultaneous equations. Norman Levinson, in his explanatory appendix
of Norbert Wiener's Time Series, first introduced the Toeplitz matrix to engineers;
however, it had been widely known and used previously in the field of econometrics.
It is only natural that it should appear first in economics because there the data
are observed at discrete time points, whereas in engineering the idea of discretized
time was rather artificial until the advent of digital computers. The need for pre-
diction in economics is obvious. In seismology, it is not the prediction itself but
the error in prediction which is of interest. Reflection seismograms are used in
petroleum exploration. Ideally, the situation is like radar where the delay time is
in direct proportion to physical distance. This is the case for the so-called primary
reflections. A serious practical complication arises in shallow seas where large
acoustic waves bounce back and forth between the sea surface and the sea floor.
These are called multiple reflections. A mechanism for separation of the primary
waves from the multiple reflections is provided by prediction. A multiple reflection
is predictable from earlier echoes, but a primary reflection is not predictable from
earlier echoes. Thus, the useful information is carried in the part of the seismo-
gram which is not predictable. An oil company computer devoted to interpreting
54 FUNDAMENTALS OF GEOPHYSICAL DATA PROCESSING

seismic exploration data typically solves about 100,000 sets of Toeplitz simultaneous
equations in a day.
Another important application of the algebra associated with Toeplitz
matrices is in high-resolution spectral analysis. This is where a power spectrum is
to be estimated from a sample of data which is short (in time or space). The con-
ventional statistical and engineering knowledge in this subject is based on assump-
tions which are frequently inappropriate in geophysics. The situation was fully
recognized by John P. Burg who utilized some of the special properties of Toeplitz
matrices to develop his maximum-entropy spectral estimation procedure described
in a later chapter.
Another place where Toeplitz matrices play a key role is in the mathematical
physics which describes layered materials. Geophysicists often model the earth by
a stack of plane layers or by concentric spherical shells where each shell or layer
is homogeneous. Surprisingly enough, many mathematical physics books do not
mention Toeplitz matrices. This is because they are preoccupied with forward
problems; that is, they wish to calculate the waves (or potentials) observed in a
known configuration of materials. In geophysics, we are interested in both forward
problems and in inverse problems where we observe waves on the surface of the
earth and we wish to deduce material configurations inside the earth. A later
chapter contains a description of how Toeplitz matrices play a central role in such
inverse problems.
We start with a time function x,which may or may not be minimum phase.
Its spectrum is computed by R(Z) = ~ ( ~ / z ) x ( z ) .As we saw in the preceding sec-
tions, given R(Z) alone there is no way of knowing whether it was computed from
a minimum-phase function or a nonminimum-phase function. We may suppose
that there exists a minimum phase B(Z) of the given spectrum, that is, R(Z) =
B(l/Z) B(Z). Since B(Z) is by hypothesis minimum phase, it has an inverse
A(Z) = l/B(Z). We can solve for the inverse A(Z) in the following way:

To solve for A(Z), we identify coefficients of powers of 2. For the case where, for
example, A(Z) is the quadratic a, + a , Z + a 2 Z 2 , the coefficient of Z0 in (3-3-2)
is

The coefficient of Z' is

and the coefficient of z2is


SPECTRAL FACTORIZATION 55

Bringing these together we have the simultaneous equations

It should be clear how to generalize this to a set of simultaneous equations of


arbitrary size. The main diagonal of the matrix contains r, in every position. The
diagonal just below the main one contains r, everywhere. Likewise, the whole
matrix is filled. Such a matrix is called a Toeplitz matrix. Let us define a; = a,/a,.
Recall by the polynomial division algorithm that 6 , = llii,. Define a positive
number 2; = lla, G o . Now, dividing the vector on each side of (3-3-4) by a,, we
get the most popular form of the equations

This gives three equations for the three unknowns a;, a;, and v. To put (3-3-5)
in a form where standard simultaneous equations programs could be used one
would divide the vectors on both sides by v. After solving the equations, we get
a, by noting that it has magnitude I / & and its phase is arbitrary, as with the root
method of spectral factorization.
At this point, a pessimist might interject that the polynomial A(Z) = a,+
a,Z + a , z 2 determined from solving the set of simultaneous equations might
not turn out to be minimum phase, so that we could not necessarily compute B(Z)
by B ( Z ) = l / A ( Z ) . The pessimist might argue that the difficulty would be especially
likely to occur if the size of the set (3-3-5) was not taken to be large enough.
Actually experimentalists have known for a long time that the pessimists were
wrong. A proof can now be performed rather easily, along with a description of
a computer algorithm which may be used to solve (3-3-5).
The standard computer algorithms for solving simultaneous equations require
time proportional to n3 and computer memory proportional to n2. The Levinson
computer algorithm [Ref. 131 for Toeplitz matrices requires time proportional to
n2 and memory proportional to n. First notice that the Toeplitz matrix contains
many identical elements. Levinson utilized this special Toeplitz symmetry to
develop his fast method.
The method proceeds by the approach called recursion. That is, given the
solution to the k x k set of equations, we show how to calculate the solution to the
(k + 1) x (k + 1) set. One must first get the solution for k = 1 ;then one repeatedly
(recursively) applies a set of formulas increasing k by one at each stage. We will
show how the recursion works for real-time functions (r, = r - , ) going from the
3 x 3 set of equations to the 4 x 4 set, and leave it to the reader to work out the
general case.
Given the 3 x 3 simultaneous equations and their solution ai
then the following construction defines a quantity e given r3 (or r3 given e)

The first three rows in (3-3-7) are the same as (3-3-6); the last row is the new defi-
nition of e. The Levinson recursion shows how to calculate the solution a' to the
4 x 4 simultaneous equations which is like (3-3-6) but larger in size.

The important trick is that from (3-3-7) one can write a " reversed" system
of equations. (If you have trouble with the matrix manipulation, merely write out
(3-3-8) as simultaneous equations, then reverse the order of the unknowns, and
then reverse the order of the equations.)

The Levinson recursion consists of subtracting a yet unknown portion c , of (3-3-9)


from (3-3-7) so as to get the result (3-3-8). That is

To make the right-hand side of (3-3-10) look like the right-hand side of (3-3-8), we
have to get the bottom element to vanish, so we must choose c3 = e/v. This
implies that v' = u - c3 e = v - e2/zj = v[l - ( e / ~ ) ~ ]Thus,
. the solution to the
4 x 4 system is derived from the 3 x 3 by

We have shown how to calculate the solution of the 4 x 4 Toeplitz equations


from the solution of the 3 x 3 Toeplitz equations. The Levinson recursion consists
of doing this type of step, starting from 1 x 1 and working up to n x n.
Let us reexamine the calculation to see why A(Z) turns out to be minimum
SPECTRAL FACTORIZATION 57

COMPLEX R, A, C ,E ,BOT ,CONJG


C(l)=-1. ; R(l)=l. ; A(l)=l. ; V(l)=l.
200 DO 220 J=2,N
A(J)=O.
E=O.
DO 210 I=2,J
210 E=E+R(I) *A(J-I+1)
C (J)=E/V (J-1)
FIGURE 3-3 V(J)=V(J-1)-E*CONJG(C (J) )
JH= (J+l) / 2
A computer program to do the Levinson DO 220 I=l,JH
recursion. It is assumed that the input rk BOT=A(J-1+1)-c (J) *CONJG (A(I) )
have been normalized by division by ro . A(I)=A(I)-C(J)*CONJG(A(J-1+1))
The complex arithmetic is optional. 220 A(J-I+~)=BoT

phase. First, we notice that u = l/Z, a, and u' = lliida6 are always positive. Then
from (3-3-13) we see that - 1 < e/u < + 1. (The fact that c = e/u is bounded by
unity will later be shown to correspond to the fact that reflection coefficients for
waves are so bounded.) Next, (3-3-12) may be written in polynomial form as
A ' ( Z ) = A ( Z ) - ( ~ / V ) Z ~ A ( ~ / (3-3-14)
Z)
We know that z3has unit magnitude on the unit circle. Likewise (for real time
series), the spectrum of A(Z) equals that of A(l/Z). Thus (by the theorem of adding
garbage to a minimum-phase wavelet) if A(Z) is minimum phase, then A1(Z) will
also be minimum phase. In summary, the following three statements are equivalent:

1 R(Z) is of the form X


2 Ickl < 1.
3 A(Z) is minimum phase.
If any one of the above three is false, then they are all false. A program for the
calculation of a, and c, from r, is given in Fig. 3-3. In Chap. 8, on wave propagation
in layers, programs are given to compute r, from a, or c,.

EXERCISES
I The top row of a 4 x 4 Toeplitz set of simultaneous equations like (3-3-8) is (1, a, ;Ik, a).
What is the solution ak?
2 How must the Levinson recursion be altered if time functions are complex? Specific-
ally, where d o complex conjugates occur in (3-3-1I), (3-3-12), and (3-3-13)?
3 Let A,(Z) denote a polynomial whose coefficients are the solution t o a n m x m set of
Toeplitz equations. Show that if Bk(Z)= Z k A k ( Z - ' )then
2n

vn a n m -
- 2
27l
j 0
R(Z)B.(Z)Z -" dm n5m

which means that the polynomial Bm(Z)is orthogonal to polynomial Z n over the unit
circle under the positive weighting function R. Utilizing this result, state why B, is
orthogonal to B , , that is,
V . 6.. = -
1 2n
2.rr 0
I
R(zlB.(z)~. dw (i)
(HINT: First consider n I m, then all n.)
Toeplitz matrices are found in the mathematical literature under the topic of poly-
nomials orthogonal on the unit circle. The author especially recommends Atkinson's
book (Ref. 14).

3-4 WHITTLE'S EXP-LOG METHOD [Ref. 151


In this method of spectral factorization we substitute power series into other power
series. Thus, like the root method, it is good for learning but not good for comput-
ing. We start with some given autocorrelation r , where

If I RI > 2 on the unit circle then a scale factor should be divided out. Insert this
power series into the power series for logarithms.
U ( Z ) = In R(Z)

Of course, in practice this would be a lot of effort, but it could be done in a syste-
matic fashion with a computer program. Now define U,' by dropping negative
powers of Z from U ( Z )

Insert this into the power series for the exponential

The desired minimum-phase wavelet is B(Z); its spectrum is R(Z). To see why
this is so, consider the following identities.

= exp
2
+z -1

-00
u,zk +
uo
++ x
1
ukz*)

= exp (5+ 2
2 -00
ukzk) exp ;( +1
00

1
U k ~ k )

= exp [u (;)I
+ exp Iu + (z)]

You might also like