0% found this document useful (0 votes)
91 views

Advanced Signal Processing Linear Stochastic Processes

This document introduces linear stochastic models for real world data. It aims to teach how stochastic processes are created and their autocorrelation, variance and power spectrum. Specific topics covered include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) models. It also discusses stability conditions, model order selection, and applying stochastic modelling to real world data like speech and finance.

Uploaded by

Alex Stihi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Advanced Signal Processing Linear Stochastic Processes

This document introduces linear stochastic models for real world data. It aims to teach how stochastic processes are created and their autocorrelation, variance and power spectrum. Specific topics covered include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) models. It also discusses stability conditions, model order selection, and applying stochastic modelling to real world data like speech and finance.

Uploaded by

Alex Stihi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Advanced Signal Processing

Linear Stochastic Processes


Danilo Mandic
room 813, ext: 46271

Department of Electrical and Electronic Engineering


Imperial College London, UK
[email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic


c D. P. Mandic Advanced Signal Processing 1
Aims of this lecture
◦ To introduce linear stochastic models for real world data

◦ Learn how stochastic models shape up the spectrum of white noise

◦ Understand how stochastic processes are created, and to get familiarised


with the autocorrelation, variance and power spectrum of such processes

◦ Learn how to derive the parameters of linear stochastic ARMA models

◦ Introduce special cases: autoregressive (AR), moving average (MA)

◦ Stability conditions and model order selection (partial correlations)

◦ Optimal model order selection criteria (MDL, AIC, ...)

◦ Apply stochastic modelling to real world data (speech, environmental,


finance), and address the issues of under– and over–modelling
This material is a first fundamental step for the modelling of real world data


c D. P. Mandic Advanced Signal Processing 2
Example 1: Assessing the nature of a signal from its ACF
Windowed clean signal, signal in WGN, signal with DC offset (see also Lecture 1)

function sin(2x) function sin(2x) + 2 * randn function sin(2x) + 4


1 8 5

6
4
0.5 4

2 3
0 0

−2 2

−0.5 −4
1
−6

−1 −8 0
−5 0 5 −5 0 5 −5 0 5

ACF of sin(2x) ACF of sin(2x) + 2*randn ACF of sin(2x) + 4


100 1200 3500

1000 3000
50
800 2500

600 2000
0
400 1500

200 1000
−50
0 500

−100 −200 0
−10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10

Which disturbance is more detrimental: deterministic DC or stochastic noise


c D. P. Mandic Advanced Signal Processing 3
How can we categorise real–world measurements?
2
where would you place a DC level in WGN, x[n] = A + w[n], w ∼ N (0, σw )

(a) Noisy oscillations, (b) Nonlinearity and noisy oscillations, (c) Random nonlinear process
(? left) Route to chaos, (? top) stochastic chaos, (? middle) mixture of sources
Nonlinearity
Chaos
?
(c) ?
(b)
? ?
? NARMA
(a)

? ? ?
ARMA
Linearity
Determinism Stochasticity
Our lecture is about ARMA models (linear stochastic)
How about observing the signal through a nonlinear sensor?


c D. P. Mandic Advanced Signal Processing 4
Justification # Wold decomposition theorem
(Existence theorem, also mentioned in your coursework)
Wold’s decomposition theorem plays a central role in time series analysis,
and explicitly proves that any covariance–stationary time series can be
decomposed into two different parts: deterministic (such as a sinewave)
and stochastic (filtered WGN).
Therefore, a general process can be written a sum of two processes
Xq
x[n] = xp[n] + xr [n] = xp[n] + bj w[n − j] w white process
j=1
⇒ xr [n] # regular random process
⇒ xp[n] # predictable process, with xr [n] ⊥ xp[n],
E{xr [m]xp[n]} = 0

we can treat separately the predictable part (e.g. a deterministic


sinusoidal signal) and the random signal.
Our focus will be on the modelling of the random component
NB: Recall the difference between shift–invariance and time–invariance


c D. P. Mandic Advanced Signal Processing 5
Towards linear stochastic processes
Wold’s theorem implies that any purely non-deterministic covariance–stationary
process can be arbitrarily well approximated by an ARMA process

Therefore, the general form for the power spectrum of a WSS process is
XN
Px(eω ) = αk δ(ω − ωk ) + Pxr (eω )
k=1
We are interested in processes generated by filtering white noise with a
linear shift–invariant filter that has a rational system function.
This class of digital filters includes the following system functions:
• Autoregressive (AR) → all pole system → H(z) = 1/A(z)
• Moving Average (MA) → all zero system → H(z) = B(z)
• Autoregressive Moving Average (ARMA) → poles and zeros
→ H(z) = B(z)/A(z)

Definition: A covariance-stationary process x[n] is called (linearly)

R deterministic if p(x[n] | x[n − 1], x[n − 2], . . .) = x[n].


A stationary deterministic process, xp[n], can be predicted correctly (with
zero error) using the entire past, xp[n − 1], xp[n − 2], x[n − 3], . . .


c D. P. Mandic Advanced Signal Processing 6
Recap: Second-order all–pole transfer functions
p1 = 0.999exp(jπ/4), p2 = 0.9exp(jπ/4), p3 = 0.9exp(j7π/12)
Im
60
1
p3 p1 50
p1 We have two conjugate
p2 p2
0.5 40 p3
complex poles, e.g. p1 and
7π/12
p∗1 , therefore

Magnitude
30
π/4
0 Re 20 1
H(z) =
−0.5
p2*
10
(z − p1)(z − p∗1 )
0
p3* p1*
−10 z −2
−1 =
−1 −0.5 0 0.5 1
−20
0 1 2 3 (1 − p1z −1)(1 − p∗1 z −1)
Frequency rad/s

Transfer function for p = ρejθ (ignoring z −2 in the numerator on the RHS):


1 1
H(z) = =
(1 − ρejθ z −1)(1 − ρe−jθ z −1) 1 − 2ρ cos(θ)z −1 + ρ2z −2

1 1
for the sinewave ρ = 1 ⇒ H(z) = =
1 − 2 cos(θ)z −1 + z −2 1 + a1z −1 + a2z −2
⇒ Indeed, a sinewave can be modelled as an autoregressive process


c D. P. Mandic Advanced Signal Processing 7
Example 2: Sinewave revisited, is it det. or stoch.?
Is a sinewave best described as nonlinear deterministic or linear stochastic?
1
Im agi nar y Par t

0.5
Matlab code:
0

−0.5

−1
z1=0;
−4 −3 −2 −1 0 1 2 3 4
p1=[0.5+0.866i,0.5-0.866i];
Re al Part
[num1,den1]=zp2tf(z1,p1,1);
4
zplane(num1,den1);
W hi t e noi s e

2 s=randn(1,1000);
0
s1=filter(num1,den1,s);
figure;
−2
subplot(311),plot(s),
−4
0 50 100 150 200 250 300 350
subplot(313),plot(s1),
subplot(312),;
30 zplane(num1,den1)
F i l t e r e d noi s e

20

10 The AR model of a
0
sinewave
−10
x(k)=a1*x(k-1)+a2*x(k-2)+w(k)
−20

−30
a1=-1, a2=0.98, w~N(0,1)
0 50 100 150 200 250 300 350


c D. P. Mandic Advanced Signal Processing 8
0

Example
−50 3: Spectra of real–world
−0.5 data
0 100 200 300 0 10 20 30 40 50
Sunspot numbers and their
Sample power spectrum
Number Correlation lag

Sunspot
Partial ACF forseries
sunspot series Burg PowerACF for sunspot
Spectral series
Density Estimate

(dB/rad/sample)
150 1 35 1

30
100 0.5
Signal values

250.5

Correlation
Correlation

50 0 20

Power/frequency
15 0
0 −0.5
10

−50 −1 5
−0.5
0 0 10
100 20 30
200 40 50
300 0 0 0.2 10 0.4 200.6 300.8 401 50
Correlation
Sample Number lag Normalized Frequency (×π rad/sample)
Correlation lag

Partial ACF for sunspot series


Recorded from Burgonwards
about 1700 Power Spectral Density Estimate
1 frequency (dB/rad/sample) 35

This signal is random, as sunspots originate30from the explosions of helium


on the
0.5 Sun. Still, the number of sunspots obeys a relatively simple model
25
Correlation

and is predictable, as shown later in the Lecture.


0 20

c D. P. Mandic Advanced Signal Processing 9
15
−0.5
How do we model a real world signal?
Suppose the measured real world signal has X(ω )

e.g. a bandpass (any other) power spectrum


We desire to describe the whole long
signal with very few parameters
bandpass ω
spectrum
1. Can we model first and second statistics of real world signal by shaping
up the white noise spectrum using some transfer function?
2. Does this produce the same second order properties (mean, variance,
ACF, spectrum) for any white noise input?
W( ω ) B(z) X(ω )
H(z) =
A(z)

2
Px = | H( ω )| Pw

flat (white) ω bandpass ω


spectrum spectrum
Can we use this linear stochastic model for prediction?


c D. P. Mandic Advanced Signal Processing 10
Spectrum of ARMA models (look also at Recap slides)
recall that two conjugate complex poles of A(z) give one peak in the spectrum

ACF ≡ P SD in terms of the information available


In ARMA modelling we filter white noise w[n] (so called driving input)
with a causal linear shift–invariant filter with the transfer function H(z), a
rational system function with p poles and q zeros given by
Pq −k
Bq (z) k=0 bk z
X(z) = H(z)W (z) # H(z) = =
Ap(z) 1 + pk=1 ak z −k
P

For a stable H(z), the ARMA(p,q) stochastic process x[n] will be


2
wide–sense stationary. For the driving noise power Pw = σw , the power of
the stochastic process x[n] is (recall Py = |H(z)|2Px = H(z)H ∗(z)Px)

Bq (eθ ) 2 2

−1
2 Bq (z)Bq (z ) 2 |Bq (ω)|
Px(z) = σw ⇒ Pz (eθ ) = σw
2
2 = σw 2
Ap(z)Ap(z −1) |Ap (eθ )| |Ap(ω)|

Notice that “(·) ” in analogue frequency corresponds to “z −1” in “digital freq.”


c D. P. Mandic Advanced Signal Processing 11
Example 4: Can the shape of power spectrum tell us
about the order of the polynomials B(z) and A(z)?
Plot the power spectrum of an ARMA(2,2) process for which
◦ the zeros of H(z) are z = 0.95e±π/2

◦ poles are at z = 0.9e±2π/5

Solution: The system function is (poles and zeros – resonance & sink)
1 + 0.9025z −2
H(z) =
1 − 0.5562z −1 + 0.81z −2
7
Power Spectrum [dB]

−1
0 0.5 1 1.5 2 2.5 3 3.5
Frequency


c D. P. Mandic Advanced Signal Processing 12
Difference equation representation # the ACF follows
the data model!
Random processes x[n] and w[n] are related by a linear difference equation
with constant coefficients, given by
Pq −k p q
B(z) k=0 bk z
X X
H(z)= = P p −k
↔ ARMA(p,q) ↔ x[n] = alx[n − l] + blw[n − l]
A(z) 1 + k=1 ak z
|l=1 {z } |l=0 {z }
autoregressive moving average

Notice that the autocorrelation function of x[n] and crosscorrelation


between the stochastic process x[n] and the driving input w[n] follow
the same difference equation, i.e. if we multiply both sides of the above
equation by x[n − k] and take the statistical expectation, we have
p
X q
X
rxx(k) = al rxx(k − l) + bl rxw (k − l) (see Slide 18)
|l=1 {z } |l=0 {z }
easy to calculate can be complicated

Since x is WSS, it follows that x[n] and w[n] are jointly WSS


c D. P. Mandic Advanced Signal Processing 13
General linear processes: Stationarity and invertibility
Can we tell anything about the process x from the coefficients a, b (cf. h in FIR)

Consider a linear stochastic process # output from a linear filter, driven by


WGN, denoted by w[n]. (NB: Here w is “input” and x “output”)

X
x[n] = w[n] + b1w[n − 1] + b2w[n − 2] + · · · = w[n] + bj w[n − j]
j=1
that is, a weighted sum of past samples of driving white noise w[n].
For this process to be a valid stationary
P∞ process, the coefficients must be
absolutely summable, that is j=0 |bj | < ∞.
This also implies that under stationarity conditions, x[n] is also a weighted
sum of past values of x, plus an added shock w[n], that is
x[n] = a1x[n − 1] + a2x[n − 2] + · · · + w[n]
P∞
◦ Linear Process is stationary if j=0 |bj | < ∞
P∞
◦ Linear Process is invertible if j=0 |aj | < ∞
P∞ −ωn
P∞
Recall that H(ω) = n=0 h(n)e → for ω = 0 ⇒ H(0) = n=0 h(n)


c D. P. Mandic Advanced Signal Processing 14
Autoregressive processes (pole–only)

A general AR(p) process (autoregressive of order p) is given by


p
X
x[n] = a1x[n − 1] + · · · + apx[n − p] + w[n] = aix[n − i] + w[n] = aT x[n] + w[n]
i=1
Observe the auto–regression in {x[n]} # the past of x is used to generate the future

Duality between the AR and MA processes:


For example, the first order autoregressive process, AR(1),

X
x[n] = a1x[n − 1] + w[n] ⇔ bj w[n − j]
j=0

has an MA representation, too (right hand side above).


This follows from the duality between IIR and FIR filters.


c D. P. Mandic Advanced Signal Processing 15
Example 5: Statistical properties of AR processes
Drive the AR(4) model from Example 6 with two different WGN realisations ∼ N (0, 1)
4 4 3000 20
Value of random signal

PSD (dB/rad/sample)
2000

Value of AR signal

ACF of AR signal
2 2
1000 10

0 0 0

-1000 0
-2 -2
-2000

-4 -4 -3000 -10
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
Sample number Sample number Correlation lag Normalized Frequency (× π rad/sample)
4 4 3000 20
Value of random signal

PSD (dB/rad/sample)
2000
Value of AR signal

ACF of AR signal
2 2
1000 10

0 0 0

-1000 0
-2 -2
-2000

-4 -4 -3000 -10
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
Sample number Sample number Correlation lag Normalized Frequency (× π rad/sample)

r = wgn(2048,1,1); ◦ The time domain random AR(4)


a = [2.2137, -2.9403, 2.1697, -0.9606]; processes look different
a = [1 -a]; ◦ The ACFs and PSDs are exactly
x = filter(1,a,r); the same (2nd-order stats)!
xacf = xcorr(x); ◦ This signifies the importance
xpsd = abs(fftshift(fft(xacf))); of our statistical approach


c D. P. Mandic Advanced Signal Processing 16
ACF and normalised ACF of AR processes
Key: ACF has the same form as the AR process in hand!
To obtain the autocorrelation function of an AR process, multiply the
above equation by x[n − k] to obtain (recall that r(−m) = r(m))

x[n − k]x[n] = a1x[n − k]x[n − 1] + a2x[n − k]x[n − 2] + · · ·


+apx[n − k]x[n − p] + x[n − k]w[n]

Notice that E{x[n − k]w[n]} vanishes when k > 0. Therefore, we have

2
rxx(0) = a1rxx(1) + a2rxx(2) + · · · + aprxx(p) + σw , k=0
rxx(k) = a1rxx(k − 1) + a2rxx(k − 2) + · · · + aprxx(k − p), k>0

On dividing throughout by rxx(0) we obtain

ρ(k) = a1ρ(k − 1) + a2ρ(k − 2) + · · · + apρ(k − p) k > 0

Quantities ρ(k) are called normalised correlation coefficients


c D. P. Mandic Advanced Signal Processing 17
Variance and spectrum of AR processes
Variance:
2
For k = 0, the contribution from the term E{x[n − k]w[n]} is σw , and
2
rxx(0) = a1rxx(−1) + a2rxx(−2) + · · · + aprxx(−p) + σw

Divide by rxx(0) = σx2 to obtain


2
σw
σx2 =
1 − ρ1a1 − ρ2a2 − · · · − ρpap

Power spectrum: (recall that Pxx = |H(z)|2Pww = H(z)H ∗(z)Pww , the


expression for the output power of a linear system → see Appendix)

2
2σw
Pxx(f ) = 2 0 ≤ f ≤ 1/2
|1 − a1 e−2πf − · · · − ap e−2πpf |

Fro more detail: “Spectrum of Linear Systems” from Lecture 1: Background


c D. P. Mandic Advanced Signal Processing 18
Example 6a: AR(p) signal generation
Consider an AR(4) process with coeff. a = [2.2137, −2.9403, 2.1697, −0.9606]

Yule-Walker Power Spectral Density Estimate


◦ Generate the input signal 30
256 points

Power/frequency (dB/rad/sample)
x by filtering white noise 1024 points
20
through the AR filter
10
◦ Estimate the PSD of x
based on a fourth-order 0

AR model -10

◦ Careful! The Matlab -20


routines require the AR
-30
coeff. a in the format 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
a = [1, −a1, . . . , −ap]
Notice the dependence on data length
Solution:
randn(’state’,1);
x = filter(1,a,randn(256,1)); % AR system output
pyulear(x,4) % Fourth-order estimate


c D. P. Mandic Advanced Signal Processing 19
Example 6b: Alternative AR power spectrum calculation
(an alternative function in Matlab)
Consider the AR(4) system given by
x[n] = 2.2137x[n−1]−2.9403x[n−2]+2.1697x[n−3]−0.9606x[n−4]+w[n]
a = [1 -2.2137 2.9403 -2.1697 0.9606]; % AR filter coefficients
freqz(1,a) % AR filter frequency response
title(’AR System Frequency Response’)
AR System Frequency Response
40
Magnitude (dB)

20

-20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
100
Phase (degrees)

-100

-200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)


c D. P. Mandic Advanced Signal Processing 20
Key: Finding AR coefficients # the Yule–Walker eqns
(there are several similar forms – we follow the most concise one)

For k = 1, 2, . . . , p from the general AR(p) autocorrelation function, we


obtain the set of equations:
rxx(1) = a1rxx(0) + a2rxx(1) + · · · + aprxx(p − 1)
rxx(2) = a1rxx(1) + a2rxx(0) + · · · + aprxx(p − 2)
.. = ..

rxx(p) = a1rxx(p − 1) + a2rxx(p − 2) + · · · + aprxx(0)

These equations are called the Yule–Walker or normal equations.


Their solution gives us the set of autoregressive parameters, a1, . . . , ap,
T
or a = [a1, . . . , ap] .
The above equations can be expressed in a compact vector–matrix form as

rxx = Rxxa ⇒ a = R−1


xx rxx

The ACF matrix Rxx is positive definite (Toeplitz) which guarantees matrix inversion


c D. P. Mandic Advanced Signal Processing 21
Example 7: Find the parameters of an AR(2) process,
x(n), generated by x[n] = 1.2x[n − 1] − 0.8x[n − 2] + w[n]
Coursework: comment on the shape of the ACF for large lags
AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w)
6 2000

1500
1500
4

ACF of AR(2) signal

ACF of AR(2) signal


AR(2) signal values

1000
1000
2

500 500

0
0 0

−2
−500
−500

−4
−1000
−1000

−6 −1500
0 50 100 150 200 250 300 350 400 −400 −300 −200 −100 0 100 200 300 400 −20 −10 0 10 20
Sample number Correlation lag Correlation lag

Matlab: for i=1:6; [a,e]=aryule(x,i); display(a);end


a(1) = [0.6689] a(2) = [1.2046, −0.8008]
a(3) = [1.1759, −0.7576, −0.0358]
a(4) = [1.1762, −0.7513, −0.0456, 0.0083]
a(5) = [1.1763, −0.7520, −0.0562, 0.0248, −0.0140]
a(6) = [1.1762, −0.7518, −0.0565, 0.0198, −0.0062, −0.0067]


c D. P. Mandic Advanced Signal Processing 22
Example 8: Advantages of model-based analysis
Consider the PSD’s for different realisations of the AR(4) process from Example 5

PSD from data PSD from data


PSD from model PSD from model
50 50
PSD (dB)

PSD (dB)
0 0

-50 -50

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1


Normalized frequency ω/ωmax Normalized frequency ω/ωmax

◦ The different realisations lead to different Empirical PSD’s (in thin black)
◦ The theoretical PSD from the model is consistent regardless of the data (in thick red)

N = 1024;
w = wgn(N,1,1);
a = [2.2137, -2.9403, 2.1697, -0.9606]; % Coefficients of AR(4) process
a = [1 -a];
x = filter(1,a,w);
xacf = xcorr(x); % Autocorrelation of AR(4) process
dft = fft(xacf);
EmpPSD = abs(dft/length(dft)).^ 2; % Empirical PSD obtained from data
ThePSD = abs(freqz(1,a,N,1)).^ 2 ; % Theoretical PSD obtained from model


c D. P. Mandic Advanced Signal Processing 23
Normal equations for the autocorrelation coefficients

For the autocorrelation coefficients

ρk = rxx(k)/rxx(0)

we have

ρ1 = a1 + a2ρ1 + · · · + apρp−1
ρ2 = a1ρ1 + a2 + · · · + apρp−2
.. = ..

ρp = a1ρp−1 + a2ρp−2 + · · · + ap

When does the sequence {ρ0, ρ1, ρ2, . . .} vanish?

Homework: Try command xcorr in Matlab


c D. P. Mandic Advanced Signal Processing 24
Yule–Walker modelling in Matlab
In Matlab – Power spectral density using Y–W method pyulear

Pxx = pyulear(x,p)
[Pxx,w] = pyulear(x,p,nfft)
[Pxx,f] = pyulear(x,p,nfft,fs)
[Pxx,f] = pyulear(x,p,nfft,fs,’range’)
[Pxx,w] = pyulear(x,p,nfft,’range’)

Description:

Pxx = pyulear(x,p)

implements the Yule-Walker algorithm, and returns Pxx, an estimate of the


power spectral density (PSD) of the vector x.

To remember for later → This estimate is also an estimate of the


maximum entropy.
See also aryule, lpc, pburg, pcov, peig, periodogram


c D. P. Mandic Advanced Signal Processing 25
Stochastic modelling: From data to an ARM A(p, q) model

So far, we have assumed the model (AR, MA, or ARMA) and analysed the
ACF and PSD based on known model coefficients.
In practice: DATA # MODEL

This procedure is as follows:

* record data x(k)


* find the autocorrelation of the data ACF(x)
* divide by r_xx(0) to obtain correlation coefficients \rho(k)
* write down Yule-Walker equations
* solve for the vector of AR parameters

The problem is that we do not know the model order p beforehand.


An illustration of the importance of the correct morel order is given
in the following example.


c D. P. Mandic Advanced Signal Processing 26
Example 9: Sunspot number estimation
consistent with the properties of a second order AR process

S u n sp ots ACF Zoomed ACF


200 1 1

0.9 0.9
150
0.8
0.8
0.7
100 0.7
0.6
0.6
0.5
50

0.4 0.5

0 0.4
1700 1800 1900 −100 0 100 −10 0 10
Time [years] Delay [years] Delay [years]

a1 = [0.9295] a2 = [1.4740, −0.5857]


a3 = [1.5492, −0.7750, 0.1284]
a4 = [1.5167, −0.5788, −0.2638, 0.2532]
a5 = [1.4773, −0.5377, −0.1739, 0.0174, 0.1555]
a6 = [1.4373, −0.5422, −0.1291, 0.1558, −0.2248, 0.2574]
# The sunspots model is x[n] = 1.474 x[n − 1] − 0.5857 x[n − 2] + w[n]


c D. P. Mandic Advanced Signal Processing 27
Special case #1: AR(1) process (Markov)
For Markov processes, instead of the iid condition, we have the first order
conditional expectation, that is
p(x[n], x[n − 1], x[n − 2], . . . , x[0]) = p(x[n]|x[n − 1])

Then x[n] = a1x[n − 1] + w[n] = w[n] + a1w[n − 1] + a21w[n − 2] + · · ·


Therefore: first order dependence, order-1 memory
i) for the AR(1) process to be stationary −1 < a1 < 1.
ii) Autocorrelation Function of AR(1): From the Yule-Walker equations
rxx(k) = a1rxx(k − 1), k > 0

In terms of the correlation coefficients, ρ(k) = r(k)/r(0), with ρ0 = 1

ρk = ak1 , k>0

Notice the difference in the behaviour of the ACF for a1 positive and negative


c D. P. Mandic Advanced Signal Processing 28
Variance and power spectrum of AR(1) process
Both can be calculated directly from the general expression for the
variance and spectrum of AR(p) processes.

◦ Variance: Also from a general expression for the variance of linear


processes from Lecture 1

2 2
σw σw
σx2 = =
1 − ρ1a1 1 − a21

◦ Power spectrum: Notice how the flat PSD of WGN is shaped


according to the position of the pole of AR(1) model (LP or HP)

2 2
2σw 2σw
Pxx(f ) = 2 = 1 + a2 − 2a cos(2πf )
|1 − a1e −2πf | 1
1


c D. P. Mandic Advanced Signal Processing 29
Example 10: ACF and spectrum of AR(1) for a = ±0.8
a < 0 → High Pass a > 0 → Low Pass
x[n] = −0.8*x[n−1] + w[n] x[n] = 0.8*x[n−1] + w[n]
Signal Values 5 5

Signal Values
0 0

−5 −5
0 200 400 600 800 1000 0 200 400 600 800 1000
Sample Number Sample Number
ACF ACF
1
1
Correlation

Correlation
0 0.5

−1 0
0 5 10 15 20 0 5 10 15 20
Correlation Lag Correlation Lag
Burg Power Spectral Density Estimate Burg Power Spectral Density Estimate
2 2
10 10
Power, dB

Power, dB

0 0
10 10

−2 −2
10 10
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Normalised Frequency, ×π rad/sample Normalised Frequency, ×π rad/sample


c D. P. Mandic Advanced Signal Processing 30
Special case #2: Second order autoregressive processes,
p = 2, q = 0, hence the notation AR(2)
The input–output functional relationship is given by (w[n] ∼ any white noise)
x[n] = a1x[n − 1] + a2x[n − 2] + w[n]
−1 −2

X(z) = a1z + a2z X(z) + W (z)
X(z) 1
⇒ H(z) = =
W (z) 1 − a1z −1 − a2z −2
1
H(ω) = H(eω ) =
1 − a1e−ω − a2e−2ω
Y-W equations for p=2 Connecting a’s and ρ’s
ρ1 = a1 + a2ρ1 a1
ρ1 =
1 − a2
ρ2 = a1ρ1 + a2
a21
ρ2 = a2 +
when solved for a1 and a2, we have 1 − a2
ρ1(1 − ρ2) ρ2 − ρ21 Since ρ1 < ρ(0) = 1 # a stability
a1 = a2 =
1 − ρ21 1 − ρ21 condition on a1 and a2


c D. P. Mandic Advanced Signal Processing 31
Variance and power spectrum
Both readily obtained from the general AR(2) process equation!

Variance 2
  2
σw 1 − a2 σw
σx2 = =
1 − ρ1a1 − ρ2a2 1 + a2 (1 − a2)2 − a21
Power spectrum
2
2σw
Pxx(f ) = 2
|1 − a1e−2πf − a2e−4πf |
2
2σw
= 2 2 , 0 ≤ f ≤ 1/2
1 + a1 + a2 − 2a1(1 − a2 cos(2πf ) − 2a2 cos(4πf ))

Stability conditions (Condition 1 can be obtained from the


denominator of variance, Condition 2 from the expression for ρ1, etc.)
Condition 1 : a1 + a2 < 1
Condition 2 : a2 − a1 < 1
Condition 3 : − 1 < a2 < 1
This can be visualised within the so–called “stability triangle”


c D. P. Mandic Advanced Signal Processing 32
Stability triangle
a2

1
ACF ACF

II I

m m
Real Roots

−2 2 a1
ACF
ACF

m
m
III Complex Roots IV

−1

i) Real roots Region 1: Monotonically decaying ACF


ii) Real roots Region 2: Decaying oscillating ACF
iii) Complex roots Region 3: Oscillating pseudo-periodic ACF
iv) Complex roots Region 4: Pseudo-periodic ACF


c D. P. Mandic Advanced Signal Processing 33
Example 11: Stability triangle and ACFs of AR(2) signals
Left: a = [−0.7, 0.2] (region 2) Right: a = [1.474, −0.586] (region 4)

x[n] = −0.7*x[n−1] + 0.2*x[n−2] + w[n] Sunspot series


6 200

4
150
Signal Values

Signal Values
2

0 100

−2
50
−4

−6 0
0 100 200 300 0 100 200 300
Sample Number Sample Number

ACF for AR(2) signal ACF for sunspot series


1 1

0.5
0.5
Correlation

Correlation

0
−0.5

−1 −0.5
0 10 20 30 40 50 0 10 20 30 40 50
Correlation Lag Correlation Lag


c D. P. Mandic Advanced Signal Processing 34
Determining regions in the stability triangle
let us examine the autocorrelation function of AR(2) processes

The ACF
ρk = a1ρk−1 + a2ρk−2 k>0

◦ Real roots: ⇒ (a21 + 4a2 > 0) ACF = mixture of damped exponentials


◦ Complex roots: ⇒ (a21 + 4a2 < 0) ⇒ ACF exhibits a pseudo–periodic
behaviour
Dk sin(2πf0k + Φ)
ρk =
sin Φ
D - damping factor, of a sinewave with frequency f0 and phase Φ

D = −a2
a1
cos(2πf0) = √
2 −a2
1 + D2
tan(Φ) = tan(2πf0)
1 − D2


c D. P. Mandic Advanced Signal Processing 35
Example 12: AR(2) where a1 > 0, a2 < 0 # Region 4
Consider: x[n] = 0.75x[n − 1] − 0.5x[n − 2] + w[n]
x[n] = 0.75*x[n−2] − 0.5*x[n−1] + w[n]
20

10
Signal values

−10 The damping factor


−20

0 50 100 150 200 250
Sample Number
300 350 400 450 500 D = 0.5 = 0.71,
ACF
1
Frequency
Correlation

0.5
cos−1 (0.5303) 1
f0 = 2π = 6.2
0

−0.5
0 5 10 15 20 25 30 35 40 45 50
The fundamental period of
Correlation lag
the autocorrelation function
Power/frequency (dB/rad/sample)

Burg Power Spectral Density Estimate


15

10
is therefore
5 T0 = 6.2.
0

−5

−10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)


c D. P. Mandic Advanced Signal Processing 36
Model order selection: Partial autocorrelation function
Consider an earlier example using a slightly different notation for AR coefficients
AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w)
6 2000

1500
1500
4

ACF of AR(2) signal

ACF of AR(2) signal


AR(2) signal values

1000
1000
2

500 500

0
0 0

−2
−500
−500

−4
−1000
−1000

−6 −1500
0 50 100 150 200 250 300 350 400 −400 −300 −200 −100 0 100 200 300 400 −20 −10 0 10 20
Sample number Correlation lag Correlation lag

To find p, first re-write AR coeffs. of order p as [a_p1,...,a_pp]


p = 1 # [0.6689] = a11 p = 2 # [1.2046, −0.8008] = [a21, a22]
p = 3 #[1.1759, −0.7576, −0.0358] = [a31, a32, a33]
p = 4 # [1.1762, −0.7513, −0.0456, 0.0083] = [a41, a42, a43, a44]
p = 5 # [1.1763, −0.7520, −0.0562, 0.0248, −0.0140] = [a51, . . . , a55]
p = 6 # [1.1762, −0.7518, −0.0565, 0.0198, −0.0062, −0.0067]


c D. P. Mandic Advanced Signal Processing 37
Partial autocorrelation function: Motivation
Notice: ACF of AR(p) infinite in duration, but can by be described in
terms of p nonzero functions ACFs.
Denote by akj the jth coefficient in an autoregressive representation of
order k, so that akk is the last coefficient. Then

ρj = akj ρj−1 + · · · + ak(k−1)ρj−k+1 + akk ρj−k j = 1, 2, . . . , k

leading to the Yule–Walker equations, which can be written as


    
1 ρ1 ρ2 ··· ρk−1 ak1 ρ1
 ρ1 1 ρ1 ··· ρk−2   ak2   ρ2 
.. .. .. ...  ..  =  .. 
..  
    

ρk−1 ρk−2 ρk−3 ··· 1 akk ρk

The only difference from the standard Y-W equations is the use of the
symbols aki to denote the AR coefficient ai # k indicating the model order


c D. P. Mandic Advanced Signal Processing 38
Finding partial ACF coefficients
Solving these equations for k = 1, 2, . . . successively, we obtain

1 ρ1 ρ1

ρ1 1 ρ2

2 ρ2 ρ1 ρ3
ρ2 − ρ1
a11 = ρ1, a22 = 2 , a33 = , etc
1 − ρ1 1 ρ1 ρ2

ρ1 1 ρ1

ρ2 ρ1 1

◦ The quantity akk , regarded as a function of the model order k, is called


the partial autocorrelation function (PAC).

◦ For an AR(p) process, the PAC akk will be nonzero for k ≤ p and zero
for k > p ⇒ indicates the order of an AR(p) process.

In practice, we introduce a small threshold, as for real world data it is


difficult to guarantee that akk = 0 for k > p. (see your coursework)


c D. P. Mandic Advanced Signal Processing 39
Example 13: Work by Yule # model of sunspot numbers
Recorded for > 300 years. To study them in 1927 Yule invented the AR(2) model

Sunspot series
150
We first center the data, as we do
100
not wish to model the DC offset
Signal values

(determinisic component), but


50
the stochastic component (AR
model driven by white noise)!
0

Using the Y-W equations we obtain:


−50
0 50 100 150
Sample Number
200 250 300 a1 = [0.9295]
a2 = [1.4740, −0.5857]
ACF for sunspot series
1
a3 = [1.5492, −0.7750, 0.1284]
a4=[1.5167,-0.5788,-0.2638,0.2532]
0.5
Correlation

a5=[ 1.4773,-0.5377,-0.1739,
0.0174,0.1555]
0

a6=[1.4373, -0.5422, -0.1291,


0.1558, -0.2248, 0.2574]
−0.5
0 5 10 15 20 25 30 35 40 45 50
Correlation lag


c D. P. Mandic Advanced Signal Processing 40
Example 13 (contd.): Model order for sunspot numbers
After k = 2 the partial correlation function (PAC) is very small, indicating p = 2
Sunspot series ACF for sunspot series
150 1

100

Signal values
0.5

Correlation
50

0
0

−50 −0.5
0 100 200 300 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for sunspot series Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
1 35

30
0.5
25
Correlation

0 20

15
−0.5
10

−1 5
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

The broken red


√ lines denote the 95% confidence interval which has the
value ±1.96/ N , and where P AC ≈ 0


c D. P. Mandic Advanced Signal Processing 41
Example 14: Model order for an AR(3) process
An AR(3) process realisation, its ACF, and partial autocorrelation (PAC)

AR(3) signal ACF for AR(3) signal


10 1

5 0.8

Signal values

Correlation
0 0.6

−5 0.4

−10 0.2

−15 0
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for AR(3) signal Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
0.4 30

0.2
20
0
Correlation

10
−0.2

−0.4
0

−0.6
−10
−0.8

−1 −20
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

After lag k = 3, the PAC becomes very small (broken line conf. int.)


c D. P. Mandic Advanced Signal Processing 42
Example 14: Model order selection for a financial time
series (the ’correct’ and ’time-reversed’ time series)
£/$ Exchange Rate Time-Reversed £/$ Exchange Rate
1970-2018 1970-2018
3 3
Partial correlations:
Rate [$/£]

Rate [$/£]
2.5 2.5

2 2 AR(1): a = [0.9994]
1.5 1.5

1
72 75 77 80 82 85 87 90 92 95 97 00 02 05 07 10 12 15 17
1
72 75 77 80 82 85 87 90 92 95 97 00 02 05 07 10 12 15 17
AR(2): a = [.9994, −.0354]
Date Date
Autocorrelation Function Autocorrelation Function
£/$ Exhange Rate Time-Reversed £/$ Exhange Rate AR(3): a = [.9994, −.0354,
1500 1500
−.0024]
AutoCorr value

AutoCorr value

1000 1000

500 500

0 0
AR(4): a = [.9994, −.0354,
-500 -500
−.0024, .0129]
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
104 4
Lags Lags 10

Autoconvolution Function Autoconvolution Function AR(5): a = [.9994, −.0354,


£/$ Exhange Rate Time-Reversed £/$ Exhange Rate
−.0024, .0129, −.0129]
AutoConv value

AutoConv value

600 600

400 400

200 200

0 0 AR(6): a = [.9994, −.0354,


-200 -200

-400 -400 −.0024, .0129, −.0129, −.0172]


-600 -600
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
4
104
Lags 10
Lags


c D. P. Mandic Advanced Signal Processing 43
AR model based prediction: Importance of model order
For a zero mean process x[n], the best linear predictor, in the mean
square error sense, of x[n] based on x[n − 1], x[n − 2], . . . is
x̂[n] = ak−1,1x[n − 1] + ak−1,2x[n − 2] + · · · + ak−1,k−1x[n − k + 1]

(apply the E{·} operator to the general AR(p) model expression, and
recall that E{w[n]} = 0)
(Hint:
E{x[n]} = x̂[n] = E {ak−1,1x[n − 1] + · · · + ak−1,k−1x[n − k + 1] + w[n]} =
ak−1,1x[n − 1] + · · · + ak−1,k−1x[n − k + 1]) )

whether the process is an AR or not


In MATLAB, check the function:
ARYULE
and functions
PYULEAR, ARMCOV, ARBURG, ARCOV, LPC, PRONY


c D. P. Mandic Advanced Signal Processing 44
Example 16: Under– vs Over–fitting a model #
Estimation of the parameters of an AR(2) process
Original AR(2) process x[n] = −0.2x[n − 1] − 0.9x[n − 2] + w[n],
w[n] ∼ N (0, 1), estimated using AR(1), AR(2) and AR(20) models:

Original and estimated signals AR coefficients


5 0.2

−0.2
0 −0.4 Original AR(2) signal
AR( 1), Error=5.2627
−0.6
AR( 2), Error=1.0421
−0.8 AR( 20), Error=1.0621
−5
360 370 380 390 400 410 0 5 10 15 20
Time [sample] Coefficient index

The higher order coefficients of the AR(20) model are close to zero and
therefore do not contribute significantly to the estimate, while the AR(1)
does not have sufficient degrees of freedom. (see also Appendix 3)


c D. P. Mandic Advanced Signal Processing 45
Effects of over-modelling on autoregressive spectra:
Spectral line splitting
Consider an AR(2) signal x(n) = −0.9x(n − 2) + w(n) with w ∼ N (0, 1).
N = 64 data samples, model orders p = 4 (solid blue) and p = 12 (broken
green). AR 2 Highpass Circularity.m
900

800

700
Magnitude (dB)

600

500

400

300

200

100

0
0 0.2 0.4 0.6 0.8 1
Frequency (units of π)
Notice that this is an AR(2) model!

Although the true spectrum has a single spectral peak at ω − π/2 (blue),
when over-modelling using p = 12 this peak is split into two peaks (green).


c D. P. Mandic Advanced Signal Processing 46
Model order selection # practical issues
In practice: the greater the model order the greater accuracy & complexity
Q: When do we stop? What is the optimal model order?
Solution: To establish a trade–off between computational complexity and
model accuracy, we introduce “penalty” for a high model order. Such
criteria for model order selection are:
MDL: The minimum description length criterion (MDL) (by Rissanen),
AIC: The Akaike information criterion (AIC)
 
p ∗ log(N )
MDL popt = min log(E) +
p N
AIC popt = min [log(E) + 2p/N ]
p

E the loss function (typically cumulative squared error),


p the number of estimated parameters (model order),
N the number of available data points.


c D. P. Mandic Advanced Signal Processing 47
Example 17: Model order selection # MDL vs AIC
MDL and AIC criteria for an AR(2) model with a1 = 0.5 a2 = −0.3

MDL for AR(2) The graphs on the left


1
MDL
Cumulative Squared Error
show the (prediction
0.98
error)2 (vertical axis)
0.96
versus the model order p
0.94
(horizontal axis). Notice
0.92
that popt = 2.
0.9

0.88
1 2 3 4 5 6 7 8 9 10
The curves are convex,
i.e. a monotonically
AIC for AR(2) decreasing error2 with an
1
AIC increasing penalty term
0.98 Cumulative Squared Error
(MDL or AIC correction).
0.96

0.94 Hence, we have a


0.92 unique minimum at p =
0.9 2, reflecting the correct
0.88
1 2 3 4 5 6 7 8 9 10
model order (no over-
model order p modelling)

c D. P. Mandic Advanced Signal Processing 48
Moving average processes, MA(q)
A general MA(q) process is given by
x[n] = w[n] + b1w[n − 1] + · · · + bq w[n − q]

Autocorrelation function: The autocovariance function of MA(q)


ck = E[(w[n] + b1w[n − 1] + · · · + bq w[n − q])(w[n − k]
+b1w[n − k − 1] + · · · + bq w[n − k − q])]

Hence the variance of the process


c0 = (1 + b21 + · · · + b2q )σw
2

The ACF of an MA process has a cutoff after lag q.

Spectrum: All–zero transfer function ⇒ struggles to model ’peaky’ PSDs


2 −2πf −4πf −2πqf 2

P (f ) = 2σw 1 + b1e + b2e + · · · + bq e


c D. P. Mandic Advanced Signal Processing 49
Example 18: Third order moving average MA(3) process
An MA(3) process and its autocorrel. (ACF) and partial autocorrel. (PAC) fns.
MA(3) signal ACF for MA(3) signal
3 1.2

1
2
0.8

Signal values

Correlation
1
0.6

0.4
0

0.2
−1
0

−2 −0.2
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for MA(3) signal Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
0.5 −4

0.4
−6
0.3
−8
Correlation

0.2

0.1 −10

0
−12
−0.1
−14
−0.2

−0.3 −16
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

After lag k = 3, the PAC becomes very small (broken line conf. int.)


c D. P. Mandic Advanced Signal Processing 50
Analysis of nonstationary signals
Speech Signal
1 ◦ Consider a real–
W1 W2 W3
world speech signal,
Signal values

0.5 and thee different


segments with
0 different statistical
2000 3000 4000 5000 6000 7000 8000 9000 10000

Partial ACF for W1


Sample Number
Partial ACF for W2 Partial ACF for W3
properties
0 1 1
◦ Different AR
−1 0.5 0.5
model orders
Correlation

Correlation

Correlation
1 0 0
required for
0 −0.5 −0.5
different segments
−1 −1 −1
0 25 50 0 25 50 0 25 50 of speech #
Correlation lag Correlation lag Correlation lag

1
MDL calculated for W1
1
MDL calculated for W2
1
MDL calculated for W3 opportunity for
0.8 Calculated Calculated Model 0.8 Calculated Model
content analysis!
Model Order = 13 Order > 50 Order = 24
◦ To deal with
MDL

MDL

MDL

0.6 0.5 0.6

0.4 0.4 nonstationarity we


0.2 0 0.2 need short sliding
0 25 50 0 25 50 0 25 50
Model Order Model Order Model Order data windows

c D. P. Mandic Advanced Signal Processing 51
Summary: AR and MA Processes
i) A stationary finite AR(p) process can be represented as an infinite order
MA process. A finite MA process can be represented as an infinite AR
process.

ii) The finite MA(q) process has an ACF that is zero beyond q. For an AR
process, the ACF is infinite in length and consists of mixture of damped
exponentials and/or damped sine waves.

iii) Finite MA process are always stable, and there is no requirement on the
coefficients of MA processes for stationarity. However, for invertibility,
the roots of the characteristic equation must lie inside the unit circle.

iv) AR processes produce spectra with sharp peaks (two poles of A(z) per
peak), whereas MA processes cannot produce peaky spectra.

ARMA modelling is a classic technique which has found a


tremendous number of applications


c D. P. Mandic Advanced Signal Processing 52
Summary: Wold’s Decomposition Theorem and ARMA
◦ Every stationary time series can be represented as a sum of a perfectly
predictable process and a feasible moving average process

◦ Two time series with the same Wold representations are the same, as
the Wold representation is unique

◦ Since any MA process also has an ARMA representation, working with


ARMA models is not an arbitrary choice but is physically justified

◦ The causality and stationarity on ARMA processes depend entirely on


the AR parameters and not on the MA parameters

◦ An MA process is not uniquely determined by its ACF

◦ An AR(p) process is always invertible, even if it is not stationary

◦ An MA(q) process is always stationary, even if it is non-invertible


c D. P. Mandic Advanced Signal Processing 53
Recap: Linear systems
transfer function
known unknown/known
input H(z) output Y(z)
H(z) =
X(z) h(k) Y(z) X(z)
x(k) known/unknown
y(k)
Described by their impulse response h(n) or the transfer function H(z)

In the frequency domain (remember that z = eθ ) the transfer function is



X {h(n)}
H(θ) = h(n)e−nθ {x[n]} → → {y[n]}
H(θ)
n=−∞


X
that is y[n] = h(r)x[n − r] = h ∗ x
r=−∞
The next two slides show how to calculate the power of the output, y(k).


c D. P. Mandic Advanced Signal Processing 54
Recap: Linear systems – statistical properties # mean
and variance
i) Mean
∞ ∞
( )
X X
E{y[n]} = E h(r)x[n − r] = h(r)E{x[n − r]}
r=−∞ r=−∞

X
⇒ µy = µ x h(r) = µxH(0)
r=−∞
P∞ −jrθ P∞
[ NB: H(θ) = r=−∞ h(r)e . For θ = 0, then H(0) = r=−∞ h(r) ]

ii) Cross–correlation

X
ryx(m) = E{y[n]x[n + m]} = h(r)E{x[n − r]x[n + m]}
r=−∞

X
= h(r)rxx(m + r) convolution of input ACF and {h}
r=−∞

⇒ Cross-power spectrum Syx(f ) = F(ryx) = Sxx(f )H(f )


c D. P. Mandic Advanced Signal Processing 55
Recap: Lin. systems – statistical properties # output
These are key properties # used in AR spectrum estimation
From rxy (m)
P∞= ryx(−m) we have
rxy (m) = r=−∞ h(r)rxx(m − r). Now we write

X
ryy (m) = E{y[n]y[n + m]} = h(r)E{x[n − r]y[n + m]}
r=−∞

X ∞
X
= h(r)rxy (m + r) = h(−r)rxy (m − r)
r=−∞ r=−∞

by taking Fourier transforms we have


Sxy (f ) = Sxx(f )H(f )
Syy (f ) = Sxy (f )H(−f ) function of rxx

or
Syy (f ) = H(f )H(−f )Sxx(f ) = |H(f )|2Sxx(f )

Output power spectrum = input power spectrum × squared transfer function


c D. P. Mandic Advanced Signal Processing 56
More on Wold Decomposition (Representation) Theorem
Example: A “paradox”, can we talk about a deterministic random process

Consider a stochastic process given by

x[n] = A cos[n] + B sin[n]

where A, B ∈ N (0, σ 2) and A is independent of B (A and B are


independent normal random variables).
This process is deterministic because it can be written as

sin(2)
x[n] = x[n − 1] − x[n − 2]
sin(1)

that is, based on the history of x[n]. Therefore

sin(2)
p(x[n] | x[n − 1], x[n − 2], . . .) = x[n − 1] − x[n − 2] = x[n]
sin(1)

Remember: Deterministic does not mean that x[n] is non-random


c D. P. Mandic Advanced Signal Processing 57
Appendix 1: More on numbers (recorded since 1874)
Top: original sunspots Middle and Bottom: AR(2) representations

Left: time Middle:spectrum Right: autocorrelation


Average number of sunspots NASA Burg Power Spectral Density Estimation Partial ACF for sunspot series
200 1
0.8
100 0.5
0.6

0 0.4 0
0.2
100 0.5
0 200 400 600 5 10 15 20 0 5 10 15
1st filtered Gaussian process (2nd order YW) PSD for 1st filtered signal Partial ACF for 1st filtered signal
10 1
0.8
0.5
0.6
0
0.4 0
0.2
10 0.5
0 200 400 600 5 10 15 20 0 5 10 15
2nd filtered Gaussian process (2nd order YW) PSD for 2nd filtered signal Partial ACF for 2nd filtered signal
10 1
0.8
0.5
0.6
0
0.4 0
0.2
10 0.5
0 200 400 600 5 10 15 20 0 5 10 15
Top: original Middle: first AR(2) model Bottom: second AR(2) model

c D. P. Mandic Advanced Signal Processing 58
Appendix 2: Model order for an AR(2) process
An AR(2) signal, its ACF, and its partial autocorrelations (PAC)

AR(2) signal ACF for AR(2) signal


8 1

4 0.5

Signal values

Correlation
2

0 0

−2

−4 −0.5

−6

−8 −1
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag

Partial ACF for AR(2) signal Burg Power Spectral Density Estimate

Power/frequency (dB/rad/sample)
0.8 15

0.6 10
0.4
5
Correlation

0.2
0
0
−5
−0.2
−10
−0.4

−0.6 −15

−0.8 −20
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)

After lag k = 2, the PAC becomes very small (broken line conf. int.)


c D. P. Mandic Advanced Signal Processing 59
Appendix 3: More on over–parametrisation
Consider the linear stochastic process given by
x[n] = x[n − 1] − 0.16x[n − 2] + w[n] − 0.8w[n − 1]
It clearly has an ARMA(2,1) form. Consider now its coefficient vectors
written as polynomials in the z–domain:
a(z) = 1 − z + 0.16z 2 = (1 − 0.8z)(1 − 0.2z)
b(z) = 1 − 0.8z
These polynomials clearly have a common factor (1 − 0.8z), and therefore
after cancelling these terms, we have the resulting lower–order polynomials:
a(z) = 1 − 0.2z
b(z) = 1
The above process is therefore an AR(1) process, given by
x[n] = 0.2x[n − 1] + w[n]
and the original ARMA(2,1) version was over–parametrised.


c D. P. Mandic Advanced Signal Processing 60
Something to think about ...
◦ What would be the properties of a multivariate (MV) autoregressive,
say MVAR(1), process, where the quantities w, x, a now become
matrices, that is

X(n) = AX(n − 1) + W(n)

◦ Would the inverse of the multichannel correlation matrix depend on


’how similar’ the data channels are? Explain also in terms of eigenvalues
and ’collinearity’.

◦ Threshold autoregressive (TAR) models allow for the mean of a time


series to change along the blocks of data. What would be the
advantages of such a model?

◦ Try to express an AR(p) process as a state-space model. What kind of


the transition matrix between the states do you observe?


c D. P. Mandic Advanced Signal Processing 61
Consider also: Fourier transform as a filtering operation
We can see FT as a convolution of a complex exponential and the data (under a
mild assumption of a one-sided h sequence, ranging from 0 to ∞)
R∞
1) Continuous FT. For a continuous FT F (ω) = −∞
x(t)e−ωtdt
Let us now swap variables t → τ and multiply by eωt, to give
Z Z
eωt x(τ )e−ωτ dτ = x(τ ) |eω(t−τ )
{z } dτ = x(t) ∗ e
ωt
(= x(t) ∗ h(t))
h(t−τ )

2) Discrete Fourier transform. For DFT, we have a filtering operation


N −1 h i
X 2π
− N nk − 2π
W = e Nn

X(k) = x(n)e = x(0) + W x(1) + W x(2) + · · ·
n=0 | {z }
cumulative add and multiply

1 1−z −1 W ∗
with the transfer function (large N) H(z) = 1−z −1 W
= 1−2 cos θk z −1 +z −2

exp(jwt) discrete time case


x[n] DFT
x(t)*exp(jwt) x +
x(t) DFT −
z −1 W
continuous time case


c D. P. Mandic Advanced Signal Processing 62
Notes


c D. P. Mandic Advanced Signal Processing 63
Notes


c D. P. Mandic Advanced Signal Processing 64
Notes


c D. P. Mandic Advanced Signal Processing 65
Notes


c D. P. Mandic Advanced Signal Processing 66

You might also like