Advanced Signal Processing Linear Stochastic Processes
Advanced Signal Processing Linear Stochastic Processes
c D. P. Mandic Advanced Signal Processing 1
Aims of this lecture
◦ To introduce linear stochastic models for real world data
c D. P. Mandic Advanced Signal Processing 2
Example 1: Assessing the nature of a signal from its ACF
Windowed clean signal, signal in WGN, signal with DC offset (see also Lecture 1)
6
4
0.5 4
2 3
0 0
−2 2
−0.5 −4
1
−6
−1 −8 0
−5 0 5 −5 0 5 −5 0 5
1000 3000
50
800 2500
600 2000
0
400 1500
200 1000
−50
0 500
−100 −200 0
−10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10
c D. P. Mandic Advanced Signal Processing 3
How can we categorise real–world measurements?
2
where would you place a DC level in WGN, x[n] = A + w[n], w ∼ N (0, σw )
(a) Noisy oscillations, (b) Nonlinearity and noisy oscillations, (c) Random nonlinear process
(? left) Route to chaos, (? top) stochastic chaos, (? middle) mixture of sources
Nonlinearity
Chaos
?
(c) ?
(b)
? ?
? NARMA
(a)
? ? ?
ARMA
Linearity
Determinism Stochasticity
Our lecture is about ARMA models (linear stochastic)
How about observing the signal through a nonlinear sensor?
c D. P. Mandic Advanced Signal Processing 4
Justification # Wold decomposition theorem
(Existence theorem, also mentioned in your coursework)
Wold’s decomposition theorem plays a central role in time series analysis,
and explicitly proves that any covariance–stationary time series can be
decomposed into two different parts: deterministic (such as a sinewave)
and stochastic (filtered WGN).
Therefore, a general process can be written a sum of two processes
Xq
x[n] = xp[n] + xr [n] = xp[n] + bj w[n − j] w white process
j=1
⇒ xr [n] # regular random process
⇒ xp[n] # predictable process, with xr [n] ⊥ xp[n],
E{xr [m]xp[n]} = 0
c D. P. Mandic Advanced Signal Processing 5
Towards linear stochastic processes
Wold’s theorem implies that any purely non-deterministic covariance–stationary
process can be arbitrarily well approximated by an ARMA process
Therefore, the general form for the power spectrum of a WSS process is
XN
Px(eω ) = αk δ(ω − ωk ) + Pxr (eω )
k=1
We are interested in processes generated by filtering white noise with a
linear shift–invariant filter that has a rational system function.
This class of digital filters includes the following system functions:
• Autoregressive (AR) → all pole system → H(z) = 1/A(z)
• Moving Average (MA) → all zero system → H(z) = B(z)
• Autoregressive Moving Average (ARMA) → poles and zeros
→ H(z) = B(z)/A(z)
c D. P. Mandic Advanced Signal Processing 6
Recap: Second-order all–pole transfer functions
p1 = 0.999exp(jπ/4), p2 = 0.9exp(jπ/4), p3 = 0.9exp(j7π/12)
Im
60
1
p3 p1 50
p1 We have two conjugate
p2 p2
0.5 40 p3
complex poles, e.g. p1 and
7π/12
p∗1 , therefore
Magnitude
30
π/4
0 Re 20 1
H(z) =
−0.5
p2*
10
(z − p1)(z − p∗1 )
0
p3* p1*
−10 z −2
−1 =
−1 −0.5 0 0.5 1
−20
0 1 2 3 (1 − p1z −1)(1 − p∗1 z −1)
Frequency rad/s
1 1
for the sinewave ρ = 1 ⇒ H(z) = =
1 − 2 cos(θ)z −1 + z −2 1 + a1z −1 + a2z −2
⇒ Indeed, a sinewave can be modelled as an autoregressive process
c D. P. Mandic Advanced Signal Processing 7
Example 2: Sinewave revisited, is it det. or stoch.?
Is a sinewave best described as nonlinear deterministic or linear stochastic?
1
Im agi nar y Par t
0.5
Matlab code:
0
−0.5
−1
z1=0;
−4 −3 −2 −1 0 1 2 3 4
p1=[0.5+0.866i,0.5-0.866i];
Re al Part
[num1,den1]=zp2tf(z1,p1,1);
4
zplane(num1,den1);
W hi t e noi s e
2 s=randn(1,1000);
0
s1=filter(num1,den1,s);
figure;
−2
subplot(311),plot(s),
−4
0 50 100 150 200 250 300 350
subplot(313),plot(s1),
subplot(312),;
30 zplane(num1,den1)
F i l t e r e d noi s e
20
10 The AR model of a
0
sinewave
−10
x(k)=a1*x(k-1)+a2*x(k-2)+w(k)
−20
−30
a1=-1, a2=0.98, w~N(0,1)
0 50 100 150 200 250 300 350
c D. P. Mandic Advanced Signal Processing 8
0
Example
−50 3: Spectra of real–world
−0.5 data
0 100 200 300 0 10 20 30 40 50
Sunspot numbers and their
Sample power spectrum
Number Correlation lag
Sunspot
Partial ACF forseries
sunspot series Burg PowerACF for sunspot
Spectral series
Density Estimate
(dB/rad/sample)
150 1 35 1
30
100 0.5
Signal values
250.5
Correlation
Correlation
50 0 20
Power/frequency
15 0
0 −0.5
10
−50 −1 5
−0.5
0 0 10
100 20 30
200 40 50
300 0 0 0.2 10 0.4 200.6 300.8 401 50
Correlation
Sample Number lag Normalized Frequency (×π rad/sample)
Correlation lag
2
Px = | H( ω )| Pw
c D. P. Mandic Advanced Signal Processing 10
Spectrum of ARMA models (look also at Recap slides)
recall that two conjugate complex poles of A(z) give one peak in the spectrum
Bq (eθ )2 2
−1
2 Bq (z)Bq (z ) 2 |Bq (ω)|
Px(z) = σw ⇒ Pz (eθ ) = σw
2
2 = σw 2
Ap(z)Ap(z −1) |Ap (eθ )| |Ap(ω)|
∗
Notice that “(·) ” in analogue frequency corresponds to “z −1” in “digital freq.”
c D. P. Mandic Advanced Signal Processing 11
Example 4: Can the shape of power spectrum tell us
about the order of the polynomials B(z) and A(z)?
Plot the power spectrum of an ARMA(2,2) process for which
◦ the zeros of H(z) are z = 0.95e±π/2
Solution: The system function is (poles and zeros – resonance & sink)
1 + 0.9025z −2
H(z) =
1 − 0.5562z −1 + 0.81z −2
7
Power Spectrum [dB]
−1
0 0.5 1 1.5 2 2.5 3 3.5
Frequency
c D. P. Mandic Advanced Signal Processing 12
Difference equation representation # the ACF follows
the data model!
Random processes x[n] and w[n] are related by a linear difference equation
with constant coefficients, given by
Pq −k p q
B(z) k=0 bk z
X X
H(z)= = P p −k
↔ ARMA(p,q) ↔ x[n] = alx[n − l] + blw[n − l]
A(z) 1 + k=1 ak z
|l=1 {z } |l=0 {z }
autoregressive moving average
Since x is WSS, it follows that x[n] and w[n] are jointly WSS
c D. P. Mandic Advanced Signal Processing 13
General linear processes: Stationarity and invertibility
Can we tell anything about the process x from the coefficients a, b (cf. h in FIR)
c D. P. Mandic Advanced Signal Processing 14
Autoregressive processes (pole–only)
c D. P. Mandic Advanced Signal Processing 15
Example 5: Statistical properties of AR processes
Drive the AR(4) model from Example 6 with two different WGN realisations ∼ N (0, 1)
4 4 3000 20
Value of random signal
PSD (dB/rad/sample)
2000
Value of AR signal
ACF of AR signal
2 2
1000 10
0 0 0
-1000 0
-2 -2
-2000
-4 -4 -3000 -10
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
Sample number Sample number Correlation lag Normalized Frequency (× π rad/sample)
4 4 3000 20
Value of random signal
PSD (dB/rad/sample)
2000
Value of AR signal
ACF of AR signal
2 2
1000 10
0 0 0
-1000 0
-2 -2
-2000
-4 -4 -3000 -10
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
Sample number Sample number Correlation lag Normalized Frequency (× π rad/sample)
c D. P. Mandic Advanced Signal Processing 16
ACF and normalised ACF of AR processes
Key: ACF has the same form as the AR process in hand!
To obtain the autocorrelation function of an AR process, multiply the
above equation by x[n − k] to obtain (recall that r(−m) = r(m))
2
rxx(0) = a1rxx(1) + a2rxx(2) + · · · + aprxx(p) + σw , k=0
rxx(k) = a1rxx(k − 1) + a2rxx(k − 2) + · · · + aprxx(k − p), k>0
c D. P. Mandic Advanced Signal Processing 17
Variance and spectrum of AR processes
Variance:
2
For k = 0, the contribution from the term E{x[n − k]w[n]} is σw , and
2
rxx(0) = a1rxx(−1) + a2rxx(−2) + · · · + aprxx(−p) + σw
2
2σw
Pxx(f ) = 2 0 ≤ f ≤ 1/2
|1 − a1 e−2πf − · · · − ap e−2πpf |
c D. P. Mandic Advanced Signal Processing 18
Example 6a: AR(p) signal generation
Consider an AR(4) process with coeff. a = [2.2137, −2.9403, 2.1697, −0.9606]
Power/frequency (dB/rad/sample)
x by filtering white noise 1024 points
20
through the AR filter
10
◦ Estimate the PSD of x
based on a fourth-order 0
AR model -10
c D. P. Mandic Advanced Signal Processing 19
Example 6b: Alternative AR power spectrum calculation
(an alternative function in Matlab)
Consider the AR(4) system given by
x[n] = 2.2137x[n−1]−2.9403x[n−2]+2.1697x[n−3]−0.9606x[n−4]+w[n]
a = [1 -2.2137 2.9403 -2.1697 0.9606]; % AR filter coefficients
freqz(1,a) % AR filter frequency response
title(’AR System Frequency Response’)
AR System Frequency Response
40
Magnitude (dB)
20
-20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
100
Phase (degrees)
-100
-200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
c D. P. Mandic Advanced Signal Processing 20
Key: Finding AR coefficients # the Yule–Walker eqns
(there are several similar forms – we follow the most concise one)
The ACF matrix Rxx is positive definite (Toeplitz) which guarantees matrix inversion
c D. P. Mandic Advanced Signal Processing 21
Example 7: Find the parameters of an AR(2) process,
x(n), generated by x[n] = 1.2x[n − 1] − 0.8x[n − 2] + w[n]
Coursework: comment on the shape of the ACF for large lags
AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w)
6 2000
1500
1500
4
1000
1000
2
500 500
0
0 0
−2
−500
−500
−4
−1000
−1000
−6 −1500
0 50 100 150 200 250 300 350 400 −400 −300 −200 −100 0 100 200 300 400 −20 −10 0 10 20
Sample number Correlation lag Correlation lag
c D. P. Mandic Advanced Signal Processing 22
Example 8: Advantages of model-based analysis
Consider the PSD’s for different realisations of the AR(4) process from Example 5
PSD (dB)
0 0
-50 -50
◦ The different realisations lead to different Empirical PSD’s (in thin black)
◦ The theoretical PSD from the model is consistent regardless of the data (in thick red)
N = 1024;
w = wgn(N,1,1);
a = [2.2137, -2.9403, 2.1697, -0.9606]; % Coefficients of AR(4) process
a = [1 -a];
x = filter(1,a,w);
xacf = xcorr(x); % Autocorrelation of AR(4) process
dft = fft(xacf);
EmpPSD = abs(dft/length(dft)).^ 2; % Empirical PSD obtained from data
ThePSD = abs(freqz(1,a,N,1)).^ 2 ; % Theoretical PSD obtained from model
c D. P. Mandic Advanced Signal Processing 23
Normal equations for the autocorrelation coefficients
ρk = rxx(k)/rxx(0)
we have
ρ1 = a1 + a2ρ1 + · · · + apρp−1
ρ2 = a1ρ1 + a2 + · · · + apρp−2
.. = ..
ρp = a1ρp−1 + a2ρp−2 + · · · + ap
c D. P. Mandic Advanced Signal Processing 24
Yule–Walker modelling in Matlab
In Matlab – Power spectral density using Y–W method pyulear
Pxx = pyulear(x,p)
[Pxx,w] = pyulear(x,p,nfft)
[Pxx,f] = pyulear(x,p,nfft,fs)
[Pxx,f] = pyulear(x,p,nfft,fs,’range’)
[Pxx,w] = pyulear(x,p,nfft,’range’)
Description:
Pxx = pyulear(x,p)
c D. P. Mandic Advanced Signal Processing 25
Stochastic modelling: From data to an ARM A(p, q) model
So far, we have assumed the model (AR, MA, or ARMA) and analysed the
ACF and PSD based on known model coefficients.
In practice: DATA # MODEL
c D. P. Mandic Advanced Signal Processing 26
Example 9: Sunspot number estimation
consistent with the properties of a second order AR process
0.9 0.9
150
0.8
0.8
0.7
100 0.7
0.6
0.6
0.5
50
0.4 0.5
0 0.4
1700 1800 1900 −100 0 100 −10 0 10
Time [years] Delay [years] Delay [years]
c D. P. Mandic Advanced Signal Processing 27
Special case #1: AR(1) process (Markov)
For Markov processes, instead of the iid condition, we have the first order
conditional expectation, that is
p(x[n], x[n − 1], x[n − 2], . . . , x[0]) = p(x[n]|x[n − 1])
ρk = ak1 , k>0
Notice the difference in the behaviour of the ACF for a1 positive and negative
c D. P. Mandic Advanced Signal Processing 28
Variance and power spectrum of AR(1) process
Both can be calculated directly from the general expression for the
variance and spectrum of AR(p) processes.
2 2
σw σw
σx2 = =
1 − ρ1a1 1 − a21
2 2
2σw 2σw
Pxx(f ) = 2 = 1 + a2 − 2a cos(2πf )
|1 − a1e −2πf | 1
1
c D. P. Mandic Advanced Signal Processing 29
Example 10: ACF and spectrum of AR(1) for a = ±0.8
a < 0 → High Pass a > 0 → Low Pass
x[n] = −0.8*x[n−1] + w[n] x[n] = 0.8*x[n−1] + w[n]
Signal Values 5 5
Signal Values
0 0
−5 −5
0 200 400 600 800 1000 0 200 400 600 800 1000
Sample Number Sample Number
ACF ACF
1
1
Correlation
Correlation
0 0.5
−1 0
0 5 10 15 20 0 5 10 15 20
Correlation Lag Correlation Lag
Burg Power Spectral Density Estimate Burg Power Spectral Density Estimate
2 2
10 10
Power, dB
Power, dB
0 0
10 10
−2 −2
10 10
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Normalised Frequency, ×π rad/sample Normalised Frequency, ×π rad/sample
c D. P. Mandic Advanced Signal Processing 30
Special case #2: Second order autoregressive processes,
p = 2, q = 0, hence the notation AR(2)
The input–output functional relationship is given by (w[n] ∼ any white noise)
x[n] = a1x[n − 1] + a2x[n − 2] + w[n]
−1 −2
X(z) = a1z + a2z X(z) + W (z)
X(z) 1
⇒ H(z) = =
W (z) 1 − a1z −1 − a2z −2
1
H(ω) = H(eω ) =
1 − a1e−ω − a2e−2ω
Y-W equations for p=2 Connecting a’s and ρ’s
ρ1 = a1 + a2ρ1 a1
ρ1 =
1 − a2
ρ2 = a1ρ1 + a2
a21
ρ2 = a2 +
when solved for a1 and a2, we have 1 − a2
ρ1(1 − ρ2) ρ2 − ρ21 Since ρ1 < ρ(0) = 1 # a stability
a1 = a2 =
1 − ρ21 1 − ρ21 condition on a1 and a2
c D. P. Mandic Advanced Signal Processing 31
Variance and power spectrum
Both readily obtained from the general AR(2) process equation!
Variance 2
2
σw 1 − a2 σw
σx2 = =
1 − ρ1a1 − ρ2a2 1 + a2 (1 − a2)2 − a21
Power spectrum
2
2σw
Pxx(f ) = 2
|1 − a1e−2πf − a2e−4πf |
2
2σw
= 2 2 , 0 ≤ f ≤ 1/2
1 + a1 + a2 − 2a1(1 − a2 cos(2πf ) − 2a2 cos(4πf ))
c D. P. Mandic Advanced Signal Processing 32
Stability triangle
a2
1
ACF ACF
II I
m m
Real Roots
−2 2 a1
ACF
ACF
m
m
III Complex Roots IV
−1
c D. P. Mandic Advanced Signal Processing 33
Example 11: Stability triangle and ACFs of AR(2) signals
Left: a = [−0.7, 0.2] (region 2) Right: a = [1.474, −0.586] (region 4)
4
150
Signal Values
Signal Values
2
0 100
−2
50
−4
−6 0
0 100 200 300 0 100 200 300
Sample Number Sample Number
0.5
0.5
Correlation
Correlation
0
−0.5
−1 −0.5
0 10 20 30 40 50 0 10 20 30 40 50
Correlation Lag Correlation Lag
c D. P. Mandic Advanced Signal Processing 34
Determining regions in the stability triangle
let us examine the autocorrelation function of AR(2) processes
The ACF
ρk = a1ρk−1 + a2ρk−2 k>0
c D. P. Mandic Advanced Signal Processing 35
Example 12: AR(2) where a1 > 0, a2 < 0 # Region 4
Consider: x[n] = 0.75x[n − 1] − 0.5x[n − 2] + w[n]
x[n] = 0.75*x[n−2] − 0.5*x[n−1] + w[n]
20
10
Signal values
0.5
cos−1 (0.5303) 1
f0 = 2π = 6.2
0
−0.5
0 5 10 15 20 25 30 35 40 45 50
The fundamental period of
Correlation lag
the autocorrelation function
Power/frequency (dB/rad/sample)
10
is therefore
5 T0 = 6.2.
0
−5
−10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
c D. P. Mandic Advanced Signal Processing 36
Model order selection: Partial autocorrelation function
Consider an earlier example using a slightly different notation for AR coefficients
AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w) ACF for AR(2) signal x=filter([1],[1, −1.2, 0.8],w)
6 2000
1500
1500
4
1000
1000
2
500 500
0
0 0
−2
−500
−500
−4
−1000
−1000
−6 −1500
0 50 100 150 200 250 300 350 400 −400 −300 −200 −100 0 100 200 300 400 −20 −10 0 10 20
Sample number Correlation lag Correlation lag
c D. P. Mandic Advanced Signal Processing 37
Partial autocorrelation function: Motivation
Notice: ACF of AR(p) infinite in duration, but can by be described in
terms of p nonzero functions ACFs.
Denote by akj the jth coefficient in an autoregressive representation of
order k, so that akk is the last coefficient. Then
The only difference from the standard Y-W equations is the use of the
symbols aki to denote the AR coefficient ai # k indicating the model order
c D. P. Mandic Advanced Signal Processing 38
Finding partial ACF coefficients
Solving these equations for k = 1, 2, . . . successively, we obtain
1 ρ1 ρ1
ρ1 1 ρ2
2 ρ2 ρ1 ρ3
ρ2 − ρ1
a11 = ρ1, a22 = 2 , a33 = , etc
1 − ρ1 1 ρ1 ρ2
ρ1 1 ρ1
ρ2 ρ1 1
◦ For an AR(p) process, the PAC akk will be nonzero for k ≤ p and zero
for k > p ⇒ indicates the order of an AR(p) process.
c D. P. Mandic Advanced Signal Processing 39
Example 13: Work by Yule # model of sunspot numbers
Recorded for > 300 years. To study them in 1927 Yule invented the AR(2) model
Sunspot series
150
We first center the data, as we do
100
not wish to model the DC offset
Signal values
a5=[ 1.4773,-0.5377,-0.1739,
0.0174,0.1555]
0
c D. P. Mandic Advanced Signal Processing 40
Example 13 (contd.): Model order for sunspot numbers
After k = 2 the partial correlation function (PAC) is very small, indicating p = 2
Sunspot series ACF for sunspot series
150 1
100
Signal values
0.5
Correlation
50
0
0
−50 −0.5
0 100 200 300 0 10 20 30 40 50
Sample Number Correlation lag
Partial ACF for sunspot series Burg Power Spectral Density Estimate
Power/frequency (dB/rad/sample)
1 35
30
0.5
25
Correlation
0 20
15
−0.5
10
−1 5
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)
c D. P. Mandic Advanced Signal Processing 41
Example 14: Model order for an AR(3) process
An AR(3) process realisation, its ACF, and partial autocorrelation (PAC)
5 0.8
Signal values
Correlation
0 0.6
−5 0.4
−10 0.2
−15 0
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag
Partial ACF for AR(3) signal Burg Power Spectral Density Estimate
Power/frequency (dB/rad/sample)
0.4 30
0.2
20
0
Correlation
10
−0.2
−0.4
0
−0.6
−10
−0.8
−1 −20
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)
After lag k = 3, the PAC becomes very small (broken line conf. int.)
c D. P. Mandic Advanced Signal Processing 42
Example 14: Model order selection for a financial time
series (the ’correct’ and ’time-reversed’ time series)
£/$ Exchange Rate Time-Reversed £/$ Exchange Rate
1970-2018 1970-2018
3 3
Partial correlations:
Rate [$/£]
Rate [$/£]
2.5 2.5
2 2 AR(1): a = [0.9994]
1.5 1.5
1
72 75 77 80 82 85 87 90 92 95 97 00 02 05 07 10 12 15 17
1
72 75 77 80 82 85 87 90 92 95 97 00 02 05 07 10 12 15 17
AR(2): a = [.9994, −.0354]
Date Date
Autocorrelation Function Autocorrelation Function
£/$ Exhange Rate Time-Reversed £/$ Exhange Rate AR(3): a = [.9994, −.0354,
1500 1500
−.0024]
AutoCorr value
AutoCorr value
1000 1000
500 500
0 0
AR(4): a = [.9994, −.0354,
-500 -500
−.0024, .0129]
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
104 4
Lags Lags 10
AutoConv value
600 600
400 400
200 200
c D. P. Mandic Advanced Signal Processing 43
AR model based prediction: Importance of model order
For a zero mean process x[n], the best linear predictor, in the mean
square error sense, of x[n] based on x[n − 1], x[n − 2], . . . is
x̂[n] = ak−1,1x[n − 1] + ak−1,2x[n − 2] + · · · + ak−1,k−1x[n − k + 1]
(apply the E{·} operator to the general AR(p) model expression, and
recall that E{w[n]} = 0)
(Hint:
E{x[n]} = x̂[n] = E {ak−1,1x[n − 1] + · · · + ak−1,k−1x[n − k + 1] + w[n]} =
ak−1,1x[n − 1] + · · · + ak−1,k−1x[n − k + 1]) )
c D. P. Mandic Advanced Signal Processing 44
Example 16: Under– vs Over–fitting a model #
Estimation of the parameters of an AR(2) process
Original AR(2) process x[n] = −0.2x[n − 1] − 0.9x[n − 2] + w[n],
w[n] ∼ N (0, 1), estimated using AR(1), AR(2) and AR(20) models:
−0.2
0 −0.4 Original AR(2) signal
AR( 1), Error=5.2627
−0.6
AR( 2), Error=1.0421
−0.8 AR( 20), Error=1.0621
−5
360 370 380 390 400 410 0 5 10 15 20
Time [sample] Coefficient index
The higher order coefficients of the AR(20) model are close to zero and
therefore do not contribute significantly to the estimate, while the AR(1)
does not have sufficient degrees of freedom. (see also Appendix 3)
c D. P. Mandic Advanced Signal Processing 45
Effects of over-modelling on autoregressive spectra:
Spectral line splitting
Consider an AR(2) signal x(n) = −0.9x(n − 2) + w(n) with w ∼ N (0, 1).
N = 64 data samples, model orders p = 4 (solid blue) and p = 12 (broken
green). AR 2 Highpass Circularity.m
900
800
700
Magnitude (dB)
600
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1
Frequency (units of π)
Notice that this is an AR(2) model!
Although the true spectrum has a single spectral peak at ω − π/2 (blue),
when over-modelling using p = 12 this peak is split into two peaks (green).
c D. P. Mandic Advanced Signal Processing 46
Model order selection # practical issues
In practice: the greater the model order the greater accuracy & complexity
Q: When do we stop? What is the optimal model order?
Solution: To establish a trade–off between computational complexity and
model accuracy, we introduce “penalty” for a high model order. Such
criteria for model order selection are:
MDL: The minimum description length criterion (MDL) (by Rissanen),
AIC: The Akaike information criterion (AIC)
p ∗ log(N )
MDL popt = min log(E) +
p N
AIC popt = min [log(E) + 2p/N ]
p
c D. P. Mandic Advanced Signal Processing 47
Example 17: Model order selection # MDL vs AIC
MDL and AIC criteria for an AR(2) model with a1 = 0.5 a2 = −0.3
0.88
1 2 3 4 5 6 7 8 9 10
The curves are convex,
i.e. a monotonically
AIC for AR(2) decreasing error2 with an
1
AIC increasing penalty term
0.98 Cumulative Squared Error
(MDL or AIC correction).
0.96
c D. P. Mandic Advanced Signal Processing 49
Example 18: Third order moving average MA(3) process
An MA(3) process and its autocorrel. (ACF) and partial autocorrel. (PAC) fns.
MA(3) signal ACF for MA(3) signal
3 1.2
1
2
0.8
Signal values
Correlation
1
0.6
0.4
0
0.2
−1
0
−2 −0.2
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag
Partial ACF for MA(3) signal Burg Power Spectral Density Estimate
Power/frequency (dB/rad/sample)
0.5 −4
0.4
−6
0.3
−8
Correlation
0.2
0.1 −10
0
−12
−0.1
−14
−0.2
−0.3 −16
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)
After lag k = 3, the PAC becomes very small (broken line conf. int.)
c D. P. Mandic Advanced Signal Processing 50
Analysis of nonstationary signals
Speech Signal
1 ◦ Consider a real–
W1 W2 W3
world speech signal,
Signal values
Correlation
Correlation
1 0 0
required for
0 −0.5 −0.5
different segments
−1 −1 −1
0 25 50 0 25 50 0 25 50 of speech #
Correlation lag Correlation lag Correlation lag
1
MDL calculated for W1
1
MDL calculated for W2
1
MDL calculated for W3 opportunity for
0.8 Calculated Calculated Model 0.8 Calculated Model
content analysis!
Model Order = 13 Order > 50 Order = 24
◦ To deal with
MDL
MDL
MDL
ii) The finite MA(q) process has an ACF that is zero beyond q. For an AR
process, the ACF is infinite in length and consists of mixture of damped
exponentials and/or damped sine waves.
iii) Finite MA process are always stable, and there is no requirement on the
coefficients of MA processes for stationarity. However, for invertibility,
the roots of the characteristic equation must lie inside the unit circle.
iv) AR processes produce spectra with sharp peaks (two poles of A(z) per
peak), whereas MA processes cannot produce peaky spectra.
c D. P. Mandic Advanced Signal Processing 52
Summary: Wold’s Decomposition Theorem and ARMA
◦ Every stationary time series can be represented as a sum of a perfectly
predictable process and a feasible moving average process
◦ Two time series with the same Wold representations are the same, as
the Wold representation is unique
c D. P. Mandic Advanced Signal Processing 53
Recap: Linear systems
transfer function
known unknown/known
input H(z) output Y(z)
H(z) =
X(z) h(k) Y(z) X(z)
x(k) known/unknown
y(k)
Described by their impulse response h(n) or the transfer function H(z)
∞
X
that is y[n] = h(r)x[n − r] = h ∗ x
r=−∞
The next two slides show how to calculate the power of the output, y(k).
c D. P. Mandic Advanced Signal Processing 54
Recap: Linear systems – statistical properties # mean
and variance
i) Mean
∞ ∞
( )
X X
E{y[n]} = E h(r)x[n − r] = h(r)E{x[n − r]}
r=−∞ r=−∞
∞
X
⇒ µy = µ x h(r) = µxH(0)
r=−∞
P∞ −jrθ P∞
[ NB: H(θ) = r=−∞ h(r)e . For θ = 0, then H(0) = r=−∞ h(r) ]
ii) Cross–correlation
∞
X
ryx(m) = E{y[n]x[n + m]} = h(r)E{x[n − r]x[n + m]}
r=−∞
∞
X
= h(r)rxx(m + r) convolution of input ACF and {h}
r=−∞
c D. P. Mandic Advanced Signal Processing 55
Recap: Lin. systems – statistical properties # output
These are key properties # used in AR spectrum estimation
From rxy (m)
P∞= ryx(−m) we have
rxy (m) = r=−∞ h(r)rxx(m − r). Now we write
∞
X
ryy (m) = E{y[n]y[n + m]} = h(r)E{x[n − r]y[n + m]}
r=−∞
∞
X ∞
X
= h(r)rxy (m + r) = h(−r)rxy (m − r)
r=−∞ r=−∞
or
Syy (f ) = H(f )H(−f )Sxx(f ) = |H(f )|2Sxx(f )
c D. P. Mandic Advanced Signal Processing 56
More on Wold Decomposition (Representation) Theorem
Example: A “paradox”, can we talk about a deterministic random process
sin(2)
x[n] = x[n − 1] − x[n − 2]
sin(1)
sin(2)
p(x[n] | x[n − 1], x[n − 2], . . .) = x[n − 1] − x[n − 2] = x[n]
sin(1)
c D. P. Mandic Advanced Signal Processing 57
Appendix 1: More on numbers (recorded since 1874)
Top: original sunspots Middle and Bottom: AR(2) representations
0 0.4 0
0.2
100 0.5
0 200 400 600 5 10 15 20 0 5 10 15
1st filtered Gaussian process (2nd order YW) PSD for 1st filtered signal Partial ACF for 1st filtered signal
10 1
0.8
0.5
0.6
0
0.4 0
0.2
10 0.5
0 200 400 600 5 10 15 20 0 5 10 15
2nd filtered Gaussian process (2nd order YW) PSD for 2nd filtered signal Partial ACF for 2nd filtered signal
10 1
0.8
0.5
0.6
0
0.4 0
0.2
10 0.5
0 200 400 600 5 10 15 20 0 5 10 15
Top: original Middle: first AR(2) model Bottom: second AR(2) model
c D. P. Mandic Advanced Signal Processing 58
Appendix 2: Model order for an AR(2) process
An AR(2) signal, its ACF, and its partial autocorrelations (PAC)
4 0.5
Signal values
Correlation
2
0 0
−2
−4 −0.5
−6
−8 −1
0 100 200 300 400 500 0 10 20 30 40 50
Sample Number Correlation lag
Partial ACF for AR(2) signal Burg Power Spectral Density Estimate
Power/frequency (dB/rad/sample)
0.8 15
0.6 10
0.4
5
Correlation
0.2
0
0
−5
−0.2
−10
−0.4
−0.6 −15
−0.8 −20
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
Correlation lag Normalized Frequency (×π rad/sample)
After lag k = 2, the PAC becomes very small (broken line conf. int.)
c D. P. Mandic Advanced Signal Processing 59
Appendix 3: More on over–parametrisation
Consider the linear stochastic process given by
x[n] = x[n − 1] − 0.16x[n − 2] + w[n] − 0.8w[n − 1]
It clearly has an ARMA(2,1) form. Consider now its coefficient vectors
written as polynomials in the z–domain:
a(z) = 1 − z + 0.16z 2 = (1 − 0.8z)(1 − 0.2z)
b(z) = 1 − 0.8z
These polynomials clearly have a common factor (1 − 0.8z), and therefore
after cancelling these terms, we have the resulting lower–order polynomials:
a(z) = 1 − 0.2z
b(z) = 1
The above process is therefore an AR(1) process, given by
x[n] = 0.2x[n − 1] + w[n]
and the original ARMA(2,1) version was over–parametrised.
c D. P. Mandic Advanced Signal Processing 60
Something to think about ...
◦ What would be the properties of a multivariate (MV) autoregressive,
say MVAR(1), process, where the quantities w, x, a now become
matrices, that is
c D. P. Mandic Advanced Signal Processing 61
Consider also: Fourier transform as a filtering operation
We can see FT as a convolution of a complex exponential and the data (under a
mild assumption of a one-sided h sequence, ranging from 0 to ∞)
R∞
1) Continuous FT. For a continuous FT F (ω) = −∞
x(t)e−ωtdt
Let us now swap variables t → τ and multiply by eωt, to give
Z Z
eωt x(τ )e−ωτ dτ = x(τ ) |eω(t−τ )
{z } dτ = x(t) ∗ e
ωt
(= x(t) ∗ h(t))
h(t−τ )
1 1−z −1 W ∗
with the transfer function (large N) H(z) = 1−z −1 W
= 1−2 cos θk z −1 +z −2
c D. P. Mandic Advanced Signal Processing 62
Notes
c D. P. Mandic Advanced Signal Processing 63
Notes
c D. P. Mandic Advanced Signal Processing 64
Notes
c D. P. Mandic Advanced Signal Processing 65
Notes
c D. P. Mandic Advanced Signal Processing 66