0% found this document useful (0 votes)

25 views

Matlab Codes and Report

Random numbers using Matlab to generate numbers from 1-1000 and subsequent reports. The document also shares the appropriate codes and variable presentation

Uploaded by

felix.gatheru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Matlab Codes and Report

Random numbers using Matlab to generate numbers from 1-1000 and subsequent reports. The document also shares the appropriate codes and variable presentation

Uploaded by

felix.gatheru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Full Coursework

Advanced Signal Processing

Haobo Zhu
CID: 01493196

March, 2022
Imperial College London
Contents
1 Random Signals and Stochastic Processes 2
1.1 Stochastic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Estimation of probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Linear stochastic modelling 9

2.1 ACF of uncorrelated and correlated sequences . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Cross-correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Autoregressive Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Cramer-Rao Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Real world signals: ECG from iAmp experiment . . . . . . . . . . . . . . . . . . . . 18

3 Spectral Estimation and Modelling 20

3.1 Averaged Periodogram Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Spectrum of autoregressive processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 The Least Squares Estimation (LSE) of AR Coefficients . . . . . . . . . . . . . . . . 24
3.4 Spectrogram for time-frequency analysis:dial tone pad . . . . . . . . . . . . . . . . . 27
3.5 Real world signals: Respiratory sinus arrhythmia from RR-Intervals . . . . . . . . . 29

4 Optimal filtering - fixed and adaptive 31

4.1 Wiener filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 The least mean square (LMS) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Gear shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Identification of AR processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Dealing with computational complexity: sign algorithms . . . . . . . . . . . . . . . . 38

5 MLE for the Frequency of a Signal 39

1
Haobo Zhu; Full Coursework 2

1 Random Signals and Stochastic Processes

1.1 Stochastic Estimation

For a uniformly distributed random variable X ∼ U (0, 1) , the theoretical mean is given by the
following integration:
Z ∞ Z 1
m = E{X} = xp (X = x) dx = x · dx = 0.5 (1)
−∞ 0
1
which gives a theoretical mean of 0.5.
Assignment 1
Calculating 1.1
byStatistical
the MATLAB
estimation command mean on the 1000x1 vector generated by rand gives a sample
1. For a uniformly distributed random variable 𝑋~ 𝒰 0,1 , the theoretical mean is given by the following
mean of 0.4995, which is 0.12. The theoretical standard deviation is then calculated by
integration:

𝑚 𝔼𝑋 𝑥 𝑝 𝑋 𝑥 𝑑𝑥 𝑥 ∙ 1 𝑑𝑥 Z 0.5 1
s r
1
1
q
2
p
− E{X}) 2 2 2 2
σ = E{(X which gives a theoretical}mean E{X } − E{X} =
= of 0.5. x dx − 0.5 = ≈ 0.2887 (2)
Calculating by the MATLAB command PHDQ on the 1000x1 vector generated 0 by UDQG gives a sample 12
mean of 0.4995, which is 0.1% below the theoretical value.
2. The
and the result theoretical
from MATLAB standard deviation
usingis then
thecalculated
function by std gives a sample standard deviation of 0.2877,

which is 3.46% below 1

𝜎 the 𝔼 theoretical
𝑋 𝔼𝑋 𝔼value.
𝑋 𝔼𝑋 𝑥 𝑑𝑥 0.5 0.2887 2
12
and the result from MATLAB using the function VWG gives a sample standard deviation
Estimationof of
0.2877,
std which
0.3
is 3.46% below the theoretical value.
3.

0.295
std Value/AU

0.29

0.285

1 2 3 4 5 6 7 8 9 10
Realisation

(a) sample mean (b) sample std

Figure 1:sample mean of the 10 realisations Figure 2:sample std of the 10 realisations
with 1000 samples with 1000 samples.
Figure 1: Sample mean and std of the 10 realisations of uniform random noise with 1000 samples
From the two figures showing the estimate of sample mean and standard deviations, we obtain a maximum
error of 4.30% for means and 3.58% for the standard deviations, yet there are data points lying both below
and above the theoretical values. We can therefore conclude that the estimators for both mean and standard
deviation are unbiased.
The pdf plotted by MATLAB using 5, 10 and 100 bins are as follows.
4. The pdf plotted by MATLAB using 5, 10 and 100 bins are as follows.

Figure 3-5: The pdf plotted using 5, 10 and 100 bins respectively
Figure 2: The pdf plotted using 5, 10 and 100 bins respectively
Haobo Zhu; Full Coursework 3
2

The error in the pdf of a certain realisation of the random process is more pronounced if more the
The error in the pdf of a certain realisation of the random process is more pronounced if more the number
number of bins increases,
of bins whilesmoothened
increases, while smoothened with
with small small
number number of bins.
of bins.

Figure 6-9: The pdf plotted using 100, 10000 and 100000 bins respectively, along with the theoretical
Figure 3: Estimated
pdf.
PDF with 100, 10000 and 100000 samples and nbin=10, along with the theo-
retical PDF
As the sample size grows, it can be observed from the diagrams that the error of pdf approximation
converges to 0. That is, the approximation converges to the theoretical pdf 𝑋~ 𝒰 0,1 .

5. We
As the sample thengrows,
size repeat theitprocess
can be above for a normally
observed from distributed that 𝑋~
random process
the diagrams the𝒩error
0,1 . of pdf approximation
a) The theoretical mean is 0 by the definition of standard deviation, and the calculated sample mean for
converges to 0. That is, the approximation converges to the theoretical pdf X ∼ U (0, 1) .
the 1000x1 vector generated by UDQGQ is 0.0376 which is reasonable for a zero-cantered normal
distribution.
Estimation of std
b) The theoretical standard deviation is 1 by the definition
1.04 of standard deviation, and the standard
deviation calculated by MATLAB is 0.9927, which 1.03
is 0.73% below the theoretical value.
1.02
std Value/AU

1.01

0.99

0.98

0.97
0 2 4 6 8 10
Realisation

(a) sample mean (b) sample std

Figure 12-15: The mean and standard deviation of a normally distributed random process
Figure 4: Sample mean and std of the 10 realisations of Gaussian white noise with 1000 samples

We then repeat the process above for a normally distributed random process X ∼ N (0, 1). The
theoretical mean is 0 by the definition of standard deviation, and the calculated sample mean for
the 1000x1 vector generated by randn is 0.0376 which is reasonable for a zero-cantered normal
distribution. The theoretical standard deviation is 1 by the definition of standard deviation, and
the standard deviation calculated by MATLAB is 0.9927, which is 0.73% below the theoretical
value.
Haobo Zhu; Full Coursework 4

The sample mean and standard deviation for the 10 realisations with 1000 data samples are shown
above by the two figures. They cluster near the theoretical values 0 and 1 with points lying on
3 3
both sides of them. The estimator of mean is constrained by ±0.04 and the estimator of mean
is also constrained
c) The c) sample
The ±0.04,
bysample
mean and mean which
and standard
standard
is reasonable
deviationdeviation
for the for
10 the
for we withhave
10 realisations
realisations
got
with
1000
an
1000
data
unbiased
data
samplessamples
are shown areestimator
shown for both
of them. The pdfby
above above
of by the
the this
two two figures.
realisation
figures. They
They cluster of cluster near
thetherandom
near the theoretical
theoretical values values
process 0 and 10 with
can and
be1points
with points
approximately
lying on lying
bothon both
plotted by the
sides ofsides
them.ofThethem. The estimator
estimator of meanofismean is constrained
constrained by 0.04 by 0.04 theand the estimator of is
mean
alsois also
histogram function in MATLAB.
constrained 0.04, is
by which
We isthen
which
plot the
reasonable
pdf andof an theestimator of mean
realisation with 1000 data samples
constrained by 0.04, reasonable for we for
havewegot
have
an got
unbiased unbiased
estimatorestimator
for bothforofboth of them.
them.
as follows,d) with
The number
d) pdfThe ofofthisbins
pdf realisation
of this of of
the 5,
realisation 10random
of the
random and
process100
process
can berespectively.
can be approximately
approximately Theplotted
plotted byerror
the by is
the also
KLVWRJUDP more pronounced
KLVWRJUDP
functionfunction
in MATLAB.in MATLAB. We
We then plot then plot
the pdf the pdf of the realisation
of the realisation with 1000with 1000 data
data samples samples as
as follows, follows,
in the diagram if the pdf is plotted with a greater number of bins. As the curve of the pdf of not
with number
with number of bins of
of bins
5, 10ofand
5, 10
100and 100 respectively.
respectively. Theiserror
The error also is alsopronounced
more more pronounced in the diagram
in the diagram
uniform, we iflose more
if the
the pdf pdfinformation
is plotted
is plotted with a about
with a greater greater the
numbernumber
of bins.variability
ofAs
bins.
the As the of
curve ofthethe
curve of the
pdf estimation
of pdf
not of if lose
not uniform,
uniform, we weweplot
lose the diagram
with too fewmorebins, more information
information
especially about
about for the variability
the variability
the region of
of thewith the estimation
estimation if
lowif intensity.we plot the diagram with
we plot the diagram with too few bins, too few bins,
especially
especially for the for the with
region region with
low low intensity.
intensity.

Figure
Figure 16-18 16-18
The The
pdf of thepdf of the realisation
realisation with 1000with 1000
data data samples
samples plotted5,using
plotted using 5, 100
25 and 25 and
bins100 bins
respectively
respectively Figure 5: PDF of 1000 sample WGN with different bins
The pdfThe pdf normally
of this of this normally distributed
distributed randomrandom
variablevariable also converges
also converges to the theoretical
to the theoretical one as one as the data
the data
sample sample size increases.
size increases.
Probability Density, 25bins, number of samples:100 Probability Density, 25bins, number of samples:10000
Probability
0.6 Density, 25bins, number of samples:100 Probability
0.4 Density, 25bins, number of samples:10000
0.6 0.4

0.35
0.5 0.35
0.5
0.3
0.3
0.4 0.25
0.4 0.25

0.2
0.3 0.2
0.3
0.15
0.15
0.2
0.2 0.1
0.1

0.1 0.05
0.1 0.05
0
0 0 -4 -3 -2 -1 0 1 2 3 4
0 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -4 -3 -2 -1 0 1 Intensity
2 3 4
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1
Intensity 1.5 2 Intensity
Intensity

Figure 19-22: The pdf of the realisation plotted using 100, 10000 and 100000 data samples respectively,
Figure 19-22: The pdf
all plotted Figure
withof
25the 6: with
realisation
bins, along PDF
plotted of 100,
using WGN
the theoretical 10000with
pdf. different
and 100000 lengths
data samples respectively,
all plotted with 25 bins, along with the theoretical pdf.

The pdf of this normally distributed random variable also converges to the theoretical one as the
data sample size increases.
Haobo Zhu; Full Coursework 5

1.2 Stochastic Processes

The ensemble means and standard deviation of M=100 members of ensemble, each with N=100
time steps are plotted as follows using MATLAB. It may be concluded that random process 1 is
not stationary in mean or variance, since the mean grows linearly in time and variance is low at
beginning and stopping range. On the other hand, random processes 2 and 3 are both stationary in
mean and variance, as they centre about a constant value, which is presumed to be the theoretical
mean and variance.

Figure 7: The ensemble means and variances of the three random processes (M=100, N=100)

The time averages are constrained for processes 1(std=0.0205) and 3(std=0.0371), but not 2
(std=0.2939), and therefore random process 2 is not ergodic in mean, as the time average changes
over realisations we get different values for every realisation. However, for process1, the ensemble
average changes with time step, and thus it is not ergodic in mean as it is impossible to retrieve
the ensemble average in the time domain. The “std” term in braces is the standard deviation for
the time average of the 4 realisations in this paragraph.
Haobo Zhu; Full Coursework 6

Table 1: Time average of realisations

Process# Realisation 1 Realisation 2 Realisation 3 Realisation 4

1 10.0234 10.0709 10.0332 10.0416
2 0.7590 0.0618 0.5632 0.4854
3 0.5590 0.5010 0.4705 0.5208

Similarly, we may assess the time standard deviation to see if the processes are ergodic in variance.
The time standard deviations are constrained for processes 1(std=0.0371) and 3(std=0.0139), but
not process 2(std=0.0801), and therefore process 2 is not ergodic in variance as the values derived
from each realisation are so different as they do not suggest the ensemble standard deviation.
However, process 1 is not ergodic in variance as its variance changes with time and so does its
standard deviation, so a time standard deviation cannot provide an ensemble standard deviation
for it. The “std” term in braces suggest the standard deviation of the 4 realisations for the time
std in this paragraph.

Table 2: Time std of realisations

Process# Realisation 1 Realisation 2 Realisation 3 Realisation 4

1 5.8578 5.8869 5.8780 5.8855
2 0.2215 0.0627 0.0765 0.1912
3 0.8574 0.8586 0.8806 0.8691

The random process 1 can be expressed as y1 [n] = an + bRsin πn

N , a=0.02, b=5 and N is the
data sample size; R is a uniformly distributed random variable R ∼ U (−0.5, 0.5). The theoretical
expected value of this random process is then

m1 = E{y1 [n]} = an = 0.02n (3)

The theoretical variance is just

n 2πn o
σ12 = E{y1 [n]2 } − E{y1 [n]}2 = E b2 R 2 sin( )
N
Z 0.5 πn 25 πn (4)
= 25R2 sin2 = sin2
−0.5 N 12 N

and therefore, the std is σ1 = √512 sin πn

N . The shape of the mean and standard deviation plotted
by sample averaging match the results as the mean plot grows linearly and the standard deviation
shows the pattern of half a period of sine.

The random process 2 can be expressed as y2 [n] = R1 R2 + R3 , where R1 ∼ U (−0.5, 0.5) , R2 ∼

U (0, 1), and R3 ∼ U (0, 1). Its ensemble mean is thus

m2 = E {y2 [n]} = E{R1 }E{R2 } + E {R3 } = E {R3 } = 0.5 (5)

Haobo Zhu; Full Coursework 7

and its variance is

1 1 1 1
σ22 = E y2 [n]2 − E {y2 [n]}2 = E R12 R22 + 2R1 R2 R3 + R32 − 0.52 =

+ − = (6)
36 3 6 4 9
Its variance is therefore σ2 = 31 . It can also be concluded on the theoretical point of view that the
The random
process is stationary forprocess
both2of canits
be expressed as 𝑦 𝑛ensemble
theoretical 𝑅 𝑅 𝑅 mean
, where 𝑅and~ 𝒰 standard
0.5, 0.5 ,𝑅 ~𝒰 0, 1 , and are constant.
deviation
𝑅 ~𝒰 0, 1 Its ensemble mean is thus
By assessing the average mean and standard deviation from the 100 realisations, each 100 time
points, we have an average mean 𝑚 𝔼of𝑦 0.4918
𝑛 𝔼 Rstandard
𝔼 Rand 𝔼𝑅 𝔼𝑅
deviation 0.5 of 0.3420 with small 7 standard
deviations for the average
The first value as
term is separable over
𝑅 ,𝑅100and time points (0.0147
𝑅 are uncorrelated for they arefor ensemble
different standard
realisations of a white deviation and
noise. Its variance
0.0175 for the ensemble is then
mean).
1 1 1 1
σ 𝔼𝑦 𝑛 𝔼𝑦 𝑛 𝔼𝑅 𝑅 2𝑅 𝑅 𝑅 𝑅 0.5 8
36 3 4 9
The random process
Its variance3iscan be σexpresses
therefore bybe yconcluded
. It can also 3 [n] =onmR + a, where
the theoretical m that
point of view = the
3, aprocess
= 0.5is and R is a
random variable R ∼ for
stationary U (−0.5, 0.5). Itsensemble
both of its theoretical meanmean
is then
and standard deviation are constant.
By assessing the average mean and standard deviation from the 100 realisations, each 100 time points, we
have an average mean of 0.4918 and standard deviation of 0.3420 with small standard deviations for the
average value over 100 time
E {y(0.0147
m3 =points 3 [n]}for=ensemble
mE{R} + a = a = 0.5
standard deviation and 0.0175 for the ensemble
(7)
mean).
Its variance is then
The random process 3 can be expresses by 𝑦 𝑛 𝑚𝑅 𝑎, where m=3, a=0.5 and R is a random
variable 𝑅~2 𝒰 0.5, 0.5 . Its mean
2 is then 2 2
σ32 = E{y3 [n] } − E{y3 [n]} = E{m R + 2amR + a2 } − a2 = E{m2 R2 } = 0.75 (8)
𝑚 𝔼√
𝑦 𝑚𝔼 √
𝑅 𝑎 𝑎 0.5 9
3
And its standard deviation is then σ3 = 0.75 = 2 ≈ 0.8660
Its variance is then σ

This process is indeed σ

also
𝔼𝑦 𝑛
stationary
𝔼𝑦 𝑛
for𝔼 it𝑚has 𝑅
got
2𝑎𝑚𝑅 𝑎
constant𝑎 mean 𝔼𝑚 𝑅
and 0.75standard10deviation. By
looking at the average mean and standard deviation we obtain a mean of 0.5056 and a standard
√
And its standard
deviation of 0.8658. deviation is then
The standard σ √0.75 for the
deviation 0.8660
mean and std for the 100 time points of the
ensemble meanThis
areprocess
small, being 0.09 and 0.0389 respectively.
is indeed also stationary for it has got constant mean and standard deviation. By looking at
the average mean and standard deviation we obtain a mean of 0.5056 and a standard deviation of 0.8658.
The standard deviation for the mean and std for the 100 time points of the ensemble mean are small, being
0.09 and 0.0389 respectively.
1.3 Estimation of probability distributions
1.3 Estimation of probability distributions

1.
Probability Density/AU

Figure 24: The pdf of a normally distributed random process with 10000 data points plotted by
the SGI function.
Figure 8: The pdf of a normally distributed random process with 10000 data points plotted by the
pdf function.

For the three random processes tested in section 1.2, only process 3 is stationary and ergodic. From
the plot below, as the data sample size increases, the pdf converges to its theoretical value.
Haobo Zhu; Full Coursework 8

Figure 9: The probability density function for the random process rp3 in exercise 2 plotted for
the realisations with 100(left-up), 1000(right-up), and 10000(left-bottom) time steps, along with its
theoretical value.

We cannot plot the pdf of a nonstationary process using the pdf function. Because the data
distributions of different time steps differ, we cannot use a single collective method (histogram in
this case) to analyse its overall pdf using a single diagram. For the 1000-sample-long signal of
which the mean changes from 0 to 1 after N=500, we may treat it as two stationary signals, one
with n < 500 and one with n ≥ 500 and calculate their pdf respectively using a histogram.
Haobo Zhu; Full Coursework 9

2 Linear stochastic modelling

2.1 ACF of uncorrelated and correlated sequences

Figure 1: The ACF function of a WGN plotted for delays τ ∈ [−999 : 999] (left) and |τ | < 50
(right)

By plotting the the unbiased estimate of the ACF function of a White Gaussian noise generated by
randn(1, 1000), we may see the ACF has a spike at τ = 0, and the other values are all less than
0.2 except for the extrme values of τ (|τ | > 900). This is expected as a WGN are only correlated
with the exact same realisation (|τ | = 0), and the other values should converge to 0 if the sample
size is large enough

If we zoom in the plot onto |τ | < 50, we may see that all the values of the ACF other than the spike
are consistent and well constrained below 0.2, and does not increase overall when |τ | increases.

The equation for an unbiased estimate of ACF is shown as follows:

N −|τ |−1
1 X
Rx (τ ) = x[n]x[n + τ ], τ = −N + 1, ..., N − 1 (1)
N − |τ |
n=0

1
It may be easily observed that if |τ | gets sufficiently large, N −|τ | increases inversely proportionally.
Moreover, less samples are calculated when |τ | approaches ±1000, so that the variance of the
estimates also increases as suggested by the central limit theorem, the variance of an expected
2
value is larger if the number of samples used is smaller (σmean2 = σN , where N is the sample size).
We would recommend the empirical bound for |τ | is 900.

We then generate another 1000-sample WGN and filters it with a moving average filter with 9 unit
coefficients, then we plot the ACF of the filtered signal. We may see that spike in the ACF of the
WGN gets widened that the values for |τ | ∈ [−9 : 9] are significantly greater than 0 (Ry > 1) and
decreasing as |τ | increases, the bound is consistent with the order of the MA filter.

The ACF of the output signal represents a convolution between the ACF of the input signal and
Haobo Zhu; Full Coursework 10

Figure 2: The ACF function of a WGN plotted for delays τ ∈ [−999 : 999]

the ACF of the impulse response of the filter.

Ry (τ ) = Rx (τ ) ∗ Rh (τ ) (2)

For a white noise, the auto correlation is a delta function (Rx = δ(τ )), by the sifting property of a
delta function,
Ry (τ ) = δ(τ ) ∗ Rh (τ ) (3)
and thus Ry represents the ACF of the impulse response of the filter.

2.2 Cross-correlation function

We plot the unbiased estimate of cross correlation function for ergodic signals between the input
and filtered signals given by the equation
N −|τ |−1
1 X
Rxy (τ ) = x[n]y[n + τ ], τ = −N + 1, ..., N − 1. (4)
N − |τ |
n=0

The CCF function of the input and output signal can be expressed by a convolution between the
ACF of the input and the impulse response of the filter.

Rxy (τ ) = Rx (τ ) ∗ h(τ ) (5)

Similarly, if the input Xt is an uncorrelated stochastic process, the resulting Rxy would have the
shape of the impulse response of the filter by the sifting property of the delta function.

Rxy (τ ) = δ(τ ) ∗ h(τ ) = h(τ ) (6)

The calculated estimation of CCF is flipped around τ = 0 comparing to the expected output. This
is due to Matlab computes the CCF by xcorr in a different way, where the CCF function between
Haobo Zhu; Full Coursework 11

Figure 3: The CCF function of a WGN and the output of the MA filter plotted for delays τ ∈ [−20 :
20]

the input and output peaks at negative lags if the output is a delayed copy of the input. This result
can be used for system identification, as if we feed a white noise into an unknown system, we would
be able to identify its impulse response after we obtain the output.

2.3 Autoregressive Modelling

For the 100 samples of AR(2) models with length 1000 and uniformly distributed
a1 ∈ [−2.5, 2.5] and a2 ∈ [−1.5, 1.5], the pairs of coefficients that result in a stable output (the
output at the final time step less than 1000) is plotted as red asterisks in the figure below.

Figure 4: The pairs of a1 and a2 that result in stable(red) and unstable(black) outputs, along with
the stability bounds
Haobo Zhu; Full Coursework 12

The Yule-Walker equations for a second order AR process is given by

(
ρ1 = a1 + a2 ρ1
(7)
ρ2 = a2 + a1 ρ1

Solving for ρ1 and ρ2 , we obtain

 a1
ρ1 = 1 − a

2
2 (8)
ρ2 = a2 + a1

1 − a2

As ρ1 < ρ(0) = 1, from Eqn (8)

a1
<1 (9)
1 − a2

Thus, the first stability criterion is given by

a1 + a2 < 1. (10)

The variance of the AR process is then

2 2

σw 1 − a2 σw
σx2 = = (11)
1 − ρ1 a1 − ρ2 a2 1 + a2 (1 − a2 − a1 )(1 − a2 + a1 )
The variance of a random process must be positive, and therefore

(1 − a2 − a1 )(1 − a2 + a1 ) > 0 (12)

From Eqn (10),

1 − a2 − a1 > 0 (13)
Therefore, we get the second stability criterion

1 − a2 + a1 > 0
(14)
a2 − a1 < 1

Similarly,
1 − a2
> 0, (15)
1 + a2
which leads to the final stability criterion:

|a2 | < 1 (16)

The three stability criteria are then 

a1 + a2 < 1


a1 − a2 < 1 (17)


 |a | < 1
2
Haobo Zhu; Full Coursework 13

Figure 5: ACF estimates of sunspot data

which forms a triangular shape when plotting a1 against a2 .

We then investigate the sunspot data by plotting the ACF estimates for N = {5, 20, 250}. For
the dataset with different lengths, the shapes of different lengths are quite different. For N = 5,
we may see a decaying ACF estimate in the non-zero mean plot, while for the zero mean version,
the absolute value of the ACF at |τ | = 4 is larger than that at τ = 0, probably due to the
statistical inconsistency at edges of the plot. The shapes for N = 20 and N = 250 are more similar,
demonstrating a pseudo periodic behaviour, with the shape of N = 250 has a shorter period. For
the non-zero mean version, the ACF for N = 20 at |τ | = 13 exceeds that at τ = 0. The ACFs of
the non-zero mean version is greater than those of zero mean version at all lags, and the non-zero
mean serves as a non-constant offset in the ACF estimations.

The partial correlation functions (PCF) are calculated using Yule-Walker equation for the original
data and the standardised data with zero mean and unit variance, we obtain

For both the original and standardised data, we may conclude that the model can be best modelled
by an AR(2) process, since all PAF over order 2 are less than 0.3. However, as the statistical bound
Haobo Zhu; Full Coursework 14

Table 1: PAFs from Yule-Walker

a1,1 a2,2 a3,3 a4,4 a5,5 a6,6 a7,7 a8,8 a9,9 a10,10
Original 0.9295 -0.5857 0.1284 0.2532 0.1555 0.2574 0.2736 0.2384 0.1680 0.0252
Standardised 0.8212 -0.6783 -0.1223 0.0473 -0.0156 0.1623 0.1751 0.2276 0.1766 0.0038

√
for 95% confidence is ±1.96/ N = ±0.1155, we may say that the PAF up to order 9 still have
some significance. The PAF over order 2 in the standardised data are all less than those for the
original data except for a9,9 .

The appropriate model order can be further assessed using the MDL, AIC, and AICc criteria for
the standardised dataset.

Figure 6: Different criteria assessing the correct model order

MDL and AIC tests show a minimum at p = 9, while AICc clearly shows a minimum at p = 2. As
the difference of the values at p = 2 and p = 9 of MDL(< 0.1) and AIC (< 0.2) are not significant,
and considering the efficiency of the model regarding computing complexity and that the AIC and
MDL may be unreliable since we have a short segment of data from the AR(2) process, we may
conclude a model order 2 is appropriate to model this dataset.

The predictions made by model orders p = {1, 2, 10} with prediction horizons m = {1, 2, 5, 10} are
plotted as follows.
Haobo Zhu; Full Coursework 15

Figure 7: Predictions done by different model orders on different prediction horizons

m=0 m=1 m=2 m=5 m=10

p=1 0.3244 0.3191 0.8311 1.4301 0.8310
p=2 0.1752 0.1694 0.4582 0.8176 0.8513
p=10 0.1492 0.1372 0.3314 0.4960 0.5220

Table 2: Cummulative Square Error of Different model orders on different prediction horizons

The plot and the table suggest that all predictions are reasonably close to the original data for
m = 1. For a typical AR(2) process, the overmodelled prediction should have larger MSE in
interpolation, but as the only criterion that suggests p = 2 is AICc, this process is not typically
AR(2) but more close to AR(9). The variance of the 10th order model is smaller than those of the
1st and 2nd order models. As the prediction horizon increases, the AR prediction results of different
model orders start to diverge. For m = 10, The variance between the prediction results and the
original data is minimised when p = 10, while the lower order models have great differences in
the amplitude of oscillations comparing to the original data. This also suggests that overmodelled
predictions are more robust than those undermodelled ones when the prediction horizon is high.
Haobo Zhu; Full Coursework 16

2.4 Cramer-Rao Lower Bound

The NASDAQ closing prices can be well modelled by an AR(1) process, as all MDL, AIC suggest
an optimal model order at p = 1 when modelling the NASDAQ data standardised to X ∼ N (0, 1)
as AR modelling requires zero-mean datasets. The corresponding PAF is a1,1 = 0.9977, and all
higher order PAF fall below the 95% confidence interval. The daily return then can be calculated
by subtracting the adjacent closing prices from the obtained AR model.

Different Criteria for Orders 1:20

0.14
MDL
0.12 AIC

0.1

0.08

0.06

0.04

0.02

-0.02

-0.04
0 2 4 6 8 10 12 14 16 18 20
Model Order

Figure 8: Different criteria assessing the optimal model order for NASDAQ data

As suggested by the coursework manual, the estimation of CRLB can be done by

h i h i
Z 1 ∂ ln P̂X (f ; θ) ∂ ln P̂X (f ; θ)
N 2
[I(θ)]ij = df, (18)
2 −1 ∂θi ∂θj
2

and
p p
" # " #
h i X X
ln P̂X (f ; θ) = ln[σ̂ 2 ] − ln 1 − âm e−j2πf m − ln 1 − âm ej2πf m (19)
m=1 m=1

Therefore, given θ1 = a1 and θ2 = σ 2 ,

 h i 2
1 Z 1
N
Z ∂ ln P̂X (f ; θ) d ln [σ 2 ]
2
 df = N N
2 2
[I(θ)]22 = 
2 2
df = 4 (20)
2 − 21 ∂(σ ) 2 −1 d(σ ) 2σ
2

N rxx [0]
Given [I(θ)]11 = σ2
, [I(θ)]12 = [I(θ)]21 = 0, the full Fisher information matrix is then
 
N rxx [0]
 σ2 0 
I(θ) =  N  (21)
0
2σ 4
Haobo Zhu; Full Coursework 17

σ2 σ2
[I(θ)]11 can be further simplified as rxx [0] = σx2 = 1−ρ1 a1 = 1−a21
for zero mean data

N σ2

1
[I(θ)]11 = 2 =N (22)
σ (1 − a1 ) 1 − a21
We then invert the Fisher information matrix to get the CRLB for â1 and σ̂ 2
1
 
2
 (1 − a1 ) 0 
I−1 (θ) =  N 2σ 4  (23)
0
N
Finally, we yield the CRLB for the parameters
2σ 4
var(σˆ2 ) ≥
N (24)
1
var(â1 ) ≥ (1 − a21 )
N

The heatmaps of CRLB for var(σˆ2 ) and var(aˆ1 ) are taken their logarithm and plotted as follows,
the deeper blue represents larger variance. The colour pattern clearly follows the CRLB derived in
Eqn (24).
Heatmap of log(CRLB) for the Variance Heatmap of log(CRLB) for the a1
1 1
51 51
101 6
151 101 -7
201 151
251 4 201
301 251 -8
351 301
401 2
351
401 -9
451 451
2

501
2

501
551 0 551 -10
601 601
651 651
701 -2 701 -11
751 751
801 801
851 -4 851 -12
901 901
951 951
1001 -6 1001
1
51

1
101
151
201
251
301
351
401
451
501
551
601
651
701
751
801
851
901
951

51
1001

101
151
201
251
301
351
401
451
501
551
601
651
701
751
801
851
901
951
1001

N N

(a) log10 (CRLB) for â1 (b) log10 (CRLB) for σˆ2

Figure 9: The heatmaps of CRLB for (a) var(σˆ2 ) and (b) var(aˆ1 )

The calculated CRLB for var(â21 ) is then

1 1
var(â1 ) ≥ (1 − a21 ) = (1 − 0.99772 ) = 4.973 ∗ 10−6 . (25)
N 924
As â1 →
− 1, var(â1 ) →
− 0. A physical justification is that the system becomes unstable if â1 reaches
unity, so that as it approaches to 1 the room for the variability becomes smaller. The Fisher
information also increases inverse proportionally as a1 → − 1, which suggests higher confidence in
the estimation.

The lower bound for the variance of the estimation of the power spectrum is
T
∂ P̂X (f ; θ) −1 ∂ P̂X (f ; θ)
var P̂X (f ; θ) ≥ I (θ) (26)
∂θ ∂θ
Haobo Zhu; Full Coursework 18

h iT
∂ P̂X (f ;θ) ∂ PˆX (f ;θ)
where ∂θ = ∂a1
∂ P̂X (f ;θ)
∂σ 2
. By substituting A(f ) = 1−a1 e−j2πf , A∗ (f ) = 1−a1 ej2πf ,

∂ PˆX (f ; θ)

∂ 1
= σ2
∂a1 ∂a1 A(f )A∗ (f )
∂A∗ (f )

2 1 ∂A(f ) ∗
= −σ A (f ) + A(f )
|A(f )|4 ∂a1 ∂a1
(27)
σ 2 j2πf −j2πf ∗

= e A(f ) + e A (f )
|A(f )|4
2σ 2
Re A(f )ej2πf .

= 4
|A(f )|
Similarly,
∂ P̂X (f ; θ) 1
2
= . (28)
∂σ |A(f )|2
Finally, the lower bound for the variance of the power spectrum is then
T
∂ P̂X (f ; θ) −1 ∂ P̂X (f ; θ)
var P̂x (f ; θ) ≥ I (θ)
∂θ ∂θ
i 1 (1 − a2 ) 0 2σ2 Re A(f )ej2πf
" #
h
2σ 2 j2πf 1 1 |A(f )| 4
N
= |A(f )|4 Re A(f )e |A(f )|2 2σ 4 1
0 N |A(f )|2
(29)
1 − a21 2σ 2 2 2σ 4

j2πf
= Re A(f )e +
N |A(f )|4 N |A(f )|4
2 j2πf

2σ 4 2(1 − a1 )Re A(f )e

= + 1
N |A(f )|4 |A(f )|4

2.5 Real world signals: ECG from iAmp experiment

We may plot the probability density function of averaged and original RR-intervals. Fig. 10 shows
the PDF estimates using histogram function with 20 bins.
PDF estimate of RRI1 PDF estimate of RRI2 PDF estimate of RRI3
14 15 8

12 7

6
10
10
5
8

4
6
3
5
4
2

2
1

0
0 0
0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.55 0.6 0.65 0.7 0.75 0.8
values values values

(a) RRI1 (b) RRI2 (c) RRI3

Figure 10: PDF estimate using histogram function for original RRI signals

With averaged RRI signals with averaging windows of 10 time points, the distributions and the
Haobo Zhu; Full Coursework 19

corresponding values generally do not change, with minor differences arise from the decrease in
total data points
PDF estimate of averaged RRI1, =1 PDF estimate of averaged RRI2, =1 PDF estimate of averaged RRI3, =1
15
18 9

16 8

14 7

10
12 6

10 5

8 4

5 6 3

4 2

2 1

0 0 0
0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.55 0.6 0.65 0.7 0.75
values values values

(a) RRI1 (b) RRI2 (c) RRI3

Figure 11: PDF estimate using histogram function for averaged RRI signals with α = 1

When α is decreased to 0.6, we may see the general shapes of the PDF are relatively consistent,
with the values on the x-axis shifted to the left (become smaller). This is expected because the
values are
PDF estimate of averaged RRI2, =0.6 PDF estimate of averaged RRI3, =0.6
PDF estimate of averaged RRI1, =0.6 30
15
25

25
20

10
20

10 5

5
5

0 0 0
0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46
values values values

(a) RRI1 (b) RRI2 (c) RRI3

Figure 12: PDF estimate using histogram function for averaged RRI signals with α = 0.6

10 -3ACF estimate of RRI1 10 -3ACF estimate of RRI2 10 -3ACF estimate of RRI3

1.5 1.5 5

1 1

0
0.5 0.5

0 0
-5

-0.5 -0.5

-1 -1 -10
-1000 -500 0 500 1000 -1000 -500 0 500 1000 -1000 -500 0 500 1000

Figure 13: ACF of the three RRI sequences

The ACF estimates of the three RRIs can then be analysed. As the sequences do not shrink to 0
after a certain delay as seen in Fig.13, they are AR processes.
Haobo Zhu; Full Coursework 20

The optimal order of the AR models for modelling the three RRI sequences can then be found
using criteria such as MDL, AIC and AICc.

Different Criteria for AR order, RRI1 Different Criteria for AR order, RRI2 Different Criteria for AR order, RRI3
-10.6 -10.2 -8.4
MDL MDL MDL
-10.8 AIC AIC AIC
AICc -10.4 AICc -8.6 AICc
-11

-11.2 -10.6 -8.8

-11.4 -10.8 -9
-11.6
-11 -9.2
-11.8

-12 -11.2 -9.4

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Figure 14: Different criteria assessing the optimal model order for the three RRI sequences

For RRI1 and RRI3, the optimal model order suggested by MDL and AIC is p = 5, while AICc
suggests p = 4 is optimal, while MDL and AIC show major decreases after p = 8 in RRI2, but
AICc suggests p = 3 is optimal; since MDL and AIC are also flat between order 3 to order 8, there
is no need to increase the model order to such high orders as the computational complexity would
increase significantly.

3 Spectral Estimation and Modelling

3.1 Averaged Periodogram Estimates

An Estimation of power spectral density(PSD) can be calculated by

N −1 2
1 X n
P̂x (f ) = x[n]e−j2πf N . (1)
N
n=0

With the written matlab function pgm, it produces following plots for WGN realisations with lengths
N = {128, 256, 512}
Haobo Zhu; Full Coursework 21

Periodogram of WGN, N=128 Periodogram of WGN, N=256 Periodogram of WGN, N=512

4 5 6

5
4
3
4
3
2 3
2
2
1
1
1

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
frequency frequency frequency

Figure 1: Estimates of PDF of White Gaussian Noise

The majority of the PSD in all three estimates sit about 1, which is expected as for WGNs, all
frequencies would have the same power distribution with an intensity of their variance by their
definition. The plots are also symmetrical about 0.5, which is π in terms of angular frequency,
which is a property of PDFs.

The PDF estimates are then filtered by a order 5 MA filter, they produce the following plots

Periodogram of WGN, N=128 Periodogram of WGN, N=256 Periodogram of WGN, N=512

2 4 3

2.5
3
1.5 2

2 1.5
1
1
1
0.5
0.5
0 0
0 0.5 1 0 0.5 1 0 0.5 1
frequency frequency frequency

Figure 2: Estimates of PDF of White Gaussian Noise, Filtered

The envelopes of the three PDF estimates are more pronounced and they have lower spikes com-
paring to the original copy in that such moving average filters are low pass filters. The quality of
the estimates are therefore improved.

We then generate a WGN of length 1024 and segment it into 8 segments. The MSE for PDF
estimates are

Table 1: The MSE of PDF estimates of different segments

Segment# 1 2 3 4 5 6 7 8 Average
MSE 0.7248 1.1582 0.7996 0.7000 0.7936 1.2452 0.6300 1.1331 0.8985
Haobo Zhu; Full Coursework 22

By averaging the PDF estimates, the MSE of the new PDF estimate becomes 0.1193, comparing to
the average MSE of the 8 segments, it is ∼ 7.5 times smaller, which should have dropped 8 times
in theory.

PDF Estimate of Segment of WGN Average PDF Estimate of 8 Segments of WGN

5 2

4
1.5

0.5
1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Frequency Frequency

Figure 3: An PDF Estimate of a segment of WGN(left), and the averaged PDF Estimate(right)

3.2 Spectrum of autoregressive processes

One realisation of WGN Filtered WGN

4
5

0
0

-5
-2

-4 -10
0 500 1000 0 500 1000
Time step Time step

Figure 4: A realisation of WGN(left) and the filtered signal(right)

To assess the spectral properties of AR models, a 1064-sample WGN is filtered by an AR(2) filter
1
with a = [1 0.9], namely H(z) = 1−0.9z −1 . The following plot compares the original WGN and the

filtered signal. It is clear that the filtered signal oscillates with larger amplitudes, and the high
frequency components are more pronounced.

We may then carry out a comparison between the PSD of the filter and the filtered WGN.The
following plot shows the filter is clearly a high pass filter with a maximum PSD at 100. The
Haobo Zhu; Full Coursework 23

empirically obtained estimation of cut-off frequency at which the PSD is 50.4 is 0.487.The filtered
WGN is centered around the filter, oscillates about the PDF of the filter and shoots up high when
the frequency approaches 0.5.

PDF Estimates of the Filter PDF Estimates of the Filter and a Filtered WGN
100 300
Filter Filter
Filtered WGN
250
80

200
60

150

40
100

20
50

0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Normalised Frequency Normalised Frequency

Figure 5: PDF of the filter and a filtered WGN

By zooming into f ∈ [0.4, 0.5], violent oscillations are prevalent in this region and thus the variability
of the estimation of PDF done by feeding a WGN into the system is high. As the 1064-samples of
a realisation of WGN can be interpreted by a whole realisation with infinitely many sample filtered
rectangular window filter in the time domain, the resulting PDF estimate is convolved with a sinc
function in the frequency domain and therefore causing oscillations.
PDF Estimates of the Filter and a Filtered WGN Filter PSD and Model Based Estimation
300 120
Filter Filter PSD
Filtered WGN Model based PSD
250 100

200 80

150 60

100 40

50 20

0 0
0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalised Frequency Normalised Frequency

(a) Filter and Filtered WGN (b) Filter and Model based Estimation

Figure 6: PDF of the ideal pdf of filter overlaid with different estimations

The PDF may also be estimated by using the property of PDF estimators of an AR(1) process
2
σ̂X
P̂y (f ) = 2 (2)
1 + â1 e−j2πf
Haobo Zhu; Full Coursework 24

where â1 = −R̂Y (1)/R̂Y (0), and σ̂ 2 = R̂Y (0) + â1 R̂Y (1). From the calculations by xcorr, R̂Y (0) =
5.57985 and R̂Y (1) = −5.04679. Therefore,

â1 = −R̂Y (1)/R̂Y (0) = −(−5.04679)/5.57985 = 0.9045

(3)
σ̂ 2 = R̂Y (0) + â1 R̂Y (1) = 5.57985 + 0.9045 ∗ (−5.04679) = 1.0152

From Fig.6(b), it is clear that the model based PSD estimation provides close estimates comparing
to the ideal PSD, with the curve slightly sit beyond the ideal one due to larger numerator and |â1 |
is greater than |a1 |.

We can then do a similar analysis to the sunspot data by filtering the original data with AR models
with order p={1, 2, 10}.

4
PSD of10Sunspot Data (Zero Mean) and AR Models of5 Sunspot Data (Original) and AR Models
PSD10
7 7
original original
6 AR(1) 6 AR(1)
AR(2) AR(2)
AR(10) AR(10)
5 5

4 4

3 3

2 2

1 1

0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Normalised Frequency Normalised Frequency

Figure 7: The sunspot data PSD from AR models of different orders

As the sunspot data is an AR(2) process, one may conclude that the AR(1) model is under-fitting
the data and AR(10) is potentially overfitting the data. However, according to the plots, the AR(10)
has overall the best performance, as AR(2) loses the first peak while AR(1) loses the second, but as
AR(2) is able to pronounce the most important frequencies in the zero-centered dataset, increasing
the model order up to 10 is not necessary, as AR(10) is much more computationally complex. The
properties of overfitting are not significant probably due to the small size of the dataset(N = 288).

3.3 The Least Squares Estimation (LSE) of AR Coefficients

For an AR(p) process, the cumulative square error loss function for finding optimal partial corre-
lation functions is given by
M
X p
X 2
J= r̂xx [k] − ai r̂xx [k − i] , f or M ≥p (4)
k=1 i=1
Haobo Zhu; Full Coursework 25

It can be rearranged into a matrix multiplication form, J = (x − Ha)T (x − Ha), H ∈ RM ×p , where

   Pp 
r̂xx [0] r̂xx [1] . . . r̂xx [p − 1] i=1 ai r̂xx [1 − i]
 r̂xx [1] r̂xx [0] . . . r̂xx [p − 2]   p ai r̂xx [2 − i] 
P
 i=1
H=  , s = Ha =  (5)
  
.. .. . . .. .. 
 . . . .   . 
Pp
r̂xx [M − 1] r̂xx [M − 2] . . . r̂xx [M − p] i=1 ai r̂xx [M − i]

as the loss function can be also written as J = (x − s)T (x − s). By taking the gradient of J w.r.t. a
and setting the result to 0, the optimal LSE estimator of a is â = (HT H)−1 HT x, and the minimum
cost function is Jmin = xT (x − Hâ), it is possible to determine the optimal model order by LSE.
We may also compare the estimation with the Yule-Walker results, where â = R−1 x, where the
matrix R ∈ Rp×p  
r̂xx [0] r̂xx [1] . . . r̂xx [p − 1]
 r̂xx [1] r̂xx [0] . . . r̂xx [p − 2]
R= (6)
 
.. .. .. .. 
 . . . . 
r̂xx [p − 1] r̂xx [p − 2] . . . r̂xx [0]
We may conclude that the LSE method is more computationally complex as it requires more matrix
multiplications and the dimension of the transformation matrix is larger (H ∈ RM ×p comparing to
R ∈ Rp×p ), and both methods require the calculation of a matrix where the entries are obtained
from the ACF.Given the random process x[n] is stochastic with a noise term w ∼ N (0, σ 2 ), and
rxx [m] = E(x[n]x[n + m]), the biased estimator of ACF becomes,
p
X
r̂xx [k] = ai r̂xx [k − 1] + ϵ[k] (7)
i=1

It is clear that within the estimator, the stochastic error is still present and therefore the matrix
H is stochastic.

We then test the LSE algorithm on estimating the best AR model for the (standardised) sunspot
data. The coefficients up to order 10 are

Table 2: Coefficients of AR(p) model from LSE

a1 0.7905
a2 1.5205 -0.8621
a3 1.4961 -0.8212 -0.0253
a4 1.4967 -0.8435 0.0115 -0.0227
a5 1.4966 -0.8473 0.0923 -0.1533 0.0797
a6 1.4836 -0.8287 0.0766 -0.0311 -0.1181 0.1205
a7 1.4620 -0.8120 0.0807 -0.0485 0.0311 -0.1247 0.1504
a8 1.4412 -0.7972 0.0757 -0.0435 0.0171 -0.0122 -0.0358 0.1159
a9 1.4304 -0.7950 0.0766 -0.0459 0.0199 -0.0212 0.0379 -0.0069 0.0774
a10 1.4222 -0.7954 0.0723 -0.0441 0.0168 -0.0178 0.0274 0.0784 -0.0641 0.0898

Then, we may plot Jmin = xT (x − Ha) for the AR models above.

Haobo Zhu; Full Coursework 26

Model Order against MSE 10 6 Model Order against MSE

2 5

1.8 4.5

1.6 4

1.4 3.5

1.2 3
MSE

MSE
1 2.5

0.8 2

0.6 1.5

0.4 1

0.2 0.5
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Model Order Model Order

(a) Standardised (b) Original

Figure 8: MSE of LSE for model orders p = {1, 2, ..., 10}

Fig. 8(a) shows that after order 2, the minimisation in MSE is small (comparing 0.2403 when p=10
and 0.2631 when p=2) for the standardised data, which is consistent comparing to the results from
Yule-Walker in Section 2.3, and Fig. 8(b) suggests that the optimal order for the original data is
3. We may suggest that using standardised data is a good method to reduce model complexity.

We then plot the model-based estimation of PSD for the standardised data. According to the
formula of PSD of an AR model, the nominator is the variance of the noise is equivalent to the
variance of the model residual, i.e. the interpolation MSE of the AR model with the original
dataset. By normalising the PSD using this variance, we then plot the model based PSD for orders
p = {1, 2, 10}.

LSE Model Based PSD Estimation

50
Original
AR(1)
40
AR(2)
AR(10)
30

0
0 0.1 0.2 0.3 0.4 0.5

Figure 9: PSD of AR models with LSE Estimated coefficients

The plot suggests that with AR(1), it shows information of the first peak well but no information
of the second peak. AR(2) is a suitable model order, as it well indicates the second peak, but
not as good when it fits the first peak. The AR(10) model shows more information on the first
peak than AR(2) but less information than AR(1), and shoots up high in the second peak. The
1
driving noise variance is estimated by σ 2 = var(x̂[n]) , where x̂[n] is a WGN∼ N (0, 1) filtered by the
corresponding AR model, considering the dataset is standardised. The following plot suggests that
Haobo Zhu; Full Coursework 27

MSE of ACF estimation also varies with data length. With the given AR(2) model, The MSE hits
at a minimum at N=25, where MSE=0.000621, then rises and oscillates between N ∈ [30, 150]. A
second minimum is hit when N=245, where MSE=0.000607. We would recommend a data length
of 245 is optimal, and any data lengths less than 20 or between 30 and 100 should be avoided.

10-3 MSE of ACF Estimation

0
0 50 100 150 200 250
Data Length

Figure 10: The change of MSE in ACF estimation with the data length

3.4 Spectrogram for time-frequency analysis:dial tone pad

Pattern of the waveform for 0 Pattern of the waveform for 2

2 2

0 0

-2 -2
0.24 0.242 0.244 0.246 0.248 0.25 0.74 0.742 0.744 0.746 0.748 0.75
time(s) time(s)

(a) Waveform for 0 (b) Waveform for 2

Figure 11: Waveforms of dial tone for 2 different digits

The dial tone of digits from 0 to 9 in the UK consists of 10 pairs of frequencies sinusoidal waves
superimposed with each over, and the frequencies vary from 697 Hz to 1477 Hz. Therefore, using
a sampling frequency of 32768 Hz is appropriate as it is more than 10 times beyond the Nyquist-
Shannon rate(2954 Hz, twice the highest frequency), and therefore there would be no aliasing in
the resulting sampled signal. The spectrogram can then be obtained by calculating FFT of the 21
segments with a Hanning window. It is clear that with the dialing interval, there are two peaks in
the frequency, which correspond to the two frequencies of the superimposing sine waves. For an
idle interval, the FFT is constantly 0.

The ten pairs of frequencies in the UK dial tone are

Given the generated sequence is 02063853985, we may clearly identify the ten pairs of peaks in the
frequency domain to identify the digits. For example it is clear that for digit 0, the spectrogram
peaks at 941 Hz and 1336 Hz, while for digit 9 the peaks are 852 Hz and 1447 Hz. For real life
Haobo Zhu; Full Coursework 28

Spectralgram of 21 segments in the dialing sequence

5 -20

-40
4

Power/frequency (dB/Hz)
-60

Time (s)
3
-80

2 -100

-120
1
-140

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Frequency (kHz)

Figure 12: Spectrogram of the dial sequence with segment length 0.25s

1209 Hz 1336 Hz 1447 Hz

697 Hz 1 2 3
770 Hz 4 5 6
852 Hz 7 8 9
941 Hz ∗ 0 #

Table 3: The UK dial tone pad

signals, the true dial tone is always superimposed with noise, and thus the Fourier transform of the
resulting signal becomes ideally

2

F ŷ[n] = F y[n] + w[n] = SY (f ) + σw . (8)

We then corrupt the signal with WGNs with standard deviations 0.5, 1 and 5. The patterns of the
corrupted copies are then

corrupted signal with noise std 0.5 corrupted signal with noise std 1 corrupted signal with noise std 5
2
10

0
0 0
-2
-10

-4 -5 -20
0.24 0.245 0.25 0.24 0.245 0.25 0.24 0.245 0.25
time(s) time(s) time(s)

Figure 13: Corrupted signals with noise std 0.5, 1 and 5

Fig.14 shows the spectrogram of the corrupted signal with σw = 0.5. The ten pairs of peaks can
be well detected. The background should be ideally constantly the noise variance, but it is not
constant here due to the windowing effect.

Fig.15 shows the corrupted signal with σw = 1. The peaks are still identifiable with this noise
variance, though the background offset is greater.

However, as Fig.16 suggests, the peaks in the frequency domain of the corrupted signal with σw = 5
Haobo Zhu; Full Coursework 29

Spectralgram of corrupted signal(noise std=0.5)

5 -20

Power/frequency (dB/Hz)
-40
4
-60

Time (s)
3
-80

2 -100
-120
1
-140
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Frequency (kHz)

Figure 14: Spectrogram of the corrupted signals with noise std 0.5

Spectralgram of corrupted signal(noise std=1)

Power/frequency (dB/Hz)
4 -50
Time (s)

2 -100

1
-150
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Frequency (kHz)

Figure 15: Spectrogram of the corrupted signals with noise std 1

Spectralgram of corrupted signal (noise std=5)

5 -20

Power/frequency (dB/Hz)
4 -40
-60
Time (s)

3
-80
2 -100
-120
1
-140
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Frequency (kHz)

Figure 16: Spectrogram of the corrupted signals with noise std 5

are hardly identifiable. This also implies the importance of filtering of the signal as a prepossessing
method rather than subtracting the noise variance from the frequency domain.

3.5 Real world signals: Respiratory sinus arrhythmia from RR-Intervals

It is shown in the previous sections that the variance of estimation in the periodogram can be
minimised by averaging windows of the same length in the dataset. For the three RRI trials, the
periodogram obtained directly from the original data and the averaged periodogram with window
size 200 appear difficult to determine the correct respiration frequency, with only RRI3 shows a
sensible peak.
Haobo Zhu; Full Coursework 30

PSD of original Data for RRI1 PSD of original Data for RRI2 PSD of original Data for RRI3
0.15 0.06 1.2

0.05 1
X 0.03125
0.1 0.04 0.8 Y 1.01296
0.03 0.6

0.05 0.02 0.4

0.01 0.2

0 0 0
0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6
Normalised frequency Normalised frequency Normalised frequency

Figure 17: Periodograms obtained directly from original datasets

10-3 Averaged PSD of original Data for RRI1 (200) 10-3 Averaged PSD of original Data for RRI2 (200)
9 8
Averaged PSD of original Data for RRI3 (200)
0.09
X 0.00505051
8 7 X 0.030303
Y 0.00873283 X 0.00505051
0.08 Y 0.0889468
Y 0.00730047
7 6
0.07
6
5 0.06

5
4 0.05

4
0.04
3
3
0.03
2
2 0.02
1
1 0.01

0 0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalised frequency Normalised frequency Normalised frequency

(a) RRI1 (b) RRI2 (c) RRI3

Figure 18: Averaged periodograms with window size 200 time points

10-3 Averaged PSD of original Data for RRI1 (100)

9 10-3 Averaged PSD of original Data for RRI2 (100) Averaged PSD of original Data for RRI3 (100)
4 0.1
8 X 0.0102041
X 0.0204082 0.09 X 0.0306122
3.5 Y 0.0039017
Y 0.00818605 Y 0.0954521
7 0.08
3
6 0.07
2.5
5 0.06

2 0.05
4
0.04
3 1.5
0.03
2 1
0.02
1 0.5 0.01

0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Normalised Frequency Normalised Frequency Normalised Frequency

(a) RRI1 (b) RRI2 (c) RRI3

Figure 19: Averaged periodograms with window size 100 time points

By setting the window length to 100 time points and averaging them accordingly, it is then possible
to determine clean peaks out of the periodograms for all of the three trials as shown by Fig.19.
The PSD estimates of the three trials show different respiratory frequencies as they peak at 0.01,
0.02, and 0.03 respectively, which suggest their dominant frequencye(i.e. the respiration frequency)
is then 0.02, 0.04 and 0.06 of the sampling frequency.
Haobo Zhu; Full Coursework 31

4 Optimal filtering - fixed and adaptive

4.1 Wiener filter

A group of optimal MA coefficients can be calculated by Wiener filter by

wopt = R−1
xx pzx (1)

where    
rxx (0) rxx (−1) ... rxx (−Nw ) rzx (0)
 rxx (1) rxx (0) . . . rxx (−Nw + 1)  rzx (−1)

Rxx =  , pzx =  (2)
   
.. .. .. .. ..

 . . . .   .

rxx (Nw ) rxx (Nw − 1) . . . rxx (0) rzx (−Nw )
where Nw is the order of the MA filter minus 1. . When fed by a WGN, the MA filter with
b = [1 2 3 2 1] and a = [1] will scale the variance of the input noise by 12 + 22 + 32 + 22 + 12 = 19
√
times, and thus its standard deviation by 19 ≈ 4.359 times. As due to the different time points
of a white noise are uncorrelated and a white noise is stationary, ideally

var(y[n]) = E{y[n]2 } = E{x[n]2 + (2x[n1])2 + (3x[n − 2])2 + (2x[n − 3])2 + x[n − 4]2 }
= var(x[n]) + 4 var(x[n]) + 9 var(x[n]) + 4 var(x[n]) + var(x[n]) (3)
= 19 var(x[n])
√
However, the scaling is not constantly 19 for the filtered signals, and thus the coefficients are de-
normalised by multiplying the empirical standard deviation of the filter output before normalisation.
The standardised signal is then corrupted by a WGN with σw = 0.1. Given the variance of the
σ2
original signal is then 10 log10 σ2y = 20 dB. The typical resulting filter coefficients are then
w
b = [1.0048 1.9860 3.0049 1.9881 1.0048]. The average cumulative square error (CSE) for 1000
different trials is 0.0029, which suggests a good estimation of the coefficients.

2 = {0.1, 1, 2, 5, 10}.
Then we conduct five different experiments with σw

Table 1: wopt for the signals corrupted by different additive noises

2
σw SNR(dB) b1 b2 b3 b4 b5 CSE(avg)
0.1 10 0.9825 2.0279 2.9505 2.0173 0.9377 0.0113
1 0 1.0896 2.1033 2.9492 2.2277 1.0353 0.0982
2 -3 1.4715 1.7197 3.1560 1.7209 1.0232 0.1898
5 -7 0.7177 1.9841 2.8272 0.3609 1.4751 0.4725
10 -10 0.9241 1.7518 2.8195 1.6740 1.6867 0.9325

In the table above, the coefficients are retrieved from a single estimation and the CSE is averaged
for the 1000 different realisations. As the variance of noise increases, it is clear that the cumulative
square error increases proportionally with σw 2 and the predictions get worse. The calculated w
opt
Haobo Zhu; Full Coursework 32

is ideally invariant if we use Nw greater than 4, w5 is effectively zero as the final term of pzw is
effectively zero due to since its index exceeds the order of the filter. The results for the first five
terms of wopt are effectively invariant due to the ACF of a white noise for lags other than 0 are
effectively zero, the results may be more erroneous as more terms are considered when calculating
R−1xx . The following are the results of average wopt for 100 experiments for Nw = {4, 5} and
2
σw = 0.01.

Table 2: wopt for the Nw = {4, 5}

b1 b2 b3 b4 b5 b6 CSE
Nw = 4 1.0162 1.9954 2.9479 1.9747 0.9882 — 0.0027
Nw = 5 1.0151 2.0198 2.9850 1.9722 0.9797 -0.0139 0.0030

The results in Table 2 agrees with our assumption, with the 6th term of wopt when Nw = 5 is
effectively 0. With Nw = 5, the error slightly higher than the results from Nw = 4, which agrees
with the assumption.

The estimated flops of calculating wopt can the be calculated. For calculating the ACF, we need
(N + 1)2 operations if we calculate the unbiased estimate of ACF, and the matrix multiplication
requires Nw (Nw −1) operations, along with we need O(Nw3 ) operations when calculating the inverse
of Rxx . Therefore, we may conclude that Wiener filter is computationally complex and may be slow
for certain practical usages, and thus adaptive filters are important as they increase the calculation
speed drastically with the cost of accuracy.

4.2 The least mean square (LMS) algorithm

By calculating the coefficients adaptively using the formula suggested in Eqn(39-41) in the course-
work manual for the SNR=20dB case, a typical estimated wopt is for µ = 0.01. A typical final
estimation of wopt is [1.0098 2.0018 2.9928 1.9880 0.9920] , which is close to the ideal value with a
CSE of 0.0015, but as the weights are constantly changing, the results are not as good before the
weights converge to their ideal values. A good adaptation gain is µ = 0.01, as with it the coefficients
converge after step 200 with minor fluctuations caused by the effect of additive noise(Fig.1). If the
adaptation gain is too low, the weights would not have time to converge to their ideal values, and
with too high adaptation gains the coefficients would constantly oscillate and have no chance to
converge either. For µ = 0.02, the coefficients begin to converge to the Wiener filter results after
approximately time step 100 as shown by Fig.2(a), but the oscillations are clearly greater than that
for µ = 0.01, which suggests the adaptation gain is slightly higher than optimum.
Haobo Zhu; Full Coursework 33

Coefficients evolution with LMS, mu=0.01 Squared error with LMS, mu=0.01
5 50
b1
b2
4 b3 40
b4
b5

3 30

2 20

1 10

0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
time step time step

Figure 1: Coefficients evolution and squared error for µ = 0.01

Coefficients evolution with LMS, mu=0.25

Coefficients evolution with LMS, mu=0.02 Coefficients evolution with LMS, mu=0.002 15
3.5 3
b1
b2
3 10
2.5 b3
b4
b5
2.5 5
2

2
1.5 0

1.5
1 -5
1
0.5 b1
-10
0.5 b1 b2
b2 b3
b3 0 b4 -15
0 b4
b5
b5
-0.5
-0.5 0 200 400 600 800 1000 -20
0 200 400 600 800 1000 0 200 400 600 800 1000
time step time step time step

(a) µ=0.02 (b) µ=0.002 (c) µ=0.25

Figure 2: Evolution of the five coefficients with different adaptive gains

The calculation of ŷ[n] requires Nw (Nw +1) operations, the calculation of error requires 1 operation,
and the update of the parameters requires 2Nw + 3 operations (Nw + 1 for the vector addition, 1
for the calculation of µe[n] and (Nw + 1 for the multiplication of a scalar with a vector). Therefore,
for the whole process with N time steps, we need N (Nw (Nw + 1) + 2Nw + 3) operations, which is
much less computationally complex than the Wiener filter.

4.3 Gear shifting

Although with µ = 0.01, the coefficients are able to converge to their true values, there are some
minor fluctuations and the rising time is not optimal. We then added an extra subroutine to
schedule the adaption rate as follows:

e[n] − e[n − 1] α N − n β

µ[n] = µ0 , ∀n ≥ 2 (4)
e[n − 1] − e[n − 2] N
With several trials and errors, we empirically selected µ0 = 0.05, α = 0.6, and β = 3. As shown by
Fig.3, the overall squared error is mitigated comparing to 1 with the maximum error not exceeding
25. The rising time is much shorter as the coefficients are converged to their true values after step
Haobo Zhu; Full Coursework 34

100 with little overshooting. The overall learning rate is decayed when time steps increase, and
little fluctuation is evident for n > 400. Although there are still minor spikes after convergence,
these are due to the effect of the additive noise and can cause oscillations if the second decaying
factor is not set. By setting the second decaying factor, overfitting is much mitigated.

Coefficients evolution with LMS, mu0=0.05 Squared error with LMS, mu0=0.05
5 25
b1
b2
4 b3 20
b4
b5
3 15

2 10

1 5

0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
time step time step

Figure 3: Evolution of the five coefficients with µ0 = 0.05 with gear shifting

4.4 Identification of AR processes

The coefficients of an AR process can also be identified by an LMS adaptive filter.

An AR(2) process can be defined as follows:

x[n] = a1 x[n − 1] + a2 x[n − 2] + w[n] (5)

By taking the Z-transform on both sides, we obtain

X(z) = a1 X(z)z −1 + a2 X(z)z −2 + W (z) (6)

Thus, the system function for an AR(2) model is just

X(z) 1
H(z) = = (7)
W (z) 1 − a1 z − a2 z −1
−1

which corresponds to a matlab filter with a = [1 − a1 − a2 ] and b = 0. For a filter with a =

[1 0.9 0.2], the corresponding a1 and a2 for the AR(2) representation are -0.9 and -0.2, which
should be the values of convergence of the LMS algorithm if the system is successfully built. The
T
coefficients may be found by performing x̂ = aT xdelayed , where xdelayed = x[n − 1], x[n − 2] , and

the errors and coefficients are therefore determined by the LMS algorithm.

The LMS algorithm is then tested with 4 different adaptive gains(0.01, 0.002, 0.0001, 0.1). The
input signal is a WGN with 10000 time points.
Haobo Zhu; Full Coursework 35

Evolution of AR coefficients over time

Evolution of AR coefficients over time 0.2
0.2
a1
a1 a2
a2 0
0

-0.2 -0.2

-0.4
-0.4

-0.6
-0.6
-0.8

-0.8
-1

-1.2 -1
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
time step time step

(a) µ=0.01 (b) µ=0.002

Evolution of AR coefficients over time

Evolution of AR coefficients over time 60
0.2
a1
40 a2
0 a1
a2
20
-0.2
0
-0.4
-20

-0.6
-40

-0.8 -60

-1 -80
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
time step time step

(c) µ=0.0001 (d) µ=0.1

Figure 4: Evolution of the AR coefficients with different adaptive gains

With µ = {0.01, 0.002}, the performances of the algorithm are similar with MSE w.r.t the syn-
thesised signal of 1.0283 and 1.0271. The predicted AR coefficients oscillates more with µ = 0.01.
With µ = 0.0001, the adaptive gain is too small to let the predictions converge to their true values,
and the MSE is thus much worse (1.3092). When µ = 0.1, the adaptive gain is too large and the
predictions diverge to arbitrary large values with MSE=25.2297.

4.5 Speech Recognition

The LMS algorithm can also be applied to determine the AR model corresponding to the pronun-
ciation of a letter. We investigate AR models corresponding to the pronunciation of ‘e’, ‘a’, ‘s’, ‘t’,
Haobo Zhu; Full Coursework 36

‘x’ in this section. The following table suggests sensible model orders and adaptive gains with and
without gear shifting.

without gear shifting with gear shifting

letter order µ MSE µ MSE
‘e’ 1 10 5.89e-05 50 5.33e-05
‘a’ 10 5 5.33e-05 10 1.62e-04
‘s’ 5 300 2.29e-05 800 2.59e-05
‘t’ 2 50 1.31e-04 100 1.42e-04
’x’ 10 2 2.22e-05 7 3.24e-04

Table 3: Assessment of LMS on recordings of different letters

We may then explain two examples in the chart in detail, ‘e’ and ‘a’. For letter e, a sensible setting
is AR(1) with µ = 10 without gear shifting and µ = 50 with gear shifting. The MSE is 5.89e-05
without gear shifting and 5.33e-05 with gear shifting.
x against xhat plotted against time coefficients evolution with time coefficients evolution with time, gear shifted
0.15 1.2 1.2

1 1
0.1

0.8 0.8
0.05
0.6 0.6
0
0.4 0.4

-0.05
0.2 0.2

-0.1 0 0
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
time step time step time step

(a) without gear shifting (b) with gear shifting

Figure 5: Prediction of letter ‘e’ and AR coefficients.

x against xhat plotted against time coefficients evolution with time coefficients evolution with time, gear shifted
0.15 0.6 0.4

0.1 0.3
0.4
0.05 0.2

0 0.2 0.1

-0.05 0
0
-0.1 -0.1

-0.15 -0.2 -0.2

0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
time step time step time step

(a) without gear shifting (b) with gear shifting

Figure 6: Prediction of letter ‘a’ and AR coefficients.

A typical well converging model for letter ‘a’ is AR(10) with µ = 5. The MSE of prediction is
6.95e-05. The five coefficients converge to constants after time step 300, and minor changes are
Haobo Zhu; Full Coursework 37

observable due to the non-stationary nature of the input signal. With gear shifting, µ is set to
10. The fluctuation in the predicted coefficients is much mitigated, but the MSE is increased to
1.62e-04. The extrapolation error may be reduced due to less over-fitting.

The correct filter length, i.e. model order, may be determined heuristically by assessing the vari-
ability of the coefficients in a given model order; if the coefficients vary too much, we may conclude
that the given model order is not robust and the coefficients must change with time to fit the
data. MDL, AIC and AICc criteria may also be applied to determine the optimal model order
analytically, where the MSE may be used as the error term.

We may also assess the performance of AR models of different orders with prediction gain given by
2
Rp = 10 log10 σσx2 . Here we investigate the models without gear shifting and with adaptive gains
e
suggested in Table 3.

Prediction Gain of different orders for e Prediction Gain of different orders for a Prediction Gain of different orders for s
6
50

0 5

0
-500
4

-1000
-50 3
-1500
2
-2000 -100
1
-2500

-3000 -150 0
1 2 3 4 5 6 0 5 10 15 20 25 30 0 5 10 15 20 25 30
order order order

(a) ‘e’ (b) ‘a’ (c) ‘s’

Prediction Gain of different orders for t Prediction Gain of different orders for x
20 16.5

0 16

-20 15.5

-40 15

-60 14.5

-80 14

-100 13.5
0 5 10 15 20 25 30 0 5 10 15 20 25 30
order order

(d) ‘t’ (e) ‘x’

Figure 7: Prediction gain of different model orders

The plots above suggest the sensible orders are well chosen. The prediction gain of ‘e’, ‘a’ and ‘t’
drop quickly after certain model orders, and before that Rp ’s remain constant and suggest no big
change by altering the model orders. However, such effect is not observed in the Rp ’s of ‘s’ and
‘x’, but it is shown by the plots that it is not worthy to increase the model orders for the marginal
increase in Rp .

For fs = 16000Hz, the selection of model order remain unchanged according to the plots of Rp , but
Rp ’s at these orders are generally decreased comparing to those in fs = 44100Hz. For example, the
Haobo Zhu; Full Coursework 38

prediction gain for ‘e’ is 23dB at order 1 when fs = 44100Hz, but is only 16.5 when fs = 16000Hz.
As we are sampling 1000 time points with different sampling frequencies, as the sampling frequency
decreases a longer time frame is considered. Therefore, the prediction results are more prone to the
non-stationarity caused by change in the pronunciation of the letter, for example changing from
[ k ] to [ s ] in letter ‘x’, and thus it is more possible not to converge.
Prediction Gain of different orders for e Prediction Gain of different orders for a Prediction Gain of different orders for s
19.5 21 10

19 5
20

18.5 0
19
18 -5
18
17.5 -10

17
17 -15

16.5 16 -20

16 15 -25
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
order order order

(a) ‘e’ (b) ‘a’ (c) ‘s’

Prediction Gain of different orders for t Prediction Gain of different orders for x
20 4.5

10
4
0

-10 3.5

-20
3

-30
2.5
-40

-50 2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
order order

(d) ‘t’ (e) ‘x’

Figure 8: Prediction gain of different model orders with fs = 16000Hz

4.6 Dealing with computational complexity: sign algorithms

In order to reduce computational complexity, the LMS algorithm may be further simplified to the
following forms where the exact contribution of the weights update by the error term is not explicitly
calculated. We may assess the following algorithms by using the data generated in Section 4.4 and
letter ‘e’ in Section 4.5.

signed − error (sign algorithm) w(n + 1) = w(n) + µsign(e[n])x(n)

signed − regressor w(n + 1) = w(n) + µe[n]sign(x(n)) (8)
sign − sign w(n + 1) = w(n) + µsign(e[n])sign(x(n))

The vanilla LMS algorithm, sign-error, signed-regressor and sign-sign are then plotted with red,
blue, yellow and magenta respectively. For the synthesised signal in Section 4.4, all algorithms
converge to the theoretical AR coefficients with comparable performances, and the vanilla LMS
Haobo Zhu; Full Coursework 39

algorithm slightly outperforms the others (MSE = {1.05, 1.06, 1.07, 1.09}). For letter ‘e’, the
adaptive gain for the four algorithms should be set to distinct values to produce decent results,
and are set to 50, 0.5, 2, 0.02 respectively. The MSE are also in the same order (1.64e-05, 2.73e-05,
1.33e-05, 1.89e-05). Although the result from the vanilla LMS is not the best when comparing the
MSE, the evolution of the AR coefficient involves less fluctuation.

AR coefficients with different sign algorithms

0.2 AR coefficients with different sign algorithms
1.2

0
1

-0.2
0.8

-0.4
0.6

-0.6
0.4

-0.8
0.2

-1 0
0 2000 4000 6000 8000 10000 0 200 400 600 800 1000
time step time step

(a) AR(2) model in 4.4 (b) ‘e’

Figure 9: Coefficients’ evolution of different sign algorithms

5 MLE for the Frequency of a Signal

T
As shown by the Coursework manual, the psd of a signal x = x[0], x[1], ..., x[N − 1] parametrised
by θ = [A, f0 , ϕ]T is given by
N −1
1 n 1 X 2
o
p(x; θ) = exp − (x[n] − A cos(2πf0 n + ϕ)) (1)
(2πσ 2 )N/2 2σ 2
n=0

The cost function that should be minimised for a maximum-likelihood estimate (MLE) is therefore
N −1
X 2
J(θ) = x[n] − A cos(2πf0 n + ϕ) (2)
n=0

In general, a sinusoidal function can be split into a sine and a cosine component with the same
frequency. Therefore,

A cos(2πf0 n + ϕ) = α1 cos(2πf0 n) + α2 sin(2πf0 n) (3)

By generalising this property to a vector form and define the cosine components as c = 1,
T
cos(2πf0 ), ..., cos(2πf0 (N − 1)) and the sine components as s = 0, sin(2πf0 ), ... sin(2πf0 (N −
Haobo Zhu; Full Coursework 40

T
1)) , the squared part in Eqn. (2) becomes
     
x[0] 1 0
 x[1]   cos(2πf0)
  sin(2πf0 )
 − α1   − α2  (4)
     
 .. .. .. 
 .   .   . 
x[N − 1] cos(2πf0 (N − 1)) sin(2πf0 (N − 1))
which therefore can be expressed in a matrix form as x − Hα where H = [c, s] and α = [α1 , α2 ]T .
As the squared sum can be expressed by a dot product w.r.t. itself of a vector, it is proven that
J(θ) may be mapped to
J ′ (α1 , α2 , f0 ) = (x − α1 c − α2 s)T (x − α1 c − α2 s) = (x − Hα)T (x − Hα) = J ′ (α, f0 ) (5)
The optimum estimator of the parameters, denoted by α̂, is given by α̂ = (HT H)−1 HT x. By
plugging α̂ into Eqn.(5), the minimum loss given by α is therefore
′
Jmin,α (f0 ) = (x − Hα̂)T (x − Hα̂)
= (x − H(HT H)−1 HT x)T (x − H(HT H)−1 HT x)
T
= xT x − 2xT H(HT H)−1 HT x + H(HT H)−1 HT x H(HT H)−1 HT x

(6)
T
= xT x − 2xT H(HT H)−1 HT x + xT H (HT H)−1 HT H(HT H)−1 HT x
= const. − xT H(HT H)−1 HT x
′
Therefore, to minimise Jmin,α (f0 ) by altering the frequency, we may maximise the non-constant
term in Eqn. (6), namely x H(HT H)−1 HT x as which is subtracted from the constant. ■
T

The term xT H(HT H)−1 HT x is quadratic in x and is scaled by H(HT H)−1 HT . As H = [c, s],
cT c cT s −1 cT

T T −1 T T

x H(H H) H x = x c s x (7)
sT c sT s sT
As H(HT H)−1 HT acts as a coefficient of x, we may plug in x into the matrices as xT HHT x =
xT ccT x + xT ssT x
cT c cT s −1 cT
T T T −1 T
c c cT s

T
c x c x
x c s x= T (8)
sT c sT s sT s x sT c sT s sT x
N
T 0
By the orthogonality properties of sinusoidal functions, H H = 2 N as cosine and sine func-
0 2
tions are orthogonal to each other the sums are 0, and the in phase terms are added up N times
with an average value of 12 . Thus,
T T N −1 T T T 2
0 cT x

c x 0 c x c x
xT H(HT H)−1 HT x = T 2 = N (9)
s x 0 N2 sT x sT x 0 N2 sT x
The optimum estimate of f0 can then be found by finding f0 that maximises
N −1 −1
2 NX
!
2 T T 2 X 2
(x cc x + xT ssT x) = x[n] cos(2πf0 n) + x[n] sin(2πf0 n)
N N
n=0 n=0
2 (10)
N −1
2 X
= x[n]e−j2πf0 n
N
n=0
Haobo Zhu; Full Coursework 41

as the cosine and sine terms can be treated as the real and imaginary parts of e−j2πf0 n . Comparing
PN −1 n 2
−j2πf N
Eqn.(10) with the formula that calculates PSD estimate P̂X (f ) = N1 n=0 x[n]e , the
f0 in Eqn.(10) suggests normalised frequencies that f0 ∈ [0, 1] , while the f term in the PSD
formula suggests data points, and the corresponding normalised frequency is given by Nf . Eqn.(10)
is therefore nothing else but the PSD formula with normalised frequency and scaled by a factor of
2 if the dataset is sufficiently large and the frequency is not close to 0 or 0.5. Therefore, it is shown
that the optimum frequency estimate fˆ0 can be found by maximising the periodogram. ■

MLE estimate for f0 = 0.25 MLE estimate for f0 = 0.4 MLE estimate for f0 = 0.495
5 5 10

4.5 4.5 9

4 4 8

3.5 3.5 7

3 3 6

2.5 2.5 5

2 2 4

1.5 1.5 3

1 1 2

0.5 0.5 1

0 0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalised Frequency Normalised Frequency Normalised Frequency

(a) f0 = 0.25 (b) f0 = 0.4 (c) f0 = 0.495

Figure 1: MLE estimates

Periodogram for f0=0.25 Periodogram for f0=0.4 Periodogram for f0=0.495

3 10
2.5

9
2.5
2 8

7
2

1.5 6

1.5 5

1 4
1
3

0.5 2
0.5
1

(a) f0 = 0.25 (b) f0 = 0.4 (c) f0 = 0.495

Figure 2: PSD estimates

From the plots of periodograms and MLE estimates, for f0 = 0.25 and f0 = 0.4, they are consistent
with each other, while the MLE estimates lose all information as it becomes flat after f = 0.48 for
f0 = 0.495. A theoretical justification is that if f0 approaches 0, s →
− 0, and similarly c →
− 0 as
T
f0 approaches 0.5. Therefore, the matrix H H becomes singular and thus not invertable, and the
MLE estimation therefore becomes meaningless as the result does not converge.

Fundamentals of Statistical Signal Processing - Estimation Theory-Kay
100% (1)
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay
303 pages
K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
100% (4)
K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
676 pages
Randomization in Matlab
No ratings yet
Randomization in Matlab
30 pages
Special Topic Project
No ratings yet
Special Topic Project
15 pages
Project Report
No ratings yet
Project Report
56 pages
SSPI Lecture 3 Estimation Intro 2025
No ratings yet
SSPI Lecture 3 Estimation Intro 2025
56 pages
5CTA0 Reader 1
No ratings yet
5CTA0 Reader 1
88 pages
Estimation and Detection Theory by Don H. Johnson
No ratings yet
Estimation and Detection Theory by Don H. Johnson
214 pages
Notes 12j686o
No ratings yet
Notes 12j686o
272 pages
Lecture 4
No ratings yet
Lecture 4
8 pages
Lab2 EE422
No ratings yet
Lab2 EE422
9 pages
SSPI Lecture 1 Slides 2025
No ratings yet
SSPI Lecture 1 Slides 2025
68 pages
Advanced Signal Processing Introduction To Estimation Theory
No ratings yet
Advanced Signal Processing Introduction To Estimation Theory
40 pages
Chapter 1 Introduction and Review
No ratings yet
Chapter 1 Introduction and Review
43 pages
Estimation Theory
No ratings yet
Estimation Theory
603 pages
OP01 Random Variables
No ratings yet
OP01 Random Variables
57 pages
Discrete Time Random Process
No ratings yet
Discrete Time Random Process
35 pages
Random Processes
No ratings yet
Random Processes
155 pages
Discrete-Time Random Signals
No ratings yet
Discrete-Time Random Signals
34 pages
6 Random Signal Analysis
No ratings yet
6 Random Signal Analysis
16 pages
Statistical Signal Processing PK Bora
No ratings yet
Statistical Signal Processing PK Bora
135 pages
Statistical Signal Processing
No ratings yet
Statistical Signal Processing
135 pages
Speech Signal Processing
No ratings yet
Speech Signal Processing
135 pages
lec4
No ratings yet
lec4
14 pages
Random Processes PDF
No ratings yet
Random Processes PDF
37 pages
05 ECE 3125 ECE 3242 - March 5 2012 - Review of Random Signals
No ratings yet
05 ECE 3125 ECE 3242 - March 5 2012 - Review of Random Signals
34 pages
Data Analysis Vade Me Cum
No ratings yet
Data Analysis Vade Me Cum
79 pages
DISCRETE-TIME RANDOM PROCESS Summary
No ratings yet
DISCRETE-TIME RANDOM PROCESS Summary
13 pages
Lecture 2-260
No ratings yet
Lecture 2-260
8 pages
A First Course in Statistics for Signal Analysis 1st Edition Wojbor A. Woyczynski pdf download
100% (1)
A First Course in Statistics for Signal Analysis 1st Edition Wojbor A. Woyczynski pdf download
53 pages
8094
No ratings yet
8094
56 pages
ADSP Unit 2
No ratings yet
ADSP Unit 2
29 pages
CourseNotesEE501 PDF
No ratings yet
CourseNotesEE501 PDF
231 pages
Chapter II. Random Signals
No ratings yet
Chapter II. Random Signals
31 pages
(Ebook) A First Course in Statistics for Signal Analysis by Wojbor A. Woyczynski ISBN 9780817643980, 9780817645168, 0817643982, 0817645160 - The ebook in PDF/DOCX format is ready for download now
100% (1)
(Ebook) A First Course in Statistics for Signal Analysis by Wojbor A. Woyczynski ISBN 9780817643980, 9780817645168, 0817643982, 0817645160 - The ebook in PDF/DOCX format is ready for download now
58 pages
EXPERIMENT-11:Noise Realization: 1 Aim/Practice Questions
No ratings yet
EXPERIMENT-11:Noise Realization: 1 Aim/Practice Questions
13 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Introduction of Statistical Signal Processing: Random Signals
No ratings yet
Introduction of Statistical Signal Processing: Random Signals
7 pages
2TF-N6
No ratings yet
2TF-N6
16 pages
EECE 522 Notes - 02 CH - 1 Intro To Estimation - 2
No ratings yet
EECE 522 Notes - 02 CH - 1 Intro To Estimation - 2
15 pages
AppendixA Probability and Statistics
No ratings yet
AppendixA Probability and Statistics
32 pages
Digital Signal Processing-Random Processes
100% (1)
Digital Signal Processing-Random Processes
121 pages
Complete Download A First Course in Statistics for Signal Analysis 1st Edition Wojbor A. Woyczynski PDF All Chapters
100% (15)
Complete Download A First Course in Statistics for Signal Analysis 1st Edition Wojbor A. Woyczynski PDF All Chapters
60 pages
Computational Statistics With Matlab
No ratings yet
Computational Statistics With Matlab
71 pages
ICS-2223-Lecture 12 Slides
No ratings yet
ICS-2223-Lecture 12 Slides
8 pages
Random Processes Sept 25 26 Oct 3 5
No ratings yet
Random Processes Sept 25 26 Oct 3 5
53 pages
Where can buy A First Course in Statistics for Signal Analysis 1st Edition Wojbor A. Woyczynski ebook with cheap price
100% (1)
Where can buy A First Course in Statistics for Signal Analysis 1st Edition Wojbor A. Woyczynski ebook with cheap price
51 pages
Achmad Nuru Fauzie - 196060300111007 - CHAPTER 4
No ratings yet
Achmad Nuru Fauzie - 196060300111007 - CHAPTER 4
24 pages
HUST Guest Lecture 2022
No ratings yet
HUST Guest Lecture 2022
142 pages
(eBook PDF) Probability, Statistics, and Random Signals by Charles Boncelet instant download
100% (2)
(eBook PDF) Probability, Statistics, and Random Signals by Charles Boncelet instant download
53 pages
(eBook PDF) Probability, Statistics, and Random Signals by Charles Bonceletinstant download
100% (6)
(eBook PDF) Probability, Statistics, and Random Signals by Charles Bonceletinstant download
49 pages
Maximum likelihood estimation
No ratings yet
Maximum likelihood estimation
5 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
Werdibaji
No ratings yet
Werdibaji
203 pages
Random Signals Detection Estimation and Data Analysis Shanmugan Breipohl 1988
80% (10)
Random Signals Detection Estimation and Data Analysis Shanmugan Breipohl 1988
340 pages
Exercises
No ratings yet
Exercises
27 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet