b14 SP Lect4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Lecture 4 - Spectral Estimation

The Discrete Fourier Transform

The Discrete Fourier Transform (DFT) is the equivalent of the continuous Fourier
Transform for signals known only at N instants separated by sample times T (i.e.
a finite sequence of data).
Let f (t) be the continuous signal which is the source of the data. Let N samples
be denoted f [0], f [1], f [2], . . . , f [k], . . . , f [N − 1].
The Fourier Transform of the original signal, f (t), would be
! ∞
F (jω) = f (t)e −jωt dt
−∞

We could regard each sample f [k] as an impulse having area f [k]. Then, since

74
the integrand exists only at the sample points:
! (N−1)T
F (jω) = f (t)e −jωt dt
o
= f [0]e −j0 + f [1]e −jωT + . . . + f [k]e −jωkT + . . . f (N − 1)e −jω(N−1)T
N−1
"
ie. F (jω) = f [k]e −jωkT
k=0
We could in principle evaluate this for any ω, but with only N data points to start
with, only N final outputs will be significant.
You may remember that the continuous Fourier transform could be evaluated
over a finite interval (usually the fundamental period To ) rather than from −∞ to
+∞ if the waveform was periodic. Similarly, since there are only a finite number
of input data points, the DFT treats the data as if it were periodic (i.e. f (N) to
f (2N − 1) is the same as f (0) to f (N − 1).)
Hence the sequence shown below in Fig. 4.1(a) is considered to be one period
of the periodic sequence in plot (b).

75
(a)
1

0.8

0.6

0.4

0.2

0
0 1 2 3 4 5 6 7 8 9 10 11

(b)
1

0.8

0.6

0.4

0.2

0
0 5 10 15 20 25 30

Figure 4.1: (a) Sequence of N = 10 samples. (b) implicit periodicity in DFT.

76
Since the operation treats the data as if it were periodic, we evaluate the
1
DFT equation for the fundamental frequency (one cycle per sequence, NT Hz,

NT rad/sec.) and its harmonics (not forgetting the d.c. component (or average)
at ω = 0).
2π 2π 2π 2π
i.e. set ω = 0, , × 2, . . . × n, . . . × (N − 1)
NT NT NT NT
or, in general
N−1
"
−j 2π
N nk
F [n] = f [k]e (n = 0 : N − 1)
k=0
F [n] is the Discrete Fourier Transform of the sequence f [k]. We may write this
equation in matrix form as:
⎛ ⎞
⎛ ⎞ 1 1 1 1 ... 1 ⎛ ⎞
F [0] ⎜1 W 2 3 N−1 ⎟ f [0]
⎜ F [1] ⎟ ⎜ W W . . . W ⎟ ⎜ f [1] ⎟
⎜ ⎟ ⎜ 2 4 6 N−2 ⎟ ⎜ ⎟
⎜ ⎟ ⎜1 W W W ... W ⎟⎜ ⎟
⎜ F [2] ⎟ = ⎜ 3 6 9 N−3 ⎟ ⎜ f [2] ⎟
⎜ .
.. ⎟ ⎜ 1 W W W . . . W ⎟ ⎜ .
.. ⎟
⎝ ⎠ ⎜ .. ⎟⎝ ⎠
⎝ . ⎠
F [N − 1] f [N − 1]
1 W N−1 W N−2 W N−3 . . . W
where W = exp(−j2π/N) and W = W 2N etc. = 1.
77
DFT – example

Let the continuous signal be

5 + 2 cos(2πt − 90o ) + )3 cos


f (t) = )*+, *+4πt,
) *+ ,
dc 1Hz 2Hz

10

−2

−4
0 1 2 3 4 5 6 7 8 9 10

Figure 4.2: Example signal for DFT.

Let us sample f (t) at 4 times per second (ie. fs = 4Hz) from t = 0 to t = 34 .

78
The values of the discrete samples are given by:

f [k] = 5 + 2cos( π2 k − 90o ) + 3cosπk by putting t = kTs = k


4

i.e. f [0] = 8, f [1] = 4, f [2] = 8, f [3] = 0, (N = 4)

3
" 3
"
−j π2 nk
Therefore F [n] = f [k]e = f [k](−j)nk
0 k=0
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
F [0] 1 1 1 1 f [0] 20
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ F [1] ⎟ ⎜ 1 −j −1 j ⎟ ⎜ f [1] ⎟ ⎜ −j4 ⎟
⎜ ⎟=⎜ ⎟⎜ ⎟=⎜ ⎟
⎝ F [2] ⎠ ⎝ 1 −1 1 −1 ⎠ ⎝ f [2] ⎠ ⎝ 12 ⎠
F [3] 1 j −1 −j f [3] j4

The magnitude of the DFT coefficients is shown below in Fig. 4.3.

79
20

15

|F[n]| 10

0
0 1 2 3
f (Hz)
Figure 4.3: DFT of four point sequence.

80
Inverse Discrete Fourier Transform

The inverse transform of


N−1
" 2π
F [n] = f [k]e −j N nk
k=0
is
N−1
1" +j 2π
f [k] = F [n]e N nk
N n=0
1
i.e. the inverse matrix is N times the complex conjugate of the original (symmet-
ric) matrix.

Note that the F [n] coefficients are complex. We can assume that the f [k] values
are real (this is the simplest case; there are situations (e.g. radar) in which two
inputs, at each k, are treated as a complex pair, since they are the outputs from
0 o and 90 o demodulators).

In the process of taking the inverse transform the terms F [n] and F [N − n]

81
(remember that the spectrum is symmetrical about N2 ) combine to produce 2
frequency components, only one of which is considered to be valid (the one at
N
the lower of the two frequencies, n × 2π
T Hz where n ≤ 2 ; the higher frequency
component is at an “aliasing frequency” (n > N2 )).

From the inverse transform formula, the contribution to f [k] of F [n] and F [N−n]
is:
1 j 2π nk j 2π
fn [k] = {F [n]e N + F [N − n]e N (N−n)k } (4.2)
N
N−1
" 2π
For allf [k] real, F [N − n] = f [k]e −j N (N−n)k
k=0
−j 2π
N (N−n)k −j2πk +j 2πn
N k +j 2π
N nk
But e = e) *+ , e =e
1 for all k
i.e. F [N − n] = F ∗(n) (i.e. the complex conjugate)

Substituting into the Equation for fn [k] above gives,


1 2π 2π
fn [k] = {F [n]e j N nk + F ∗(n)e −j N nk } since e j2πk = 1
N
82
2 2π 2π
ie. fn [k] = {Re{F [n]} cos nk − Im{F [n]} sin nk}
N N N
2 2π
or fn [k] = |F [n]| cos{( n)kT + arg(F [n])}
N NT
2πn 2
i.e. a sampled sinewave at NT Hz, of magnitude N |F [n]|.

-
For the special case of n = 0, F [0] = f [k] (i.e. sum of all samples) and the
contribution of F [0] to f [k] is f0 [k] = N1 F [0] = average of f [k] = d.c. compo-
nent.

83
Interpretation of example

1. F [0] = 20 implies a d.c. value of N1 F [0] = 20


4 = 5 (as expected)

2. F [1] = −j4 = F ∗[3] implies a fundamental component of peak amplitude


2
[1]| = 2 o
N |F 4 × 4 = 2 with phase given by arg F [1] = −90

2π π
i.e. 2 cos( kT − 90 o) = 2 cos( k − 90 o) (as expected)
NT 2

N
3. F [2] = 12 (n = 2 – no other N − n component here) and this implies a
component

1 2π 1
f2 [k] = F [2]e j N ·2k = F [2]e jπk = 3cosπk (as expected)
N 4
since sin πk = 0 for all k

Thus, the conventional way of displaying a spectrum is not as shown in Fig. 4.3
but as shown in Fig. 4.4 (obviously, the information content is the same):

84
6

|F[n]| 3 3/sqrt(2)

2 sqrt(2)

0
0 1 2 3
f (Hz)
Figure 4.4: DFT of four point signal.

85
In typical applications, N is much greater than 4; for example, for N = 1024, F [n]
has 1024 components, but 513 − 1023 are the complex conjugates of 511 − 1,
F [0] 2 |F√[1]| 2 |F [511]|
leaving 1024 as the d.c. component, 1024 2
to 1024

2
as complete a.c.
1 F [512]
components and 1024 √
2
as the cosine-only component at the highest distin-
guishable frequency (n = N2 ).
2
Most computer programmes evaluate |FN[n]| (or |F [n]|
N for the power spectral den-
sity) which gives the correct “shape” for the spectrum, except for the values at
n = 0 and N2 .

86
4.1 Discrete Fourier Transform Errors

To what degree does the DFT approximate the Fourier transform of the function
underlying the data? Clearly the DFT is only an approximation since it provides
only for a finite set of frequencies. But how correct are these discrete values
themselves? There are two main types of DFT errors: aliasing and “leakage”:

4.1.1 Aliasing

This is another manifestation of the phenomenon which we have now encoun-


tered several times. If the initial samples are not sufficiently closely spaced to
represent high-frequency components present in the underlying function, then
the DFT values will be corrupted by aliasing. As before, the solution is either
to increase the sampling rate (if possible) or to pre-filter the signal in order to
minimise its high-frequency spectral content.

4.1.2 Leakage

Recall that the continuous Fourier transform of a periodic waveform requires the
integration to be performed over the interval - ∞ to +∞ or over an integer
87
number of cycles of the waveform. If we attempt to complete the DFT over
a non-integer number of cycles of the input signal, then we might expect the
transform to be corrupted in some way. This is indeed the case, as will now be
shown.
Consider the case of an input signal which is a sinusoid with a fractional number
of cycles in the N data samples. The DFT for this case (for n = 0 to n = N2 ) is
shown below in 4.5.
8

|F[n]| 6

0
0 2 4 6 8
freq

Figure 4.5: Leakage.

88
We might have expected the DFT to give an output at just the quantised fre-
quencies either side of the true frequency. This certainly does happen but we
also find non-zero outputs at all other frequencies. This smearing effect, which
is known as leakage, arises because we are effectively calculating the Fourier se-
ries for the waveform in Fig. 4.6, which has major discontinuities, hence other
frequency components.
1

0.5

−0.5

−1
0 5 10 15 20 25 30 35 40 45 50

Figure 4.6: Leakage. The repeating waveform has discontinuities.

Most sequences of real data are much more complicated than the sinusoidal
sequences that we have so far considered and so it will not be possible to avoid
introducing discontinuities when using a finite number of points from the sequence
in order to calculate the DFT.
The solution is to use one of the window functions which we encountered in the
89
design of FIR filters (e.g. the Hamming or Hanning windows). These window
functions taper the samples towards zero values at both endpoints, and so there
is no discontinuity (or very little, in the case of the Hanning window) with a
hypothetical next period. Hence the leakage of spectral content away from its
correct location is much reduced, as in Fig 4.7.
(a) (b)
7 5

6
4
5

4 3

3 2
2
1
1

0 0
0 2 4 6 8 0 2 4 6 8

Figure 4.7: Leakage is reduced using a Hanning window.

90
Stochastic Models
4.2 Introduction

We now discuss autocorrelation and autoregressive processes; that is, the corre-
lation between successive values of a time series and the linear relations between
them. We also show how these models can be used for spectral estimation.

4.3 Autocorrelation

Given a time series xt we can produce a lagged version of the time series xt−T
which lags the original by T samples. We can then calculate the covariance
between the two signals
N
1 "
σx x (T ) = (xt−T − µx )(xt − µx ) (4.3)
N − 1 t=1

where µx is the signal mean and there are N samples. We can then plot σx x (T )
as a function of T . This is known as the autocovariance function. The autocor-

91
relation function is a normalised version of the autocovariance
σx x (T )
rx x (T ) = (4.4)
σx x (0)
Note that σx x (0) = σx2 . We also have rx x (0) = 1. Also, because σx y = σy x we
have rx x (T ) = rx x (−T ); the autocorrelation (and autocovariance) are symmetric
functions or even functions. Figure 4.8 shows a signal and a lagged version of it
and Figure 4.9 shows the autocorrelation function.

92
6

0
0 20 40 60 80 100
t

Figure 4.8: Signal xt (top) and xt+5 (bottom). The bottom trace leads the top trace by 5 samples. Or we
may say it lags the top by -5 samples.

93
1

0.5

−0.5
−100 −50 0 50 100
(a) Lag

Figure 4.9: Autocorrelation function for xt . Notice the negative correlation at lag 20 and positive correlation
at lag 40. Can you see from Figure 4.8 why these should occur?

94
4.4 Autoregressive models

An autoregressive (AR) model predicts the value of a time series from previous
values. A pth order AR model is defined as
p
"
xt = xt−i ai + et (4.5)
i=1

where ai are the AR coefficients and et is the prediction error. These errors
are assumed to be Gaussian with zero-mean and variance σe2. It is also possible
to include an extra parameter a0 to soak up the mean value of the time series.
Alternatively, we can first subtract the mean from the data and then apply the
zero-mean AR model described above. We would also subtract any trend from
the data (such as a linear or exponential increase) as the AR model assumes
stationarity.
The above expression shows the relation for a single time step. To show the
relation for all time steps we can use matrix notation.
We can write the AR model in matrix form by making use of the embedding
matrix, M, and by writing the signal and AR coefficients as vectors. We now

95
illustrate this for p = 4. This gives
⎡ ⎤
x4 x3 x2 x1
⎢ ⎥
⎢x x4 x3 x2
M = ⎢ 5

⎥ (4.6)
⎣ .. .. .. .. ⎦
xN−1 xN−2 xN−3 xN−4
We can also write the AR coefficients as a vector a = [a1, a2, a3, a4]T , the
errors as a vector e = [e5 , e6, ..., eN ]T and the signal itself as a vector X =
[x5, x6, ..., xN ]T . This gives
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
x5 x4 x3 x2 x1 a1 e5
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ x6 ⎥ ⎢ x5 x4 x3 x 2 ⎥ ⎢ a 2 ⎥ ⎢ e6 ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥ + ⎢ ⎥ (4.7)
⎣ .. ⎦ ⎣ .. .. .. .. ⎦ ⎣ a3 ⎦ ⎣ .. ⎦
xN xN−1 xN−2 xN−3 xN−4 a4 eN
which can be compactly written as
X = Ma + e (4.8)
The AR model is therefore a special case of the multivariate regression model.
The AR coefficients can therefore be computed from the equation
â = (MT M)−1 MT X (4.9)
96
The AR predictions can then be computed as the vector
X̂ = Mâ (4.10)
and the error vector is then e = X − X̂. The variance of the noise is then
calculated as the variance of the error vector.
To illustrate this process we analyse our data set using an AR(4) model. The
AR coefficients were estimated to be
â = [1.46, −1.08, 0.60, −0.186]T (4.11)
and the AR predictions are shown in Figure 4.10. The noise variance was esti-
mated to be σe2 = 0.079 which corresponds to a standard deviation of 0.28. The
variance of the original time series was 0.3882 giving a signal to noise ratio of
(0.3882 − 0.079)/0.079 = 3.93.

97
1.5

0.5

−0.5

−1

−1.5
0 20 40 60 80 100
(a) t
1.5

0.5

−0.5

−1

−1.5
0 20 40 60 80 100
(b) 98 t

Figure 4.10: (a) Original signal (solid line), X, and predictions (dotted line), X̂, from an AR(4) model and (b)
the prediction errors, e. Notice that the variance of the errors is much less than that of the original signal.
4.4.1 Relation to autocorrelation

The autoregressive model can be written as


xt = a1xt−1 + a2xt−2 + ... + ap xt−p + et (4.12)
If we multiply both sides by xt−k we get
xt xt−k = a1xt−1xt−k + a2xt−2xt−k + ... + ap xt−p xt−k + et xt−k (4.13)
If we now sum over t and divide by N − 1 and assume that the signal is zero
mean (if it isn’t we can easily make it so, just by subtracting the mean value from
every sample) the above equation can be re-written in terms of covariances at
different lags
σx x (k) = a1σx x (k − 1) + a2σx x (k − 2) + ... + ap σx x (k − p) + σe,x (4.14)
where the last term σe,x is the covariance between the noise and the signal. But
as the noise is assumed to be independent from the signal σe,x = 0. If we now
divide every term by the signal variance we get a relation between the correlations
at different lags
rx x (k) = a1rx x (k − 1) + a2rx x (k − 2) + ... + ap rx x (k − p) (4.15)

99
This holds for all lags. For an AR(p) model we can write this relation out for
the first p lags. For p = 4
⎡ ⎤ ⎡ ⎤⎡ ⎤
rx x (1) rx x (0) rx x (−1) rx x (−2) rx x (−3) a1
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ rx x (2) ⎥ ⎢ rx x (1) rx x (0) rx x (−1) rx x (−2) ⎥ ⎢ a2 ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥ (4.16)
⎣ rx x (3) ⎦ ⎣ rx x (2) rx x (1) rx x (0) rx x (−1) ⎦ ⎣ a3 ⎦
rx x (4) rx x (3) rx x (2) rx x (1) rx x (0) a4
which can be compactly written as
r = Ra (4.17)
where r is the autocorrelation vector and R is the autocorrelation matrix. The
above equations are known, after their discoverers, as the Yule-Walker relations.
They provide another way to estimate AR coefficients
a = R−1 r (4.18)
This leads to a more efficient algorithm than the general method for multivari-
ate linear regression (equation 4.9) because we can exploit the structure in the
autocorrelation matrix. By noting that rx x (k) = rx x (−k) we can rewrite the

100
correlation matrix as
⎡ ⎤
1 rx x (1) rx x (2) rx x (3)
⎢ ⎥
⎢ r (1) 1 rx x (1) rx x (2) ⎥
R = ⎢ xx ⎥ (4.19)
⎣ rx x (2) rx x (1) 1 rx x (1) ⎦
rx x (3) rx x (2) rx x (1) 1

Because this matrix is both symmetric and a Toeplitz matrix (the terms along
any diagonal are the same) we can use a recursive estimation technique known
as the Levinson-Durbin algorithm.

4.5 Moving Average Models

A Moving Average (MA) model of order q is defined as


q
"
xt = bi et−i (4.20)
i=0

where et is Gaussian random noise with zero mean and variance σe2. They are a
type of FIR filter. These can be combined with AR models to get Autoregressive

101
Moving Average (ARMA) models
"p q
"
xt = ai xt−i + bi et−i (4.21)
i=1 i=0
which can be described as an ARMA(p,q) model. They are a type of IIR filter.
Usually, however, FIR and IIR filters have a set of fixed coefficients which
have been chosen to give the filter particular frequency characteristics. In MA or
ARMA modelling the coefficients are tuned to a particular time series so as to
capture the spectral characteristics of the underlying process.

4.6 Spectral Estimation using AR models

Autoregressive models can also be used for spectral estimation. An AR(p) model
predicts the next value in a time series as a linear combination of the p previous
values p
"
xt = − ak xt−k + et (4.22)
k=1
where ak are the AR coefficients and et is IID Gaussian noise with zero mean and
variance σ 2 . Note the sudden -ve, this is arbitrary and only added to make terms
in the spectral equations (as below) +ve.
102
The above equation can be solved by using the z -transform. This allows the
equation to be written as
4 p
5
"
X(z ) 1 + ak z −k = E(z ) (4.23)
k=1
It can then be rewritten for X(z) as
E(z )
X(z ) = - (4.24)
1 + pk=1 ak z −k
Taking z = exp(iωTs ) where ω is frequency and Ts is the sampling period we
can see that the frequency domain characteristics of an AR model are given by
(and noting that the power in the noise process is σe2Ts )
σe2 Ts
P (ω) = - (4.25)
|1 + pk=1 ak exp(−ikωTs )|2
An AR(p) model can provide spectral estimates with p/2 peaks; therefore if
you know how many peaks you’re looking for in the spectrum you can define the
AR model order. Alternatively, AR model order estimation methods should auto-
matically provide the appropriate level of smoothing of the estimated spectrum.
AR spectral estimation has two distinct advantages over methods based on
the Fourier transform (i) power can be estimated over a continuous range of
103
35 25

30
20
25

15
20

15
10

10
5
5

0 0
(a) 0 10 20 30 40 50 60 70 (b) 0 10 20 30 40 50 60 70

Figure 4.11: Power spectral estimates of two sinwaves in additive noise using (a) Discrete Fourier transform
method and (b) autoregressive spectral estimation.

frequencies (not just at fixed intervals) and (ii) the power estimates have less
variance.

104

You might also like