0% found this document useful (0 votes)
207 views24 pages

FFT For Experimentalists

The inverse transform reverses the process, converting frequency data into time-domain data. The fast Fourier transform (fft) provides an efficient algorithm for implementing the DFT. The Cooley-Tukey algorithm makes The FFT extremely useful by reducing the number of computations.

Uploaded by

Mara Felipe
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views24 pages

FFT For Experimentalists

The inverse transform reverses the process, converting frequency data into time-domain data. The fast Fourier transform (fft) provides an efficient algorithm for implementing the DFT. The Cooley-Tukey algorithm makes The FFT extremely useful by reducing the number of computations.

Uploaded by

Mara Felipe
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

domain.

The inverse transform reverses the process, con-


verting frequency data into time-domain data. Such trans-
formations can be applied in a wide variety of fields, from
geophysics to astronomy, from the analysis of sound signals
to CO
2
concentrations in the atmosphere. Over the course
of three articles, our goal is to provide a convenient sum-
mary that the experimental practitioner will nd useful. In
the first two parts of this article, well discuss concepts as-
sociated with the fast Fourier transform (FFT), an imple-
mentation of the DFT. In the third part, well analyze two
applications: a bat chirp and atmospheric sea-level pressure
differences in the Pacic Ocean.
The FFT provides an efcient algorithm for implement-
ing the DFT and, as such, well focus on it. This transform
is easily executed; indeed, almost every available mathe-
matical software package includes it as a built-in function.
Some books are devoted solely to the FFT,
13
while others
on signal processing,
46
time series,
7, 8
or numerical meth-
ods
9,10
include major sections on Fourier analysis and the
FFT. We draw together here some of the basic elements
that users need to apply and interpret the FFT and its in-
verse (IFFT). We will avoid descriptions of the Fourier ma-
trix, which lies at the heart of the DFT process,
11
and the
parsing of the Cooley-Tukey algorithm
12
(or any of several
other comparable algorithms), which provides a means for
transforming the discrete into the fast Fourier transform.
The Cooley-Tukey algorithm makes the FFT extremely
useful by reducing the number of computations from some-
thing on the order of n
2
to n log(n), which obviously pro-
vides an enormous reduction in computation time. Its so
useful, in fact, that the FFT made Computing in Science &
Engineerings list of the top 10 algorithms in an article that
noted the algorithm is, perhaps, the most ubiquitous algo-
rithm in use today.
13
The interlaced decomposition
method used in the Cooley-Tukey algorithm can be applied
to other orthogonal transformations such as the Hadamard,
Hartley, and Haar. However, in this article, we concentrate
on the FFTs application and interpretation.
Fundamental Elements
As a rule, data to be transformed consists of N uniformly
spaced points x
j
= x(t
j
), where N= 2
n
with n an integer, and t
j
= j t where j ranges from 0 to N 1. (Some FFT imple-
mentations dont require that Nbe a power of 2. This num-
ber of points is, however, optimal for the algorithms
execution speed.) Even though any given data set is unlikely
to have the number of its data points precisely equal to 2
n
, zero
padding (which we describe in more detail in the next section)
provides a means to achieve this number of samples without
losing information. As an additional restriction, we limit our
discussions to real valued time series as most data streams are
real. When the time-domain data are real, the values of the
amplitude or power spectra at any negative frequency are the
same as those at the corresponding positive frequency. Thus,
if the time series is real, one half of the 2
n
frequencies contain
all the frequency information. In typical representations, the
frequency domain contains N/2 + 1 samples.
The FFTs kernel is a sum of complex exponentials. As-
sociated with this process are conventions for normaliza-
tion, sign, and range. Here, we present what we consider to
be good practice, but our choices are not universal. Users
should always check the conventions of their particular soft-
ware choice so they can properly interpret the computed
transforms and related spectra.
Equation 1 shows some simple relationships between pa-
rameters such as t, the sampling time interval; f, the spac-
ing in the frequency domain; N, the number of samples in
the time domain; and f
j
, the Fourier frequencies. The num-
ber of samples per cycle (spc) for a particular frequency
component with period T in the time domain and (in some
cases) the total number of cycles (nc) in the data record for
a particular frequency component are two other pieces of
information that are useful because they remind us of the
adequacy of the sampling rate or the data sample. Some re-
80 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING
THE FAST FOURIER TRANSFORMFOR
EXPERIMENTALISTS, PART I: CONCEPTS
By Denis Donnelly and Bert Rust
T
HE DISCRETE FOURIER TRANSFORM (DFT)
PROVIDES A MEANS FOR TRANSFORMING
DATA SAMPLED IN THE TIME DOMAIN TO AN EX-
PRESSION OF THIS DATA IN THE FREQUENCY
Editor: Denis Donnelly, [email protected]
EDUCATION
E D U C A T I O N
MARCH/APRIL 2005 81
lations between these parameters are
f = and f
j
= j f,
where j = 0, ..., N/2
spc = , nc = = = . (1)
The period T represents only one frequency, but, as we
discuss later, there must be more than 2 spc for the highest
frequency component of the sampled signal. This band-
width-limiting frequency is called the Nyquist frequency and
is equal to half the sampling frequency. The spacing in the
frequency domain f is the inverse of the total time sampled,
so time and frequency resolution cant both be simultane-
ously improved. Thus, the maximum frequency represented
is f

N/2 = 1/(2

t), or the Nyquist frequency.


We can express the transform in several ways. A com-
monly used form is the following (with i = ):
, k = N/2, , 1, 0, 1, ,
N/2 1, (2)
where x
j
represents the time-domain data and X
k
their rep-
resentation in the frequency domain.
We express the IFFT as
, j = 0, 1, , N 1. (3)
The FFT replicates periodically on the frequency axis with
a period of 1/t; consequently, X(f
N/2
) = X(f
N/2
) so that the
transform is dened at both ends of the closed interval from
1/(2t) to + 1/(2t). This interval is sometimes called the
Nyquist band.
Some FFT and IFFT implementations use different nor-
malizations or sign conventions. For example, some imple-
mentations place the factor 1/N in the FFT conversion
rather than with the IFFT. Some place 1/ in both con-
version processes, and some reverse the signs in the expo-
nentials of the transforms; this sign change reverses the sign
of the phase component. Moreover, some implementations
take the range for k from 0, , N/2.
Because Equations 2 and 3 represent the frequency and
time domains of the same signal, the energy in the two cases
must be the same. Parsevals relation expresses this equality.
For real data, we can express the relation as
, (4)
where X = fft(x). The last term on the right-hand side is not
usually separated from the sum as it is here; we do this be-
cause there should be only Nterms to consider in both sum-
mations, not Nin one and N+ 1 in the other. Recall that be-
cause were dealing with real valued data, we can exploit a
symmetry and present the frequency data only from 0 to
N/2; this symmetry is the source of the factor of two associ-
ated with the summation. Unlike the other terms, the +N/2
frequency value isnt independent and was assigned, as noted
earlier, to the value at N/2. Should the +N/2 term be in-
cluded in the sum, we would, in effect, double count the
term, so we pull the N/2 term from the sum to avoid this. Of
course, if Nis large, this difference is likely to be minimal.
There are two common ways to display an FFT. One is
the amplitude spectrum, which presents the magnitudes of
the FFTs complex values as a function of frequency:
, k = N/2, , 1, 0, 1, , N/2. (5)
Given the symmetry of real time series, the standard presen-
tation restricts the range of k to positive values: k = 0, 1, ,
N/2. An equally common way to represent the transform is
with a power spectrum (or periodogram), which is dened as
, k = 0, 1, , N/2. (6)
However, neither of these spectral representations is uni-
versal. For example, some conventions place a 1 in the nu-
merator instead of a 2 for the amplitude spectrum. The pe-
riodogram is sometimes represented with a factor of 2 in the
numerator instead of 1 or as the individual terms expressed
in Parsevals relation (Equation 4).
In Figure 1, as an example of the FFT process, we show
the amplitude spectrum of a single-frequency sine wave with
two different sampling intervals. In one case, the interval t
is chosen to make nc integral, and in the other, nonintegral.
If nc is integral, f is necessarily a multiple of f, and one
point of the transform is associated with the true frequency
(see the circles in Figure 1a). However, in any FFT applica-
tion, were dealing with a finite-length time series. The
process of restricting the data in the time domain (multi-
plying the data by one over the range where we wish to keep
P
N
X
k k

1 2
A
N
X
k k

2
x
N
X X X
i j N
j
N
i
N
2
0
2
2
2
2
1
2 1
0
1
2 + +
j
(
,
,
\
,
(
(

/
/ 11

N
x
N
X i
k
N
j
j k
k N
N

j
(
,
\
,
(

1
2
2
2 1
exp
/
/

X x i
j
N
k
k j
j
N

j
(
,
\
,
(

exp 2
0
1

1
f
f
1
T f
N
spc
T
t
1
N t
82 COMPUTING IN SCIENCE & ENGINEERING
the data and multiplying by zero elsewherean example of
windowing, discussed later) introduces sidelobes in the fre-
quency domain. These sidelobes are called leakage.
Even though theres leakage, because theres only one fre-
quency associated with the transformed sine wave, we might
expect to be able to estimate that frequency with a weighted
average of all the points in the frequency domain. Such an
average, however, wouldnt yield the correct frequency.
In general, the FFT process generates complex values in
the frequency domain from the real values in the time do-
main. If we transform sine or cosine waves where we consider
an integral number of cycles, the transform magnitudes are
identical. However, in the frequency domain, a sine curve is
represented only with imaginary values and a cosine curve
only with real values. When the number of cycles is noninte-
gral or if there is a phase shift, then both real and imaginary
parts appear in the transform of both the sine and cosine.
Zero Padding
Zero padding is a commonly used technique associated with
FFTs. Two frequent uses are to make the number of data
points in the time-domain sample a power of two and to im-
prove interpolation in the transformed domain (for exam-
ple, zero pad in the time domain, improve interpolation in
the frequency domain).
Zero padding, as the name implies, means appending a
string of zeros to the data. It doesnt make any difference if
the zeros are appended at the end (the typical procedure), at
the beginning, or split between the beginning and end of the
data sets time domain. One very common use of this process
is to extend time-series data so that the number of samples
becomes a power of two, making the conversion process
more efcient or, with some software, simply possible. Be-
cause the spacing of data in the frequency domain is in-
versely proportional to the number of samples in the time
domain, by increasing the number of sampleseven if their
values are zerothe resulting frequency spectrum will con-
tain more data points for the same frequency range. Conse-
quently, the zero-padded transform contains more data
points than the unpadded; as a result, the overall process acts
as a frequency interpolating function. The resulting, more
detailed picture in the frequency space might indicate un-
expected detail (see, for example, Figure 2). As the number
of zeros increases, the FFT better represents the time series
continuous Fourier transform (CFT).
As we noted earlier, zero padding introduces more points
into the same frequency range and provides interpolation
between points associated with the unpadded case. When
data points are more closely spaced, clearly, theres a possi-
bility that unnoticed detail could be revealed (such as Fig-
ure 1a shows). In Figure 2, we see the effect of quadrupling
the number of points for two different cases. The transforms
of the zero-padded data contain the same information as the
unpadded data, and every fourth point of the padded data
matches the corresponding unpadded data point. The in-
termediate points provide interpolation.
E D U C A T I O N
(a) (b)
0.0
0.5
1.0
0 0.5 1 1.5 2
0.0
0.5
1.0
Frequency
0 0.5 1 1.5 2
Frequency
A
m
p
l
i
t
u
d
e
A
m
p
l
i
t
u
d
e
Figure 1. Amplitude spectra of a single-frequency sine wave. Two representations of a sine wave of frequency 0.5 are shown in each
part of the gure. In each case, the circles are based on a time series where the number of sample points N = 32 but the time step is
slightly different: (a) Nt = 8, so nc = 4; (b) Nt = 7.04, so nc = 3.52, where nc is the total number of cycles. The solid lines provide a
view of these same spectra with zero padding. This form is closer to what would be expected from a continuous rather than a
discrete Fourier transform. The zero-padded examples reveal detail that might not have been expected, given the appearance of the
unpadded case.
MARCH/APRIL 2005 83
In Figure 2, we see an application of that interpolating
ability when we consider a signal consisting of two closely
lying frequencies. In Figure 2a, although the envelope is
more clearly drawn, zero padding does not have the power
to resolve the two frequencies associated with this case. In
Figure 2b, the peaks are sufciently separated so that the in-
terpolation reveals the two peaks, whereas the unpadded
data seemingly did not. This example reminds us that a
graphical representation connecting adjacent data points
with straight lines can be misleading.
Zero padding can also be performed in the frequency
domain. The inverse transform results in an increase in the
number of data points in the time domain, which could be
useful in interpolating between samples (see Figure 3).
Zero padding is also used in association with convolution
or correlation and with filter kernels, which we discuss
later in this article.
Aliasing
When performing an FFT, its necessary to be aware of the
frequency range composing the signal so that we sample the
signal more than twice per cycle of the highest frequency as-
sociated with the signal. In practice, this might mean ltering
the signals to block any signal components with a frequency
above the Nyquist frequency (2

t
sample
)
1
before perform-
ing a transform. If we dont restrict the signal in this way,
higher frequencies will not be adequately sampled and will
masquerade as lower-frequency signals. This effect is similar
to what moviegoers experience when the onscreen wheels of
a moving vehicle seemingly freeze or rotate in the wrong di-
rection. The camera, which operates at the sampling rate of
24 frames per second, only has a Nyquist limit of 12 Hz; any
higher frequencies present will appear as lower frequencies.
Lets assume that we can readily observe a point on a wheel
(not at the center) thats rotating but not translating. At a slow
rotation rate, each successive frame of our lm shows the ob-
servable point advancing from the previous frame. (The frac-
tion of a complete rotation and the sampling rate are related;
the number of samples per rotation is the inverse of the frac-
tion of a rotation per sample.) As the rotation rate increases,
A
m
p
l
i
t
u
d
e
A
m
p
l
i
t
u
d
e
(b) (a)
0 1 2 3 4
0.0
0.5
1.0
Frequency
0 1 2 3 4
0.0
0.5
1.0
Frequency
Figure 2. The effect of zero padding on the transform of a signal containing two different frequencies. We look at two cases: one in
which the two frequencies are too close to be clearly resolved, and one in which resolution is possible. (a) Fast Fourier transforms
(FFTs) of the sum of two sine waves of amplitude 1 and frequencies of 1 and 1.3 Hz; the frequencies arent resolved, and (b) FFTs of
the sum of two sine waves of amplitude 1 and frequencies of 1 and 1.35 Hz; the frequencies are resolved. The solid curves are
transforms of zero-padded data and include four times as many samples as the transforms of the unpadded data (dotted curves).
Because the zero-padded curve has four times as many data points as the unpadded case (N = 32), every fourth point of the zero-
padded data is the same as the unpadded data. Zero-padded results provide better interpolation and more detail.
0 20 40 60 80 100 120 140
0.6
0.4
0.2
0.0
0.2
0.4
0.6
Figure 3. The effect of zero padding in the frequency domain
on the time-domain data. The frequency data (the unpadded
case in Figure 2a) was zero-padded to four times its original
length. We show the original unpadded time-domain data
(boxes) and the inverse fast Fourier transform of the zero-
padded frequency data (dots). The padding process again acts
as an interpolation function.
84 COMPUTING IN SCIENCE & ENGINEERING
E D U C A T I O N
the angle between our observed point in successive frames in-
creases. When the angle reaches 180 degrees, or two samples
per rotation, the perceived rotation rate is at its maximum
the wheel is rotating at the Nyquist frequency.
When passing through the Nyquist limit, as the frequency
goes from f
Ny
to f
Ny
+ (where << f
Ny
), the rotation di-
rection appears to change from forward to reverse while the
rotation rate remains the same. Further increases in the ro-
tation rate make the wheel appear to continue rotating in a
reversed direction but at a decreasing rate. When the actual
rotation rate is twice the Nyquist frequency, the apparent
rotation rate is zero and the sampling rate is just once per
rotation. (Another example of one sample per rotation and
an apparent zero rotation rate is to use a stroboscope to de-
termine an objects rotation rate. With one flash per rota-
tion, the rotating object appears at rest and the ash rate and
rotation rate are equal.) If the frequency of rotation contin-
ues to increase, the wheel will again appear to rotate in the
original rotation direction.
To make this more concrete, consider two constant rota-
tion rates, one of 170degrees between successive frames/sam-
ples and one of 190 degrees. We observe only the current po-
sition in each frame, so as we compute a value sequence, we
take them mod(360). If we compute values for the 170-de-
gree case, we obtain 0, 170, 340, 150, 320, 130, and so on. If
we compute values for the 190-degree case, we get 0, 190,
20, 210, 40, 230, and so on, but we wouldnt see the 190-
degree rotation. We dont observe an increase greater than
180 degrees (for angles greater than that, the data is under-
sampled). For the 190-degree case, we would see a 170-
degree step, but with the rotation in the opposite direction.
To consider a reverse rotation, we subtract the forward
rotation angle from 360. The result is the magnitude of the
angle of rotation in the reverse direction. For example, a
forward rotation angle of 350 degrees is equivalent to a 10-
degree step in the reverse direction. So for our 190-degree
case, the numbers become 0, 360 190 = 170, 360 20 =
340, 360 210 = 150, and so on. Table 1 provides a sum-
mary. The magnitudes of these rotation angles are identi-
cal to the 170-degree data. Thus, we would see the 190-
degree case as equivalent to the 170-degree case in terms of
rotation rate, but with the rotation direction reversed. The
graph in Figure 4 helps demonstrate this kind of behavior.
In the example shown in Figure 4, the Nyquist frequency
is 8 Hz. Frequencies associated with the rst leg of the saw-
tooth curve have more than two samples per cycle, and the
apparent and actual frequencies are equal. Once the actual
frequency exceeds the Nyquist frequency, the apparent fre-
quency begins to decrease, with the negative slope corre-
sponding to a reversed rotation direction. At 16 Hz, with
one sample per rotation, the apparent frequency is zero.
With further increases in the true frequency, the apparent
frequency once again increases.
If we take the FFT of three amplitude 1 cosine waves hav-
ing frequencies of 3.5, 12.5, and 19.5 Hz and where we set
N= 16 and t = 1/N (so the Nyquist frequency is 8 Hz), we
get identical FFTs, one of which is shown in Figure 5. The
number of samples per cycle for these frequencies is 4.57,
1.28, and 0.82. Only the lowest frequency is adequately rep-
resented; the two higher-frequency cases have fewer than
Table 1. Actual and apparent angles for 170
o
and 190
o
rotations.
Angle sequence for 170 step Angle sequence for 190 step Apparent angle
sequence for 190
0
steps with
rotation direction reversed.*
0 0 0
170 190 170
340 20 340
150 210 150
320 40 320
130 230 130
*Magnitudes of reverse angles are given by 360 column 2.
0 5 10 15 20 25
True frequency
A
p
p
a
r
e
n
t

f
r
e
q
u
e
n
c
y
0
5
10
Figure 4. Apparent frequency as a function of the true
frequency. Frequencies greater than the Nyquist frequency fold
back into the allowed frequency range and appear as lower
frequencies. In this example, where the Nyquist frequency is 8
Hz, an actual frequency of 9 Hz would appear as 7 Hz.
MARCH/APRIL 2005 85
two samples per cycle and consequently masquerade as
lower frequencies, appearing in the allowed range between
0 Hz and the Nyquist frequency. For the example with the
three different frequencies, we purposely selected the higher
frequencies so that their FFTs would be identical to that of
the lowest frequency. Referring to Figure 4, we note that the
frequencies 12.5 and 19.5 Hz would appear on the second
and third legs of the sawtooth curve. The apparent
frequency of the 12.5-Hz line is 8 (12.5 8); the apparent
frequency of the 19.5-line is 19.5 2 8. In general, the out-
of-range frequency f
true
would appear as f
apparent
as given by
, (7)
where k = 1, 2, , and k is selected to bring f
apparent
within
the range 0 f
Ny
.
In Figure 6, we see the actual curves that correspond to
the three frequencies and the points where sampling occurs.
If we performed an FFT followed by an IFFT for any one
of the three curves (given the sampling specied), the algo-
rithm would return the same result in each case, which,
without other information, would be interpreted as the
lowest-frequency case.
If the magnitudes of the Fourier coefficients approach
zero (roughly as 1/f ) as the frequency approaches the
Nyquist frequency (a zero between lobes would not qualify),
then there is a good likelihood that aliasing has not occurred.
If it isnt zero, we can consider the possibility that it has oc-
curred. However, a nonzero value doesnt imply that alias-
ing has necessarily happened. The Fourier coefficients in
Figure 5 dont go to zero even in the adequately sampled
case. Zero padding of this example will show a great deal
more detail, but the transform is still nonzero at the Nyquist
frequency.
Relation to Fourier Series
There is a direct connection between the real and imaginary
parts of the frequency information from an FFT and the co-
efcients in a Fourier series that would represent the corre-
sponding time-domain signal. As we noted earlier, for the
conditions stated, the transform of a single-frequency sine
wave is imaginary, whereas the transform of a single-
frequency cosine wave is real. So, in a Fourier series of the
time-domain signal, we would expect the real parts of the fre-
quency information to be associated with cosine series and
the imaginary parts with sine series. This is, in fact, the case.
An equation for recreating the original signal as a Fourier
series from the frequency information is
. (8)
For the case N= 2
n
, a
k
represents the real part of the trans-
formed signal, b
k
the imaginary part, nt the number of terms
to be included in the series (where nt < N/2), and f the spac-
ing in the frequency domain.
An alternate form in terms of magnitude and phase is also
possible. Given that
, (9)
where h
k
= a
k
+ib
k
and the H
j
are the magnitudes of h
j
, the
series is given by

j
j
j
lm h
h

j
(
,
\
,
(

tan
( )
Re( )
1
S t
a
a k ft b k ft
k k
k
( )
+ ( ) ( )
+ ( )
,

]
]
0
2
2 2 cos sin

,
,
]
]
]
]

1
2
nt
N
f f k f f
k
t
apparent true Nyquist true

( )

2
0 2 4 6 8
0.0
0.5
1.0
Frequency
A
m
p
l
i
t
u
d
e
Figure 5. The FFT of a 3.5 Hz, amplitude one cosine wave
where N = 16 and t = 1/N (represented by circles). The FFTs
of the frequencies 3.5 Hz, 12.5 Hz, and 19.5 Hz are identical
for the case when the Nyquist frequency is 8 Hz. The solid
curve shows the transform with zero padding.
0 0.05 0.1 0.15 0.2 0.25
Time
1
0
1
Figure 6. A view of the sampling of three cosine curves. Cosine
curves with frequencies 3.5 Hz, 12.5 Hz, and 19.5 Hz are
shown, with the marked points representing those at which
sampling occurs (t = 1/N and N=16). Only the lowest-
frequency curve is adequately sampled, with more than two
samples per cycle. In this case, the FFT for each curve would
indicate a signal with a frequency of 3.5 Hz. For clarity, we
show only the rst ve samples.
86 COMPUTING IN SCIENCE & ENGINEERING
. (10)
In Figure 7, we see the square wave signal (one cycle of a
square wave that ranges between 0 and 1 with equal times
high and low) to be transformed as well as the signal con-
structed from the rst 10 terms of a Fourier series using the
coefcients from the FFT as per Equation 8. We would ob-
tain an identical waveform if we took the IFFT of a trunca-
tion of the original FFT, where all the FFTs coefficients
with an index greater than the number of desired terms
(here, nt = 10) are set to zero.
Windows
Windows are useful for extracting and/or smoothing data.
A window is typically a positive, smooth symmetric function
that has a value of one at its maximum and approaches zero
at the extremes. (A window might have a discontinuity in its
first derivative, giving it an inverted V shapesuch a win-
dow is sometimes referred to as a tentor two disconti-
nuities for a rectangular or trapezoidal shape.) We apply
windows by multiplying time-domain data by the window
function. Of course, whenever a window is applied, it alters
at least some of the data.
Smoothing windows, for example, reduce the amplitude
of the time-domain data at both the beginning and the end
of the windowed data set. One effect of this smoothing is to
reduce leakage in the frequency domain. In Figure 8, we
show comparative plots of four frequently used windows.
We show the effect of applying three of those windows to a
sine wave sequence in Figure 9.
Lets look at the expressions for four common windows:
Rectangular:
Hamming: hamw
i
= 0.54 0.46 cos(2 i/N)
Hann: hanw
i
= 0.5 0.5 cos(2 i/N)
Blackman: blkw
i
= 0.42 0.5 cos(2 i/N) + 0.08
cos(4 i/N). (11)
The Hamming and Hann windows differ in only one pa-
rameter: if the corresponding coefcients are written (1
), then is 0.54 for the Hamming window and 0.5 for the
Hann. The fact that a slight change in the parameter value
gives rise to two different windows hints at the sensitivity of
the windowing process to the value of . If decreases from
0.5, the side lobes increase signicantly in amplitude. As
increases from 0.5 to 0.54, the relative sizes of the side lobes
change. The rst set of the Hann side lobes tend to be sig-
nicantly larger than those of the Hamming case, but sub-
sequent Hann side lobes decrease rapidly in magnitude and
become signicantly smaller than the Hamming side lobes.
As to general appearance, the Hamming window doesnt
quite go to zero at the windows endpoints whereas the Rec-
tangular, Hann, and Blackman windows do. Several other
windows also exist, including Bartlett (tent function), Welch
(parabolic), Parzen (piece-wise cubic), Lanczos (central lobe
recw
inside
outside
i

1
0
( )
( )
S t
a
H k ft
k k
k
nt
( ) + ( ) ( )
,

,
,
]
]
]
]

0
1
2
2
2
cos
NN
E D U C A T I O N
0 0.2 0.4 0.6 0.8 1
0.5
0.0
0.5
1.0
1.5
Time
S
i
g
n
a
l
/
F
Figure 7. A comparison of the original time-domain signal
and its partial reconstruction as a Fourier series. The original
signal (dotted curve) and the rst 10 terms of a Fourier series
(solid curve) computed using coefcients from the original
signals FFT.
0.0
0.5
1.0
Time
W
i
n
d
o
w

a
m
p
l
i
t
u
d
e
Figure 8. The shapes of four different windows. From the
side, we see a rectangular (red), Hamming (blue), Hann
(green), and Blackman (magenta), respectively. Well apply
three of these windows to a sine wave sequence in Figure 9.
MARCH/APRIL 2005 87
of a sine function), Gaussian, and Kaiser (which uses a mod-
ied Bessel function).
Each of these windows has particular characteristics. Two
particularly useful points of comparison in the frequency space
are the full width at half maximum of the central peak and the
relative magnitude of central peak to that of the side lobes. An
unwindowed signals FFT has the narrowest central peak, but
it also has considerable leakage that decays slowly. The curves
for the Hamming and Blackman cases show wider central
peaks but signicantly smaller side lobes. The Blackman win-
dow has the largest peak height to rst sidelobe height ratio.
There is no nal summary statement that says you should
use window x in all casescircumstances decide that. In the
bat-chirp analysis well examine in part two of this series,
well use an isosceles trapezoidal window. Such a window isnt
generally recommended, but for the bat-chirp case, its the
best choice. (A split cosine bell curve, a Hann window shape
for the beginning and end of the curve with a magnitude of
one in the interior, would give essentially the same results.)
As an example of windowings effect on the transform, we
apply a Blackman window to the time-domain data associ-
ated with Figure 1b. Two effects of applying this window, as
Figure 10 shows, are that the leakage is greatly reduced and
that the central peak is broadened. Obtaining the needed
detail to observe these features requires zero padding.
I
n part two of this series, well discuss auto-regression
spectral analysis and the maximum entropy method, con-
volution, and filtering. In the third and final installment,
well present some applications, including the analysis of a
bat chirp and atmospheric sea-level pressure variations in
the Pacic Ocean.
Whether there is an interest in CO
2
concentrations in the
atmosphere, ozone levels, sunspot numbers, variable star
magnitudes, the price of pork, or nancial markets, or if the
interest is in ltering, correlations, or convolutions, Fourier
transforms provide a very powerful and, for many, an essen-
tial algorithmic tool.
References
1. R.N. Bracewell, The Fourier Transform and Its Applications, McGraw-Hill,
1965.
2. E.O. Brigham, The Fast Fourier Transform and Its Applications, Prentice-
Hall, 1988.
3. J.F. James, A Students Guide to Fourier Transforms, Cambridge Univ.
Press, 1995.
4. C.T. Chen, Digital Signal Processing, Oxford Univ. Press, 2001.
5. S.L. Marple Jr., Digital Spectral Analysis with Applications, Prentice Hall, 1987.
6. S. Smith, Digital Signal Processing, Newnes, 2003.
7. P. Bloomeld, Fourier Analysis of Time Series, John Wiley & Sons, 2000.
8. P. Hertz and E.D. Feigelson, A Sample of Astronomical Time Series,
Applications of Time Series Analysis in Astronomy and Meteorology, T.
Subba Rao, M.B. Priestley, and O. Lessi, eds., Chapman & Hall, 1979,
pp. 340356.
9. W.H. Press et al., Numerical Recipes in Fortran, Cambridge Univ. Press,
1992.
10. L.N. Trefethen, Spectral Methods in Matlab, SIAM Press, 2000.
1
0
1
Time
A
m
p
l
i
t
u
d
e
Figure 9. A comparison of the effects (from left to right) of a
rectangular, a Hamming, and a Blackman window on a sine
wave sequence. For convenience of display, we compute the
three examples separately, shift the second and third in time,
and sum the set, with the effect that the three examples
appear sequentially in time; because each example is zero
outside its window zone, the results do not interfere. The
three windows have the same width, but as Figure 8 shows,
the Blackman window increases in magnitude more slowly
than the others, and we can observe the effect on the sine
wave signal. The difference between Hamming and Blackman
windowing is also evident.
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
0
0.5
1
Frequency
A
m
p
l
i
t
u
d
e
Figure 10. The effects of windowing as seen in the transform
space. The FFT of the 3.52-cycle example in Figure 1 and the
result of multiplying time-domain data and a Blackman
window before taking the FFT are shown without zero padding
(circles) and with zero padding (solid curves). The windowed
form reduces leakage but has a broader central lobe.
11. C.D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM Press,
2000.
12. J.W. Cooley and J.W. Tukey, An Algorithm for the Machine Calculation
of Complex Fourier Series, Mathematics of Computation, vol. 19, no.
90, 1965, pp. 297301.
13. D.N. Rockmore, The FFT: An Algorithm the Whole Family Can Use,
Computing in Science & Eng., vol. 2, no. 1, 2000, pp. 6064.
Denis Donnelly is a professor of physics at Siena College. His research
interests include computer modeling and electronics. Donnelly received
a PhD in physics from the University of Michigan. He is a member of the
American Physical Society, the American Association of Physics Teachers,
and the American Association for the Advancement of Science. Contact
him at [email protected].
Bert Rust is a mathematician at the National Institute for Standards and
Technology. His research interests include ill-posed problems, time-se-
ries modeling, nonlinear regression, and observational cosmology. Rust
received a PhD in astronomy from the University of Illinois. He is a mem-
ber of SIAM and the American Astronomical Society. Contact him at
[email protected].
88 COMPUTING IN SCIENCE & ENGINEERING
E D U C A T I O N
Mid Atlantic
(product/recruitment)
Dawn Becker
Phone: +1 732 772 0160
Fax: +1 732 772 0161
Email:
[email protected]
New England (product)
Jody Estabrook
Phone: +1 978 244 0192
Fax: +1 978 244 0103
Email:
[email protected]
New England
(recruitment)
Robert Zwick
Phone: +1 212 419 7765
Fax: +1 212 419 7570
Email: [email protected]
Connecticut (product)
Stan Greenfield
Phone: +1 203 938 2418
Fax: +1 203 938 3211
Email:
[email protected]
Midwest/Southwest
(recruitment)
Darcy Giovingo
Phone: +1 847 498-4520
Fax: +1 847 498-5911
Email:
[email protected]
Midwest (product)
Dave Jones
Phone: +1 708 442 5633
Fax: +1 708 442 7620
Email:
[email protected]
Will Hamilton
Phone: +1 269 381 2156
Fax: +1 269 381 2556
Email:
[email protected]
Joe DiNardo
Phone: +1 440 248 2456
Fax: +1 440 248 2594
Email:
[email protected]
Southeast
(product)
Bob Doran
Phone: +1 770 587 9421
Fax: +1 770 587 9501
Email:
[email protected]
Southeast (recruitment)
Thomas M. Flynn
Phone: +1 770 645 2944
Fax: +1 770 993 4423
Email:
[email protected]
Southwest (product)
Josh Mayer
Phone: +1 972 423 5507
Fax: +1 972 423 6858
Email: josh.mayer@wagen
eckassociates.com
Northwest (product)
Peter D. Scott
Phone: +1 415 421-7950
Fax: +1 415 398-4156
Email:
[email protected]
Southern CA (product)
Marshall Rubin
Phone: +1 818 888 2407
Fax: +1 818 888 4907
Email:
[email protected]
Northwest/Southern CA
(recruitment)
Tim Matteson
Phone: +1 310 836 4064
Fax: +1 310 836 4067
Email:
[email protected]
Japan
(product/recruitment)
Tim Matteson
Phone: +1 310 836 4064
Fax: +1 310 836 4067
Email:
[email protected]
Europe
(product/recruitment)
Hilary Turnbull
Phone: +44 1875 825700
Fax: +44 1875 825701
Email:
[email protected]
A D V E R T I S E R / P R O D U C T I N D E X
M A R C H / A P R I L 2 0 0 5
DE Shaw & Company 7
John Wiley Cover 2
Nanotech 2005 Cover 4
North Carolina Central University 4
Boldface denotes advertisements in this issue.
Advertising Personnel
Advertiser Page Number
Marion Delaney
IEEE Media,
Advertising Director
Phone: +1 212 419 7766
Fax: +1 212 419 7589
Email:
[email protected]
Marian Anderson
Advertising Coordinator
Phone: +1 714 821 8380
Fax: +1 714 821 4010
Email:
[email protected]
Sandy Brown
IEEE Computer Society,
Business Development
Manager
Phone: +1 714 821 8380
Fax: +1 714 821 4010
Email:
[email protected]
Advertising Sales Representatives
Submissions: Send one PDF copy of articles and/or proposals to Norman Chonacky, Editor
in Chief, [email protected]. Submissions should not exceed 6,000 words and 15 refer-
ences. All submissions are subject to editing for clarity, style, and space.
Editorial: Unless otherwise stated, bylined articles and departments, as well as product
and service descriptions, reect the authors or rms opinion. Inclusion in CiSE does not
necessarily constitute endorsement by the IEEE, the AIP, or the IEEE Computer Society.
Circulation: Computing in Science & Engineering (ISSN 1521-9615) is published
bimonthly by the AIP and the IEEE Computer Society. IEEE Headquarters, Three Park Ave.,
17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Ofce, 10662
Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714 821
8380; IEEE Computer Society Headquarters, 1730 Massachusetts Ave. NW, Washington,
DC 20036-1903; AIP Circulation and Fulllment Department, 1NO1, 2 Huntington
Quadrangle, Melville, NY 11747-4502. Annual subscription rates for 2005: $42 for Com-
puter Society members (print only) and $42 for AIP society members (print plus online).
For more information on other subscription prices, see www.computer.org/subscribe/ or
https://www.aip.org/forms/journal_catalog/order_form_fs.html. Computer Society back
issues cost $20 for members, $96 for nonmembers; AIP back issues cost $22 for members.
Postmaster: Send undelivered copies and address changes to Computing in Science &
Engineering, 445 Hoes Ln., Piscataway, NJ 08855. Periodicals postage paid at New York,
NY, and at additional mailing ofces. Canadian GST #125634188. Canada Post
Corporation (Canadian distribution) publications mail agreement number 40013885.
Return undeliverable Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8
Canada. Printed in the USA.
Copyright & reprint permission: Abstracting is permitted with credit to the source.
Libraries are permitted to photocopy beyond the limits of US copyright law for private use
of patrons those articles that carry a code at the bottom of the rst page, provided the per-
copy fee indicated in the code is paid through the Copyright Clearance Center, 222
Rosewood Dr., Danvers, MA 01923. For other copying, reprint, or republication
permission, write to Copyright and Permissions Dept., IEEE Publications Administration,
445 Hoes Ln., PO Box 1331, Piscataway, NJ 08855-1331. Copyright 2005 by the
Institute of Electrical and Electronics Engineers Inc. All rights reserved.
seems so simple and transparent: the software takes care of
the computations, and its easy to create the plots. But once
they start probing, students quickly learn that like any rich
scientic expression, the implications, the range of applica-
bility, and the associated multilevel understandings needed
to fully appreciate the subtleties involved take them far be-
yond the basics. Even professionals nd surprises when per-
forming such computations, becoming aware of details that
they might not have fully appreciated until they asked more
sophisticated questions.
In the first of this five-part series,
1
we discussed several
basic properties of the FFT. In addition to some funda-
mental elements, we treated zero-padding, aliasing, and the
relationship to a Fourier series, and ended with an intro-
duction to windowing. In this article, well briefly look at
the convolution process.
Convolution
Convolution, a process some would say lies at the heart of
digital signal processing, involves two functions, which well
call x(t) and h(t), where x(t), for example, could be an input
signal and h(t) some linear systems impulse response. When
convolved, , they yield an output function y(t). The
process expresses the amount of one functions overlap as it
is shifted over the other, providing a kind of blending of the
two functions:
y(t) = x(t) h(t). (1)
This process has many applications. Filtering is one exam-
ple: given the appropriate impulse response, we can create
any one of a number of lters. Well give some examples in
the next section, but well postpone further information
about filtering and detrending until the next installment.
Correlation is another closely related process and can help
determine if a particular signal occurs in another datastream.
Deconvolution is the reverse: in effect, it uses the
process itself to remove the effects of an undesired convo-
lution or data distortion. When taking data, a convolution
can obscure the desired information, perhaps due to in-
terfering physical interactions or by the detection system
itself (which has its own response). A gamma ray arriving
at a detector, for example, has a well-defined energy, yet
the detector output shows several associated effects related
to the interaction of the gamma ray with a crystal. If a nu-
clear physicist is interested in the gamma rays energy or
intensity instead of the detectors response, then he or she
needs to know how to extract the appropriate information
from this much larger signal set. Deconvolving can remove
the detector response, restoring the data to a form closer
to the original.
When noise accompanies a signal, as it always does to
some extent, a direct deconvolution can generate unstable
results, which renders the process unusable. One way to re-
duce the noises influence is to assume that analytic func-
tions can represent either (or both) the original signal and
the convoluted signal. When such a representation is pos-
sible, the chances of success with the deconvolution process
greatly improve. Still, deconvolution is beyond the scope of
this series, so we wont discuss it here.
The continuous convolution is dened as
y(t) = x(t) h(t)
= x()h(t )d = x(t )h()d. (2)
In his book on the FFT, E. Oran Brigham states that Pos-
sibly the most important and powerful tool in modern sci-
entic analysis is the relationship between [Equation 2] and
its Fourier transform.
2
The relationship referred to is the
timeconvolution theorem:
F{x(t) h(t)} = F{x(t)} F{h(t)} = X(f )H(f ), (3)

92 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING
THE FAST FOURIER TRANSFORMFOR
EXPERIMENTALISTS, PART II: CONVOLUTIONS
By Denis Donnelly and Bert Rust
W
HEN UNDERGRADUATE STUDENTS
FIRST COMPUTE A FAST FOURIER
TRANSFORM (FFT), THEIR INITIAL IMPRESSION IS
OFTEN A BIT MISLEADING. THE PROCESS ALL
Editor: Denis Donnelly, [email protected]
EDUCATION
E D U C A T I O N
JULY/AUGUST 2005 93
where denotes ordinary multiplication, and X( f ) and
H( f ) are the continuous Fourier transforms of x(t) and
h(t).
In real life, we seldom have access to the functions x(t) and
h(t); instead, we have only nite time-series representations,
such as
x
k
= x(k t)
and
h
k
= h(k t), k = 0, 1, 2, , N 1. (4)
Given this discrete representation, we cant compute y(t) ex-
actly, but we can compute a time-series approximation to it.
Specically, we can write an expression for the discrete con-
volution as
n = 0, 1, 2, , N 1. (5)
If the response function were the trivial example in which
h
0
has the value 1 and all other h values are 0, then the con-
volution process would just reproduce the input signal (if h
0
differed from 1, it would scale the input signal proportion-
ally to h
0
). If all hs were 0 except for h
m
, then we would scale
the input signal by the magnitude of h
m
and delay it by m
sample intervals. The convolution process is the summation
of such elements.
Its important to keep two details in mind when per-
forming a convolution process: one, the two signals must
have the same number of elements (zero-padding easily
solves this problem), and two, the discrete convolution
theorem treats the data as if it were periodic. We can ex-
press the summation associated with this circular convolu-
tion as
. (6)
This cyclic effect causes a wraparound problem that well
explain in more detail later.
The FFT form of the convolution of two time series is
given by
x h = ifft(fft(x) fft(h)), (7)
where the product of the two transforms is element by el-
ement and ifft stands for inverse FFT. (While were dis-
cussing convolution in the time domain and multiplication
in the frequency domain, we should mention that an in-
terchange of roles is also possible. Multiplication in the
time domain corresponds to convolution in the frequency
domain.)
We can readily program the summation required to com-
pute a convolution: as the number of data points increases,
the computational advantage goes to the convolutions im-
plementation with FFT, even though it requires several
steps. The reason is that a convolution in the time domain
requires N
2
multiplications whereas the computational cost
of taking the FFT route is on the order of 3N log
2
(N) mul-
tiplications. Despite the fact that three steps are involved,
for large N, the advantages of the FFT approach are unmis-
takable. Even for the very modest case of N = 250, using
FFTs to compute a convolution is already more than 10
times faster than the time-domain computation.
One way to implement the summation shown in Equation
6 is by expressing the equation itself in matrix form. Create
an N N matrix in which the first column takes on the x-
values from x
0
to x
N1
. Let the next column take on the same
x-values but shifted down one row, with the last value be-
coming the rst, and repeat this rolling procedure for each
successive column. Multiplying this x-matrix by the h-vector
yields a circular convolution. We get a linear convolution from
this same multiplication if we set all the terms in the x-matrix
above the diagonal to zero.
To avoid the wraparound pitfall, we could do one of two
things: compute the linear convolution (setting all elements
above the x-matrixs diagonal to zero) or zero-pad the func-
tions so that the total number of data points is at least N
0
+
K
0
1, where N
0
and K
0
are the original numbers of data
points in the functions x and h. With this number of ele-
ments, we avoid any distortion due to wraparound:
. (8)
Examples
As an example of a linear convolution calculation, consider
the signals
y
y
y
y
x x x x
x
N
N N 0
1
2
1
0 1 2 1
1

=
xx x x
x x x x
x x x x
N
N N N
0 1 2
2 1 0 3
1 2 3 0

h
h
h
h
N
0
1
2
1

x n k N h k
k
N
( ) mod ( )
[ ]
=

0
1
y t x h t x h
n k n k n k k
k
N
k
N
= =

=


1
1
1
1
94 COMPUTING IN SCIENCE & ENGINEERING
(9)
and
(10)
which we discretize to have 32 equally spaced points on the
interval [0,1].
Figure 1 shows the signal, the impulse response, and the
associated continuous and discrete convolutions. The dis-
crete convolution as computed by taking the IFFT of the
product of the FFTs of x and h is identical to that obtained
via matrix multiplication.
Figure 2 shows the wraparound associated with the circu-
lar convolution example. The convolution is altered for the
number of nonzero data points in h.
In Figure 3, we show the FFTs of the linear and circular
convolutions. The FFT of the convolution resulting from
the matrix multiplication is the same as the product of x and
hs FFTs. In the figure, we can see some frequency depen-
dence associated with the convolution process. Figure 4 gives
an overall summary of the operations and their interrelation.
For a more realistic example of convolution, lets look at
the propagation of an acoustic pressure wave through a rec-
tangular waveguide. The waveguides resonant conditions
restrict the wave numbers of the transverse wave compo-
nents to discrete values, and the wave propagates only in cer-
tain modes. If we treat the waveguide as a linear device with
an impulse response h, then we can predict the form of the
transmitted signal by taking the convolution of our input
signal x and the impulse response of the waveguide. Kristien
Meykens and colleagues
3
show that for modes other than (0,
0), the impulse response departs from a -function in which
the lower frequencies resemble a reversed chirp.
Figure 5 shows the convolution of an input signal consist-
ing of a brief acoustic burst with the impulse response of a rec-
tangular waveguide (which we represent as a chirp function).
We form this input signal by multiplying an 8-kHz sine wave
by a Bartlett (tent-shaped) window. The chirp function repre-
sents the impulse response for the waveguides (1, 0) mode, and
f(t) = 10
3
+ 2 10
6
t represents the chirp functions frequency
dependence. The chirp expression is simply sin((t)), where
. (11)
I
n general, a convolution shows the two functions entan-
glement. The examples weve discussed here provide a
clear instance in which we can see where the similarity be-
tween the input signal and the impulse response is the great-
est. Such computations are in reasonable agreement with ex-
perimental results.
3
In the next installment of this series, well continue to ex-
amine the problem of spectrum estimation with a discussion
of the autocorrelation function and the correlogram esti-
mates, which are based upon it.
( ) ( ') ' t f t dt
t
=

2
0
h t
t
otherwise
( )
, . ,
, ,
=

1 0 0 3125
0
x t
t t t
otherwise
( )
sin( ) sin( ), ,
, ,
=
+

2 4 0 1
0

E D U C A T I O N
A
m
p
l
i
t
u
u
d
e
0 0.2 0.4 0.6 0.8 1.0
2
1
0
1
2
Time
Discrete convolution
Continuous convolution
x(t)
h(t)
Figure 1. Comparison of continuous and discrete convolution
calculations. We calculated the convolution of x(t) and h(t) in
three ways: continuous and discrete in the time and frequency
domains. The discrete convolution calculations approach the
continuous form.
A
m
p
l
i
t
u
u
d
e
0.0 0.2 0.4 0.6 0.8 1.0
0.4
0.2
0
0.2
0.4
Time
Figure 2. Convolution with and without wraparound
distortions. The blue curve shows the circular form of the
convolution without zero-padding. The red curve is based on a
zero-padded calculation that avoids the distortion associated
with circularity. The diamonds show the h response curve
(scaled at 10 percent of true height); the width of the
response function is associated with the region in which the
circular convolution is spoiled.
References
1. D. Donnelly and B. Rust, The Fast Fourier Transform for Experimental-
ists Part I: Concepts, Computing in Science & Eng., vol. 7, no. 2, 2005,
pp. 8088.
2. E. Oran Brigham, The Fast Fourier Transform and Its Applications, Pren-
tice-Hall, 1988.
3. K. Meykens, B. Van Rompaey, and J. Janssen, Dispersion in Acoustic
Waveguide: A Teaching Laboratory Experiment, Am. J. Physics, vol. 67,
no. 5, 1999, pp. 400406.
Denis Donnelly is a professor of physics at Siena College. His research
interests include computer modeling and electronics. Donnelly received
a PhD in physics from the University of Michigan. He is a member of the
American Physical Society, the American Association of Physics Teachers,
and the American Association for the Advancement of Science. Contact
him at [email protected].
Bert Rust is a mathematician at the US National Institute for Standards
and Technology. His research interests include ill-posed problems, time-
series modeling, nonlinear regression, and observational cosmology. Rust
received a PhD in astronomy from the University of Illinois. He is a mem-
ber of SIAM and the American Astronomical Society. Contact him at
[email protected].
A
m
p
l
i
t
u
u
d
e
0 1 2 3 4 5 6 7 8
Time (milliseconds)
Figure 5. The convolution of a windowed sine wave burst and
a chirp function. The top curve shows the input signal, and the
middle curves show the impulse response of the waveguide (a
chirp function). The chirp frequency increases linearly with
time, ranging from roughly 1 kHz at t = 0 to roughly 17 kHz at
t = 8 ms; the frequency increases at a rate of approximately 2
kHz/ms. The bottom curves show the convolution and the
approximate frequencies associated with the most signicant
section of the convolution over time. The one marked point
represents the frequency of the windowed sine curve which is
8 kHz. The slope of the line representing frequency is about
1.9 kHz/ms.
0 5 10 15
0.00
0.05
0.10
0.15
0.20
0.25
Frequency
Zero-padded linear
Unpadded linear
Zero-padded circular
Unpadded circular
Figure 3. The FFTs of the linear and circular convolutions. The
two curves are shown with (solid curves) and without (circles
and diamonds) zero padding. We computed these FFTs from
the convolution data for Figure 1s discrete transform. The
results are the same as those obtained by taking the product
of x and hs FFTs.
Computation in time
domain as per definition
X = FFT(x)
x(k) data
FFT
Convolution in
the time domain
IFFT
Multiplication in the
frequency domain
H * X
H = FFT(h)
h(k) data
Figure 4. The interrelation between time and frequency
domain operations that lead to convolution. Multiplying the
FFTs of x and h followed by an IFFT also lead to the
convolution. An FFT of the convolution would yield the same
result as the product of the FFTs.
Submissions: Send one PDF copy of articles and/or proposals to Norman Chonacky, Editor in Chief, [email protected]. Submissions should not exceed 6,000 words and 15 references. All
submissions are subject to editing for clarity, style, and space.
Editorial: Unless otherwise stated, bylined articles and departments, as well as product and service descriptions, reect the authors or rms opinion. Inclusion in CiSE does not necessarily
constitute endorsement by the IEEE, the AIP, or the IEEE Computer Society.
Circulation: Computing in Science & Engineering (ISSN 1521-9615) is published bimonthly by the AIP and the IEEE Computer Society. IEEE Headquarters, Three Park Ave., 17th Floor, New
York, NY 10016-5997; IEEE Computer Society Publications Ofce, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714 821 8380; IEEE Computer Society
Headquarters, 1730 Massachusetts Ave. NW, Washington, DC 20036-1903; AIP Circulation and Fulllment Department, 1NO1, 2 Huntington Quadrangle, Melville, NY 11747-4502. Annual
subscription rates for 2005: $42 for Computer Society members (print only) and $42 for AIP society members (print plus online). For more information on other subscription prices, see
www.computer.org/subscribe/ or https://www.aip.org/forms/journal_catalog/order_form_fs.html. Computer Society back issues cost $20 for members, $96 for nonmembers; AIP back issues
cost $22 for members.
Postmaster: Send undelivered copies and address changes to Computing in Science & Engineering, 445 Hoes Ln., Piscataway, NJ 08855. Periodicals postage paid at New York, NY, and at
additional mailing ofces. Canadian GST #125634188. Canada Post Corporation (Canadian distribution) publications mail agreement number 40013885. Return undeliverable
Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8 Canada. Printed in the USA.
Copyright & reprint permission: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of US copyright law for private use of patrons
those articles that carry a code at the bottom of the rst page, provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Dr., Danvers,
MA 01923. For other copying, reprint, or republication permission, write to Copyright and Permissions Dept., IEEE Publications Administration, 445 Hoes Ln., PO Box 1331, Piscataway, NJ
08855-1331. Copyright 2005 by the Institute of Electrical and Electronics Engineers Inc. All rights reserved.
spectrum estimation problem. Before
we begin, heres a short refresher
about two elements we introduced
previously, windowing
1
and convolu-
tion.
2
As we noted in those install-
ments, a convolution is an integral
that expresses the amount of overlap
of one function as it is shifted over an-
other. The result is a blending of the
two functions. Closely related to the
convolution process are the processes
of cross-correlation and autocorrela-
tion. Computing the cross-correlation
differs only slightly from the convolu-
tion; its useful for finding the degree
of similarity in signal patterns from
two different data streams and in de-
termining the lead or lag between
such similar signals. Autocorrelation
is also related to the convolution; its
described later. Windowing, used in
extracting or smoothing data, is typi-
cally executed by multiplying time-
domain data or its autocorrelation
function by the window function. A
disadvantage of windowing is that it
alters or restricts the data, which, of
course, has consequences for the spec-
tral estimate. In this installment, we
continue our discussion, building on
these concepts with a more general
approach to computing spectrum es-
timates via the FFT.
Spectrum Estimations
Central Problem
The periodogram, invented by
Arthur Schuster in 1898,
3
was the
first formal estimator for a time se-
riess frequency spectrum, but many
others have emerged in the ensuing
century. Almost all use the FFT in
their calculations, but they differ in
their assumptions about the missing
data; that is, the data outside the ob-
servation window. These assumptions
have profound effects on the spectral
estimates. Let t be time, f be fre-
quency, and x(t) a real function on the
interval < t < . The continuous
Fourier transform (CFT) of x(t) is de-
fined by
,
f , (1)
where . If we knew x(t) per-
fectly and could compute Equation 1,
then we could compute an energy
spectral density function
E(f ) = |X(f )|
2
, f , (2)
and a power spectral density function
(PSD) by
f . (3)
But we have only a discrete, real time
series
x
j
= x(t
j
), with t
j
= jt,
j = 0, 1, , N 1, (4)
defined on a finite time interval of
length Nt. We saw in Part I
1
that
sampling x(t) with sample spacing t
confined our spectral estimates to the
Nyquist band 0 f 1/2t. We used
the FFT algorithm to compute the dis-
crete Fourier transform (DFT)
k = 0, 1, , N/2, (5)
which approximates the CFT X(f ) at
the Fourier frequencies
, k = 0, 1, ..., N/2. (6)
We then computed periodogram esti-
mates of both the PSD and the ampli-
tude spectrum by
, k = 0, 1, , N/2,
, k= 0, 1, , N/2. (7)
We also saw that we could approximate
A f
N
X
k k
( ) | | =
2
P f
N
X
k k
( ) | | =
1
2
f
k
N t
k
=

X x
j
N
k
k j
j
N
=

exp 2
0
1
i
P f
x t ft dt
( )
lim ( ) exp( ) ,
=

T
T
T
T
1
2
2
i
i 1
X f x t ft dt ( ) ( ) exp( ) =

2i
74 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING
E
ACH ARTICLE IN THIS CONTINUING SERIES ON THE FAST
FOURIER TRANSFORM (FFT) IS DESIGNED TO ILLUMINATE
NEW FEATURES OF THE WIDE-RANGING APPLICABILITY OF THIS
TRANSFORM. THIS SEGMENT DEALS WITH SOME ASPECTS OF THE
Editor: Denis Donnelly, [email protected]
EDUCATION
E D U C A T I O N
THE FAST FOURIER TRANSFORM
FOR EXPERIMENTALISTS PART III:
CLASSICAL SPECTRAL ANALYSIS
By Bert Rust and Denis Donnelly
SEPTEMBER/OCTOBER 2005 75
the CFT and the frequency spectrum
on a denser frequency mesh simply by
appending zeroes to the time series.
This practice, called zero padding, is
just an explicit assertion of an implicit
assumption of the periodogram
methodnamely, that the time series
is zero outside the observation window.
Frequency spectrum estimation is a
classic underdetermined problem be-
cause we need to estimate the spectrum
at an innite number of frequencies us-
ing only a finite amount of data. This
problem has many solutions, differing
mainly in what they assume about the
missing data.
Before considering other solutions
to this problem, lets reconsider one of
the examples from Part I
1
(specifically,
Figure 1b), but make it more realistic
by simulating some random measure-
ment errors. More precisely, we take
N = 32, t = 0.22, and consider the
time series
t
j
= jt, j = 0, 1, 2, , N 1,
x
j
= x(t
j
) = sin[2f
0
(t
j
+ 0.25)] +
j
, (8)
with f
0
= 0.5, and each
j
a random
number drawn independently from a
normal distribution with mean zero
and standard deviation = 0.25. This
new time series is plotted together
with the original uncorrupted series in
Figure 1a. Both series were zero
padded to length 1,024 (992 zeroes
appended) to obtain the periodogram
estimates given in Figure 1b. Its re-
markable how well the two spectra
agree, even though the noises stan-
dard deviation was 25 percent of the
signals amplitude.
The Autocorrelation Function
After the periodogram, the next fre-
quency spectrum estimators to emerge
were Richard Blackman and John
Tukeys correlogram estimators.
4
Theyre based on the autocorrelation
theorem (sometimes called Wieners
theorem), which states that if X(f ) is
the CFT of x(t), then |X(f )|
2
is the
CFT of the autocorrelation function
(ACF) of x(t). Norbert Wiener dened
the latter function as
5
,
< < , (9)
in which the variable is called the lag
(the time interval for the correlation of
x(t) with itself), and x*(t) is the com-
plex conjugate of x(t). Thus, if we
could access x(t), we could compute
the PSD in two ways: either by Equa-
tion 3 or by
. (10)
But again, we have access to only a
noisy time series x
0
, x
1
, , x
N1
, so to
use the second method, we need esti-
mates for () evaluated at the discrete
lag values

m
= mt, m= 0, 1, , N 1. (11)
Because were working with a real time
series, and (
m
) = (
m
), we dont need
to worry about evaluating () at neg-
ative lags.
Because () is a limit of the average
value of x*(t)x(t + ) on the interval
[T, T ], the obvious estimator is the
sequence of average values
,
m= 0, 1, , N 1. (12)
This sequence is sometimes called the
unbiased estimator of () because its
expected value is the true valuethat
is, {
^
(mt)} = (mt). But the data
are noisy, and for successively larger
values of m, the average
^

m
is based on
fewer and fewer terms, so the variance
grows and, for large m, the estimator
becomes unstable. Therefore, its
common practice to use the biased
estimator

( )
m
n n m
n
N m
m t
N m
x x
=
=

+
=

1
0
1
P f f d ( ) ( ) exp( ) =

2 i
( ) lim * ( ) ( ) = +

T
T
T
T
1
2
x t x t dt
x(t) = sin [2(0.50)(t + 0.25)] + noise
Periodogram
Frequency
t (time units)
x
i

=

x
(
t
i
)
P
o
w
e
r

s
p
e
c
t
r
a
l
d
e
n
s
i
t
y

(
P
S
D
)
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
Signal and noise
Signal without noise
Signal and noise
Signal without noise
(a)
(b)
9
8
7
6
5
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
Figure 1. Original and new time series as dened by Equation 8. (a) The noise-
corrupted time series and the uncorrupted series originally used in Part Is Figure 1b.
The noise is independently, identically distributed n(0, 0.25). (b) Periodograms of
the two times series plotted in (a). For the noise-corrupted series, the peak is
centered on frequency
^
f
0
= 0.493.
76 COMPUTING IN SCIENCE & ENGINEERING
,
m= 0, 1, , N 1, (13)
which damps those instabilities and has
a smaller total error (bias + variance)
than does the unbiased estimator. (Bias
is the difference between the estimators
expected value and the true value of the
quantity being estimated.) Figure 2a
gives plots of both estimates for the
times series that Equation 8 denes.
The ACF we have just described is
sometimes called the engineering auto-
correlation to distinguish it from the
statistical autocorrelation, which is de-
ned by
,
where . (14)
The individual
^
r
m
are true correlation
coefcients because they satisfy
1
^
r
m
1, m= 0, 1, , N 1. (15)
Correlogram PSD Estimators
Once weve established the ACF esti-
mate, we can use the FFT to calculate
the discrete estimate to the PSD. More
precisely, the ACF estimate is zero
padded to have M lags, which gives
M/2 + 1 frequencies in the PSD esti-
mate, which we can then compute by
approximating Equation 10 with
,
k = 0, 1, , M/2. (16)
Zero padding in this case is an explicit
expression of the implicit assumption
that the ACF is zero for all lag values
> (N 1)t. We must assume that
because we dont know the data out-
side the observation window. Assum-
ing some nonzero extension for the
ACF would amount to an implicit
assumption about the missing ob-
served data.
Figure 2b plots the correlograms
corresponding to the biased and un-
biased ACF estimates, shown in Fig-
ure 2a. The negative sidelobes for the
unbiased correlogram show dramati-
cally why most analysts choose the bi-
ased estimate even though its central
peak is broader. The reason for this
broadening, and for the damped side-
lobes, is that the biased ACF, Equa-
tion 13, can also be computed by
multiplying the unbiased ACF, Equa-
tion 12, by the triangular (Bartlett) ta-
pering window
,
k = 0, 1, 2, , N 1. (17)
Recall that we observed the same sort
of peak broadening and sidelobe sup-
pression in Part Is Figure 10 when we
multiplied the observed data by a
Blackman window before computing
the periodogram.
Notice that the biased correlogram
estimate plotted in Figure 2b is identi-
cal to the periodogram estimate plot-
ted in Figure 1b. The equality of these
two estimates, computed in very dif-
ferent ways, constitutes a nite dimen-
sional analogue of Wieners theorem
for the continuous PSD.
Figure 2bs two PSD correlograms
arent the only members of the class of
correlogram estimates. We can obtain
other variations by truncating the ACF
estimate at lags < (N 1)t and by
smoothing the truncated (or untrun-
cated) estimate with one of the taper-
ing windows defined in Part Is
Equation 11. Most of those windows
were originally developed for the cor-
relogram method; they were then
retroactively applied to the perio-
dogram method when the latter was
resurrected in the mid 1960s. In those
days, people often used very severe
truncations, with the estimates being
w
k
N
k
= 1

( )

exp
P P f
j
M
k
k k
j
j
M
=
=


0
1
2 i
x
N
x
n
n
N
=
=

1
0
1

( )( )
( )
r
N
x x x x
N
x x
m
n n m
n
N m
n
n
N
=

+
=

=

1
1
0
1
2
0
11


( )
m n n m
n
N m
m t
N
x x = =
+
=

1
0
1
E D U C A T I O N
Autocorrelation estimates
Correlogram estimates of power spectral density (PSD)
f
Lag = mt
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
(a)
(b)
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
Biased estimate
Unbiased estimate
Using biased ACF
Using unbiased ACF

(
m

t
)
P
(
f
)
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
14
10
6
2
2
6
Figure 2. Autocorrelation and correlogram estimates for the noisy time series
dened by Equation 8. (a) Biased and unbiased estimates of the autocorrelation
function (ACF); (b) correlogram estimates obtained from the ACF estimates in (a).
SEPTEMBER/OCTOBER 2005 77
set to zero at 90 percent or more of the
lags. Not only did this alleviate the
variance instability problem, but it also
reduced the computing timean im-
portant consideration before the in-
vention of the FFT algorithm, and
when computers were much slower
than today.
The effect of truncating the biased
ACF estimate is shown in Figure 3,
where m
max
is the largest index for
which the nonzero ACF estimate is re-
tained. More precisely,
(18)
Its clear that smaller values of m
max
produce more pronounced sidelobes
and broader central peaks than larger
values. The peak broadening is ac-
companied by a compensating de-
crease in height to keep the area under
the curve invariant. PSD is measured
in units of power-per-unit-frequency
interval, so the peaks area indicates its
associated power.
Figure 4 shows the effect of tapering
the truncated ACF estimates used in
Figure 3 with a Hamming window
,
m= 0, 1, 2, , m
max
. (19)
The sidelobes are suppressed by the ta-
pering, but the central peaks are fur-
ther broadened. This loss in resolution
is the price we must pay to smooth the
sidelobes and eliminate their negative
excursions.
Tapering the biased ACF estimates
with the Hamming window amounts
to twice tapering the unbiased esti-
mates; we can obtain the former from
the latter by tapering them with the
Bartlett window, Equation 17. Figure
5 shows the effect of a single tapering
of the unbaised estimates with the
Hamming window, Equation 19.
Note that the sidelobes are not com-
pletely suppressed, but theyre not as
w
m
m
m
= +

0 538 0 462 . . cos


max

,
, , ..., ,

max

m n n m
n
N m
m
N m
x x
m m
=

=
=
+
=

1
0 1
0
1
00 1 1 , , ..., .
max
m m N = +
Untapered correlogram estimates of power spectral density (PSD)
9
8
7
6
5
4
3
2
1
0
1
P
(
f
)
f
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
m
max
= 10
m
max
= 20
m
max
= 31
Periodogram
Figure 3. Three correlogram estimates for Equation 8 computed from the biased
autocorrelation function (ACF) estimator in Equation 13. The periodogram,
although plotted, doesnt show up as a separate curve because its identical to the
m
max
= 31 correlogram.
Tapered correlogram estimates of power spectral density (PSD)
9
8
7
6
5
4
3
2
1
P
(
f
)
f
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
m
max
= 10
m
max
= 20
m
max
= 31
Periodogram
Figure 4. Three correlogram estimates for the time series generated by Equation 8.
We computed the estimates by tapering three truncations of the biased estimator in
Equation 13 with a Hamming window. The periodogram was also plotted for
comparison. Although it has sidelobes, its central peak is sharper than those of the
correlograms.
78 COMPUTING IN SCIENCE & ENGINEERING
pronounced as in Figure 3, in which
the tapering used the Bartlett win-
dow. However, the central peaks are
also slightly broader here. This is yet
another example of the trade-off
between resolution and sidelobe
suppression.
This particular example contains
only a single-sinusoid, so it doesnt
suggest any advantage for the taper-
ing and truncation procedures, but
they werent developed to analyze a
time series with such a simple struc-
ture. Their advantages are said to be
best realized when the signal being
analyzed contains two or more sinu-
soids with frequencies so closely
spaced that sidelobes from two adja-
cent peaks might combine and rein-
force one another to give a spurious
peak in the spectrum. But of course, if
two adjacent frequencies are close
enough, then the broadening of both
peaks might cause them to merge into
an unresolved lump.
M
uch ink has been used in de-
bating the relative merits of the
various truncation and windowing
strategies, but none of them have
proven to be advantageous, so correlo-
gram estimates are beginning to fall
out of favor. For the past 30 years or so,
most researchers have concentrated on
autoregressive spectral estimates,
which, as we shall see in Part 4, give
better resolution because they make
better assumptions about the the data
outside the window of observation.
References
1. D. Donnelly and B. Rust, The Fast Fourier
Transform for Experimentalists, Part I: Con-
cepts, Computing in Science & Eng., vol. 7,
no. 2, 2005, pp. 8088.
2. D. Donelly and B. Rust, The Fast Fourier
Transform for Experimentalists, Part II: Con-
volutions, Computing in Science & Eng., vol.
7, no. 3, 2005, pp. 9295.
3. A. Schuster, On the Investigation of Hidden
Periodicities with Application to a Supposed
Twenty-Six-Day Period of Meteorological
Phenomena, Terrestrial Magnetism, vol. 3,
no. 1, 1898, pp. 1341.
4. R.B. Blackman and J.W. Tukey, The Measure-
ment of Power Spectra, Dover Publications,
1959.
5. N. Wiener, Extrapolation, Interpolation, and
Smoothing of Stationary Time Series, MIT
Press, 1949.
Denis Donnelly is a professor of physics at
Siena College. His research interests include
computer modeling and electronics. Donnelly
received a PhD in physics from the University of
Michigan. He is a member of the American
Physical Society, the American Association of
Physics Teachers, and the American Association
for the Advancement of Science. Contact him
at [email protected].
Bert Rust is a mathematician at the US National
Institute for Standards and Technology. His re-
search interests include ill-posed problems,
time-series modeling, nonlinear regression, and
observational cosmology. Rust received a PhD
in astronomy from the University of Illinois. He
is a member of SIAM and the American Astro-
nomical Society. Contact him at [email protected].
E D U C A T I O N
www.computer.org/internet/
Stay on Track
IEEE Internet Computing
reports emerging tools,
technologies, and applications
implemented through
the Internet to support a
worldwide computing
environment.
Tapered correlograms using unbiased autocorrelation function (ACF)
9
8
7
6
5
4
3
2
1
0
1
P
(
f
)
f
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
m
max
= 10
m
max
= 20
m
max
= 31
Periodogram
Figure 5. Three correlogram estimates for the time series generated by Equation 8.
We computed the estimates by tapering three truncations of the unbiased estimator
in Equation 12. We also plotted the periodogram for comparison; again, it has a
sharper peak but larger sidelobes .
NOVEMBER/DECEMBER 2005 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 2005 IEEE 85
Editors: David Winch, [email protected]
Denis Donnelly, [email protected]
EDUCATION
E D U C A T I O N
recent article of this series,
1
we considered the peri-
odogram and correlogram estimators for the power spec-
tral density (PSD) function. However, they are only two of
several possibilities.
In this installment, we consider two additional kinds of
spectrum estimates: autoregressive (AR) estimates and the
maximum entropy (ME) method. In the first approach, we
assume that an AR process generates the time series,
which means we can compute the PSD of the time series
from estimates of the AR parameters. The second ap-
proach is a special case of the first, but it uses a different
method for estimating the AR parameters. Specifically, it
chooses them to make the PSDs inverse transform com-
patible with the measured time series, while remaining
maximally noncommittal about the data outside the ob-
servational window.
Autoregressive Time-Series Models
Both the periodogram and correlogram estimates make
rather unrealistic assumptions about the data outside the
observational window. Moreover, when they use tapering
windows or truncation of the autocorrelation function
(ACF), they change the observed data. The years since the
early 1970s have seen the development of a new class of
PSD estimators that are based on the idea of fitting a para-
metric time-series model to the observed data. This en-
ables us to use estimates of the parameters in the
theoretical expression of the models PSD to get an esti-
mate of the observed series PSD. If the model is a good
representation of the process that generated the data, it
should hopefully give a more realistic extrapolation for the
missing data.
The class of models used most often assumes that the data
are generated by an AR process in which each new data
point is formed from a linear combination of the preceding
data plus a random shock. The basic idea is that a systems
future states depend in a deterministic way on previous
states, but at each time step, a random perturbation drives
the system forward. We can write the AR models of orders
1, 2, and 3 as
AR(1): x
n
= a
1
x
n1
+ u
n
, n = 1, 2, , N 1,
AR(2): x
n
= a
1
x
n1
a
2
x
n2
+ u
n
, n = 2, 3, , N 1,
AR(3): x
n
= a
1
x
n1
a
2
x
n2

a
3
x
n3
+ u
n
, n = 3, 4, , N 1, (1)
where a
1
, a
2
, and a
3
are the AR parameters (whose values
must be determined to make the model fit the data), and
u
n
is the random shock at time step n. We assume the ran-
dom shocks to be samples from a zero-mean distribution
whose variance remains constant in time. The choice of
negative signs for the parameters is a universal convention
adopted for notational convenience in derivations that we
wont give here.
Autoregressive Spectral Estimates
In general, for any integer p < N 1, the AR( p) model is
THE FAST FOURIER TRANSFORM
FOR EXPERIMENTALISTS, PART IV:
AUTOREGRESSIVE SPECTRAL ANALYSIS
By Bert Rust and Denis Donnelly
I
TS RARE THAT WE HAVE ONLY ONE WAY IN
WHICH TO APPROACH A PARTICULAR TOPIC
FORTUNATELY, SPECTRUM ESTIMATION ISNT
ONE OF THOSE RARE CASES. IN THE MOST
New Editorial Board Member
D
avid Winch is an emeritus professor of physics at
Kalamazoo College, Michigan. His research inter-
ests are focused on educational technologies (his most re-
cent work is a DVD/CD called Physics: Cinema Classics).
Winch has a PhD in physics from Clarkson University. He is
a member of the American Physical Society, the American
Association of Physics Teachers, and the National Science
Teachers Association. Hell be joining our board as a coed-
itor of the Education department. Contact him at
[email protected] or lead editor Jenny Ferrero at
[email protected] if you are interested in writing.
86 COMPUTING IN SCIENCE & ENGINEERING
, n = p, p + 1, , N 1. (2)
We can show that the PSD function for this model is
(3)
where
w
is another adjustable parameter that we can esti-
mate along with a
1
, a
2
, , a
p
by solving the ( p + 1) ( p + 1)
linear system of equations,
(4)
which are sometimes called the Yule-Walker equations. The
-values in the matrix are just the autocorrelations
k
= (
k
)
= (kt) that we dened in the last issue
1
with
, (5)
where x* is the complex conjugate of x(t). Were working
with real data, so
k
=
k
, which means that the matrix is
symmetric and positive definite. Note that the element in
row i and column j is just
(ij)
, which makes it a Toeplitz ma-
trix. Norman Levinson
2
exploited this special structure to
devise a recursive algorithm that solves the system in times
proportional to (p + 1)
2
rather than the (p + 1)
3
required by
a general linear equations solver.
We can summarize the steps required to compute an au-
toregressive spectral estimate as follows:
1. Choose an autoregressive order p N 1.
2. Compute ACF estimates

0
,

1
, ,

p
using the biased
estimator
m= 0, 1, , N 1. (6)
3. Substitute

0
,

1
, ,

p
into the matrix in Equation 4
and use the Levinson algorithm to compute estimates

a
1
,

a
2
, ,

a
p
and

w
.
4. Substitute

a
1
,

a
2
, ,

a
p
and

w
into Equation 3 to
compute the PSD estimate

P
AR
(f ) on any desired fre-
quency mesh.
Its absolutely necessary to use the biased ACF estimator in
step 2. Using the unbiased estimator produces an unstable
linear system (see Equation 4) with a matrix that numerically
isnt positive denite.
Its easy to do the calculations in the nal step by using the
fast Fourier transform (FFT) algorithm to compute the de-
nominator in Equation 3. If we dene

a
0
1, then
(7)
Suppose we want to evaluate P
AR
( f ) at (M/2 + 1) equally
spaced frequencies
, k = 0, 1, , M/2, (8)
where M> p. Then,
(9)
and we can compute these values quite quickly by zero
padding the sequence

a
0
,

a
1
,

a
2
, ,

a
p
to have Mterms and
applying the FFT algorithm.
exp( ) exp a f j t a
j
M
k
j k
j
p
j
j

j
(
,
\
,
(

2 2
0
i i

0
p
,
f
k
M t
k

1 2 2
1
+

exp( ) exp( ) a f j t a f j t
j
j
p
j
j
i i

0
p
.

( ) ,
m n n m
n
N m
m t
N
x x
+

1
0
1
( ) lim * ( ) ( ) , + < <

T
T
T
T
1
2
x t x t dt



0 1 2
1 0 1 1
2 1 0 2

+
+
...
...
...
p
p
p
.... ... ... ... ...
...
p p p
,

,
,
,
,
,
,
,
]
]
]
1 2 0
]]
]
]
]
]
]
,

,
,
,
,
,
,
]
]
]
]
]
]
]
]

,
1
0
0
0
1
2
a
a
a
p
w
...
...

,
,
,
,
,
,
]
]
]
]
]
]
]
]
,
P f
a f j t
t
f
AR
w
j
j
p
( )
exp( )
,
+

1 2
1
2
1
1
2
i
22t
,
x a x u
n k n k n
k
p
+

1
E D U C A T I O N
1.4
1.0
0.6
0.2
0.2
0.6
1.0
1.4
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
x
i


=

x
(
t
i
)
9
8
7
6
5
4
3
2
1
0
P
o
w
e
r
s
p
e
c
t
r
a
l

d
e
n
s
i
t
y
t (time units)
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
Frequency
Figure 1. The time series generated by Equation 10 and its
periodogram. The discrete points in the upper plot are joined
by straight-line segments to emphasize the time series nature
of the data. The time series was zero padded to length M=
1,024 to compute the periodogram in the lower plot.
NOVEMBER/DECEMBER 2005 87
Two Examples
If we choose the AR order p properly, the peaks in the AR( p)
spectrum will be sharper than those in the periodogram or
correlogram estimates. There is no clear-cut prescription
for choosing p, but a fairly wide range of values will usually
give acceptable results. To illustrate the effect of the choice
of p, lets revisit an example time series used in the last issue.
1
Again, well take N= 32, t = 0.22, and consider the time se-
ries generated by
t
j
= jt, j = 0, 1, 2, , N 1,
x
j
= x(t
j
) = sin[2f
0
(t
j
+ 0.25)] +
j
, (10)
with f
0
= 0.5, and each
j
a random number drawn indepen-
dently from a normal distribution with mean zero and stan-
dard deviation = 0.25. Figure 1 plots the time series and
its periodogram, and Figure 2 gives three different AR( p)
spectra for the time series, together with the periodogram
for comparison. Table 1 gives the locations of the peak cen-
ters. Both the AR(16) and AR(24) estimates give better re-
sults than the periodogram, but for real-world problems, its
best to try several orders in the range N/2 p 3N/4 and
compare them to make the nal choice. Our own experience
has indicated that the best choice usually has p 2N/3.
To better illustrate the AR methods power, lets recon-
sider another time series originally introduced in Part I of
our series (specically, Figure 2a).
3
We generated it by sum-
ming two sine waves, with amplitudes A
1
= A
2
= 1.0, fre-
quencies f
1
= 1.0 and f
2
= 1.3, and phases
1
=
2
= 0, at N =
16 equally spaced time points with t = 0.125. Again, we add
random noise to make the problem more realistic, and write
t
j
= jt, j = 0, 1, , N 1,
x
j
= sin[2f
1
t
j
] + sin[2f
2
t
j
] +
j
, (11)
with the
j
chosen independently from a normal distribution;
the mean is 0 and standard deviation = 0.25. This is the
same error distribution in the preceding example, but the
samples used here differ from any used there. The top graph
of Figure 3 gives plots of the noisy and noise-free time se-
ries, and the bottom graph gives their periodograms. Figure
4 gives plots of the PSDs periodogram and AR(12) esti-
mates. The latter clearly indicates the presence of two peaks,
although it doesnt completely resolve them. The two max-
ima occur at frequencies very near the true values used to
generate the time series. Its remarkable that the AR(12) es-
timate could obtain such good agreement with the true val-
ues using only 16 noise-corrupted data points.
p = 8
p = 16
p = 24
Periodogram
P
A
R
(
f
)
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64
Frequency
Figure 2. AR(p) power spectral density (PSD) estimates. For
p = 8, 16, and 24, and the periodogram for the time series
generated by Equation 10, the plot doesnt cover the whole
Nyquist band 0 f 2.273, but rather only the frequency
range spanned by the central peak in the periodogram. Using
the whole Nyquist range renders the AR(p) peaks so narrow
that its difcult to distinguish between them.
Estimate Periodogram AR(8) AR(16) AR(24)

f
0
0.493 0.491 0.495 0.504
Table 1. Peak centers.
Signal with noise
Signal without noise
2.5
2.0
1.5
1.0
0.5
0
0.5
1.0
1.5
2.0
6.0
5.0
3.0
2.0
1.0
0
0
0
0.4 0.8 1.2 1.6 0.2 0.6 1.0 1.4 1.8 2.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
i


=

x
(
t
i
)
P
o
w
e
r

s
p
e
c
t
r
a
l

d
e
n
s
i
t
y
(a)
(b)
t (time units)
f
Signal with noise
Signal without noise
Figure 3. Time series. In (a) the noise-corrupted time series
generated by Equation 11, the noise is independently and
identically distributed n(0, 0.25). (b) Periodograms of the two
time series plotted in (a). In neither case was the periodogram
method able to resolve two separate peaks. For the noisy
spectrum, the unresolved lump peaks at frequency f

= 1.136.
88 COMPUTING IN SCIENCE & ENGINEERING
The Maximum Entropy Approach
John Parker Burg invented the ME method in the late
1960s; he exhibited its strengths and advantages in oral pre-
sentations at geophysics conferences, but he didnt publish
the mathematical derivations that dened and justied it un-
til his PhD thesis
4
appeared in 1975. This lack of published
documentation produced a great deal of independent work
by other researchers who were trying to understand and ex-
tend the method. In fact, the ME method was one of the
chief motivators for the development of the AR methods and
can be classified as an AR method itself, although Burg
didnt use AR models in its development.
Rather, Burg started with the denition for PSD, that is,
, (12)
but sought a function P
e
( f ), dened on the Nyquist band
1/(2t) f 1/(2t), which satised three guiding principles:
1. The inverse Fourier transform of P
e
( f ) should return
the autocorrelation function unchanged by any lter-
ing or tapering operations:
m= 0, 1, , N 1. (13)
2. P
e
( f ) should correspond to the most random or unpre-
dictable time series whose autocorrelation function
agrees with the known values.
3. P
e
( f ) > 0 on the interval 1/(2t) f 1/(2t).
The first condition merely states that the measured data
shouldnt be changed in any way in computing P
e
( f ). The
second is a statement about what is to be assumed about the
data outside the observational window. Essentially, it says
that those assumptions should be minimized.
To measure a time series randomness or unpredictability,
Burg used the information theoretic concept of entropy. A
random process
x(2t), x(t), x(0), x(t), x(2t), (14)
is said to be band limited if its PSD function is zero everywhere
outside its Nyquist band. If P( f ) is such a PSD function, then
the time series entropy rate (entropy per sample) is given by
(15)
Burgs idea was to maximize this quantity, subject to the
constraints imposed by Equation 13. More precisely, he
sought to impose the constraint at lags 0, t, 2t, , pt,
with p < N and then choose from the set of all nonnegative
functions P( f ) that satisfy those p + 1 constraints the partic-
ular one that minimizes the entropy rate (Equation 15). We
can write the problem formally as
(16)
We need techniques from the calculus of variations to solve
it; we can show that
, (17)
where a
1
, a
2
, , a
p
and
e
are parameters satisfying
(18)



0 1 2
1 0 1 1
2 1 0 2
...
...
...
... ...
p
p
p

.... ... ...


...
p p p
,

,
,
,
,
,
,
,
]
]
]
]
]
]
]
]
]
1 2 0
11
0
0
0
1
2
a
a
a
p
e
...
...
,

,
,
,
,
,
,
]
]
]
]
]
]
]
]

,
,
,
,
,

,,
]
]
]
]
]
]
]
]
.
P f
a fj t
t
f
e
e
j
j
p
( )
exp( )
,
+

1 2
1
2
1
2
1
2
i
tt
h P f
P f df
P f
P f
e
P f
{ ( )}
max ln[ ( )]
( ) ,
( ) exp(
( )

> 0
2iifm t df
m p
m
t
t
t
t

) ,
, , ...,

1
2
1
2
1
2
1
2
0 1

.
h P f P f df
t
t
{( ( )} ln[ ( )] .

1
2
1
2

m e
m t P f fm t df
t
t

( ) ( ) exp( ) , 2
1
2
1
2
i
P f f d ( ) ( ) exp( )

2 i
E D U C A T I O N
P
(
f
)
6.0
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
0 1.0 0.5 1.5 2.0 2.5 3.0 3.5 4.0
f
P
AR
(f )
Periodogram
Figure 4. Power spectral density (PSD). The AR(12) and the
untapered periodogram estimates of the PSD for time series
generated by Equation 11. The two maxima in the AR(12)
spectrum occur at frequencies f

1
= 1.027 and f

2
= 1.321, which
are very near the true values f
1
= 1.00 and f
2
= 1.30.
NOVEMBER/DECEMBER 2005 89
Equation 17 is the same as Equation 3, and, because were
working with real data for which
k
=
k
, Equation 18 is the
same as Equation 4. Thus, the maximum entropy method is
correctly classied as an AR method, even though Burg used
different methods to estimate the autorcorrelations and pa-
rameters in Equation 18.
Forward and Backward Prediction Filters
Burg regarded the vector (1 a
1
a
2
a
p
)
T
as a prediction
lter, which he applied to the data x
0
, x
1
, , x
N1
in both the
forward and reverse directions to get forward and backward
predictions

x
f
n
,

x
b
n
and their corresponding prediction er-
rors e
f
n
, e
b
n
:
(19)
He reasoned that he could get the best estimates for a
1
, a
2
,
, a
p
by minimizing the sum of squares of the predictions
errors, for example,
. (20)
He was able to devise a recursive algorithm that gave esti-
mates not only for a
1
, a
2
, , a
p
, but also, at the same time,
for
e
and for the autocorrelations
0
,
1
, ,
p
. The de-
tails are complicated, so we wont give them here.
4
Its re-
markable that the recursion generates a new estimator for
the elements of the matrix in Equation 18 at the same time
its solving the system of equations!
Choosing the Order p
Like the other AR methods, the ME method requires the
choice of an order p < N. Figure 5 exhibits the results of
choosing a low, intermediate, and high order for the time se-
ries generated by Equation 10. The same plots are repeated
using a logarithmic scaling in Figure 6. Table 2 gives the peak
locations. The ME(3) spectrum gave the best estimate

f
0
, but
its peak is almost as broad as the periodogram. Increasing p
produces sharper peaks, but the locations display a noticeable
downward bias. The ME(14) estimate is fairly representative
of the orders in the range 4 p 25. At p = 26, the peak splits
into two, with the dominant one giving a better

f
0
than any of
the sharp single peaks for p = 4, 5, , 25. The same splitting
occurs for orders p = 27, 28, 29, and 30, with the dominant
peak becoming sharper and sharper but remaining at

f
0
=
0.492. These spurious splittings arent caused by errors in
the data. In fact, they occur much more readily for articially
e e
n
f
n p
N
n
b
n
N p
2
1
2
0
1



+
, , , , , x a x e x x n p p N
n
f
k n k
k
p
n
f
n n
f
+

1
1 11
0 1
1
, , , , , x a x e x x n N
n
b
k n k
k
p
n
b
n n
b

+

pp 1.
P
M
E
(
f
)
100
90
80
70
60
50
40
30
20
10
0
p = 3
p = 14
p = 26
Periodogram
0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64
f
Figure 5. Maximum entropy power spectral density (PSD)
estimates. For orders p = 3, 14, and 26, and the periodogram
for the time series generated by Equation 10, we see plots
along the same frequency range used for the AR(p) spectra in
Figure 2. The ME peaks are even sharper than the AR(p) peaks,
so they must be taller to preserve the area subtended.
Estimate Periodogram ME(3) ME(14) ME(26)
f

0
0.493 0.498 0.479 0.492
Table 2. Peak locations.
P
M
E
(
f
)
10
2
10
1
10
0
0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64
f
p = 3
p = 14
p = 26
Periodogram
Figure 6. Another view of the plots given in Figure 5. Using the
logarithm scale makes it easier to compare the ME(3) estimate
with the periodogram.
E D U C A T I O N
generated time series without added noise, but the ME(26)
spectrum clearly demonstrates that they also occur in noisy
data, so great care must be exercised in interpreting high-
order ME spectra. One of the ME methods strengths is its
ability to resolve closely spaced peaks, but in using it for that
purpose, always remember the possibility of a spurious split-
ting of a single peak.
Researchers have proposed several criteria for choosing
the optimal order for the ME method (and for the other AR
methods), but none of them work all of the time. In fact, its
easier to nd a time series that confounds a given criterion
than it is to develop it. Many authors
5,6
recommend p N/2,
but higher order methods often give better results. Figure 7
shows the result of using a relatively high p for the time se-
ries generated by Equation 11. The very narrow spurious
peak at

f = 2.901 is a typical occurrence when we use high


values for p. Such peaks can usually be easily identied be-
cause theyre so much sharper than the peaks correspond-
ing to real power. The one in Figure 7 is a small price to pay
for the excellent resolution of the two real peaks. Its amaz-
ing that the ME method can achieve such good results us-
ing just 16 noisy data points spanning only 2.5 cycles of
the higher frequency sine wave.
W
eve now looked at four different methods of spec-
trum estimation, and although we havent ex-
hausted the subject, we must proceed. (More details about
this topic appear elsewhere.
5,6
) In the next installment, well
take a brief look at lters and detrending before we present
an analysis of a bat chirp. In the nal installment, well dis-
cuss some statistical tests and use them to analyze atmos-
pheric pressure differences in the Pacific Ocean that have
signicant environmental implications.
References
1. B. Rust and D. Donnelly, The Fast Fourier Transform for Experimental-
ists, Part III: Classical Spectral Analysis, Computing in Science & Eng., vol.
7, no. 5, 2005, pp. 7478.
2. N. Levinson, The Wiener (Root Mean Square) Error Criterion in Filter
Design and Prediction, J. Mathematical Physics, vol. 25, 1947, pp.
261278.
3. D. Donnelly and B. Rust, The Fast Fourier Transform for Experimental-
ists, Part I: Concepts, Computing in Science & Eng., vol. 7, no. 2, 2005,
pp. 8088.
4. J.P. Burg, Maximum Entropy Spectral Analysis, PhD dissertation, Dept. of
Geophysics, Stanford Univ., May 1975; http://sepwww.stanford.edu/
theses/sep06/.
5. S.L. Marple Jr., Digital Spectral Analysis with Applications, Prentice Hall,
1987.
6. S.M. Kay, Modern Spectral Estimation: Theory and Application, Prentice
Hall, 1988.
Bert Rust is a mathematician at the US National Institute for Standards
and Technology. His research interests include ill-posed problems, time-
series modeling, nonlinear regression, and observational cosmology. Rust
has a PhD in astronomy from the University of Illinois. He is a member of
SIAM and the American Astronomical Society. Contact him at
[email protected].
Denis Donnelly is a professor of physics at Siena College. His research
interests include computer modeling and electronics. Donnelly has a PhD
in physics from the University of Michigan. He is a member of the Amer-
ican Physical Society, the American Association of Physics Teachers, and
the American Association for the Advancement of Science. Contact him
at [email protected].
90 COMPUTING IN SCIENCE & ENGINEERING
P
M
E
(
f
)
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0 1.0 0.5 1.5 2.0 2.5 3.0 3.5 4.0
f
P
ME
(f )
Periodogram
Figure 7. Maximum entropy (ME) method. In the ME(14) power
spectral density (PSD) estimate for the time series generated by
Equation 11, the two peaks are centered at f

1
= 1.023 and f

2
=
1.302. These are somewhat better than the estimates from the
AR(12) spectrum in Figure 4. The very narrow peak at f

= 2.901
is an artifact caused by using the very high order p = 14 (high
relative to N = 16), but because its so narrow, it doesnt
indicate much power and thus can be safely ignored.
computer.org/join/
Join the IEEE Computer Society
online at
T H E WO R L D ' S C O M P U T E R S O C I E T Y

You might also like