0% found this document useful (0 votes)
52 views33 pages

T FFT: A A W F C U: HE N Lgorithm THE Hole Amily AN SE

Uploaded by

Yas Herab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views33 pages

T FFT: A A W F C U: HE N Lgorithm THE Hole Amily AN SE

Uploaded by

Yas Herab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

the Top

THEME ARTICLE

THE FFT: AN ALGORITHM


THE WHOLE FAMILY CAN USE
The fast Fourier transform is one of the fundamental algorithm families in digital
information processing. The author discusses its past, present, and future, along with its
important role in our current digital revolution.

A paper by Cooley and Tukey described a recipe deed, the FFT is perhaps the most ubiquitous
for computing Fourier coefficients of a time se- algorithm used today to analyze and manipulate
ries that used many fewer machine operations digital or discrete data.
than did the straightforward procedure ... What My own research experience with various fla-
lies over the horizon in digital signal processing is vors of the FFT is evidence of its wide range of
anyone’s guess, but I think it will surprise us all. applicability: electroacoustic music and audio-
– Bruce P. Bogert, IEEE Trans. Audio Electronics, signal processing, medical imaging, image pro-
AU-15, No. 2, 1967, p. 43. cessing, pattern recognition, computational che-
mistry, error-correcting codes, spectral methods
for partial differential equations, and last but not

T
hese days, it is almost beyond belief least, mathematics. Of course, I could list many
that there was a time before digital more applications, notably in radar and commu-
technology. It seems almost every- nications, but space and time restrict. E. Oran
one realizes that the data whizzing Brigham’s book is an excellent place to start, es-
over the Internet, bustling through our modems, pecially pages two and three, which contain a
or crashing into our cell phones is ultimately just (nonexhaustive) list of 77 applications!1
a sequence of 0’s and 1’s—a digital sequence—
that magically makes the world the convenient,
high-speed place it is today. Much of this magic History
is due to a family of algorithms that collectively We can trace the FFT’s first appearance, like
go by the name the fast Fourier transform. In- so much of mathematics, back to Gauss.2 His in-
terests were in certain astronomical calculations
(a recurrent area of FFT application) that dealt
1521-9615/00/$10.00 © 2000 IEEE
with the interpolation of asteroidal orbits from
a finite set of equally spaced observations. Un-
DANIEL N. ROCKMORE doubtedly, the prospect of a huge, laborious hand
Dartmouth College calculation provided good motivation to develop

60 COMPUTING IN SCIENCE & ENGINEERING


Discrete Fourier transforms
The fast Fourier transform efficiently computes the discrete Equation 5 that we can rewrite Equation 1 as
Fourier transform. Recall that the DFT of a complex input vec- N 1−1 N 2 −1
tor of length N, X = (X(0) ..., X(N − 1)), denoted Xˆ , is an- Xˆ(c , d ) = ∑ WNb (cN +d ) ∑ X (a , b )WNad
2
2
(6)
other vector of length N given by the collection of sums b =0 a =0

N −1 The computation is now performed in two steps. First,


Xˆ(k ) = ∑ X ( j )WNjk (1) compute for each b the inner sums (for all d)
j =0
N 2 −1
X˜(b , d ) = ∑ X (a , b )WNad (7)
( )
2
where WN = exp 2π −1 / N . Equivalently, we can view a =0

this as the matrix-vector product FN . X, where which is now interpreted as a subsampled DFT of length

FN = (( ))
WNjk
N2. Even if computed directly, at most N1N22 arithmetic op-
erations are required to compute all of the X˜ (b, d ) . Finally,
is the so-called Fourier matrix. The DFT is an invertible trans- we compute N1N2 transforms of length N1:
form with inverse given by N 1−1

1 N −1 ∑ W b (cN +d )X˜(b ,d )
2 (8)
X(j ) =
N
∑ Xˆ(k ) jWN− jk . (2) b =0
k =0 which requires at most an additional N1N12 operations.
2
Thus, if computed directly, the DFT would require N oper- Thus, instead of (N1N2)2 operations, this two-step approach
ations. Instead, the FFT is an algorithm for computing the uses at most (N1N2)(N1 + N2) operations. If we had more
DFT in O(N log N) operations. Note that we can view the factors in Equation 6, then this approach would work even
inverse as the DFT of the function better, giving Cooley and Tukey’s result. The main idea is
that we have converted a 1D algorithm, in terms of index-
1 ˆ
X ( −k ) , ing, into a 2D algorithm. Furthermore, this algorithm has
N
the advantage of an in-place implementation, and when
so that we can also use the FFT to invert the DFT. accomplished this way, concludes with data reorganized
One of the DFT’s most useful properties is that it converts according to the well-known bit-reversal shuffle.
circular or cyclic convolution into pointwise multiplication, This “decimation in time” approach is one of a variety of
for example, FFT techniques. Also notable is the dual approach of “deci-
mation in frequency” developed simultaneously by Gordon
X *Y (k ) = Xˆ(k )Yˆ(k ) (3) Sande, whose paper with W. Morven Gentleman also con-
where tains an interesting discussion on memory consideration as
n it relates to implementational issues.1 Charles Van Loan’s
X *Y ( j ) = ∑ X (l )Y ( j − l ) . (4) book discusses some of the other variations and contains an
l =0
extensive bibliography.2 Many of these algorithms rely on
Consequently, the FFT gives an O(N log N) (instead of an N2) the ability to factor N. When N is prime, we can use a differ-
algorithm for computing convolutions: First compute the ent idea in which the DFT is effectively reduced to a cyclic
DFTs of both X and Y, then compute the inverse DFT of the convolution instead.3
sequence obtained by multiplying pointwise Xˆ and Yˆ .
In retrospect, the idea underlying the Cooley-Tukey FFT is References
quite simple. If N = N1N2, then we can turn the 1D equation 1. W.M. Gentleman and G. Sande, “Fast Fourier Transforms—For Fun and
(Equation 1) into a 2D equation with the change of variables Profit,” Proc. Fall Joint Computer Conf. AFIPS, Vol. 29, Spartan, Washington,
D.C., 1966, pp. 563–578.
j = j(a,b) = aN1 + b, 0 ≤ a < N2, 0 ≤ b <N1 2. C. Van Loan, Computational Framework for the Fast Fourier Transform, SIAM,
k = k(c,d) = cN2 + d, 0 ≤ c < N1, 0 ≤ d < N2 (5) Philadelphia, 1992.
3. C.M. Rader, “Discrete Fourier Transforms When the Number of Data Points is
Using the fact WNm +n = WNmWNn , it follows quickly from Prime,” Proc. IEEE, IEEE Press, Piscataway, N.J., Vol. 56, 1968, pp. 1107–1108.

a fast algorithm. Fewer calculations also imply Fourier transforms of length N1, which are com-
less opportunity for error and therefore lead to bined as N1 DFTs of length N2. (See the “Dis-
numerical stability. Gauss observed that he could crete Fourier transforms” sidebar for detailed
break a Fourier series of bandwidth N = N1N2 information.) Gauss’s algorithm was never pub-
into a computation of N2 subsampled discrete lished outside of his collected works.

JANUARY/FEBRUARY 2000 61
The statistician Frank Yates published a less gen- win’s fervent proselytizing, did a lot to publicize the
eral but still important version of the FFT in 1932, existence of this (apparently) new fast algorithm.6
which we can use to efficiently compute the The timing of the announcement was such that
Hadamard and Walsh transforms.3 Yates’s “inter- usage spread quickly. The roughly simultaneous de-
action algorithm” is a fast technique designed to velopment of analog-to-digital converters capable
compute the analysis of variance for a 2n-factorial of producing digitized samples of a time-varying
design and is described in almost any text on statis- voltage at rates of 300,000 samples per second had
tical design and analysis of experiments. already initiated something of a digital revolution.
Another important predecessor is the work of This development also provided scientists with
G.C. Danielson and Cornelius Lanczos, per- heretofore unimagined quantities of digital data to
formed in the service of x-ray crystallography, an- analyze and manipulate (just as is the case today).
other area for applying FFT technology.4 Their The “standard” applications of FFT as an analysis
“doubling trick” showed how to reduce a DFT on tool for waveforms or for solving PDEs generated a
2N points to two DFTs on N points using only N tremendous interest in the algorithm a priori. But
extra operations. Today, it’s amusing to note their moreover, the ability to do this analysis quickly let
problem sizes and timings: “Adopting these im- scientists from new areas try the algorithm without
provements, the approximate times for Fourier having to invest too much time and energy.
analysis are 10 minutes for 8 coefficients, 25 min-
utes for 16 coefficients, 60 minutes for 32 coeffi-
cients, and 140 minutes for 64 coefficients.”4 This Its effect
indicates a running time of about .37 N log N min- It’s difficult for me to overstate FFT’s impor-
utes for an N-point DFT! tance. Much of its central place in digital signal
Despite these early discoveries of an FFT, it and image processing is due to the fact that it
wasn’t until James W. Cooley and John W. Tukey’s made working in the frequency domain equally
article that the algorithm gained any notice. The computationally feasible as working in the tem-
story of their collaboration is an interesting one. poral or spatial domain. By providing a fast algo-
Tukey arrived at the basic reduction while in a meet- rithm for convolution, the FFT enabled fast,
ing of President Kennedy’s Science Advisory Com- large-integer and polynomial multiplication, as
mittee. Among the topics discussed were techniques well as efficient matrix-vector multiplication for
for offshore detection of nuclear tests in the Soviet Toeplitz, circulant, and other kinds of structured
Union. Ratification of a proposed United States– matrices. More generally, it plays a key role in
Soviet Union nuclear test ban depended on the de- most efficient sorts of filtering algorithms. Modi-
velopment of a method to detect the tests without fications of the FFT are one approach to fast al-
actually visiting Soviet nuclear facilities. One idea gorithms for discrete cosine or sine transforms, as
was to analyze seismological time-series data ob- well as Chebyshev transforms. In particular, the
tained from offshore seismometers, the length and discrete cosine transform is at the heart of MP3
number of which would require fast algorithms to encoding, which gives life to real-time audio
compute the DFT. Other possible applications to streaming. Last but not least, it’s also one of the
national security included the long-range acoustic few algorithms to make it into the movies—I can
detection of nuclear submarines. still recall the scene in No Way Out where the im-
Richard Garwin of IBM was another participant age-processing guru declares that he will need to
at this meeting, and when Tukey showed him the “Fourier transform the image” to help Kevin
idea, he immediately saw a wide range of potential Costner see the detail in a photograph!
applicability and quickly set to getting the algorithm Even beyond these direct technological appli-
implemented. He was directed to Cooley, and, cations, the FFT influenced the direction of aca-
needing to hide the national security issues, told demic research, too. The FFT was one of the first
Cooley that he wanted the code for another prob- instances of a less-than-straightforward algorithm
lem of interest: the determination of the spin- with a high payoff in efficiency used to compute
orientation periodicities in a 3D crystal of He3. something important. Furthermore, it raised the
Cooley was involved with other projects, and sat natural question, “Could an even faster algorithm
down to program the Cooley-Tukey FFT only after be found for the DFT?” (the answer is no7),
much prodding. In short order, he and Tukey pre- thereby raising awareness of and heightening in-
pared a paper which, for a mathematics or computer terest in the subject of lower bounds and the
science paper, was published almost instantaneously analysis and development of efficient algorithms
(in six months).5 This publication, as well as Gar- in general. With respect to Shmuel Winograd’s

62 COMPUTING IN SCIENCE & ENGINEERING


lower-bound analysis, Cooley writes in the dis- search, and we can only wonder which of the many
cussion of the 1968 Arden House Workshop on recent private technological discoveries might have
FFT, “These are the beginnings, I believe, of a prospered from a similar announcement.
branch of computer science which will probably
uncover and evaluate other algorithms for high
speed computers.”8 The future FFT
Ironically, the FFT’s prominence might have As torrents of digital data continue to stream into
slowed progress in other research areas. It pro- our computers, it seems that the FFT will continue
vided scientists with a big analytic hammer, and, to play a prominent role in our analysis and under-
for many, the world suddenly looked as though it standing of this river of data. What follows is a brief
were full of nails—even if this wasn’t always so. discussion of future FFT challenges, as well as a few
Researchers sometimes massaged problems that new directions of related research.
might have benefited from other, more appropriate
techniques into a DFT framework, simply because Even bigger FFTs
the FFT was so efficient. One example that comes Astronomy continues to be a chief consumer of
to mind is some of the early spectral-methods work large FFT technology. The needs of projects like
to solve PDEs in spherical geometry. In this case, MAP (Microwave Anisotropy Project) or LIGO
the spherical harmonics are a natural set of basis (Laser InterFerometer Gravitational-Wave Obser-
functions. Discretization for numerical solutions vatory) require FFTs of several (even tens of) giga-
implies the computation of discrete Legendre points. FFTs of this size do not fit in the main
transforms (as well as FFTs). Many of the early memory of most machines, and these so-called out-
computational approaches tried instead to ap- of-core FFTs are an active area of research.9
proximate these expansions completely in terms As computing technology evolves, undoubtedly,
of Fourier series, rather than address the develop- versions of the FFT will evolve to keep pace and
ment of an efficient Legendre transform. take advantage of it. Different kinds of memory
Even now there are still lessons to learn from the hierarchies and architectures present new chal-
FFT’s development. In this day and age, where any lenges and opportunities.
new technological idea seems fodder for Internet
venture capitalists and patent lawyers, it is natural Approximate and nonuniform FFTs
to ask, “Why didn’t IBM patent the FFT?” Cooley For a variety of applications (such as fast MRI),
explained that because Tukey wasn’t an IBM em- we need to compute DFTs for nonuniformly
ployee, IBM worried that it might not be able to spaced grid points and frequencies. Multipole-
gain a patent. Consequently, IBM had a great in- based approaches efficiently compute these quan-
terest in putting the algorithm in the public do- tities in such a way that the running time increases
main. The effect was that then nobody else could by a factor of
patent it either. This did not seem like such a great  1
loss because at the time, the prevailing attitude was log 
e
that a company made money in hardware, not soft-
ware. In fact, the FFT was designed as a tool to an- where e denotes the approximation’s precision.10
alyze huge time series, in theory something only Algebraic approaches based on efficient polyno-
supercomputers tackled. So, by placing in the pub- mial evaluation are also possible.11
lic domain an algorithm that would make time-
series analysis feasible, more big companies might Group FFTs
have an interest in buying supercomputers (like The FFT might also be explained and interpreted
IBM mainframes) to do their work. using the language of group representation the-
Whether having the FFT in the public domain ory—working along these lines raises some inter-
had the effect IBM hoped for is moot, but it cer- esting avenues for generalization. One approach is
tainly provided many scientists with applications to view a 1D DFT of length N as computing the ex-
on which to apply the algorithm. The breadth of pansion of a function defined on CN, the cyclic
scientific interests at the Arden workshop (held group of length N (the group of integers mod N) in
only two years after the paper’s publication) is truly terms of the basis of irreducible matrix elements of
impressive. In fact, the rapid pace of today’s tech- CN, which are precisely the familiar sampled expo-
nological developments is in many ways a testa- nentials: ek (m) = exp(2π −1km / N ) . The FFT is a
ment to this open development’s advantage. This is highly efficient algorithm for computing the ex-
a cautionary tale in today’s arena of proprietary re- pansion in this basis. More generally, a function on

JANUARY/FEBRUARY 2000 63
any compact group (cyclic or not) has an expan- References
sion in terms of a basis of irreducible matrix ele- 1. E.O. Brigham, The Fast Fourier Transform and Its Applications, Pren-
ments (which generalize the exponentials from the tice Hall Signal Processing Series, Englewood Cliffs, N.J., 1988.

point of view of group invariance). It’s natural to 2. M.T. Heideman, D.H. Johnson, and C.S. Burrus, “Gauss and the
History of the Fast Fourier Transform,” Archive for History of Exact
wonder if efficient algorithms for performing this Sciences, Vol. 34, No. 3, 1985, pp. 265–277.
change of basis exist. For example, the problem of 3. F. Yates, “The Design and Analysis of Factorial Experiments,” Im-
efficiently computing spherical harmonic expan- perial Bureau of Soil Sciences Tech. Comm., Vol. 35, 1937.
sions falls into this framework. 4. G.C. Danielson and C. Lanczos, “Some Improvements in Practical
The first FFT for a noncommutative finite Fourier Analysis and Their Application to X-Ray Scattering from
Liquids,” J. Franklin Inst., Vol. 233, Nos. 4 and 5, 1942, pp.
group seems to have been developed by Alan 365–380 and 432–452.
Willsky in the context of analyzing certain Mar- 5. J.W. Cooley and J.W. Tukey, “An Algorithm for Machine Calcula-
kov processes.12 To date, fast algorithms exist for tion of Complex Fourier Series,” Mathematics of Computation, Vol.
many classes of compact groups.11 Areas of ap- 19, Apr., 1965, pp. 297–301.

plications of this work include signal processing, 6. J.W. Cooley, “The Re-Discovery of the Fast Fourier Transform Al-
gorithm,” Mikrochimica Acta, Vol. 3, 1987, pp. 33–45.
data analysis, and robotics.13
7. S. Winograd, “Arithmetic Complexity of Computations,” CBMS-
NSF Regional Conf. Series in Applied Mathematics, Vol. 33, SIAM,
Quantum FFTs Philadelphia, 1980.
One of the first great triumphs of the quan- 8. “Special Issue on Fast Fourier Transform and Its Application to
tum-computing model is Peter Shor’s fast algo- Digital Filtering and Spectral Analysis,” IEEE Trans. Audio Electron-
ics, AU-15, No. 2, 1969.
rithm for integer factorization on a quantum
9. T.H. Cormen and D.M. Nicol, “Performing Out-of-Core FFTs on
computer.14 At the heart of Shor’s algorithm is a Parallel Disk Systems,” Parallel Computing, Vol. 24, No. 1, 1998,
subroutine that computes (on a quantum com- pp. 5–20.
puter) the DFT of a binary vector representing 10. A. Dutt and V. Rokhlin, “Fast Fourier Transforms for Nonequi-
an integer. The implementation of this trans- spaced Data,” SIAM J. Scientific Computing, Vol. 14, No. 6, 1993,
pp. 1368–1393; continued in Applied and Computational Har-
form as a sequence of one- and two-bit quantum monic Analysis, Vol. 2, No. 1, 1995, pp. 85–100.
gates, now called the quantum FFT, is effectively 11. D.K. Maslen and D.N. Rockmore, “Generalized FFTs—A Survey
the Cooley-Tukey FFT realized as a particular of Some Recent Results,” Groups and Computation, II, DIMACS Ser.
factorization of the Fourier matrix into a product Discrete Math. Theoret. Comput. Sci., Vol. 28, Amer. Math. Soc.,
Providence, R.I., 1997, pp. 183−237.
of matrices composed as certain tensor products
12. A.S. Willsky, “On the Algebraic Structure of Certain Partially Ob-
of two-by-two unitary matrices, each of which is servable Finite-State Markov Processes,” Information and Control,
a so-called local unitary transform. Similarly, Vol. 38, 1978, pp. 179–212.
the quantum solution to the Modified Deutsch- 13. D.N. Rockmore, “Some Applications of Generalized FFTs (An Ap-
Josza problem uses the matrix factorization aris- pendix with D. Healy),” Groups and Computation II, DIMACS Series
on Discrete Math. Theoret. Comput. Sci., Vol. 28, American Math-
ing from Yates’s algorithm.15 Extensions of these ematical Society, Providence, R.I., 1997, pp. 329–369.
ideas to the more general group transforms 14. P.W. Shor, “Polynomial-Time Algorithms for Prime Factorization
mentioned earlier are currently being explored. and Discrete Logarithms on a Quantum Computer,” SIAM J. Com-
puting, Vol. 26, No. 5, 1997, pp. 1484–1509.
15. D. Simon, “On the Power of Quantum Computation,” Proc. 35th
Annual ACM Symp. on Foundations of Computer Science, ACM

T
hat’s the FFT—both parent and Press, New York, 1994, pp. 116–123.
child of the digital revolution, a
computational technique at the Daniel N. Rockmore is an associate professor of math-
nexus of the worlds of business and ematics and computer science at Dartmouth College,
entertainment, national security and public com- where he also serves as vice chair of the Department of
munication. Although it’s anyone’s guess as to Mathematics. His general research interests are in the
what lies over the next horizon in digital signal theory and application of computational aspects of
processing, the FFT will most likely be in the group representations, particularly to FFT generaliza-
thick of it. tions. He received his BA and PhD in mathematics from
Princeton University and Harvard University, respectively.
Acknowledgment In 1995, he was one of 15 scientists to receive a five-
Special thanks to Jim Cooley, Shmuel Winograd, and year NSF Presidential Faculty Fellowship from the White
Mark Taylor for helpful conversations. The Santa Fe Ins- House. He is a member of the American Mathematical
titute provided partial support and a very friendly and Society, the IEEE, and SIAM. Contact him at the Dept.
stimulating environment in which to write this paper. NSF of Mathematics, Bradley Hall, Dartmouth College,
Presidential Faculty Fellowship DMS-9553134 supported Hanover, NH 03755; [email protected];
part of this work. www.cs.darthmouth.edu/~rockmore.

64 COMPUTING IN SCIENCE & ENGINEERING


EDUCATION EDUCATION

Editor: Denis Donnelly, [email protected]

THE FAST FOURIER TRANSFORM FOR


EXPERIMENTALISTS, PART I: CONCEPTS
By Denis Donnelly and Bert Rust

T HE DISCRETE FOURIER TRANSFORM (DFT)

PROVIDES A MEANS FOR TRANSFORMING

DATA SAMPLED IN THE TIME DOMAIN TO AN EX-


rithm in use today.”13 The interlaced decomposition
method used in the Cooley-Tukey algorithm can be applied
to other orthogonal transformations such as the Hadamard,
Hartley, and Haar. However, in this article, we concentrate
on the FFT’s application and interpretation.

PRESSION OF THIS DATA IN THE FREQUENCY Fundamental Elements


As a rule, data to be transformed consists of N uniformly
domain. The inverse transform reverses the process, con- spaced points xj = x(tj), where N = 2n with n an integer, and tj
verting frequency data into time-domain data. Such trans- = j  t where j ranges from 0 to N – 1. (Some FFT imple-
formations can be applied in a wide variety of fields, from mentations don’t require that N be a power of 2. This num-
geophysics to astronomy, from the analysis of sound signals ber of points is, however, optimal for the algorithm’s
to CO2 concentrations in the atmosphere. Over the course execution speed.) Even though any given data set is unlikely
of three articles, our goal is to provide a convenient sum- to have the number of its data points precisely equal to 2n, zero
mary that the experimental practitioner will find useful. In padding (which we describe in more detail in the next section)
the first two parts of this article, we’ll discuss concepts as- provides a means to achieve this number of samples without
sociated with the fast Fourier transform (FFT), an imple- losing information. As an additional restriction, we limit our
mentation of the DFT. In the third part, we’ll analyze two discussions to real valued time series as most data streams are
applications: a bat chirp and atmospheric sea-level pressure real. When the time-domain data are real, the values of the
differences in the Pacific Ocean. amplitude or power spectra at any negative frequency are the
The FFT provides an efficient algorithm for implement- same as those at the corresponding positive frequency. Thus,
ing the DFT and, as such, we’ll focus on it. This transform if the time series is real, one half of the 2n frequencies contain
is easily executed; indeed, almost every available mathe- all the frequency information. In typical representations, the
matical software package includes it as a built-in function. frequency domain contains N/2 + 1 samples.
Some books are devoted solely to the FFT,1–3 while others The FFT’s kernel is a sum of complex exponentials. As-
on signal processing,4–6 time series,7, 8 or numerical meth- sociated with this process are conventions for normaliza-
ods9,10 include major sections on Fourier analysis and the tion, sign, and range. Here, we present what we consider to
FFT. We draw together here some of the basic elements be good practice, but our choices are not universal. Users
that users need to apply and interpret the FFT and its in- should always check the conventions of their particular soft-
verse (IFFT). We will avoid descriptions of the Fourier ma- ware choice so they can properly interpret the computed
trix, which lies at the heart of the DFT process,11 and the transforms and related spectra.
parsing of the Cooley-Tukey algorithm12 (or any of several Equation 1 shows some simple relationships between pa-
other comparable algorithms), which provides a means for rameters such as t, the sampling time interval; f, the spac-
transforming the discrete into the fast Fourier transform. ing in the frequency domain; N, the number of samples in
The Cooley-Tukey algorithm makes the FFT extremely the time domain; and fj, the Fourier frequencies. The num-
useful by reducing the number of computations from some- ber of samples per cycle (spc) for a particular frequency
thing on the order of n2 to n log(n), which obviously pro- component with period T in the time domain and (in some
vides an enormous reduction in computation time. It’s so cases) the total number of cycles (nc) in the data record for
useful, in fact, that the FFT made Computing in Science & a particular frequency component are two other pieces of
Engineering’s list of the top 10 algorithms in an article that information that are useful because they remind us of the
noted the algorithm is, “perhaps, the most ubiquitous algo- adequacy of the sampling rate or the data sample. Some re-

80 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 © 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING
lations between these parameters are For real data, we can express the relation as
1
f = and fj = j · f, N −1
1  N / 2−1 2 
∑  X0 + 2⋅ ∑ X j + X N /2  ,
N ⋅ ∆t 2 2
xi2 = (4)
i =0
N  j =1

where j = 0, ..., N/2
T N 1 f
spc = , nc = = = . (1) where X = fft(x). The last term on the right-hand side is not
∆t spc T ⋅ ∆f ∆f
usually separated from the sum as it is here; we do this be-
The period T represents only one frequency, but, as we cause there should be only N terms to consider in both sum-
discuss later, there must be more than 2 spc for the highest mations, not N in one and N + 1 in the other. Recall that be-
frequency component of the sampled signal. This band- cause we’re dealing with real valued data, we can exploit a
width-limiting frequency is called the Nyquist frequency and symmetry and present the frequency data only from 0 to
is equal to half the sampling frequency. The spacing in the N/2; this symmetry is the source of the factor of two associ-
frequency domain f is the inverse of the total time sampled, ated with the summation. Unlike the other terms, the +N/2
so time and frequency resolution can’t both be simultane- frequency value isn’t independent and was assigned, as noted
ously improved. Thus, the maximum frequency represented earlier, to the value at –N/2. Should the +N/2 term be in-
is f · N/2 = 1/(2 · t), or the Nyquist frequency. cluded in the sum, we would, in effect, double count the
We can express the transform in several ways. A com- term, so we pull the N/2 term from the sum to avoid this. Of
monly used form is the following (with i = −1 ): course, if N is large, this difference is likely to be minimal.
There are two common ways to display an FFT. One is
N −1 the amplitude spectrum, which presents the magnitudes of
 j 
Xk = ∑ x j exp  −2π i k  , k = –N/2, …, –1, 0, 1, …,
 N  the FFT’s complex values as a function of frequency:
j =0
2
N/2 – 1, (2) Ak = X , k = –N/2, …, –1, 0, 1, …, N/2. (5)
N k
where xj represents the time-domain data and Xk their rep- Given the symmetry of real time series, the standard presen-
resentation in the frequency domain. tation restricts the range of k to positive values: k = 0, 1, …,
We express the IFFT as N/2. An equally common way to represent the transform is
with a power spectrum (or periodogram), which is defined as
N / 2−1
1  k 
xj =
N
∑ X k ⋅ exp  2π i

j , j = 0, 1, …, N – 1. (3)
N  Pk =
1 2
⋅ X k , k = 0, 1, …, N/2. (6)
k =− N / 2 N
The FFT replicates periodically on the frequency axis with However, neither of these spectral representations is uni-
a period of 1/t; consequently, X(fN/2) = X(f–N/2) so that the versal. For example, some conventions place a 1 in the nu-
transform is defined at both ends of the closed interval from merator instead of a 2 for the amplitude spectrum. The pe-
–1/(2t) to + 1/(2t). This interval is sometimes called the riodogram is sometimes represented with a factor of 2 in the
Nyquist band. numerator instead of 1 or as the individual terms expressed
Some FFT and IFFT implementations use different nor- in Parseval’s relation (Equation 4).
malizations or sign conventions. For example, some imple- In Figure 1, as an example of the FFT process, we show
mentations place the factor 1/N in the FFT conversion the amplitude spectrum of a single-frequency sine wave with
rather than with the IFFT. Some place 1/ N in both con- two different sampling intervals. In one case, the interval t
version processes, and some reverse the signs in the expo- is chosen to make nc integral, and in the other, nonintegral.
nentials of the transforms; this sign change reverses the sign If nc is integral, f is necessarily a multiple of f, and one
of the phase component. Moreover, some implementations point of the transform is associated with the true frequency
take the range for k from 0, …, N/2. (see the circles in Figure 1a). However, in any FFT applica-
Because Equations 2 and 3 represent the frequency and tion, we’re dealing with a finite-length time series. The
time domains of the same signal, the energy in the two cases process of restricting the data in the time domain (multi-
must be the same. Parseval’s relation expresses this equality. plying the data by one over the range where we wish to keep

MARCH/APRIL 2005 81
EDUCATION

1.0 1.0
Amplitude

Amplitude
0.5 0.5

0.0 0.0

0 0.5 1 1.5 2 0 0.5 1 1.5 2


(a) Frequency (b) Frequency

Figure 1. Amplitude spectra of a single-frequency sine wave. Two representations of a sine wave of frequency 0.5 are shown in each
part of the figure. In each case, the circles are based on a time series where the number of sample points N = 32 but the time step is
slightly different: (a) Nt = 8, so nc = 4; (b) Nt = 7.04, so nc = 3.52, where nc is the total number of cycles. The solid lines provide a
view of these same spectra with zero padding. This form is closer to what would be expected from a continuous rather than a
discrete Fourier transform. The zero-padded examples reveal detail that might not have been expected, given the appearance of the
unpadded case.

the data and multiplying by zero elsewhere—an example of the beginning, or split between the beginning and end of the
windowing, discussed later) introduces sidelobes in the fre- data set’s time domain. One very common use of this process
quency domain. These sidelobes are called leakage. is to extend time-series data so that the number of samples
Even though there’s leakage, because there’s only one fre- becomes a power of two, making the conversion process
quency associated with the transformed sine wave, we might more efficient or, with some software, simply possible. Be-
expect to be able to estimate that frequency with a weighted cause the spacing of data in the frequency domain is in-
average of all the points in the frequency domain. Such an versely proportional to the number of samples in the time
average, however, wouldn’t yield the correct frequency. domain, by increasing the number of samples—even if their
In general, the FFT process generates complex values in values are zero—the resulting frequency spectrum will con-
the frequency domain from the real values in the time do- tain more data points for the same frequency range. Conse-
main. If we transform sine or cosine waves where we consider quently, the zero-padded transform contains more data
an integral number of cycles, the transform magnitudes are points than the unpadded; as a result, the overall process acts
identical. However, in the frequency domain, a sine curve is as a frequency interpolating function. The resulting, more
represented only with imaginary values and a cosine curve detailed picture in the frequency space might indicate un-
only with real values. When the number of cycles is noninte- expected detail (see, for example, Figure 2). As the number
gral or if there is a phase shift, then both real and imaginary of zeros increases, the FFT better represents the time series’
parts appear in the transform of both the sine and cosine. continuous Fourier transform (CFT).
As we noted earlier, zero padding introduces more points
Zero Padding into the same frequency range and provides interpolation
Zero padding is a commonly used technique associated with between points associated with the unpadded case. When
FFTs. Two frequent uses are to make the number of data data points are more closely spaced, clearly, there’s a possi-
points in the time-domain sample a power of two and to im- bility that unnoticed detail could be revealed (such as Fig-
prove interpolation in the transformed domain (for exam- ure 1a shows). In Figure 2, we see the effect of quadrupling
ple, zero pad in the time domain, improve interpolation in the number of points for two different cases. The transforms
the frequency domain). of the zero-padded data contain the same information as the
Zero padding, as the name implies, means appending a unpadded data, and every fourth point of the padded data
string of zeros to the data. It doesn’t make any difference if matches the corresponding unpadded data point. The in-
the zeros are appended at the end (the typical procedure), at termediate points provide interpolation.

82 COMPUTING IN SCIENCE & ENGINEERING


1.0 1.0
Amplitude

Amplitude
0.5 0.5

0.0 0.0
0 1 2 3 4 0 1 2 3 4
(a) Frequency (b) Frequency

Figure 2. The effect of zero padding on the transform of a signal containing two different frequencies. We look at two cases: one in
which the two frequencies are too close to be clearly resolved, and one in which resolution is possible. (a) Fast Fourier transforms
(FFTs) of the sum of two sine waves of amplitude 1 and frequencies of 1 and 1.3 Hz; the frequencies aren’t resolved, and (b) FFTs of
the sum of two sine waves of amplitude 1 and frequencies of 1 and 1.35 Hz; the frequencies are resolved. The solid curves are
transforms of zero-padded data and include four times as many samples as the transforms of the unpadded data (dotted curves).
Because the zero-padded curve has four times as many data points as the unpadded case (N = 32), every fourth point of the zero-
padded data is the same as the unpadded data. Zero-padded results provide better interpolation and more detail.

In Figure 2, we see an application of that interpolating 0.6


ability when we consider a signal consisting of two closely
lying frequencies. In Figure 2a, although the envelope is 0.4
more clearly drawn, zero padding does not have the power 0.2
to resolve the two frequencies associated with this case. In
Figure 2b, the peaks are sufficiently separated so that the in- 0.0
terpolation reveals the two peaks, whereas the unpadded
data seemingly did not. This example reminds us that a –0.2
graphical representation connecting adjacent data points –0.4
with straight lines can be misleading.
Zero padding can also be performed in the frequency –0.6
domain. The inverse transform results in an increase in the 0 20 40 60 80 100 120 140
number of data points in the time domain, which could be
useful in interpolating between samples (see Figure 3). Figure 3. The effect of zero padding in the frequency domain
Zero padding is also used in association with convolution on the time-domain data. The frequency data (the unpadded
or correlation and with filter kernels, which we discuss case in Figure 2a) was zero-padded to four times its original
later in this article. length. We show the original unpadded time-domain data
(boxes) and the inverse fast Fourier transform of the zero-
Aliasing padded frequency data (dots). The padding process again acts
When performing an FFT, it’s necessary to be aware of the as an interpolation function.
frequency range composing the signal so that we sample the
signal more than twice per cycle of the highest frequency as-
sociated with the signal. In practice, this might mean filtering 24 frames per second, only has a Nyquist limit of 12 Hz; any
the signals to block any signal components with a frequency higher frequencies present will appear as lower frequencies.
above the Nyquist frequency (2 · tsample)–1 before perform- Let’s assume that we can readily observe a point on a wheel
ing a transform. If we don’t restrict the signal in this way, (not at the center) that’s rotating but not translating. At a slow
higher frequencies will not be adequately sampled and will rotation rate, each successive frame of our film shows the ob-
masquerade as lower-frequency signals. This effect is similar servable point advancing from the previous frame. (The frac-
to what moviegoers experience when the onscreen wheels of tion of a complete rotation and the sampling rate are related;
a moving vehicle seemingly freeze or rotate in the wrong di- the number of samples per rotation is the inverse of the frac-
rection. The camera, which operates at the sampling rate of tion of a rotation per sample.) As the rotation rate increases,

MARCH/APRIL 2005 83
EDUCATION

Table 1. Actual and apparent angles for 170o and 190o rotations.

Angle sequence for 170 step Angle sequence for 190 step Apparent angle
sequence for 1900 steps with
rotation direction reversed.*
0 0 0
170 190 170
340 20 340
150 210 150
320 40 320
130 230 130

*Magnitudes of reverse angles are given by 360 – column 2.

10 ples and one of 190 degrees. We observe only the current po-
sition in each frame, so as we compute a value sequence, we
Apparent frequency

take them mod(360). If we compute values for the 170-de-


gree case, we obtain 0, 170, 340, 150, 320, 130, and so on. If
5 we compute values for the 190-degree case, we get 0, 190,
20, 210, 40, 230, and so on, but we wouldn’t see the 190-
degree rotation. We don’t observe an increase greater than
0
180 degrees (for angles greater than that, the data is under-
0 5 10 15 20 25 sampled). For the 190-degree case, we would see a 170-
True frequency degree step, but with the rotation in the opposite direction.
To consider a reverse rotation, we subtract the forward
Figure 4. Apparent frequency as a function of the true rotation angle from 360. The result is the magnitude of the
frequency. Frequencies greater than the Nyquist frequency fold angle of rotation in the reverse direction. For example, a
back into the allowed frequency range and appear as lower forward rotation angle of 350 degrees is equivalent to a 10-
frequencies. In this example, where the Nyquist frequency is 8 degree step in the reverse direction. So for our 190-degree
Hz, an actual frequency of 9 Hz would appear as 7 Hz. case, the numbers become 0, 360 – 190 = 170, 360 – 20 =
340, 360 – 210 = 150, and so on. Table 1 provides a sum-
mary. The magnitudes of these rotation angles are identi-
the angle between our observed point in successive frames in- cal to the 170-degree data. Thus, we would see the 190-
creases. When the angle reaches 180 degrees, or two samples degree case as equivalent to the 170-degree case in terms of
per rotation, the perceived rotation rate is at its maximum— rotation rate, but with the rotation direction reversed. The
the wheel is rotating at the Nyquist frequency. graph in Figure 4 helps demonstrate this kind of behavior.
When passing through the Nyquist limit, as the frequency In the example shown in Figure 4, the Nyquist frequency
goes from fNy –  to fNy +  (where  << fNy), the rotation di- is 8 Hz. Frequencies associated with the first leg of the saw-
rection appears to change from forward to reverse while the tooth curve have more than two samples per cycle, and the
rotation rate remains the same. Further increases in the ro- apparent and actual frequencies are equal. Once the actual
tation rate make the wheel appear to continue rotating in a frequency exceeds the Nyquist frequency, the apparent fre-
reversed direction but at a decreasing rate. When the actual quency begins to decrease, with the negative slope corre-
rotation rate is twice the Nyquist frequency, the apparent sponding to a reversed rotation direction. At 16 Hz, with
rotation rate is zero and the sampling rate is just once per one sample per rotation, the apparent frequency is zero.
rotation. (Another example of one sample per rotation and With further increases in the true frequency, the apparent
an apparent zero rotation rate is to use a stroboscope to de- frequency once again increases.
termine an object’s rotation rate. With one flash per rota- If we take the FFT of three amplitude 1 cosine waves hav-
tion, the rotating object appears at rest and the flash rate and ing frequencies of 3.5, 12.5, and 19.5 Hz and where we set
rotation rate are equal.) If the frequency of rotation contin- N = 16 and t = 1/N (so the Nyquist frequency is 8 Hz), we
ues to increase, the wheel will again appear to rotate in the get identical FFTs, one of which is shown in Figure 5. The
original rotation direction. number of samples per cycle for these frequencies is 4.57,
To make this more concrete, consider two constant rota- 1.28, and 0.82. Only the lowest frequency is adequately rep-
tion rates, one of 170 degrees between successive frames/sam- resented; the two higher-frequency cases have fewer than

84 COMPUTING IN SCIENCE & ENGINEERING


1.0

Amplitude
0.5

0.0

0 2 4 6 8
two samples per cycle and consequently masquerade as Frequency
lower frequencies, appearing in the allowed range between
0 Hz and the Nyquist frequency. For the example with the Figure 5. The FFT of a 3.5 Hz, amplitude one cosine wave
three different frequencies, we purposely selected the higher where N = 16 and t = 1/N (represented by circles). The FFTs
frequencies so that their FFTs would be identical to that of of the frequencies 3.5 Hz, 12.5 Hz, and 19.5 Hz are identical
the lowest frequency. Referring to Figure 4, we note that the for the case when the Nyquist frequency is 8 Hz. The solid
frequencies 12.5 and 19.5 Hz would appear on the second curve shows the transform with zero padding.
and third legs of the sawtooth curve. The apparent
frequency of the 12.5-Hz line is 8 – (12.5 – 8); the apparent
frequency of the 19.5-line is 19.5 – 2  8. In general, the out- 1
of-range frequency ftrue would appear as fapparent as given by

( )
f apparent = f true − k ⋅ 2 ⋅ f Nyquist = f true −
k
∆t
, (7) 0

where k = 1, 2, …, and k is selected to bring fapparent within –1


the range 0 … fNy. 0 0.05 0.1 0.15 0.2 0.25
In Figure 6, we see the actual curves that correspond to Time
the three frequencies and the points where sampling occurs.
If we performed an FFT followed by an IFFT for any one Figure 6. A view of the sampling of three cosine curves. Cosine
of the three curves (given the sampling specified), the algo- curves with frequencies 3.5 Hz, 12.5 Hz, and 19.5 Hz are
rithm would return the same result in each case, which, shown, with the marked points representing those at which
without other information, would be interpreted as the sampling occurs (t = 1/N and N=16). Only the lowest-
lowest-frequency case. frequency curve is adequately sampled, with more than two
If the magnitudes of the Fourier coefficients approach samples per cycle. In this case, the FFT for each curve would
zero (roughly as 1/f ) as the frequency approaches the indicate a signal with a frequency of 3.5 Hz. For clarity, we
Nyquist frequency (a zero between lobes would not qualify), show only the first five samples.
then there is a good likelihood that aliasing has not occurred.
If it isn’t zero, we can consider the possibility that it has oc-
curred. However, a nonzero value doesn’t imply that alias- series from the frequency information is
ing has necessarily happened. The Fourier coefficients in
Figure 5 don’t go to zero even in the adequately sampled S (t ) =
case. Zero padding of this example will show a great deal a nt  2
more detail, but the transform is still nonzero at the Nyquist ( )
 0 + ∑  ak cos ( 2π k∆ft ) + bk sin ( 2π k∆ft )   ⋅ . (8)
frequency.  2 k =1  N

Relation to Fourier Series For the case N = 2n, ak represents the real part of the trans-
There is a direct connection between the real and imaginary formed signal, bk the imaginary part, nt the number of terms
parts of the frequency information from an FFT and the co- to be included in the series (where nt < N/2), and f the spac-
efficients in a Fourier series that would represent the corre- ing in the frequency domain.
sponding time-domain signal. As we noted earlier, for the An alternate form in terms of magnitude and phase is also
conditions stated, the transform of a single-frequency sine possible. Given that
wave is imaginary, whereas the transform of a single-
 lm( h j ) 
frequency cosine wave is real. So, in a Fourier series of the ϕ j = tan −1  , (9)
time-domain signal, we would expect the real parts of the fre-  Re( h j ) 
quency information to be associated with cosine series and
the imaginary parts with sine series. This is, in fact, the case. where hk = ak +ibk and the Hj are the magnitudes of hj, the
An equation for recreating the original signal as a Fourier series is given by

MARCH/APRIL 2005 85
EDUCATION

1.5

1.0
Signal/F

0.5

0.0 with an index greater than the number of desired terms


(here, nt = 10) are set to zero.
0.5
0 0.2 0.4 0.6 0.8 1 Windows
Time Windows are useful for extracting and/or smoothing data.
A window is typically a positive, smooth symmetric function
Figure 7. A comparison of the original time-domain signal that has a value of one at its maximum and approaches zero
and its partial reconstruction as a Fourier series. The original at the extremes. (A window might have a discontinuity in its
signal (dotted curve) and the first 10 terms of a Fourier series first derivative, giving it an inverted V shape—such a win-
(solid curve) computed using coefficients from the original dow is sometimes referred to as a “tent”—or two disconti-
signal’s FFT. nuities for a rectangular or trapezoidal shape.) We apply
windows by multiplying time-domain data by the window
function. Of course, whenever a window is applied, it alters
at least some of the data.
1.0
Smoothing windows, for example, reduce the amplitude
of the time-domain data at both the beginning and the end
of the windowed data set. One effect of this smoothing is to
reduce leakage in the frequency domain. In Figure 8, we
Window amplitude

show comparative plots of four frequently used windows.


We show the effect of applying three of those windows to a
0.5 sine wave sequence in Figure 9.
Let’s look at the expressions for four common windows:

1 ( inside )
Rectangular: recwi = 
0 ( outside )
Hamming: hamwi = 0.54 – 0.46  cos(2    i/N)
0.0 Hann: hanwi = 0.5 – 0.5  cos(2    i/N)
Blackman: blkwi = 0.42 – 0.5  cos(2    i/N) + 0.08 
Time cos(4    i/N). (11)

Figure 8. The shapes of four different windows. From the The Hamming and Hann windows differ in only one pa-
side, we see a rectangular (red), Hamming (blue), Hann rameter: if the corresponding coefficients are written  – (1
(green), and Blackman (magenta), respectively. We’ll apply – ), then  is 0.54 for the Hamming window and 0.5 for the
three of these windows to a sine wave sequence in Figure 9. Hann. The fact that a slight change in the parameter value
gives rise to two different windows hints at the sensitivity of
the windowing process to the value of . If  decreases from
0.5, the side lobes increase significantly in amplitude. As 
a nt 
 2 k =1
(
S ( t ) =  0 + ∑ H k cos ( 2π k∆ft − ϕ k ) ) ⋅ N2 . (10) increases from 0.5 to 0.54, the relative sizes of the side lobes
change. The first set of the Hann side lobes tend to be sig-

nificantly larger than those of the Hamming case, but sub-
In Figure 7, we see the square wave signal (one cycle of a sequent Hann side lobes decrease rapidly in magnitude and
square wave that ranges between 0 and 1 with equal times become significantly smaller than the Hamming side lobes.
high and low) to be transformed as well as the signal con- As to general appearance, the Hamming window doesn’t
structed from the first 10 terms of a Fourier series using the quite go to zero at the window’s endpoints whereas the Rec-
coefficients from the FFT as per Equation 8. We would ob- tangular, Hann, and Blackman windows do. Several other
tain an identical waveform if we took the IFFT of a trunca- windows also exist, including Bartlett (tent function), Welch
tion of the original FFT, where all the FFT’s coefficients (parabolic), Parzen (piece-wise cubic), Lanczos (central lobe

86 COMPUTING IN SCIENCE & ENGINEERING


1

Amplitude
0

of a sine function), Gaussian, and Kaiser (which uses a mod-


ified Bessel function).
Each of these windows has particular characteristics. Two
particularly useful points of comparison in the frequency space –1
are the full width at half maximum of the central peak and the Time
relative magnitude of central peak to that of the side lobes. An
unwindowed signal’s FFT has the narrowest central peak, but Figure 9. A comparison of the effects (from left to right) of a
it also has considerable leakage that decays slowly. The curves rectangular, a Hamming, and a Blackman window on a sine
for the Hamming and Blackman cases show wider central wave sequence. For convenience of display, we compute the
peaks but significantly smaller side lobes. The Blackman win- three examples separately, shift the second and third in time,
dow has the largest peak height to first sidelobe height ratio. and sum the set, with the effect that the three examples
There is no final summary statement that says you should appear sequentially in time; because each example is zero
use window x in all cases—circumstances decide that. In the outside its window zone, the results do not interfere. The
bat-chirp analysis we’ll examine in part two of this series, three windows have the same width, but as Figure 8 shows,
we’ll use an isosceles trapezoidal window. Such a window isn’t the Blackman window increases in magnitude more slowly
generally recommended, but for the bat-chirp case, it’s the than the others, and we can observe the effect on the sine
best choice. (A split cosine bell curve, a Hann window shape wave signal. The difference between Hamming and Blackman
for the beginning and end of the curve with a magnitude of windowing is also evident.
one in the interior, would give essentially the same results.)
As an example of windowing’s effect on the transform, we
apply a Blackman window to the time-domain data associ- 1
ated with Figure 1b. Two effects of applying this window, as
Figure 10 shows, are that the leakage is greatly reduced and
Amplitude

that the central peak is broadened. Obtaining the needed


detail to observe these features requires zero padding. 0.5

I n part two of this series, we’ll discuss auto-regression


spectral analysis and the maximum entropy method, con-
volution, and filtering. In the third and final installment,
0

0 0.2 0.4 0.6 0.8 1.0 1.2 1.4


we’ll present some applications, including the analysis of a Frequency
bat chirp and atmospheric sea-level pressure variations in
the Pacific Ocean. Figure 10. The effects of windowing as seen in the transform
Whether there is an interest in CO2 concentrations in the space. The FFT of the 3.52-cycle example in Figure 1 and the
atmosphere, ozone levels, sunspot numbers, variable star result of multiplying time-domain data and a Blackman
magnitudes, the price of pork, or financial markets, or if the window before taking the FFT are shown without zero padding
interest is in filtering, correlations, or convolutions, Fourier (circles) and with zero padding (solid curves). The windowed
transforms provide a very powerful and, for many, an essen- form reduces leakage but has a broader central lobe.
tial algorithmic tool.

References 6. S. Smith, Digital Signal Processing, Newnes, 2003.


1. R.N. Bracewell, The Fourier Transform and Its Applications, McGraw-Hill, 7. P. Bloomfield, Fourier Analysis of Time Series, John Wiley & Sons, 2000.
1965.
8. P. Hertz and E.D. Feigelson, “A Sample of Astronomical Time Series,”
2. E.O. Brigham, The Fast Fourier Transform and Its Applications, Prentice- Applications of Time Series Analysis in Astronomy and Meteorology, T.
Hall, 1988. Subba Rao, M.B. Priestley, and O. Lessi, eds., Chapman & Hall, 1979,
3. J.F. James, A Student’s Guide to Fourier Transforms, Cambridge Univ. pp. 340–356.
Press, 1995. 9. W.H. Press et al., Numerical Recipes in Fortran, Cambridge Univ. Press,
4. C.T. Chen, Digital Signal Processing, Oxford Univ. Press, 2001. 1992.
5. S.L. Marple Jr., Digital Spectral Analysis with Applications, Prentice Hall, 1987. 10. L.N. Trefethen, Spectral Methods in Matlab, SIAM Press, 2000.

MARCH/APRIL 2005 87
EDUCATION

11. C.D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM Press,
2000. ADVERTISER / PRODUCT INDEX
12. J.W. Cooley and J.W. Tukey, “An Algorithm for the Machine Calculation
of Complex Fourier Series,” Mathematics of Computation, vol. 19, no.
MARCH/APRIL 2005
90, 1965, pp. 297–301.
13. D.N. Rockmore, “The FFT: An Algorithm the Whole Family Can Use,” Advertiser Page Number
Computing in Science & Eng., vol. 2, no. 1, 2000, pp. 60–64.
DE Shaw & Company 7
Denis Donnelly is a professor of physics at Siena College. His research
John Wiley Cover 2
interests include computer modeling and electronics. Donnelly received
a PhD in physics from the University of Michigan. He is a member of the Nanotech 2005 Cover 4
American Physical Society, the American Association of Physics Teachers,
North Carolina Central University 4
and the American Association for the Advancement of Science. Contact
him at [email protected]. Boldface denotes advertisements in this issue.

Advertising Personnel
Bert Rust is a mathematician at the National Institute for Standards and
Technology. His research interests include ill-posed problems, time-se- Marion Delaney Marian Anderson Sandy Brown
IEEE Media, Advertising Coordinator IEEE Computer Society,
ries modeling, nonlinear regression, and observational cosmology. Rust
Advertising Director Phone: +1 714 821 8380 Business Development
received a PhD in astronomy from the University of Illinois. He is a mem- Phone: +1 212 419 7766 Fax: +1 714 821 4010 Manager
Fax: +1 212 419 7589 Email: Phone: +1 714 821 8380
ber of SIAM and the American Astronomical Society. Contact him at
Email: [email protected] Fax: +1 714 821 4010
[email protected]. [email protected] Email:
[email protected]

Advertising Sales Representatives


Submissions: Send one PDF copy of articles and/or proposals to Norman Chonacky, Editor
in Chief, [email protected]. Submissions should not exceed 6,000 words and 15 refer- Mid Atlantic Midwest (product) Northwest (product)
ences. All submissions are subject to editing for clarity, style, and space. (product/recruitment) Dave Jones Peter D. Scott
Dawn Becker Phone: +1 708 442 5633 Phone: +1 415 421-7950
Editorial: Unless otherwise stated, bylined articles and departments, as well as product Phone: +1 732 772 0160 Fax: +1 708 442 7620 Fax: +1 415 398-4156
and service descriptions, reflect the author’s or firm’s opinion. Inclusion in CiSE does not Fax: +1 732 772 0161 Email: Email:
Email: [email protected] [email protected]
necessarily constitute endorsement by the IEEE, the AIP, or the IEEE Computer Society.
[email protected]
Will Hamilton
Circulation: Computing in Science & Engineering (ISSN 1521-9615) is published Southern CA (product)
Phone: +1 269 381 2156
New England (product) Marshall Rubin
bimonthly by the AIP and the IEEE Computer Society. IEEE Headquarters, Three Park Ave., Fax: +1 269 381 2556
Jody Estabrook Phone: +1 818 888 2407
17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Email:
Phone: +1 978 244 0192 Fax: +1 818 888 4907
[email protected]
Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714 821 Fax: +1 978 244 0103 Email:
8380; IEEE Computer Society Headquarters, 1730 Massachusetts Ave. NW, Washington, Email: Joe DiNardo [email protected]
[email protected] Phone: +1 440 248 2456
DC 20036-1903; AIP Circulation and Fulfillment Department, 1NO1, 2 Huntington
Fax: +1 440 248 2594 Northwest/Southern CA
Quadrangle, Melville, NY 11747-4502. Annual subscription rates for 2005: $42 for Com- New England Email: (recruitment)
puter Society members (print only) and $42 for AIP society members (print plus online). (recruitment) [email protected] Tim Matteson
For more information on other subscription prices, see www.computer.org/subscribe/ or Robert Zwick Phone: +1 310 836 4064
Phone: +1 212 419 7765 Southeast Fax: +1 310 836 4067
https://www.aip.org/forms/journal_catalog/order_form_fs.html. Computer Society back
Fax: +1 212 419 7570 (product) Email:
issues cost $20 for members, $96 for nonmembers; AIP back issues cost $22 for members. Email: [email protected] Bob Doran [email protected]
Postmaster: Send undelivered copies and address changes to Computing in Science &
Phone: +1 770 587 9421
Connecticut (product) Fax: +1 770 587 9501 Japan
Engineering, 445 Hoes Ln., Piscataway, NJ 08855. Periodicals postage paid at New York, Stan Greenfield Email: (product/recruitment)
NY, and at additional mailing offices. Canadian GST #125634188. Canada Post Phone: +1 203 938 2418 [email protected] Tim Matteson
Corporation (Canadian distribution) publications mail agreement number 40013885. Fax: +1 203 938 3211 Phone: +1 310 836 4064
Email: Southeast (recruitment) Fax: +1 310 836 4067
Return undeliverable Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8
[email protected] Thomas M. Flynn Email:
Canada. Printed in the USA. Phone: +1 770 645 2944 [email protected]
Midwest/Southwest Fax: +1 770 993 4423
Copyright & reprint permission: Abstracting is permitted with credit to the source.
(recruitment) Email: Europe
Libraries are permitted to photocopy beyond the limits of US copyright law for private use Darcy Giovingo [email protected] (product/recruitment)
of patrons those articles that carry a code at the bottom of the first page, provided the per- Phone: +1 847 498-4520 Hilary Turnbull
copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Fax: +1 847 498-5911 Southwest (product) Phone: +44 1875 825700
Rosewood Dr., Danvers, MA 01923. For other copying, reprint, or republication
Email: Josh Mayer Fax: +44 1875 825701
[email protected] Phone: +1 972 423 5507 Email:
permission, write to Copyright and Permissions Dept., IEEE Publications Administration, [email protected]
Fax: +1 972 423 6858
445 Hoes Ln., PO Box 1331, Piscataway, NJ 08855-1331. Copyright © 2005 by the Email: josh.mayer@wagen
Institute of Electrical and Electronics Engineers Inc. All rights reserved. eckassociates.com

88 COMPUTING IN SCIENCE & ENGINEERING


EDUCATION EDUCATION

Editor: Denis Donnelly, [email protected]

THE FAST FOURIER TRANSFORM FOR


EXPERIMENTALISTS, PART II: CONVOLUTIONS
By Denis Donnelly and Bert Rust

W HEN UNDERGRADUATE STUDENTS

FIRST COMPUTE A FAST FOURIER

TRANSFORM (FFT), THEIR INITIAL IMPRESSION IS


about filtering and detrending until the next installment.
Correlation is another closely related process and can help
determine if a particular signal occurs in another datastream.
Deconvolution is the reverse: in effect, it uses the
process itself to remove the effects of an undesired convo-
lution or data distortion. When taking data, a convolution
OFTEN A BIT MISLEADING. THE PROCESS ALL can obscure the desired information, perhaps due to in-
terfering physical interactions or by the detection system
seems so simple and transparent: the software takes care of itself (which has its own response). A gamma ray arriving
the computations, and it’s easy to create the plots. But once at a detector, for example, has a well-defined energy, yet
they start probing, students quickly learn that like any rich the detector output shows several associated effects related
scientific expression, the implications, the range of applica- to the interaction of the gamma ray with a crystal. If a nu-
bility, and the associated multilevel understandings needed clear physicist is interested in the gamma ray’s energy or
to fully appreciate the subtleties involved take them far be- intensity instead of the detector’s response, then he or she
yond the basics. Even professionals find surprises when per- needs to know how to extract the appropriate information
forming such computations, becoming aware of details that from this much larger signal set. Deconvolving can remove
they might not have fully appreciated until they asked more the detector response, restoring the data to a form closer
sophisticated questions. to the original.
In the first of this five-part series,1 we discussed several When noise accompanies a signal, as it always does to
basic properties of the FFT. In addition to some funda- some extent, a direct deconvolution can generate unstable
mental elements, we treated zero-padding, aliasing, and the results, which renders the process unusable. One way to re-
relationship to a Fourier series, and ended with an intro- duce the noise’s influence is to assume that analytic func-
duction to windowing. In this article, we’ll briefly look at tions can represent either (or both) the original signal and
the convolution process. the convoluted signal. When such a representation is pos-
sible, the chances of success with the deconvolution process
Convolution greatly improve. Still, deconvolution is beyond the scope of
Convolution, a process some would say lies at the heart of this series, so we won’t discuss it here.
digital signal processing, involves two functions, which we’ll The continuous convolution is defined as
call x(t) and h(t), where x(t), for example, could be an input
signal and h(t) some linear system’s impulse response. When y(t) = x(t)  h(t)
convolved, , they yield an output function y(t). The ∞ ∞
process expresses the amount of one function’s overlap as it = ∫ x()h(t – )d = ∫ x(t – )h()d. (2)
is shifted over the other, providing a kind of blending of the −∞ −∞
two functions:
In his book on the FFT, E. Oran Brigham states that “Pos-
y(t) = x(t)  h(t). (1) sibly the most important and powerful tool in modern sci-
entific analysis is the relationship between [Equation 2] and
This process has many applications. Filtering is one exam- its Fourier transform.”2 The relationship referred to is the
ple: given the appropriate impulse response, we can create time–convolution theorem:
any one of a number of filters. We’ll give some examples in
the next section, but we’ll postpone further information F{x(t)  h(t)} = F{x(t)}  F{h(t)} = X(f )H(f ), (3)

92 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 © 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING
where  denotes ordinary multiplication, and X( f ) and where the product of the two transforms is element by el-
H( f ) are the continuous Fourier transforms of x(t) and ement and ifft stands for inverse FFT. (While we’re dis-
h(t). cussing convolution in the time domain and multiplication
In real life, we seldom have access to the functions x(t) and in the frequency domain, we should mention that an in-
h(t); instead, we have only finite time-series representations, terchange of roles is also possible. Multiplication in the
such as time domain corresponds to convolution in the frequency
domain.)
xk = x(k  t) We can readily program the summation required to com-
pute a convolution: as the number of data points increases,
and the computational advantage goes to the convolution’s im-
plementation with FFT, even though it requires several
hk = h(k  t), k = 0, 1, 2, …, N – 1. (4) steps. The reason is that a convolution in the time domain
requires N 2 multiplications whereas the computational cost
Given this discrete representation, we can’t compute y(t) ex- of taking the FFT route is on the order of 3N log2(N) mul-
actly, but we can compute a time-series approximation to it. tiplications. Despite the fact that three steps are involved,
Specifically, we can write an expression for the discrete con- for large N, the advantages of the FFT approach are unmis-
volution as takable. Even for the very modest case of N = 250, using
FFTs to compute a convolution is already more than 10
N −1 N −1 times faster than the time-domain computation.
yn = ∆t ∑ xk ⋅ hn − k = ∆t ∑ xn − k ⋅ hk One way to implement the summation shown in Equation
k =1 k =1 6 is by expressing the equation itself in matrix form. Create
n = 0, 1, 2, …, N – 1. (5) an N  N matrix in which the first column takes on the x-
values from x0 to xN–1. Let the next column take on the same
If the response function were the trivial example in which x-values but shifted down one row, with the last value be-
h0 has the value 1 and all other h values are 0, then the con- coming the first, and repeat this rolling procedure for each
volution process would just reproduce the input signal (if h0 successive column. Multiplying this x-matrix by the h-vector
differed from 1, it would scale the input signal proportion- yields a circular convolution. We get a linear convolution from
ally to h0). If all h’s were 0 except for hm, then we would scale this same multiplication if we set all the terms in the x-matrix
the input signal by the magnitude of hm and delay it by m above the diagonal to zero.
sample intervals. The convolution process is the summation To avoid the wraparound pitfall, we could do one of two
of such elements. things: compute the linear convolution (setting all elements
It’s important to keep two details in mind when per- above the x-matrix’s diagonal to zero) or zero-pad the func-
forming a convolution process: one, the two signals must tions so that the total number of data points is at least N0 +
have the same number of elements (zero-padding easily K0 – 1, where N0 and K0 are the original numbers of data
solves this problem), and two, the discrete convolution points in the functions x and h. With this number of ele-
theorem treats the data as if it were periodic. We can ex- ments, we avoid any distortion due to wraparound:
press the summation associated with this circular convolu-
tion as
⎡ y0 ⎤ ⎡ x0 x N −1 x N − 2  x1 ⎤ ⎡ h0 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
N −1
⎢ y1 ⎥ ⎢ x1 x0 x N −1  x2 ⎥ ⎢ h1 ⎥
∑ x [( n − k ) mod N ] h( k ) . (6) ⎢ y2 ⎥ = ⎢ x 2 x1 x0  x3 ⎥ ⋅ ⎢ h2 ⎥
k=0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢     ⎥ ⎢ ⎥
This cyclic effect causes a wraparound problem that we’ll ⎢y ⎥ ⎢ ⎥ ⎢ ⎥
explain in more detail later. ⎣ N −1 ⎦ ⎣ x N −1 x N − 2 x N − 3  x0 ⎦ ⎣hN −1 ⎦ .(8)
The FFT form of the convolution of two time series is
given by Examples
As an example of a linear convolution calculation, consider
x  h = ifft(fft(x)  fft(h)), (7) the signals

JULY/AUGUST 2005 93
EDUCATION

2
Discrete convolution
Continuous convolution
x(t)
1 h(t)
Amplituude

associated continuous and discrete convolutions. The dis-


–1 crete convolution as computed by taking the IFFT of the
product of the FFTs of x and h is identical to that obtained
via matrix multiplication.
–2
0 0.2 0.4 0.6 0.8 1.0 Figure 2 shows the wraparound associated with the circu-
Time lar convolution example. The convolution is altered for the
number of nonzero data points in h.
Figure 1. Comparison of continuous and discrete convolution In Figure 3, we show the FFTs of the linear and circular
calculations. We calculated the convolution of x(t) and h(t) in convolutions. The FFT of the convolution resulting from
three ways: continuous and discrete in the time and frequency the matrix multiplication is the same as the product of x and
domains. The discrete convolution calculations approach the h’s FFTs. In the figure, we can see some frequency depen-
continuous form. dence associated with the convolution process. Figure 4 gives
an overall summary of the operations and their interrelation.
For a more realistic example of convolution, let’s look at
0.4 the propagation of an acoustic pressure wave through a rec-
tangular waveguide. The waveguide’s resonant conditions
0.2 restrict the wave numbers of the transverse wave compo-
nents to discrete values, and the wave propagates only in cer-
Amplituude

tain modes. If we treat the waveguide as a linear device with


0
an impulse response h, then we can predict the form of the
transmitted signal by taking the convolution of our input
–0.2 signal x and the impulse response of the waveguide. Kristien
Meykens and colleagues3 show that for modes other than (0,
–0.4 0), the impulse response departs from a -function in which
0.0 0.2 0.4 0.6 0.8 1.0
the lower frequencies resemble a reversed chirp.
Time
Figure 5 shows the convolution of an input signal consist-
ing of a brief acoustic burst with the impulse response of a rec-
Figure 2. Convolution with and without wraparound tangular waveguide (which we represent as a chirp function).
distortions. The blue curve shows the circular form of the We form this input signal by multiplying an 8-kHz sine wave
convolution without zero-padding. The red curve is based on a by a Bartlett (tent-shaped) window. The chirp function repre-
zero-padded calculation that avoids the distortion associated sents the impulse response for the waveguide’s (1, 0) mode, and
with circularity. The diamonds show the h response curve f(t) = 103 + 2  106t represents the chirp function’s frequency
(scaled at 10 percent of true height); the width of the dependence. The chirp expression is simply sin((t)), where
response function is associated with the region in which the t
circular convolution is spoiled. θ ( t ) = ∫ 2π f ( t ') dt '. (11)
0

⎧sin( 2π t ) + sin( 4π t ), 0 ≤ t ≤ 1,
x(t ) = ⎨
⎩0, otherwise,
(9)
I n general, a convolution shows the two functions’ entan-
glement. The examples we’ve discussed here provide a
clear instance in which we can see where the similarity be-
and
tween the input signal and the impulse response is the great-
⎧1, 0 ≤ t ≤ 0.3125, (10) est. Such computations are in reasonable agreement with ex-
h( t ) = ⎨
⎩0, otherwise, perimental results.3
In the next installment of this series, we’ll continue to ex-
which we discretize to have 32 equally spaced points on the amine the problem of spectrum estimation with a discussion
interval [0,1]. of the autocorrelation function and the correlogram esti-
Figure 1 shows the signal, the impulse response, and the mates, which are based upon it.

94 COMPUTING IN SCIENCE & ENGINEERING


Zero-padded linear
0.25 Unpadded linear x(k ) data h(k) data
Zero-padded circular Computation in time
Unpadded circular domain as per definition
0.20

Convolution in
0.15
the time domain
FFT H = FFT(h ) X = FFT(x )
0.10
IFFT
0.05 Multiplication in the
frequency domain
H*X
0.00
0 5 10 15
Frequency Figure 4. The interrelation between time and frequency
domain operations that lead to convolution. Multiplying the
Figure 3. The FFTs of the linear and circular convolutions. The FFT’s of x and h followed by an IFFT also lead to the
two curves are shown with (solid curves) and without (circles convolution. An FFT of the convolution would yield the same
and diamonds) zero padding. We computed these FFTs from result as the product of the FFTs.
the convolution data for Figure 1’s discrete transform. The
results are the same as those obtained by taking the product
of x and h’s FFTs.

References
Amplituude

1. D. Donnelly and B. Rust, “The Fast Fourier Transform for Experimental-


ists Part I: Concepts,” Computing in Science & Eng., vol. 7, no. 2, 2005,
pp. 80–88.
2. E. Oran Brigham, The Fast Fourier Transform and Its Applications, Pren-
tice-Hall, 1988.
3. K. Meykens, B. Van Rompaey, and J. Janssen, “Dispersion in Acoustic
Waveguide: A Teaching Laboratory Experiment,” Am. J. Physics, vol. 67,
no. 5, 1999, pp. 400–406. 0 1 2 3 4 5 6 7 8
Time (milliseconds)
Denis Donnelly is a professor of physics at Siena College. His research
interests include computer modeling and electronics. Donnelly received Figure 5. The convolution of a windowed sine wave burst and
a PhD in physics from the University of Michigan. He is a member of the a chirp function. The top curve shows the input signal, and the
American Physical Society, the American Association of Physics Teachers, middle curves show the impulse response of the waveguide (a
and the American Association for the Advancement of Science. Contact chirp function). The chirp frequency increases linearly with
him at [email protected]. time, ranging from roughly 1 kHz at t = 0 to roughly 17 kHz at
t = 8 ms; the frequency increases at a rate of approximately 2
Bert Rust is a mathematician at the US National Institute for Standards kHz/ms. The bottom curves show the convolution and the
and Technology. His research interests include ill-posed problems, time- approximate frequencies associated with the most significant
series modeling, nonlinear regression, and observational cosmology. Rust section of the convolution over time. The one marked point
received a PhD in astronomy from the University of Illinois. He is a mem- represents the frequency of the windowed sine curve which is
ber of SIAM and the American Astronomical Society. Contact him at 8 kHz. The slope of the line representing frequency is about
[email protected]. 1.9 kHz/ms.

Submissions: Send one PDF copy of articles and/or proposals to Norman Chonacky, Editor in Chief, [email protected]. Submissions should not exceed 6,000 words and 15 references. All
submissions are subject to editing for clarity, style, and space.
Editorial: Unless otherwise stated, bylined articles and departments, as well as product and service descriptions, reflect the author’s or firm’s opinion. Inclusion in CiSE does not necessarily
constitute endorsement by the IEEE, the AIP, or the IEEE Computer Society.
Circulation: Computing in Science & Engineering (ISSN 1521-9615) is published bimonthly by the AIP and the IEEE Computer Society. IEEE Headquarters, Three Park Ave., 17th Floor, New
York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714 821 8380; IEEE Computer Society
Headquarters, 1730 Massachusetts Ave. NW, Washington, DC 20036-1903; AIP Circulation and Fulfillment Department, 1NO1, 2 Huntington Quadrangle, Melville, NY 11747-4502. Annual
subscription rates for 2005: $42 for Computer Society members (print only) and $42 for AIP society members (print plus online). For more information on other subscription prices, see
www.computer.org/subscribe/ or https://www.aip.org/forms/journal_catalog/order_form_fs.html. Computer Society back issues cost $20 for members, $96 for nonmembers; AIP back issues
cost $22 for members.
Postmaster: Send undelivered copies and address changes to Computing in Science & Engineering, 445 Hoes Ln., Piscataway, NJ 08855. Periodicals postage paid at New York, NY, and at
additional mailing offices. Canadian GST #125634188. Canada Post Corporation (Canadian distribution) publications mail agreement number 40013885. Return undeliverable
Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8 Canada. Printed in the USA.
Copyright & reprint permission: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of US copyright law for private use of patrons
those articles that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Dr., Danvers,
MA 01923. For other copying, reprint, or republication permission, write to Copyright and Permissions Dept., IEEE Publications Administration, 445 Hoes Ln., PO Box 1331, Piscataway, NJ
08855-1331. Copyright © 2005 by the Institute of Electrical and Electronics Engineers Inc. All rights reserved.
EDUCATION EDUCATION

Editor: Denis Donnelly, [email protected]

THE FAST FOURIER TRANSFORM


FOR EXPERIMENTALISTS PART III:
CLASSICAL SPECTRAL ANALYSIS
By Bert Rust and Denis Donnelly

E ACH ARTICLE IN THIS CONTINUING SERIES ON THE FAST

FOURIER TRANSFORM (FFT) IS DESIGNED TO ILLUMINATE

NEW FEATURES OF THE WIDE-RANGING APPLICABILITY OF THIS


P( f ) =

lim
1
T →∞T
–  f  .
T
∫−T
2
x ( t ) exp( −2π ift ) dt ,

(3)

TRANSFORM. THIS SEGMENT DEALS WITH SOME ASPECTS OF THE But we have only a discrete, real time
series
spectrum estimation problem. Before Spectrum Estimation’s
we begin, here’s a short refresher Central Problem xj = x(tj), with tj = jt,
about two elements we introduced The periodogram, invented by j = 0, 1, …, N – 1, (4)
previously, windowing1 and convolu- Arthur Schuster in 1898,3 was the
tion.2 As we noted in those install- first formal estimator for a time se- defined on a finite time interval of
ments, a convolution is an integral ries’s frequency spectrum, but many length Nt. We saw in Part I1 that
that expresses the amount of overlap others have emerged in the ensuing sampling x(t) with sample spacing t
of one function as it is shifted over an- century. Almost all use the FFT in confined our spectral estimates to the
other. The result is a blending of the their calculations, but they differ in Nyquist band 0  f  1/2t. We used
two functions. Closely related to the their assumptions about the missing the FFT algorithm to compute the dis-
convolution process are the processes data; that is, the data outside the ob- crete Fourier transform (DFT)
of cross-correlation and autocorrela- servation window. These assumptions
N −1
tion. Computing the cross-correlation have profound effects on the spectral  j 
differs only slightly from the convolu- estimates. Let t be time, f be fre- Xk = ∑ x j exp  −2π i k
 N 
j =0
tion; it’s useful for finding the degree quency, and x(t) a real function on the
of similarity in signal patterns from interval – < t < . The continuous k = 0, 1, …, N/2, (5)
two different data streams and in de- Fourier transform (CFT) of x(t) is de-
termining the lead or lag between fined by which approximates the CFT X(f ) at
such similar signals. Autocorrelation the Fourier frequencies

is also related to the convolution; it’s X ( f ) = ∫ x ( t ) exp( −2π ift ) dt , k , k = 0, 1, ..., N/2.
described later. Windowing, used in −∞ fk = (6)
N ∆t
extracting or smoothing data, is typi-
cally executed by multiplying time- –  f  , (1) We then computed periodogram esti-
domain data or its autocorrelation mates of both the PSD and the ampli-
function by the window function. A where i ≡ −1 . If we knew x(t) per- tude spectrum by
disadvantage of windowing is that it fectly and could compute Equation 1,
1
alters or restricts the data, which, of then we could compute an energy P( fk ) = | X k |2 , k = 0, 1, …, N/2,
course, has consequences for the spec- spectral density function N
tral estimate. In this installment, we
continue our discussion, building on E(f ) = |X(f )|2, –  f  , (2) 2
A( f k ) = | X k |, k = 0, 1, …, N/2. (7)
these concepts with a more general N
approach to computing spectrum es- and a power spectral density function
timates via the FFT. (PSD) by We also saw that we could approximate

74 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 © 2005 IEEE COMPUTING IN SCIENCE & ENGINEERING
x(t) = sin [2π(0.50)(t + 0.25)] + noise
2.5
2.0 Signal and noise
1.5 Signal without noise

xi = x(ti)
1.0
0.5
0.0
–0.5
–1.0
–1.5
the CFT and the frequency spectrum 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
on a denser frequency mesh simply by (a) t (time units)
appending zeroes to the time series.
Periodogram
This practice, called zero padding, is 9
just an explicit assertion of an implicit 8 Signal and noise

Power spectral
density (PSD)
7 Signal without noise
assumption of the periodogram 6
method—namely, that the time series 5
4
is zero outside the observation window. 3
Frequency spectrum estimation is a 2
1
classic underdetermined problem be- 0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
cause we need to estimate the spectrum
at an infinite number of frequencies us- (b) Frequency
ing only a finite amount of data. This
problem has many solutions, differing Figure 1. Original and new time series as defined by Equation 8. (a) The noise-
mainly in what they assume about the corrupted time series and the uncorrupted series originally used in Part I’s Figure 1b.
missing data. The noise is independently, identically distributed n(0, 0.25). (b) Periodograms of
Before considering other solutions the two times series plotted in (a). For the noise-corrupted series, the peak is
^
to this problem, let’s reconsider one of centered on frequency f 0 = 0.493.
the examples from Part I1 (specifically,
Figure 1b), but make it more realistic
by simulating some random measure- Tukey’s correlogram estimators.4 ␶m = mt, m = 0, 1, …, N – 1. (11)
ment errors. More precisely, we take They’re based on the autocorrelation
N = 32, t = 0.22, and consider the theorem (sometimes called Wiener’s Because we’re working with a real time
time series theorem), which states that if X(f ) is series, and ␳(␶–m) = ␳(␶m), we don’t need
the CFT of x(t), then |X(f )|2 is the to worry about evaluating ␳(␶) at neg-
tj = jt, j = 0, 1, 2, …, N – 1, CFT of the autocorrelation function ative lags.
(ACF) of x(t). Norbert Wiener defined Because ␳(␶) is a limit of the average
xj = x(tj) = sin[2␲f0(tj + 0.25)] + j, (8) the latter function as5 value of x* (t)x(t + ␶) on the interval
[–T, T ], the obvious estimator is the
1 T
with f0 = 0.5, and each j a random ρ(τ ) = lim
T →∞ 2T
∫−T x * (t ) x(t + τ )dt , sequence of average values
number drawn independently from a
ρˆ m = ρˆ ( m∆t )
normal distribution with mean zero – < ␶ < , (9)
and standard deviation ␴ = 0.25. This 1 N − m −1
new time series is plotted together in which the variable ␶ is called the lag
= ∑ xx ,
N − m n=0 n n + m
with the original uncorrupted series in (the time interval for the correlation of
Figure 1a. Both series were zero x(t) with itself), and x*(t) is the com- m = 0, 1, …, N – 1. (12)
padded to length 1,024 (992 zeroes plex conjugate of x(t). Thus, if we
appended) to obtain the periodogram could access x(t), we could compute This sequence is sometimes called the
estimates given in Figure 1b. It’s re- the PSD in two ways: either by Equa- unbiased estimator of ␳(␶) because its
markable how well the two spectra tion 3 or by expected value is the true value—that
agree, even though the noise’s stan- ∞ is, { ␳(mt)}
^
= ␳(mt). But the data
dard deviation was 25 percent of the P( f ) = ∫ ρ(τ ) exp( −2π if τ ) dτ . (10) are noisy, and for successively larger
−∞
signal’s amplitude. values of m, the average ␳^ m is based on
But again, we have access to only a fewer and fewer terms, so the variance
The Autocorrelation Function noisy time series x0, x1, …, xN–1, so to grows and, for large m, the estimator
After the periodogram, the next fre- use the second method, we need esti- becomes unstable. Therefore, it’s
quency spectrum estimators to emerge mates for ␳(␶) evaluated at the discrete common practice to use the biased
were Richard Blackman and John lag values estimator

SEPTEMBER/OCTOBER 2005 75
EDUCATION

Autocorrelation estimates
1.0
0.8 Biased estimate
0.6 Unbiased estimate
0.4
␳(m∆t)

0.2
0.0
–0.2
–0.4
–0.6
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 corresponding to the biased and un-
(a) Lag = m∆t biased ACF estimates, shown in Fig-
Correlogram estimates of power spectral density (PSD) ure 2a. The negative sidelobes for the
14 unbiased correlogram show dramati-
Using biased ACF
10 Using unbiased ACF cally why most analysts choose the bi-
ased estimate even though its central
6
peak is broader. The reason for this
P(f )

2
broadening, and for the damped side-
–2
lobes, is that the biased ACF, Equa-
–6 tion 13, can also be computed by
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
multiplying the unbiased ACF, Equa-
(b) f
tion 12, by the triangular (Bartlett) ta-
pering window
Figure 2. Autocorrelation and correlogram estimates for the noisy time series
k
defined by Equation 8. (a) Biased and unbiased estimates of the autocorrelation wk = 1 − ,
function (ACF); (b) correlogram estimates obtained from the ACF estimates in (a).
N
k = 0, 1, 2, …, N – 1. (17)

N − m −1 Recall that we observed the same sort


1
ρˆ m = ρˆ ( m∆t ) =
N
∑ xn xn + m , –1  r^m  1, m = 0, 1, …, N – 1. (15) of peak broadening and sidelobe sup-
n=0 pression in Part I’s Figure 10 when we
m = 0, 1, …, N – 1, (13) Correlogram PSD Estimators multiplied the observed data by a
Once we’ve established the ACF esti- Blackman window before computing
which damps those instabilities and has mate, we can use the FFT to calculate the periodogram.
a smaller total error (bias + variance) the discrete estimate to the PSD. More Notice that the biased correlogram
than does the unbiased estimator. (Bias precisely, the ACF estimate is zero estimate plotted in Figure 2b is identi-
is the difference between the estimator’s padded to have M lags, which gives cal to the periodogram estimate plot-
expected value and the true value of the M/2 + 1 frequencies in the PSD esti- ted in Figure 1b. The equality of these
quantity being estimated.) Figure 2a mate, which we can then compute by two estimates, computed in very dif-
gives plots of both estimates for the approximating Equation 10 with ferent ways, constitutes a finite dimen-
times series that Equation 8 defines. sional analogue of Wiener’s theorem
Pˆk = Pˆ ( f k )
The ACF we have just described is for the continuous PSD.
M −1
sometimes called the engineering auto-  j , Figure 2b’s two PSD correlograms
correlation to distinguish it from the = ∑ ρˆ j exp  −2π i

k
M  aren’t the only members of the class of
j =0
statistical autocorrelation, which is de- correlogram estimates. We can obtain
fined by k = 0, 1, …, M/2. (16) other variations by truncating the ACF
N − m −1 estimate at lags ␶ < (N – 1)t and by
1
N
∑ ( xn − x )( xn + m − x ) Zero padding in this case is an explicit smoothing the truncated (or untrun-
n=0 expression of the implicit assumption cated) estimate with one of the taper-
rˆm = N −1
,
1 that the ACF is zero for all lag values ing windows defined in Part I’s
N
∑ ( xn − x ) 2
␶ > (N – 1)t. We must assume that Equation 11. Most of those windows
n=0 because we don’t know the data out- were originally developed for the cor-
side the observation window. Assum- relogram method; they were then
N −1 ing some nonzero extension for the retroactively applied to the perio-
1
where x =
N
∑ xn . (14) ACF would amount to an implicit dogram method when the latter was
n=0 assumption about the missing ob- resurrected in the mid 1960s. In those
The individual r^m are true correlation served data. days, people often used very severe
coefficients because they satisfy Figure 2b plots the correlograms truncations, with the estimates being

76 COMPUTING IN SCIENCE & ENGINEERING


Untapered correlogram estimates of power spectral density (PSD)
9
mmax = 10
8 mmax = 20
mmax = 31
7 Periodogram

set to zero at 90 percent or more of the 5

P(f )
lags. Not only did this alleviate the 4
variance instability problem, but it also
reduced the computing time—an im- 3
portant consideration before the in- 2
vention of the FFT algorithm, and
when computers were much slower 1
than today. 0
The effect of truncating the biased
ACF estimate is shown in Figure 3, –1
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
where mmax is the largest index for
f
which the nonzero ACF estimate is re-
tained. More precisely,
Figure 3. Three correlogram estimates for Equation 8 computed from the biased
1 N − m −1
ρˆ m = ∑ xx ,
N − m n=0 n n + m
autocorrelation function (ACF) estimator in Equation 13. The periodogram,
although plotted, doesn’t show up as a separate curve because it’s identical to the
m = 0,1,..., mmax , mmax = 31 correlogram.
ˆρm = 0, m = mmax + 1,..., N − 1. (18)
Tapered correlogram estimates of power spectral density (PSD)
It’s clear that smaller values of mmax
9
produce more pronounced sidelobes mmax = 10
and broader central peaks than larger 8 mmax = 20
values. The peak broadening is ac- mmax = 31
7 Periodogram
companied by a compensating de-
crease in height to keep the area under
6
the curve invariant. PSD is measured
in units of power-per-unit-frequency
P(f )

5
interval, so the peak’s area indicates its
associated power. 4
Figure 4 shows the effect of tapering
3
the truncated ACF estimates used in
Figure 3 with a Hamming window 2
 mπ 
wm = 0.538 + 0.462 cos  ,
 mmax 
1

m = 0, 1, 2, …, mmax. (19) 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
f
The sidelobes are suppressed by the ta-
pering, but the central peaks are fur- Figure 4. Three correlogram estimates for the time series generated by Equation 8.
ther broadened. This loss in resolution We computed the estimates by tapering three truncations of the biased estimator in
is the price we must pay to smooth the Equation 13 with a Hamming window. The periodogram was also plotted for
sidelobes and eliminate their negative comparison. Although it has sidelobes, its central peak is sharper than those of the
excursions. correlograms.
Tapering the biased ACF estimates
with the Hamming window amounts
to twice tapering the unbiased esti- Bartlett window, Equation 17. Figure Hamming window, Equation 19.
mates; we can obtain the former from 5 shows the effect of a single tapering Note that the sidelobes are not com-
the latter by tapering them with the of the unbaised estimates with the pletely suppressed, but they’re not as

SEPTEMBER/OCTOBER 2005 77
EDUCATION

Tapered correlograms using unbiased autocorrelation function (ACF)


9
mmax = 10
8 mmax = 20
mmax = 31
7 Periodogram

5 various truncation and windowing


P(f )

4 strategies, but none of them have


proven to be advantageous, so correlo-
3 gram estimates are beginning to fall
2
out of favor. For the past 30 years or so,
most researchers have concentrated on
1 autoregressive spectral estimates,
0
which, as we shall see in Part 4, give
better resolution because they make
–1 better assumptions about the the data
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
outside the window of observation.
f

References
Figure 5. Three correlogram estimates for the time series generated by Equation 8. 1. D. Donnelly and B. Rust, “The Fast Fourier
We computed the estimates by tapering three truncations of the unbiased estimator Transform for Experimentalists, Part I: Con-
cepts,” Computing in Science & Eng., vol. 7,
in Equation 12. We also plotted the periodogram for comparison; again, it has a no. 2, 2005, pp. 80–88.
sharper peak but larger sidelobes . 2. D. Donelly and B. Rust, “The Fast Fourier
Transform for Experimentalists, Part II: Con-
volutions,” Computing in Science & Eng., vol.
7, no. 3, 2005, pp. 92–95.
pronounced as in Figure 3, in which
3. A. Schuster, “On the Investigation of Hidden
the tapering used the Bartlett win- Periodicities with Application to a Supposed
dow. However, the central peaks are Twenty-Six-Day Period of Meteorological
also slightly broader here. This is yet Phenomena,” Terrestrial Magnetism, vol. 3,
no. 1, 1898, pp. 13–41.
another example of the trade-off
4. R.B. Blackman and J.W. Tukey, The Measure-
between resolution and sidelobe ment of Power Spectra, Dover Publications,
suppression. 1959.
This particular example contains 5. N. Wiener, Extrapolation, Interpolation, and
only a single-sinusoid, so it doesn’t Smoothing of Stationary Time Series, MIT
Press, 1949.
suggest any advantage for the taper-
ing and truncation procedures, but
they weren’t developed to analyze a Denis Donnelly is a professor of physics at
time series with such a simple struc- Siena College. His research interests include
ture. Their advantages are said to be computer modeling and electronics. Donnelly
Stay on Track best realized when the signal being received a PhD in physics from the University of
IEEE Internet Computing analyzed contains two or more sinu- Michigan. He is a member of the American
reports emerging tools, soids with frequencies so closely Physical Society, the American Association of
technologies, and applications spaced that sidelobes from two adja- Physics Teachers, and the American Association
implemented through cent peaks might combine and rein- for the Advancement of Science. Contact him
the Internet to support a force one another to give a spurious at [email protected].
worldwide computing peak in the spectrum. But of course, if
environment. two adjacent frequencies are close Bert Rust is a mathematician at the US National
enough, then the broadening of both Institute for Standards and Technology. His re-
peaks might cause them to merge into search interests include ill-posed problems,
an unresolved lump. time-series modeling, nonlinear regression, and
observational cosmology. Rust received a PhD
in astronomy from the University of Illinois. He

www.computer.org/internet/
M uch ink has been used in de-
bating the relative merits of the
is a member of SIAM and the American Astro-
nomical Society. Contact him at [email protected].

78 COMPUTING IN SCIENCE & ENGINEERING


EDUCATION EDUCATION

Editors: David Winch, [email protected]


Denis Donnelly, [email protected]

THE FAST FOURIER TRANSFORM


FOR EXPERIMENTALISTS, PART IV:
AUTOREGRESSIVE SPECTRAL ANALYSIS
By Bert Rust and Denis Donnelly

I T’S RARE THAT WE HAVE ONLY ONE WAY IN

WHICH TO APPROACH A PARTICULAR TOPIC—

FORTUNATELY, SPECTRUM ESTIMATION ISN’T


are generated by an AR process in which each new data
point is formed from a linear combination of the preceding
data plus a random shock. The basic idea is that a system’s
future states depend in a deterministic way on previous
states, but at each time step, a random perturbation drives
the system forward. We can write the AR models of orders
ONE OF THOSE RARE CASES. IN THE MOST 1, 2, and 3 as

recent article of this series,1 we considered the peri- AR(1): xn = –a1xn–1 + un, n = 1, 2, …, N – 1,
odogram and correlogram estimators for the power spec- AR(2): xn = –a1xn–1 – a2xn–2 + un, n = 2, 3, …, N – 1,
tral density (PSD) function. However, they are only two of AR(3): xn = –a1xn–1 – a2xn–2 –
several possibilities. a3xn–3 + un, n = 3, 4, …, N – 1, (1)
In this installment, we consider two additional kinds of
spectrum estimates: autoregressive (AR) estimates and the where a1, a2, and a3 are the AR parameters (whose values
maximum entropy (ME) method. In the first approach, we must be determined to make the model fit the data), and
assume that an AR process generates the time series, un is the random shock at time step n. We assume the ran-
which means we can compute the PSD of the time series dom shocks to be samples from a zero-mean distribution
from estimates of the AR parameters. The second ap- whose variance remains constant in time. The choice of
proach is a special case of the first, but it uses a different negative signs for the parameters is a universal convention
method for estimating the AR parameters. Specifically, it adopted for notational convenience in derivations that we
chooses them to make the PSD’s inverse transform com- won’t give here.
patible with the measured time series, while remaining
maximally noncommittal about the data outside the ob- Autoregressive Spectral Estimates
servational window. In general, for any integer p < N – 1, the AR( p) model is

Autoregressive Time-Series Models


Both the periodogram and correlogram estimates make
rather unrealistic assumptions about the data outside the New Editorial Board Member
observational window. Moreover, when they use tapering
windows or truncation of the autocorrelation function
(ACF), they change the observed data. The years since the
early 1970s have seen the development of a new class of
D avid Winch is an emeritus professor of physics at
Kalamazoo College, Michigan. His research inter-
ests are focused on educational technologies (his most re-
PSD estimators that are based on the idea of fitting a para- cent work is a DVD/CD called “Physics: Cinema Classics”).
metric time-series model to the observed data. This en- Winch has a PhD in physics from Clarkson University. He is
ables us to use estimates of the parameters in the a member of the American Physical Society, the American
theoretical expression of the model’s PSD to get an esti- Association of Physics Teachers, and the National Science
mate of the observed series’ PSD. If the model is a good Teachers Association. He’ll be joining our board as a coed-
representation of the process that generated the data, it itor of the Education department. Contact him at
should hopefully give a more realistic extrapolation for the [email protected] or lead editor Jenny Ferrero at
missing data. [email protected] if you are interested in writing.
The class of models used most often assumes that the data

NOVEMBER/DECEMBER 2005 Copublished by the IEEE CS and the AIP 1521-9615/05/$20.00 © 2005 IEEE 85
EDUCATION

1.4
1.0
0.6
xi = x(ti)

0.2
–0.2
–0.6
–1.0
–1.4
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
t (time units) symmetric and positive definite. Note that the element in
row i and column j is just ␳(i–j), which makes it a Toeplitz ma-
9
8 trix. Norman Levinson2 exploited this special structure to
spectral density

7 devise a recursive algorithm that solves the system in times


6
proportional to (p + 1)2 rather than the (p + 1)3 required by
Power

5
4 a general linear equations solver.
3
2 We can summarize the steps required to compute an au-
1
0 toregressive spectral estimate as follows:
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
Frequency 1. Choose an autoregressive order p  N – 1.
2. Compute ACF estimates ␳ˆ 0, ␳ˆ 1, …, ␳ˆ p using the biased
Figure 1. The time series generated by Equation 10 and its estimator
periodogram. The discrete points in the upper plot are joined 1 N − m −1
by straight-line segments to emphasize the time series nature ρˆ m = ρˆ ( m∆t ) = ∑ xx ,
N n=0 n n + m
of the data. The time series was zero padded to length M =
1,024 to compute the periodogram in the lower plot. m = 0, 1, …, N – 1. (6)

3. Substitute ␳ˆ 0, ␳ˆ 1, …, ␳ˆ p into the matrix in Equation 4


p and use the Levinson algorithm to compute estimates
xn = − ∑ ak xn − k + un , n = p, p + 1, …, N – 1. (2) â1, â2, …, âp and ␳ˆ w.
k =1 4. Substitute â1, â2, …, âp and ␳ˆ w into Equation 3 to
We can show that the PSD function for this model is compute the PSD estimate P̂AR(f ) on any desired fre-
quency mesh.
ρw 1 1
PAR ( f ) = ,− ≤ f ≤ ,
p
2 2∆t 2∆t It’s absolutely necessary to use the biased ACF estimator in
1+ ∑ a j exp( −2π if j ∆t ) step 2. Using the unbiased estimator produces an unstable
j =1 (3) linear system (see Equation 4) with a matrix that numerically
isn’t positive definite.
where ␳w is another adjustable parameter that we can esti- It’s easy to do the calculations in the final step by using the
mate along with a1, a2, …, ap by solving the ( p + 1)  ( p + 1) fast Fourier transform (FFT) algorithm to compute the de-
linear system of equations, nominator in Equation 3. If we define â0  1, then
p p
 ρ0

ρ−1 ρ−2 ... ρ− p  1   ρ 
 w
1+ ∑ aˆ j exp( −2π if j ∆t ) = ∑ aˆ j exp( −2π if j ∆t ). (7)
 ρ1 ρ0 ρ−1 ... ρ− p +1   a1  0  j =1 j =0
   
 
 ρ2 ρ1 ρ0 ... ρ− p + 2   a2  = 0  , (4) Suppose we want to evaluate PAR( f ) at (M/2 + 1) equally
   
... ... ... ... ...  ...  ...  spaced frequencies
    
 ρ p ρ p −1 ρ p − 2 ... ρ0 ap 0
     fk =
k
, k = 0, 1, …, M/2, (8)
M ∆t
which are sometimes called the Yule-Walker equations. The where M > p. Then,
␳-values in the matrix are just the autocorrelations ␳k = ␳(␶k) p p
= ␳(kt) that we defined in the last issue1 with  j 
∑ aˆ j exp( −2π if k j ∆t ) = ∑ aˆ j exp  −2π i M k , (9)
1 T j =0 j =0
ρ(τ ) = lim
T →∞ 2T
∫−T x * (t ) x(t + τ )dt, −∞ < τ < ∞ , (5)
and we can compute these values quite quickly by zero
where x* is the complex conjugate of x(t). We’re working padding the sequence â0, â1, â2, …, âp to have M terms and
with real data, so ␳–k = ␳k, which means that the matrix is applying the FFT algorithm.

86 COMPUTING IN SCIENCE & ENGINEERING


15
14 p=8
13 p = 16
12 p = 24
11 Periodogram
10
9
8

PAR (f)
Two Examples 7
6
If we choose the AR order p properly, the peaks in the AR( p) 5
spectrum will be sharper than those in the periodogram or 4
correlogram estimates. There is no clear-cut prescription 3
2
for choosing p, but a fairly wide range of values will usually
1
give acceptable results. To illustrate the effect of the choice 0
of p, let’s revisit an example time series used in the last issue.1 0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64
Again, we’ll take N = 32, t = 0.22, and consider the time se- Frequency
ries generated by
Figure 2. AR(p) power spectral density (PSD) estimates. For
tj = jt, j = 0, 1, 2, …, N – 1, p = 8, 16, and 24, and the periodogram for the time series
xj = x(tj) = sin[2␲f0(tj + 0.25)] + ⑀j, (10) generated by Equation 10, the plot doesn’t cover the whole
Nyquist band 0  f  2.273, but rather only the frequency
with f0 = 0.5, and each ⑀j a random number drawn indepen- range spanned by the central peak in the periodogram. Using
dently from a normal distribution with mean zero and stan- the whole Nyquist range renders the AR(p) peaks so narrow
dard deviation ␴ = 0.25. Figure 1 plots the time series and that it’s difficult to distinguish between them.
its periodogram, and Figure 2 gives three different AR( p)
spectra for the time series, together with the periodogram
Table 1. Peak centers.
for comparison. Table 1 gives the locations of the peak cen-
ters. Both the AR(16) and AR(24) estimates give better re- Estimate Periodogram AR(8) AR(16) AR(24)
sults than the periodogram, but for real-world problems, it’s 
f0 0.493 0.491 0.495 0.504
best to try several orders in the range N/2  p  3N/4 and
compare them to make the final choice. Our own experience
has indicated that the best choice usually has p  2N/3.
To better illustrate the AR methods’ power, let’s recon- 2.5 Signal with noise
sider another time series originally introduced in Part I of 2.0 Signal without noise
1.5
our series (specifically, Figure 2a).3 We generated it by sum- 1.0
xi = x(ti)

ming two sine waves, with amplitudes A1 = A2 = 1.0, fre- 0.5


0
quencies f1 = 1.0 and f2 = 1.3, and phases ␾1 = ␾2 = 0, at N = –0.5
–1.0
16 equally spaced time points with t = 0.125. Again, we add –1.5
random noise to make the problem more realistic, and write –2.0
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
(a) t (time units)
tj = jt, j = 0, 1, …, N – 1,
Power spectral density

xj = sin[2␲f1tj] + sin[2␲f2tj] + ⑀j, (11) 6.0


Signal with noise
5.0 Signal without noise
with the ⑀j chosen independently from a normal distribution; 3.0
the mean is 0 and standard deviation ␴ = 0.25. This is the 2.0
same error distribution in the preceding example, but the 1.0
samples used here differ from any used there. The top graph 0
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
of Figure 3 gives plots of the noisy and noise-free time se-
(b) f
ries, and the bottom graph gives their periodograms. Figure
4 gives plots of the PSD’s periodogram and AR(12) esti-
mates. The latter clearly indicates the presence of two peaks, Figure 3. Time series. In (a) the noise-corrupted time series
although it doesn’t completely resolve them. The two max- generated by Equation 11, the noise is independently and
ima occur at frequencies very near the true values used to identically distributed n(0, 0.25). (b) Periodograms of the two
generate the time series. It’s remarkable that the AR(12) es- time series plotted in (a). In neither case was the periodogram
timate could obtain such good agreement with the true val- method able to resolve two separate peaks. For the noisy

ues using only 16 noise-corrupted data points. spectrum, the unresolved lump peaks at frequency f = 1.136.

NOVEMBER/DECEMBER 2005 87
EDUCATION

6.0
5.5 PAR(f )
5.0 Periodogram
4.5
4.0
3.5
P(f )

3.0
2.5
second is a statement about what is to be assumed about the
2.0
data outside the observational window. Essentially, it says
1.5
that those assumptions should be minimized.
1.0
To measure a time series’ randomness or unpredictability,
0.5
Burg used the information theoretic concept of entropy. A
0
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 random process
f
… x(–2t), x(–t), x(0), x(t), x(2t), … (14)
Figure 4. Power spectral density (PSD). The AR(12) and the
untapered periodogram estimates of the PSD for time series is said to be band limited if its PSD function is zero everywhere
generated by Equation 11. The two maxima in the AR(12)
 
outside its Nyquist band. If P( f ) is such a PSD function, then
spectrum occur at frequencies f 1 = 1.027 and f 2 = 1.321, which the time series’ entropy rate (entropy per sample) is given by
are very near the true values f1 = 1.00 and f2 = 1.30. 1

h{( P ( f )} = ∫ 2∆1t ln[ P ( f )]df . (15)



2∆t
The Maximum Entropy Approach
John Parker Burg invented the ME method in the late Burg’s idea was to maximize this quantity, subject to the
1960s; he exhibited its strengths and advantages in oral pre- constraints imposed by Equation 13. More precisely, he
sentations at geophysics conferences, but he didn’t publish sought to impose the constraint at lags 0, t, 2t, …, pt,
the mathematical derivations that defined and justified it un- with p < N and then choose from the set of all nonnegative
til his PhD thesis4 appeared in 1975. This lack of published functions P( f ) that satisfy those p + 1 constraints the partic-
documentation produced a great deal of independent work ular one that minimizes the entropy rate (Equation 15). We
by other researchers who were trying to understand and ex- can write the problem formally as
tend the method. In fact, the ME method was one of the
chief motivators for the development of the AR methods and h{ Pe ( f )} =
can be classified as an AR method itself, although Burg  P ( f ) > 0, 
didn’t use AR models in its development.  1 1

 
Rather, Burg started with the definition for PSD, that is, max  ∫ 2∆1t ln[ P ( f )]df ∫− 1 P ( f ) exp( 2π ifm∆t )df = ρm , . (16)
2∆t
P( f )  − 
∞ 2∆t 2∆t
P( f ) = ∫ ρ(τ ) exp( −2π if τ ) dτ , (12) 
m = 0,1,..., p

−∞  

but sought a function Pe( f ), defined on the Nyquist band


–1/(2t)  f  1/(2t), which satisfied three guiding principles: We need techniques from the calculus of variations to solve
it; we can show that
1. The inverse Fourier transform of Pe( f ) should return ρe 1 1
the autocorrelation function unchanged by any filter- Pe ( f ) = ,− ≤ f ≤ , (17)
p
2 2∆t 2∆t
ing or tapering operations:
1
1+ ∑ a j exp( −2π ifj ∆t )
ρm = ρ( m∆t ) = ∫ 2∆1t Pe ( f ) exp( 2π ifm∆t ) df , j =1

2∆t where a1, a2, …, ap and ␳e are parameters satisfying
m = 0, 1, …, N – 1. (13)
2. Pe( f ) should correspond to the most random or unpre-  ρ0 ρ1 ρ2 ... ρ p  1   ρ 
dictable time series whose autocorrelation function   e
 ρ1 ρ0 ρ1 ... ρ p −1   a1  0 
agrees with the known values.    
 
3. Pe( f ) > 0 on the interval –1/(2t)  f  1/(2t).
 ρ2 ρ1 ρ0 ... ρ p − 2   a2  = 0  . (18)
   
... ... ... ... ...  ...  ... 
The first condition merely states that the measured data   
shouldn’t be changed in any way in computing Pe( f ). The  ρ p ρ p −1 ρ p − 2 ... ρ0   a p  0 

88 COMPUTING IN SCIENCE & ENGINEERING


100
p=3
90 p = 14
80 p = 26
Periodogram
70
60

PME (f )
50
Equation 17 is the same as Equation 3, and, because we’re
40
working with real data for which ␳–k = ␳k, Equation 18 is the
same as Equation 4. Thus, the maximum entropy method is 30
correctly classified as an AR method, even though Burg used 20
different methods to estimate the autorcorrelations and pa- 10
rameters in Equation 18.
0
0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64
Forward and Backward Prediction Filters f
Burg regarded the vector (1 a1 a2 … ap)T as a prediction
filter, which he applied to the data x0, x1, …, xN–1 in both the Figure 5. Maximum entropy power spectral density (PSD)
forward and reverse directions to get forward and backward estimates. For orders p = 3, 14, and 26, and the periodogram
predictions x̂ nf , x̂ nb and their corresponding prediction er- for the time series generated by Equation 10, we see plots
rors e nf , e nb: along the same frequency range used for the AR(p) spectra in
Figure 2. The ME peaks are even sharper than the AR(p) peaks,
p
xˆ nf = − ∑ ak xn− k , enf = xn − xˆ nf , n = p, p + 1,…, N − 1
so they must be taller to preserve the area subtended.

k =1
p Table 2. Peak locations.
xˆ nb = − ∑ ak xn + k , enb = xn − xˆ nb , n = 0,1,…, N − p −1. (19)
k =1 Estimate Periodogram ME(3) ME(14) ME(26)

f0 0.493 0.498 0.479 0.492
He reasoned that he could get the best estimates for a1, a2,
…, ap by minimizing the sum of squares of the predictions’
errors, for example,
N −1 N − p −1 102
2 2 p=3
∑ enf + ∑ enb . (20) p = 14
n= p n=0 p = 26
Periodogram
He was able to devise a recursive algorithm that gave esti- 101
mates not only for a1, a2, …, ap, but also, at the same time,
for ␳e and for the autocorrelations ␳0, ␳1, …, ␳p. The de-
PME (f )

tails are complicated, so we won’t give them here.4 It’s re-


markable that the recursion generates a new estimator for
100
the elements of the matrix in Equation 18 at the same time
it’s solving the system of equations!

Choosing the Order p


Like the other AR methods, the ME method requires the 0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64
choice of an order p < N. Figure 5 exhibits the results of f
choosing a low, intermediate, and high order for the time se-
ries generated by Equation 10. The same plots are repeated Figure 6. Another view of the plots given in Figure 5. Using the
using a logarithmic scaling in Figure 6. Table 2 gives the peak logarithm scale makes it easier to compare the ME(3) estimate
locations. The ME(3) spectrum gave the best estimate fˆ0, but with the periodogram.
its peak is almost as broad as the periodogram. Increasing p
produces sharper peaks, but the locations display a noticeable
downward bias. The ME(14) estimate is fairly representative occurs for orders p = 27, 28, 29, and 30, with the dominant
of the orders in the range 4  p  25. At p = 26, the peak splits peak becoming sharper and sharper but remaining at fˆ0 =
into two, with the dominant one giving a better fˆ0 than any of 0.492. These spurious splittings aren’t caused by errors in
the sharp single peaks for p = 4, 5, …, 25. The same splitting the data. In fact, they occur much more readily for artificially

NOVEMBER/DECEMBER 2005 89
EDUCATION

65
PME (f )
60
Periodogram
55
50
45
40
PME (f )

35
30 take a brief look at filters and detrending before we present
25 an analysis of a bat chirp. In the final installment, we’ll dis-
20 cuss some statistical tests and use them to analyze atmos-
15 pheric pressure differences in the Pacific Ocean that have
10 significant environmental implications.
5
0
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 References
f 1. B. Rust and D. Donnelly, “The Fast Fourier Transform for Experimental-
ists, Part III: Classical Spectral Analysis,” Computing in Science & Eng., vol.
7, no. 5, 2005, pp. 74–78.
Figure 7. Maximum entropy (ME) method. In the ME(14) power
2. N. Levinson, “The Wiener (Root Mean Square) Error Criterion in Filter
spectral density (PSD) estimate for the time series generated by Design and Prediction,” J. Mathematical Physics, vol. 25, 1947, pp.
 
Equation 11, the two peaks are centered at f 1 = 1.023 and f 2 = 261–278.
1.302. These are somewhat better than the estimates from the 3. D. Donnelly and B. Rust, “The Fast Fourier Transform for Experimental-

AR(12) spectrum in Figure 4. The very narrow peak at f = 2.901 ists, Part I: Concepts,” Computing in Science & Eng., vol. 7, no. 2, 2005,
pp. 80–88.
is an artifact caused by using the very high order p = 14 (high
4. J.P. Burg, Maximum Entropy Spectral Analysis, PhD dissertation, Dept. of
relative to N = 16), but because it’s so narrow, it doesn’t Geophysics, Stanford Univ., May 1975; http://sepwww.stanford.edu/
indicate much power and thus can be safely ignored. theses/sep06/.
5. S.L. Marple Jr., Digital Spectral Analysis with Applications, Prentice Hall,
1987.

generated time series without added noise, but the ME(26) 6. S.M. Kay, Modern Spectral Estimation: Theory and Application, Prentice
Hall, 1988.
spectrum clearly demonstrates that they also occur in noisy
data, so great care must be exercised in interpreting high-
order ME spectra. One of the ME method’s strengths is its Bert Rust is a mathematician at the US National Institute for Standards
ability to resolve closely spaced peaks, but in using it for that and Technology. His research interests include ill-posed problems, time-
purpose, always remember the possibility of a spurious split- series modeling, nonlinear regression, and observational cosmology. Rust
ting of a single peak. has a PhD in astronomy from the University of Illinois. He is a member of
Researchers have proposed several criteria for choosing SIAM and the American Astronomical Society. Contact him at
the optimal order for the ME method (and for the other AR [email protected].
methods), but none of them work all of the time. In fact, it’s
easier to find a time series that confounds a given criterion Denis Donnelly is a professor of physics at Siena College. His research
than it is to develop it. Many authors5,6 recommend p  N/2, interests include computer modeling and electronics. Donnelly has a PhD
but higher order methods often give better results. Figure 7 in physics from the University of Michigan. He is a member of the Amer-
shows the result of using a relatively high p for the time se- ican Physical Society, the American Association of Physics Teachers, and
ries generated by Equation 11. The very narrow spurious the American Association for the Advancement of Science. Contact him
peak at fˆ = 2.901 is a typical occurrence when we use high at [email protected].
values for p. Such peaks can usually be easily identified be-
cause they’re so much sharper than the peaks correspond-
ing to real power. The one in Figure 7 is a small price to pay
for the excellent resolution of the two real peaks. It’s amaz-
Join the IEEE Computer Society
ing that the ME method can achieve such good results us-
ing just 16 noisy data points spanning only  2.5 cycles of
online at
the higher frequency sine wave.
computer.org/join/
W e’ve now looked at four different methods of spec-
trum estimation, and although we haven’t ex-
hausted the subject, we must proceed. (More details about
this topic appear elsewhere.5,6) In the next installment, we’ll
THE WORLD'S COMPUTER SOCIETY

90 COMPUTING IN SCIENCE & ENGINEERING


Discrete Fourier transforms
The fast Fourier transform efficiently computes the discrete Equation 5 that we can rewrite Equation 1 as
Fourier transform. Recall that the DFT of a complex input vec- N 1−1 N 2 −1
tor of length N, X = (X(0) ..., X(N − 1)), denoted Xˆ , is an- Xˆ(c , d ) = ∑ WNb (cN +d ) ∑ X (a , b )WNad
2
2
(6)
other vector of length N given by the collection of sums b =0 a =0

N −1 The computation is now performed in two steps. First,


Xˆ(k ) = ∑ X ( j )WNjk (1) compute for each b the inner sums (for all d)
j =0
N 2 −1
X˜(b , d ) = ∑ X (a , b )WNad (7)
( )
2
where WN = exp 2π −1 / N . Equivalently, we can view a =0

this as the matrix-vector product FN . X, where which is now interpreted as a subsampled DFT of length

FN = (( ))
WNjk
N2. Even if computed directly, at most N1N22 arithmetic op-
erations are required to compute all of the X˜ (b, d ) . Finally,
is the so-called Fourier matrix. The DFT is an invertible trans- we compute N1N2 transforms of length N1:
form with inverse given by N 1−1

1 N −1 ∑ W b (cN +d )X˜(b ,d )
2 (8)
X(j ) =
N
∑ Xˆ(k ) jWN− jk . (2) b =0
k =0 which requires at most an additional N1N12 operations.
2
Thus, if computed directly, the DFT would require N oper- Thus, instead of (N1N2)2 operations, this two-step approach
ations. Instead, the FFT is an algorithm for computing the uses at most (N1N2)(N1 + N2) operations. If we had more
DFT in O(N log N) operations. Note that we can view the factors in Equation 6, then this approach would work even
inverse as the DFT of the function better, giving Cooley and Tukey’s result. The main idea is
that we have converted a 1D algorithm, in terms of index-
1 ˆ
X ( −k ) , ing, into a 2D algorithm. Furthermore, this algorithm has
N
the advantage of an in-place implementation, and when
so that we can also use the FFT to invert the DFT. accomplished this way, concludes with data reorganized
One of the DFT’s most useful properties is that it converts according to the well-known bit-reversal shuffle.
circular or cyclic convolution into pointwise multiplication, This “decimation in time” approach is one of a variety of
for example, FFT techniques. Also notable is the dual approach of “deci-
mation in frequency” developed simultaneously by Gordon
X *Y (k ) = Xˆ(k )Yˆ(k ) (3) Sande, whose paper with W. Morven Gentleman also con-
where tains an interesting discussion on memory consideration as
n it relates to implementational issues.1 Charles Van Loan’s
X *Y ( j ) = ∑ X (l )Y ( j − l ) . (4) book discusses some of the other variations and contains an
l =0
extensive bibliography.2 Many of these algorithms rely on
Consequently, the FFT gives an O(N log N) (instead of an N2) the ability to factor N. When N is prime, we can use a differ-
algorithm for computing convolutions: First compute the ent idea in which the DFT is effectively reduced to a cyclic
DFTs of both X and Y, then compute the inverse DFT of the convolution instead.3
sequence obtained by multiplying pointwise Xˆ and Yˆ .
In retrospect, the idea underlying the Cooley-Tukey FFT is References
quite simple. If N = N1N2, then we can turn the 1D equation 1. W.M. Gentleman and G. Sande, “Fast Fourier Transforms—For Fun and
(Equation 1) into a 2D equation with the change of variables Profit,” Proc. Fall Joint Computer Conf. AFIPS, Vol. 29, Spartan, Washington,
D.C., 1966, pp. 563–578.
j = j(a,b) = aN1 + b, 0 ≤ a < N2, 0 ≤ b <N1 2. C. Van Loan, Computational Framework for the Fast Fourier Transform, SIAM,
k = k(c,d) = cN2 + d, 0 ≤ c < N1, 0 ≤ d < N2 (5) Philadelphia, 1992.
3. C.M. Rader, “Discrete Fourier Transforms When the Number of Data Points is
Using the fact WNm +n = WNmWNn , it follows quickly from Prime,” Proc. IEEE, IEEE Press, Piscataway, N.J., Vol. 56, 1968, pp. 1107–1108.

a fast algorithm. Fewer calculations also imply Fourier transforms of length N1, which are com-
less opportunity for error and therefore lead to bined as N1 DFTs of length N2. (See the “Dis-
numerical stability. Gauss observed that he could crete Fourier transforms” sidebar for detailed
break a Fourier series of bandwidth N = N1N2 information.) Gauss’s algorithm was never pub-
into a computation of N2 subsampled discrete lished outside of his collected works.

JANUARY/FEBRUARY 2000 61
The statistician Frank Yates published a less gen- win’s fervent proselytizing, did a lot to publicize the
eral but still important version of the FFT in 1932, existence of this (apparently) new fast algorithm.6
which we can use to efficiently compute the The timing of the announcement was such that
Hadamard and Walsh transforms.3 Yates’s “inter- usage spread quickly. The roughly simultaneous de-
action algorithm” is a fast technique designed to velopment of analog-to-digital converters capable
compute the analysis of variance for a 2n-factorial of producing digitized samples of a time-varying
design and is described in almost any text on statis- voltage at rates of 300,000 samples per second had
tical design and analysis of experiments. already initiated something of a digital revolution.
Another important predecessor is the work of This development also provided scientists with
G.C. Danielson and Cornelius Lanczos, per- heretofore unimagined quantities of digital data to
formed in the service of x-ray crystallography, an- analyze and manipulate (just as is the case today).
other area for applying FFT technology.4 Their The “standard” applications of FFT as an analysis
“doubling trick” showed how to reduce a DFT on tool for waveforms or for solving PDEs generated a
2N points to two DFTs on N points using only N tremendous interest in the algorithm a priori. But
extra operations. Today, it’s amusing to note their moreover, the ability to do this analysis quickly let
problem sizes and timings: “Adopting these im- scientists from new areas try the algorithm without
provements, the approximate times for Fourier having to invest too much time and energy.
analysis are 10 minutes for 8 coefficients, 25 min-
utes for 16 coefficients, 60 minutes for 32 coeffi-
cients, and 140 minutes for 64 coefficients.”4 This Its effect
indicates a running time of about .37 N log N min- It’s difficult for me to overstate FFT’s impor-
utes for an N-point DFT! tance. Much of its central place in digital signal
Despite these early discoveries of an FFT, it and image processing is due to the fact that it
wasn’t until James W. Cooley and John W. Tukey’s made working in the frequency domain equally
article that the algorithm gained any notice. The computationally feasible as working in the tem-
story of their collaboration is an interesting one. poral or spatial domain. By providing a fast algo-
Tukey arrived at the basic reduction while in a meet- rithm for convolution, the FFT enabled fast,
ing of President Kennedy’s Science Advisory Com- large-integer and polynomial multiplication, as
mittee. Among the topics discussed were techniques well as efficient matrix-vector multiplication for
for offshore detection of nuclear tests in the Soviet Toeplitz, circulant, and other kinds of structured
Union. Ratification of a proposed United States– matrices. More generally, it plays a key role in
Soviet Union nuclear test ban depended on the de- most efficient sorts of filtering algorithms. Modi-
velopment of a method to detect the tests without fications of the FFT are one approach to fast al-
actually visiting Soviet nuclear facilities. One idea gorithms for discrete cosine or sine transforms, as
was to analyze seismological time-series data ob- well as Chebyshev transforms. In particular, the
tained from offshore seismometers, the length and discrete cosine transform is at the heart of MP3
number of which would require fast algorithms to encoding, which gives life to real-time audio
compute the DFT. Other possible applications to streaming. Last but not least, it’s also one of the
national security included the long-range acoustic few algorithms to make it into the movies—I can
detection of nuclear submarines. still recall the scene in No Way Out where the im-
Richard Garwin of IBM was another participant age-processing guru declares that he will need to
at this meeting, and when Tukey showed him the “Fourier transform the image” to help Kevin
idea, he immediately saw a wide range of potential Costner see the detail in a photograph!
applicability and quickly set to getting the algorithm Even beyond these direct technological appli-
implemented. He was directed to Cooley, and, cations, the FFT influenced the direction of aca-
needing to hide the national security issues, told demic research, too. The FFT was one of the first
Cooley that he wanted the code for another prob- instances of a less-than-straightforward algorithm
lem of interest: the determination of the spin- with a high payoff in efficiency used to compute
orientation periodicities in a 3D crystal of He3. something important. Furthermore, it raised the
Cooley was involved with other projects, and sat natural question, “Could an even faster algorithm
down to program the Cooley-Tukey FFT only after be found for the DFT?” (the answer is no7),
much prodding. In short order, he and Tukey pre- thereby raising awareness of and heightening in-
pared a paper which, for a mathematics or computer terest in the subject of lower bounds and the
science paper, was published almost instantaneously analysis and development of efficient algorithms
(in six months).5 This publication, as well as Gar- in general. With respect to Shmuel Winograd’s

62 COMPUTING IN SCIENCE & ENGINEERING


lower-bound analysis, Cooley writes in the dis- search, and we can only wonder which of the many
cussion of the 1968 Arden House Workshop on recent private technological discoveries might have
FFT, “These are the beginnings, I believe, of a prospered from a similar announcement.
branch of computer science which will probably
uncover and evaluate other algorithms for high
speed computers.”8 The future FFT
Ironically, the FFT’s prominence might have As torrents of digital data continue to stream into
slowed progress in other research areas. It pro- our computers, it seems that the FFT will continue
vided scientists with a big analytic hammer, and, to play a prominent role in our analysis and under-
for many, the world suddenly looked as though it standing of this river of data. What follows is a brief
were full of nails—even if this wasn’t always so. discussion of future FFT challenges, as well as a few
Researchers sometimes massaged problems that new directions of related research.
might have benefited from other, more appropriate
techniques into a DFT framework, simply because Even bigger FFTs
the FFT was so efficient. One example that comes Astronomy continues to be a chief consumer of
to mind is some of the early spectral-methods work large FFT technology. The needs of projects like
to solve PDEs in spherical geometry. In this case, MAP (Microwave Anisotropy Project) or LIGO
the spherical harmonics are a natural set of basis (Laser InterFerometer Gravitational-Wave Obser-
functions. Discretization for numerical solutions vatory) require FFTs of several (even tens of) giga-
implies the computation of discrete Legendre points. FFTs of this size do not fit in the main
transforms (as well as FFTs). Many of the early memory of most machines, and these so-called out-
computational approaches tried instead to ap- of-core FFTs are an active area of research.9
proximate these expansions completely in terms As computing technology evolves, undoubtedly,
of Fourier series, rather than address the develop- versions of the FFT will evolve to keep pace and
ment of an efficient Legendre transform. take advantage of it. Different kinds of memory
Even now there are still lessons to learn from the hierarchies and architectures present new chal-
FFT’s development. In this day and age, where any lenges and opportunities.
new technological idea seems fodder for Internet
venture capitalists and patent lawyers, it is natural Approximate and nonuniform FFTs
to ask, “Why didn’t IBM patent the FFT?” Cooley For a variety of applications (such as fast MRI),
explained that because Tukey wasn’t an IBM em- we need to compute DFTs for nonuniformly
ployee, IBM worried that it might not be able to spaced grid points and frequencies. Multipole-
gain a patent. Consequently, IBM had a great in- based approaches efficiently compute these quan-
terest in putting the algorithm in the public do- tities in such a way that the running time increases
main. The effect was that then nobody else could by a factor of
patent it either. This did not seem like such a great  1
loss because at the time, the prevailing attitude was log 
e
that a company made money in hardware, not soft-
ware. In fact, the FFT was designed as a tool to an- where e denotes the approximation’s precision.10
alyze huge time series, in theory something only Algebraic approaches based on efficient polyno-
supercomputers tackled. So, by placing in the pub- mial evaluation are also possible.11
lic domain an algorithm that would make time-
series analysis feasible, more big companies might Group FFTs
have an interest in buying supercomputers (like The FFT might also be explained and interpreted
IBM mainframes) to do their work. using the language of group representation the-
Whether having the FFT in the public domain ory—working along these lines raises some inter-
had the effect IBM hoped for is moot, but it cer- esting avenues for generalization. One approach is
tainly provided many scientists with applications to view a 1D DFT of length N as computing the ex-
on which to apply the algorithm. The breadth of pansion of a function defined on CN, the cyclic
scientific interests at the Arden workshop (held group of length N (the group of integers mod N) in
only two years after the paper’s publication) is truly terms of the basis of irreducible matrix elements of
impressive. In fact, the rapid pace of today’s tech- CN, which are precisely the familiar sampled expo-
nological developments is in many ways a testa- nentials: ek (m) = exp(2π −1km / N ) . The FFT is a
ment to this open development’s advantage. This is highly efficient algorithm for computing the ex-
a cautionary tale in today’s arena of proprietary re- pansion in this basis. More generally, a function on

JANUARY/FEBRUARY 2000 63
any compact group (cyclic or not) has an expan- References
sion in terms of a basis of irreducible matrix ele- 1. E.O. Brigham, The Fast Fourier Transform and Its Applications, Pren-
ments (which generalize the exponentials from the tice Hall Signal Processing Series, Englewood Cliffs, N.J., 1988.

point of view of group invariance). It’s natural to 2. M.T. Heideman, D.H. Johnson, and C.S. Burrus, “Gauss and the
History of the Fast Fourier Transform,” Archive for History of Exact
wonder if efficient algorithms for performing this Sciences, Vol. 34, No. 3, 1985, pp. 265–277.
change of basis exist. For example, the problem of 3. F. Yates, “The Design and Analysis of Factorial Experiments,” Im-
efficiently computing spherical harmonic expan- perial Bureau of Soil Sciences Tech. Comm., Vol. 35, 1937.
sions falls into this framework. 4. G.C. Danielson and C. Lanczos, “Some Improvements in Practical
The first FFT for a noncommutative finite Fourier Analysis and Their Application to X-Ray Scattering from
Liquids,” J. Franklin Inst., Vol. 233, Nos. 4 and 5, 1942, pp.
group seems to have been developed by Alan 365–380 and 432–452.
Willsky in the context of analyzing certain Mar- 5. J.W. Cooley and J.W. Tukey, “An Algorithm for Machine Calcula-
kov processes.12 To date, fast algorithms exist for tion of Complex Fourier Series,” Mathematics of Computation, Vol.
many classes of compact groups.11 Areas of ap- 19, Apr., 1965, pp. 297–301.

plications of this work include signal processing, 6. J.W. Cooley, “The Re-Discovery of the Fast Fourier Transform Al-
gorithm,” Mikrochimica Acta, Vol. 3, 1987, pp. 33–45.
data analysis, and robotics.13
7. S. Winograd, “Arithmetic Complexity of Computations,” CBMS-
NSF Regional Conf. Series in Applied Mathematics, Vol. 33, SIAM,
Quantum FFTs Philadelphia, 1980.
One of the first great triumphs of the quan- 8. “Special Issue on Fast Fourier Transform and Its Application to
tum-computing model is Peter Shor’s fast algo- Digital Filtering and Spectral Analysis,” IEEE Trans. Audio Electron-
ics, AU-15, No. 2, 1969.
rithm for integer factorization on a quantum
9. T.H. Cormen and D.M. Nicol, “Performing Out-of-Core FFTs on
computer.14 At the heart of Shor’s algorithm is a Parallel Disk Systems,” Parallel Computing, Vol. 24, No. 1, 1998,
subroutine that computes (on a quantum com- pp. 5–20.
puter) the DFT of a binary vector representing 10. A. Dutt and V. Rokhlin, “Fast Fourier Transforms for Nonequi-
an integer. The implementation of this trans- spaced Data,” SIAM J. Scientific Computing, Vol. 14, No. 6, 1993,
pp. 1368–1393; continued in Applied and Computational Har-
form as a sequence of one- and two-bit quantum monic Analysis, Vol. 2, No. 1, 1995, pp. 85–100.
gates, now called the quantum FFT, is effectively 11. D.K. Maslen and D.N. Rockmore, “Generalized FFTs—A Survey
the Cooley-Tukey FFT realized as a particular of Some Recent Results,” Groups and Computation, II, DIMACS Ser.
factorization of the Fourier matrix into a product Discrete Math. Theoret. Comput. Sci., Vol. 28, Amer. Math. Soc.,
Providence, R.I., 1997, pp. 183−237.
of matrices composed as certain tensor products
12. A.S. Willsky, “On the Algebraic Structure of Certain Partially Ob-
of two-by-two unitary matrices, each of which is servable Finite-State Markov Processes,” Information and Control,
a so-called local unitary transform. Similarly, Vol. 38, 1978, pp. 179–212.
the quantum solution to the Modified Deutsch- 13. D.N. Rockmore, “Some Applications of Generalized FFTs (An Ap-
Josza problem uses the matrix factorization aris- pendix with D. Healy),” Groups and Computation II, DIMACS Series
on Discrete Math. Theoret. Comput. Sci., Vol. 28, American Math-
ing from Yates’s algorithm.15 Extensions of these ematical Society, Providence, R.I., 1997, pp. 329–369.
ideas to the more general group transforms 14. P.W. Shor, “Polynomial-Time Algorithms for Prime Factorization
mentioned earlier are currently being explored. and Discrete Logarithms on a Quantum Computer,” SIAM J. Com-
puting, Vol. 26, No. 5, 1997, pp. 1484–1509.
15. D. Simon, “On the Power of Quantum Computation,” Proc. 35th
Annual ACM Symp. on Foundations of Computer Science, ACM

T
hat’s the FFT—both parent and Press, New York, 1994, pp. 116–123.
child of the digital revolution, a
computational technique at the Daniel N. Rockmore is an associate professor of math-
nexus of the worlds of business and ematics and computer science at Dartmouth College,
entertainment, national security and public com- where he also serves as vice chair of the Department of
munication. Although it’s anyone’s guess as to Mathematics. His general research interests are in the
what lies over the next horizon in digital signal theory and application of computational aspects of
processing, the FFT will most likely be in the group representations, particularly to FFT generaliza-
thick of it. tions. He received his BA and PhD in mathematics from
Princeton University and Harvard University, respectively.
Acknowledgment In 1995, he was one of 15 scientists to receive a five-
Special thanks to Jim Cooley, Shmuel Winograd, and year NSF Presidential Faculty Fellowship from the White
Mark Taylor for helpful conversations. The Santa Fe Ins- House. He is a member of the American Mathematical
titute provided partial support and a very friendly and Society, the IEEE, and SIAM. Contact him at the Dept.
stimulating environment in which to write this paper. NSF of Mathematics, Bradley Hall, Dartmouth College,
Presidential Faculty Fellowship DMS-9553134 supported Hanover, NH 03755; [email protected];
part of this work. www.cs.darthmouth.edu/~rockmore.

64 COMPUTING IN SCIENCE & ENGINEERING

You might also like