0% found this document useful (0 votes)

50 views6 pages

A Simple Fixed-Point Error Bound For The Fast Fourier Transform

Uploaded by

tristanlvk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views6 pages

A Simple Fixed-Point Error Bound For The Fast Fourier Transform

Uploaded by

tristanlvk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

IEEE TRANSACTIONS ON ACOUSTICS,

SPEECH, AND SIGNAL

PROCESSING, VOL. ASSP-27, NO. 6 , DECEMBER 1979 615
matrices,”SIAM J. Appl.Math., vol. 12, pp.515-522,Sept.
1964. [7] G . Baxter,“Polynomialsdefined by adifferencesystem,” J.
[3] S . Zohar, “The solution of a Toeplitz set of linear equations,” Math. Anal. Appl., vol. 2, pp. 223-263, Apr. 1961.
J. Ass. Comput. Mach.,vol. 21,pp. 272-276, Apr. 1974. [ 81 E. 0.Brigham, The Fast Fourier Transform. Englewood Cliffs,
[4] E.
H. Bareiss, “Numerical solution of linear equationswith NJ: Prentice-Hall, 1974.
Toeplitz and vector Toeplitz matrices,” Numer. Math., vol. 13, [9] A. K. Jain, “Fast inversion of banded Toeplitz matrices by circu-
pp. 404-424,
Oct. 1969. lar decomposition,” IEEE Trans.SignalSpeech,
Acoust., Pro-
[5] T. Kailath, A. Vieira, and M. Morf, “Inverses of Toeplitzopera- cessing, vol. ASSP-26, pp. 121-126, Apr. 1978.
tors,innovations, and orthogonalPolynomials,”presented a t[ l o ] S. Zohar,“Toeplitzmatrix inversion: The algorithm of W. F.
IEEE Conf. Decision andControl, Hyatt Regency Houston, Trench,” J. Ass. Comput. Mach.,vol. 16, pp. 592-601, Oct. 1969.
Houston, TX, Dec. 10-12,1975. [ 111 J. H. Justice, “The Szego recursion relation and inverses of posi-
[6] G. Szego, Orthogonal Polynomials, vol. 23, 3rd ed. (Amer. Math. tive definiteToeplitzmatrices,” SIAM J. Math.Anal., vol. 5,
SOC.).New York: Colloquium, 1967. pp. 503-508, May 1974.

A Simple Fixed-point Error Bound for the

Fast Fourier Transform

WILLIAM R. KNIGHT AND R. KAISER

Abstract-Error bounds for thecomputation of the fast Fourier trans-or to shift (divide by 2) the sum of two fixed-point numbers in
form in fixed-point arithmetic are derived for any arithmetic number order to fit the result into the computerword.
base and for anyprimefactorization of the data array length. The The error so arising from computing the fast Fourier trans-
intendedapplication is for signal processing withminicomputers.
form (FFT)in
Errors arising from inaccurate sine coefficients and from limited arith- fixed-pointarithmetichasbeenanalyzed
metic precision areconsidered. Thearithmetic errordepends essen- previously. Welch [2] has analyzed the case characterized by
tially on shifts of the data array that may be required to avoid overflow
roundedsign-magnitudebinaryarithmetic using a floating-
of the computer word. O w closest bound requires knowledge of where block decimation-in-time radix-2 FFT algorithm and assuming
shifts occur and is best computed in parallel with the Fourier trans-
form. For the case that such program modification is not feasible, we
datasuch that successivestagesof computation are statisti-
cally independent. Oppenheim and Weinstein [3] analyze the
derive an error bound for a posteriori calculation and an a priori error
estimate. Ow boundsare forthe maximum error because little is case characterized by rounded binary arithmetic (sign-magni-
gained at the expense of considerablygreatercomplexity for prob- tude format seems to beassumed in some places) using a
abilistic error bounds. decimation-in-timeradix-2 FFT algorithmwithwhite noise
as data causing either no shift or a shift at each stage. Tran-
I. INTRODUCTION Thong and Liu [4] have treated all cases generated by rounded

A N increased awareness of the limitations of fixed-point or truncated 2-complement binary arithmeticusing a floating-
arithmetic has come with the growing use of micro- and block decimation-in-time, or frequency, radix-2FFT algorithm
minicomputers for data acquisition and processing in physical withdatamaking successive stages statistically independent
and chemical instrumentation. The work reported here arose and requiring either no shift in any stage or one shift in each
froman analysis ofobservations [ l ] in Fouriertransform stage.
nuclearmagneticresonance spectroscopy where the fast Our analysis follows Welch [2] in spirit but covers truncated,
Fourier transform algorithm is commonly implemented with or rounded, complement or sign-magnitude arithmetic to any
fixed-point arithmetic. The limitation imposed by the finite number base using floating-block
decimation-in-time,or
computer word length becomes evident in the form of a re- frequency,mixed-radix FFT algorithmswith data requiring
stricted “dynamic range”and as computational“noise.” any possible number of scaling shifts. Following the tradition
These effects are caused by the need to truncate the product of Wilkinson [5], our bounds are worst case rather than proba-
bilistic, thus avoiding the assumption that errors are indepen-
Manuscript received August 23, 1978;revised July 1, 1979. dent or else the complications and uncertainties arising from
W. R. Knight is with the Departments of Computer Science and correlation oferrorand signal. Surprisingly, theworst case
Mathematics, University of New Brunswick, Fredericton, N.B., Canada.
R. Kaiser is with the Department of Physics, University ofNew differs from the probable error only by a multiplicative con-
Brunswick, Fredericton, N.B., Canada. stant rather than by a factor proportional to the square root

0096-3518/79/1200-0615$00.75 0 1979 IEEE

Authorized licensed use limited to: Tsinghua University. Downloaded on March 30,2023 at 09:04:36 UTC from IEEE Xplore. Restrictions apply.
616 IEEE TRANSACTIONS
ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-21, NO. 6 , DECEMBER 1979

of the number of stages in the computation. On the other signal ratio is highly desirable,’ we can combine the last ex-
hand, statistical assumptions can assure us that the error is pression above with an estimate of the totalnumber of required
distributed uniformly over the elements of the output data shifts s, = (n/2)+ log, ( p n / p o ) < n + 1. In this expression,
vector while our worst case analysis does not prevent it from po and p n are the peak/rms ratios of the input and output
being concentrated in a single element. signals x, and x,, respectively. Strictly,thespectrumratio
In addition to the arithmetic error, we consider a “trigono- is known a posteriori only, but sufficient information
metric error” that arises from lack of precision in the sine and about the signal spectrum will often be available to permit a
cosine coefficients needed forthe Fouriertransform. These reasonable a priori estimate.
values can either be calculated from a series approximation to The following Sections I1 and 111 show the derivation of the
the sine function [ 6 ] ,or they can be stored in a table. In the bounds on the arithmetic noise/signal ratio for any base b of
first case, the error arises from both the computer arithmetic computerarithmeticandfor algorithms of mixed radix.
and from truncation of the series. In the second case, inter- Derivation of the error bound for individual arithmetic opera-
polation between tabulated values becomes necessary for tions is delegated to Appendix I. The trigonometric error is
sizeable data arrays occurring in practice. Interpolation can dealt with in Section IV, and Section V gives a brief outline of
be linear [7], in which case most of the trigonometric error difficulties with statistical error analysis.
arises from the chord approximation to the sine curve, or by When the input data are real rather than complex, the FFT
means of the nonlinear sin (a+ 0) addition theorem [8], in algorithm maybe adapted to take advantage of theredun-
which case it arises from the interpolation arithmetic. dancy that arises fromthe vanishing imaginary part of the
For binaryarithmetic and for a complex data vector of input vector. Appendix I1 shows that our error analysis adapts
length N = 2”, our results can be summarized as follows. We easily to this case.
let x, be the vector carrying the inputdata whose Fourier
transform is to be computed. The fast Fourier transform 11. ARITHMETIC
E R R O R ANDGENERALCONSIDERATIONS
algorithm proceeds through n stages, successively replacing the The recursion leading from one stage to the next inthe
input vector x , by xl, x 2 , + . . , x , , the last vector being the FFT algorithm is
desired transform
X k = FkXk -1 (1)
N -1
where Pik is an N X N matrix of simple structure [9] , [lo] .
xn(f)= xo(t) ~ X (-2.iriftlN),
P
t=o Although the details differ for different versions of the algo-
rithm, the F k are unitary matrices for theoretical discussions.
except for scaling as needed to avoid overflow of the computer For practical computations, complex numbers are represented
word and exceptfor computation noise. The total noiseis by real and imaginary parts. Complex vectors of length N then
bounded by the sum of thetrigonometric noise plus the become real vectors of length 2N, and the Fk become 2N X
arithmetic noise. 2 N real orthogonal matrices. Forfixed-pointcomputations,
The trigonometric noise/signal ratio is bounded by 2 f i n e , considerations of scaling and the desire for simple multipliers
where e boundsthe absoluteerrorin the sine and cosine intervene,and each F k is some multiple of an orthogonal
values. 2N X 2N matrix. Instead of Parseval’s theorem, we then have,
The arithmetic noise/signal ratio is bounded by r , u / ~ ~ x , ~ ~ ,
at any stage k
where u is the value of the least positive representable number
in units of the input signal x,, and the double bars represent Ilxkll = IlFkll IIFk-1 11 . ’ * IIFI 11 * Ilxoll, (2)
the norm which we takeasthe rms value llxll = (ZfZi where I l F l l indicates the spectral norm [ l l ] of thematrix
j x ( 4 ) I2/N)’/’. The normalized arithmetic error r, is bounded
F, I l F l l =maxllxll=l IIFxII.
by the sum r, < 3.81 X”,=,2sk-(k’z). The constant 3.81 is In finite precision arithmetic, the matrix X vector product
derived fortruncating arithmetic; a smallervalue holdsfor
will generate an error vector, say d k , andthecomputed
roundingarithmetic. The quantity 2Sk is the scale factor at recursion really is
the end of the kth stage, i.e., sk is the number of scaling right
shifts accumulated to the end of the kth stage as required to ?k =Fk?k-l + dk. (3)
avoid overflow of the computer word. Beginning with zl,
the computed data vectors ?k will differ
The computation of r, can easily be included in the FFT from (1) by an error vector e k which builds up according to
program, accumulating the kth summand at the kth stage of the recursion
computation. However failing this, we derive the upper
bound r, < 26.4 X 2 (s,+i)/2 ek =Fkek-’ + dk; bo = 0. (4)
Our bound onthe trigonometric noise/signal ratio i s a By taking norms onboth sides, we get by the triangle inequality
priori; it can be evaluated before computation starts. How-
ever, our boundonthearithmetic noiselsignal ratio is a llekll GllFkll . llek-111 f IIdkll; IleoII = o (5)
posteriori because its evaluation requires knowledge of the
scaling shifts, which knowledge becomes available only as ‘We gratefullyacknowledge the commentsof areferee who im-
computation proceeds. Since an a priori bound for the noise/ pressed uponus the desirability of an a priori error estimate.

Authorized licensed use limited to: Tsinghua University. Downloaded on March 30,2023 at 09:04:36 UTC from IEEE Xplore. Restrictions apply.
KNIGHT AND KAISER: FIXED-POINT BOUND
ERROR FOR
FOURIER
TRANSFORM
FAST 611

which, after division by the norm of (l), gives a bound for the of shifts at the endof the computation, thus
propagation of the noiselsignal ratio Rk = [lekIllllxkll
b S k / a< b s " / f i . (1 1)
Rk GRk-1 ~ ~ ~ k ~ R! , /= ~ 0. ~ x k ~ ~ ~ (6) To obtain the second bound, we separate the real and imagi-
The recursion (6) establishes our basic bound on the arith- nary parts of the vectors xk and treat each xk as a real 2N
metic noiselsignal ratio. Evaluation of this recursion depends vector. Each element of xk is a weighted and scaled sum of
on particular features of the computation and is the subject 2Nk elements of x,, and the sumof the absolute values of
of the next section. theweights is, at most, N k d . We consider ( k , the largest
absolute value of any element ofxk, and obtain
111. ARITHMETIC ERRORAND
PARTICULARCONSIDERATIONS .$k < ( , N k d s k f i . (12)
The kth stage deals with the prime factorP k in the factoriza- Moreover, to make full use of the computer word, .$k/(,must
tion be held within the limits
N=PlPZ . . ' P n . (7) b-' <&I.$, < b. (1 3)
In the complex domain, each element of the new Vector xk is Substitution from (12) gives the second bound
the weighted sum of P k elements of the old vector X k - 1 , the
b s k / f i <b a . (14)
weights being the"twiddlefactors" exp (io). The algorithm
represents complex numbers by their real and imaginary parts, Bound (14) increases with increasing N k , while bound (1 1)
and the 2N X 2N matrix Fk has, therefore, P k sines and P k decreases. The two bounds are equal for
cosines of P k angles 8 in each row, all other elements being
zero. The spectral norm is IlFkll = 6,
i.e., the rms Value of
N* = bs"-' /a, (15)
the data vector IIxkll increases by a factor 6 in the kth and we let m be the least positive integer for which N , >N,.
stage. For k < m , we use bound (14) and get
This increase may make it necessary to shift the data array
one (or more) computer digit, i.e., to divide by the number b s k / f i< b a
= b@"+''I2 2'l4dN=; k < m.
base b of the computer arithmetic in order to avoid overflow
(1 6 4
of thecomputerword. We let sk bethenumber of shifts
accumulated up to the end of the kth stage. Incorporation of For k > m ,we use bound (11) and get
the shift into the matrix Fk gives then IlFkIl = b-(sk-sk-') bSkl% = b Sk-(Sn-l 21/4,/'= < b(sn+')Iz21/4
6. Accumulationofthese scale factors in ( 2 ) gives us,
for the rmsvalue of the data array at the endof the kth stage, 4 N X ; k 2 m. - (1 6b)

11 Xkll = bdsk fi11 (8) We are now in a position to bound the sum (10). With c be-
ing the largest of the c k , application of (1 6a) and (1 6b) gives
with the abbreviation Nk = p l p z * . P k .
The elements of the error vector dk will be small multiples
of the least positive numberrepresentable in thecomputer
word format. We derive in Appendix I anupperbound ck
for this multiple at stage k, and we let u be the value of the
least positive computernumber in unitsoftheinputdata
x,, so that 11 dkll < cku. The recursion (6) thereby becomes Formula (17) is convex in a
and takes its maximum for
either N* = Nm -1 or for N* = N , at the extremes of therange
GRk-1
Rk +CkbSkU/(IIX,II&); R, =o. (9)
of N*. In the first case, the parentheses in (17) has the maxi-
In terms of the normalized noiselsignal ratio rk = Rk /Ix, ll/u, mum value
it evaluates to

Thisis our basic bound on the arithmetic noise/signal ratio

which was summarized in Section I for binary arithmetic and
N = 2,. and in the second case its maximum is
Inorder to evaluate (10) otherthanin parallel withthe
computationof the Fouriertransform, we need to bound
b S k / a . There are, in fact, two bounds available. The first
follows simply from the fact that sk, the number of shifts at
the end of the kth stage, cannot exceed ,s, the total number
+...+e)
Authorized licensed use limited to: Tsinghua University. Downloaded on March 30,2023 at 09:04:36 UTC from IEEE Xplore. Restrictions apply.
618 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-27, NO. 6 , DECEMBER 1979

Since 2 is the smallest prime, the square roots are maximized that, inthe absence of scaling shifts, 11 FkII = 6,
so that
as (24) evaluates to

so that both (18a) and (1 8b) are bounded by

( - - - + 2 312
- + 2-212 2 - 112 + 1 .+ 2-112 + 2-212 Scaling shifts multiply both 11 M k I I and llFkll by the same
+ 2-312 . . . 1. (18c)
factor which cancels in the ratio (24) so that (25) becomes
our final boundonthetrigonometric noise/signal ratio as
The series with quotients 2-lI2 can be summed to infinity in summarized in Section I for N = 2".
both directions, and the result is
V. STOCHASTIC ANALYSIS
We begin by observing that the errors arising from inexact
trigonometric function values do not allow stochastic analysis.
This is the a posteriori bound which was summarized in Sec- The same inexact values are used repeatedly in the computa-
tion I for binary arithmetic. tion; the errors are not statistically independent and a stochas-
It remains to obtain an a priori estimate of the total number tic analysis is not appropriate.
of scaling shifts s., If, for comparing peak/rms ratios p in For the arithmeticerrors, the assumption of statistical
the input and output data vectors, we divide (13) by (8) with independence seems more reasonable, and both the mean
k = n, the result is error and the variance are needed to arrive at the rms error.
One would hope that the mean error, i.e., the bias inthe
Fourier transformed data, is zero so that only the variance will
be needed, but this is not the case. The next two paragraphs
and hence explain our arguments.
The vector of mean errors E ( e k ) is obtained by taking
statisticalexpectation valuesof the elements of the error
kctor ek in (4). For its norm, ( 5 ) translates into
In Section I we have specialized the mean value between these
two bounds for b = 2 and N = 2 , as an a priori estimate for
the number of shifts. The mean error propagates throughthecomputationin the
same way as the maximum error, and our results remain valid
IV. TRIGONOMETRICERROR
for the mean error after replacement of the arithmetic error
We have explained in Section I why the trigonometric ele- vector dk by its expectation E(dk). At the end of the com-
ments of the matrices Fk will not always be exact. Instead of putation, bound (26) can vanish only if E(dk) is a null vector
the correct matrices Fk, slightly different matrices Fk t AFk for all k.
will be used, and instead of the desired transform x, = The errors generated in multiplication and shift operations
F, F, -1 * .F , x, there results depend on whethertruncation or rounding is used, and on
X , t A X , = (F, + AF,)(Fn-l t AF,-,) * . . (FI t AFI)x,.
whether negative numbers are represented in complement or
sign-magnitude format, but in no case can the mean error be
(22) relied upon to vanish. Consider first the case of truncation. In
complement format, truncation reduces the value of positive
To first order, the trigonometric errorvector is
as well as negative numbers; the error distribution is uniform
n from - 1 to 0 with expectation - 3.
In sign-magnitude format,
truncation reduces the absolute value of a number; the error
distrubtion is uniform from -1 t o 0 for positive numbers, and
from 0 to t1 for negative numbers. Theexpectederror is
++ with the sign of the error correlated with the sign of the
data. It does not appear safe t o neglect this correlation in
order t o get zero mean error. For rounding, the expected
To evaluate 11 AFkII, we note that each row of Fk holds P k error is generally smaller than for tuncation, yet still does not
sine-cosine pairs, so that each row of AFk holds, at most, necessarily vanish. For illustration, consider the error arising
2pk elements different from zero and, if there is no scaling from rounding after a one-bit right shift in binary arithmetic.
shift, each of these is bounded by E , the maximum absolute Its distribution concentrates at three values: if the bit shifted
error in the sine and cosine values. By permutation of rows out was a zero, there is no error. If it was a 1 , the error is
andcolumns, AFk canbe turned into block diagonal form f with the sign depending on whether the number is rounded
holding N/pk blocks of dimension 2pk x 2pk each. It is easy up or down. The usual rules to decide between rounding up
to see [ 5 , ch. 1111 that for such a matrix of blocks filled with or down are based on the sign of the number or on whether
E'S, the spectral norm is 11 AFkII < 2Pkf. We had seen earlier it is even or odd, and thus introduce again a correlation be-

tweenerrorsanddata.Thisproblem arises for any even-b error by b. For a shifts after the butterfly computation, the
arithmetic. error is thus bounded by (2 t f i ) p k b ” -). 1. This is domi-
For any arithmetic we have considered, the mean error does nated by the case of no shifts after butterfly computation,
not exceedhalf the maximumerror. In fact, boundingthe and we take ck = (2 + f i ) p k .
mean error by half the maximum error is not at all pessimistic For the case b = 2 and N = 2’, the bound can be sharpened
for truncating two’s-complement arithmetic, which appearsto a bit by attention to detail. At most two shifts can occur in
be mostcommon at presentforfixed-pointFFTcomputa- any one stage in this case. At worst, both shifts occur before
tions. Since, generally, the mean error can amount to a size- the butterfly computation and generate an error bounded by
able fraction of the maximum error, there is little merit in 2 in the shifted elements of x k + . In the butterfly, a new
calculating the variance, althoughour analysis method can element of x k is computed as the sum of a sine cosine-weighted
easily be applied to such calculation. Quite generally, we feel pair ofelementsof x k - 1 , plusoneunmodifiedelementof
that there is little gained at the expense of greater complexity x k - l . (This applies directly in some algorithms. In others,
and uncertainty in calculating probabilistic error bounds for it applies on average in the sense that half the elements of x k
the fiied-point fast Fourier transform. arise eachfromtwoweighted pairs andtheremaininghalf
arise each from two unmodified elements of x k - l .) The shift
APPENDIXI error is thus magnified to 3
(1 + I sin 0 I + I cos 0 I) < (1 + 2
MACHINEERROR fi). In addition, there is the error from truncating the two
We derive a bound on the elements of the error vector d k products, bounded by 1 for each product. The resulting total
that arises in the computations (3) of the kth stage. The units is ck = $ (1 + a) + 2 = 3.81. This value has been used in the
summary of our results in Section I.
of dk are those of the least significant computer digit in the
elements of x k . There aresome differencesbetween algo-
rithms, and also for the first or last stage in a given algorithm, APPENDIX I1
but we will neglectthese specialcases. The error in these REAL INPUTDATA
cases is smaller than the worst case bound derived in the fol-
lowing but,exceptfor very shortdata arrays, the overall Of course, it is straightforward to treat real input data by
reduction amounts to less than a factor $. Also, since trunca- simply setting the imaginary part of x, to zero and applying
tion is more simply programmed and produces greater error theFFT algorithmunchanged. Our error analysis does not
than rounding,truncation willbeassumedeven though it requiremodification forthis case. Althoughthismethod is
generates abias in complement arithmetic. straightforward, it suffers fromredundancy. In theinput
In a typical stage, there are at most 2pk multiplications to array, the imaginarypart is filled with zeros, and in the output
compute one element of the new vectorx k from 2pk elements array every element is duplicated in conjugate complex form.
of the old ~ k - Each ~ . multiplication introduces a truncation Two modifications of the FFT algorithm have been developed
error of,atmost, oneunit. If no shifting takes place, the for real input data to permitthelength N of all complex
error in the new element is bounded by 2pk units. vectors in the transform to be halved when N is even. We do
If shifting is required, a natural program arrangementis this: not consider the complications that arise with these methods
upondetection ofan overflow,thecurrent“butterfly” is when N is odd.
abandonedandtheentiredataarray is shifted one digit, In the first of these modifications [ 121, a complex input
corresponding to a truncated division by b. The effect is that vector x, of length N / 2 is formed by placing the N real input
those elements of the array that had already been computed elements alternately into real and imaginary parts, so that the
beforetheoverflow was detected (and belong to x k ) are real part of x, holds the even subscripted input samples and
shifted after multiplication, while those still to be operated the imaginary part holds the odd subscripted ones. The com-
upon (and belonging to ~ k - ~including
) , those of the current plex FFT algorithm is then applied to x, and produces
butterfly, are shifted before multiplication. The effect of the in n - 1 stages corresponding to theprimefactorizationof
shift is different for these twosets of elements and,since there N/2. One additional stage is then required to allow forthe
may be several shifts per stage, we look at the general case of shifting oftheoddsubscriptedinputdata.This last stage
an element which is shifted, possibly several times, before and consists of a s u m and difference butterfly followed by a regu-
after entering the butterfly. lar radix-2 butterfly. Our error analysis is unchanged for the
The shifts before butterfly
computationintroduce an n - 1 stages. For the last stage, we obtain IIF, 11 = 2 and c, <
error bounded by oneunitintoeachelementremaining in 5.41 (for binaryarithmeticc,<4.56).Equation (6) shows
x k - 1 . The weighted sum of p k pairs of these give a new ele- that the contribution of the last stage to the arithmetic error
ment for xk. The weighting is by a sine and cosine multiplier is determined by c,/llFnll, and this ratio is less than that for
for each pair, resulting in an error bounded by I sin 0 I + a regular p = 2 stage. It is thus conservative to consider this
I cos 0 I <fi from each pair. The p k pairs thus introduce an last stage as a regular p = 2 stage so that our error bound with
error of, at most, p k f i into each new elementof x k . ~n N designating the length of the real input vector applies un-
addition,there is themultiplication error boundedby2pk changed to themodifiedalgorithm. Our trigonometric error
units, giving a totalerror bounded by (2 + a ) p k units. bound is also unchanged.
Any number of shifts after computation introduce an error Thesecondmodification [13] involvesan adaptation of
of, at most, one unit, and each such shift divides the previous thealgorithmsuch thatthe transform is obtained in n - 1

Authorized licensed use limited to: Tsinghua University. Downloaded on March 30,2023 at 09:04:36 UTC from IEEE Xplore. Restrictions apply.
620 IEEE TRANSACTIONS
ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-21, NO. 6, DECEMBER 1919

stages corresponding to the prime factorization of N/2. Our [6] M. Abramowitz and I. A. Stegun, Handbook of Mathematical
error analysis applies unchanged to these stages. Strictly, Functions, US.Government Rep. NBS-AMS55, 1964.
[7] J. W. Cooper, I. S. Mackay, and G . B. Powle, J. Magnetic Reso-
there is a remnant of the nth stage in the form of a single sum nance, vol. 28, p. 405, 1977.
and difference butterfly to compute the two real spectral [SIJ. A. den Hollander, Rijksuniversiteit Leiden, The Netherlands,
points at zero and at maximum frequency. If the error gener- personal communication.
[9] B. Liu, Ed.,“Digital filtersand the fast Fourier transform,”
ated in thu extra butterfly is neglected, our results apply with Benchmark Papers in Electrical Engineering and Computer
N being only half the length of the real input data array. Science, vol. 12. Stroudsburg, PA: Dowden, Hutchinson & Ross,
1975.
[ l o ] F. Theilheimer, IEEE Trans. Audio Electroacoust., vol.AU-17,
REFERENCES p. 158,1969. E. 0. Brigham, The Fast Fourier Transform.
Englewood Cliffs, NJ: Prentice-Hall, 1974.
[1] J. W. Cooper, J. Magnetic Resonance, vol. 22, p. 345, 1976. [ l l ] B. Noble, Applied Linear Algebra. Englewood Cliffs, NJ:
[2] P. D. Welch, ZEEETrans. Audio Electroacoust., vol.AU-17, Prentice-Hall, 1969.
p. 151, 1969;also in [9]. [12] C. Bingham, M. D. Godfrey, and J. W. Tukey, IEEE Trans.
__
131 A. V. Oppenheim and C. J. Weinstein, Proc. ZEEE, vol. 60,
P. 957,19i2.
Audio Electroacoust., vol.AU-15, p. 56,1967; J. W. Cooley,
P. A. W. Lewis, and P.D. Welch, IBM Res.PaperRC-1743,
14 1 T.-Thonp and B. Liu. IEEE Trans. Acoust.., Sueech.
= > . . Signal
. Pro- 1967;J. Sound Vib.,vol. 12, p. 315, 1970.
cessing, Gal. ASSP-24;p. 563, 1976. [13] G.D. Bergland, Commun. Ass. Comput. Mach., vol. 11, p. 703,
[5] J. H. Wilkinson, Rounding Errors in AlgebraicProcesses. Engle- 1968; ZEEETrans. Audio Electroacoust., vol.AU-17, p. 138,
wood Cliffs, NJ: Prentice-Hall, 1963. 1969.

On the Problem of
with Short Co
HON-KEUNG KWAN

Abstract-The paper presents an algorithm for designing IIR digital desired small word length in special-purpose hardware systems,
filters with short coefficient wordlengths. The algorithm hasthe quantizationerrors associated withfinite coefficient word
flexibility of obtaining a better design at theexpense of investing more
length become critical and may lead to a filter that does not
computational time. It was found that, by incorporating Crochiere’s
idea of equalizing passband and stopband statistical word lengths be- satisfy its original specifications. Since the cost and com-
fore applying the algorithm, weare capable of obtaining a further plexity of implementationdepends on the coefficient word
reduction in the overall word length when compared with thatob- length, the word length to be chosen should be minimum but
tained by applying the algorithm alone. The principle of the algorithm still sufficient to fulfill the desired requirements.
andtheconcept of statistical word lengths equalization forfurther
It is the last limitation which we have addressed ourselves
word length reduction applies to all IIR digital filters of different
structuresand passbands, with andwithouttheir transfer functions to in this paper. As a result, an algorithm was formulated. It
expressed in a closed form. has the flexibility of yielding a better reduction in coefficient
word length at the expense of investing more computational
time. It was foundthat, by incorporating Crochiere’s idea
I. INTRODUCTION [ 2 ] , [ 3 of
] equalizing passband and stopband statistical word
lengths before applying the algorithm, we are capable of
D ESPITE the many advantages offered by digital filters,
there are some practical limitations associated with their
actualimplementation. The most important limitation is
obtaining a better design when compared with that obtained
by employing the algorithm alone. The efficiency of the
algorithm was demonstrated by employing Crochiere’s three
caused by quantization. The three major sources of quantiza-
elliptic low-pass cascaded digital filters [2] as examples.
tion errors are: 1) input quantizationerrors, 2) arithmetic
quantizationerrors, and 3 ) coefficient quantizationerrors. 11. PROBLEM FORMULATION
The first two types have been investigated [ I ] . Due to the
Consider the cascade form elliptic digital transfer function as
limit of the word length available in minicomputers and the

Manuscriut received December 29, 1977; revised October 30, 1978

and May 8,-1979.
The author is with the DeDartment of Electrical Engineering,
-. Imuerial
.
College of Science and Tech;lology, London, England: where

0096-3518/79/1200-0620$00.75 01979 IEEE

Authorized licensed use limited to: Tsinghua University. Downloaded on March 30,2023 at 09:04:36 UTC from IEEE Xplore. Restrictions apply.

Numerical Methods For Engineers
No ratings yet
Numerical Methods For Engineers
488 pages
A Concise Introduction To Numerical Analysis
100% (1)
A Concise Introduction To Numerical Analysis
304 pages
Fixed Point Signal Processing by W Paddget
100% (1)
Fixed Point Signal Processing by W Paddget
133 pages
Implementation of Fast Fourier Transform (FFT) Using VHDL
93% (30)
Implementation of Fast Fourier Transform (FFT) Using VHDL
71 pages
Control Systems Questions and Answers For Experienced
0% (1)
Control Systems Questions and Answers For Experienced
5 pages
Fixed-Point Signal Processing
No ratings yet
Fixed-Point Signal Processing
133 pages
(Nicholas J. Higham) Accuracy and Stability of Num
100% (1)
(Nicholas J. Higham) Accuracy and Stability of Num
710 pages
Pricelist 2006 Sennheiser
0% (1)
Pricelist 2006 Sennheiser
66 pages
Exam2-Problem 1 Part (A)
100% (1)
Exam2-Problem 1 Part (A)
15 pages
1 - Intro To DSP
No ratings yet
1 - Intro To DSP
155 pages
Project report-FFT1
No ratings yet
Project report-FFT1
25 pages
Fast Inverse Square Root
No ratings yet
Fast Inverse Square Root
12 pages
vs1003 (Mp3coder)
No ratings yet
vs1003 (Mp3coder)
56 pages
Enee241text0708 PDF
No ratings yet
Enee241text0708 PDF
298 pages
Course Note
No ratings yet
Course Note
121 pages
Design Analysis of PLL Components: A Thesis Submitted in Partial Fulfillment of The Requirements For The Degree of
No ratings yet
Design Analysis of PLL Components: A Thesis Submitted in Partial Fulfillment of The Requirements For The Degree of
38 pages
8.1. Image Compression
No ratings yet
8.1. Image Compression
121 pages
Pseudospectral Shattering, The Sign Function, and Diagonalization in Nearly Matrix Multiplication Time
No ratings yet
Pseudospectral Shattering, The Sign Function, and Diagonalization in Nearly Matrix Multiplication Time
89 pages
Altiverb 5 Manual
No ratings yet
Altiverb 5 Manual
30 pages
DSP Arch and Programming Lab Manual
No ratings yet
DSP Arch and Programming Lab Manual
7 pages
FFT Algorithms PDF
No ratings yet
FFT Algorithms PDF
37 pages
(CBMS-NSF Regional Conference Series in Applied Mathematics 33) Shmuel Winograd - Arithmetic Complexity of Computations-Society For Industrial and Applied Mathematics (1987)
No ratings yet
(CBMS-NSF Regional Conference Series in Applied Mathematics 33) Shmuel Winograd - Arithmetic Complexity of Computations-Society For Industrial and Applied Mathematics (1987)
100 pages
1981 Book FastFourierTransformAndConvolu
No ratings yet
1981 Book FastFourierTransformAndConvolu
260 pages
5 FFT NP Randomized
No ratings yet
5 FFT NP Randomized
72 pages
AM341
No ratings yet
AM341
118 pages
T FFT: A A W F C U: HE N Lgorithm THE Hole Amily AN SE
No ratings yet
T FFT: A A W F C U: HE N Lgorithm THE Hole Amily AN SE
33 pages
Philips Home Theatre System
No ratings yet
Philips Home Theatre System
2 pages
Slides06 FFT
No ratings yet
Slides06 FFT
60 pages
Enee 241 Text 0708
No ratings yet
Enee 241 Text 0708
298 pages
Curseng
No ratings yet
Curseng
230 pages
Pure FT
No ratings yet
Pure FT
52 pages
ADC SNR Performance by Kolsrud PDF
No ratings yet
ADC SNR Performance by Kolsrud PDF
37 pages
09 Andrew
No ratings yet
09 Andrew
60 pages
Current Amplifier
No ratings yet
Current Amplifier
4 pages
Fixed-Point Arithmetic: An Introduction
No ratings yet
Fixed-Point Arithmetic: An Introduction
13 pages
FFT PDF
No ratings yet
FFT PDF
17 pages
!VFFT
No ratings yet
!VFFT
5 pages
Fast Inverse Square Root
No ratings yet
Fast Inverse Square Root
8 pages
Signal and Image Processing For Electromagnetic Testing: Hapter
No ratings yet
Signal and Image Processing For Electromagnetic Testing: Hapter
17 pages
Theorist's Toolkit Lecture 10: Discrete Fourier Transform and Its Uses
No ratings yet
Theorist's Toolkit Lecture 10: Discrete Fourier Transform and Its Uses
9 pages
SLBS - SI3000 VoiceBand Codec With Microphone-Speaker Drive
No ratings yet
SLBS - SI3000 VoiceBand Codec With Microphone-Speaker Drive
35 pages
Fixed-Point Design: SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications
No ratings yet
Fixed-Point Design: SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications
57 pages
Signals, Spectra and Signal Processing
100% (1)
Signals, Spectra and Signal Processing
16 pages
FFT and It's App
No ratings yet
FFT and It's App
8 pages
Svyatoslav Covanov Rapport de Stage Recherche 2014
No ratings yet
Svyatoslav Covanov Rapport de Stage Recherche 2014
25 pages
McClellan and Parks - 1973 - A Unified A
No ratings yet
McClellan and Parks - 1973 - A Unified A
5 pages
Practical Considerations in Fixed-Point FIR Filter Implem
No ratings yet
Practical Considerations in Fixed-Point FIR Filter Implem
15 pages
Final Version
No ratings yet
Final Version
14 pages
6.976 High Speed Communication Circuits and Systems Advanced Frequency Synthesizers
No ratings yet
6.976 High Speed Communication Circuits and Systems Advanced Frequency Synthesizers
51 pages
Handout - 8 Digital Filter Design
No ratings yet
Handout - 8 Digital Filter Design
45 pages
Course Code: CS-71 Course Title: Computer Oriented Numerical Techniques
No ratings yet
Course Code: CS-71 Course Title: Computer Oriented Numerical Techniques
5 pages
BW Subwoofer ASW610XP
No ratings yet
BW Subwoofer ASW610XP
78 pages
DSP Mod2@AzDOCUMENTS - in
No ratings yet
DSP Mod2@AzDOCUMENTS - in
24 pages
M FFTAlgorithm
No ratings yet
M FFTAlgorithm
11 pages
Lab Notes 6
No ratings yet
Lab Notes 6
21 pages
Int Mult 08
No ratings yet
Int Mult 08
13 pages
FF Opt MPCrev 4
No ratings yet
FF Opt MPCrev 4
17 pages
Unit 4 - 2
No ratings yet
Unit 4 - 2
21 pages
Structures For Discrete-Time Systems: Block Diagram and Signal Flow Graph
No ratings yet
Structures For Discrete-Time Systems: Block Diagram and Signal Flow Graph
16 pages
155.FFT Ropec
No ratings yet
155.FFT Ropec
7 pages
Computation: A Modification of The Fast Inverse Square Root Algorithm
No ratings yet
Computation: A Modification of The Fast Inverse Square Root Algorithm
14 pages
Limits On Bandlimited Signals
No ratings yet
Limits On Bandlimited Signals
10 pages
Brent Elementary
No ratings yet
Brent Elementary
10 pages
Simple Computation of DIT FFT: International Journal of Advanced Research in Computer Science and Software Engineering
No ratings yet
Simple Computation of DIT FFT: International Journal of Advanced Research in Computer Science and Software Engineering
4 pages
Fixed Points An Introduction
No ratings yet
Fixed Points An Introduction
15 pages
Lab # 06 PDF
No ratings yet
Lab # 06 PDF
12 pages
Buch Gander Kwok
No ratings yet
Buch Gander Kwok
10 pages
Automatic Generation of Floating-Point Test Data
No ratings yet
Automatic Generation of Floating-Point Test Data
4 pages
Bstract: K Ikx
No ratings yet
Bstract: K Ikx
7 pages
Filter Design 2
No ratings yet
Filter Design 2
0 pages
The FFT Artigo 2000
No ratings yet
The FFT Artigo 2000
5 pages
Chapter 2 Supplymentary Notes
No ratings yet
Chapter 2 Supplymentary Notes
8 pages
Digital Watermarking: First A. Author, Second B. Author, JR., and Third C. Author, Member, IEEE
No ratings yet
Digital Watermarking: First A. Author, Second B. Author, JR., and Third C. Author, Member, IEEE
2 pages
Differential Protection of Power Transfo
No ratings yet
Differential Protection of Power Transfo
4 pages
Differential Pulse Code Modulation
No ratings yet
Differential Pulse Code Modulation
15 pages
Problem Set2 Itc
No ratings yet
Problem Set2 Itc
2 pages
Midterm 2016 Corr
No ratings yet
Midterm 2016 Corr
4 pages
Dead Band Effects in Digital Signal Processing: An: by G. Sai Teja
No ratings yet
Dead Band Effects in Digital Signal Processing: An: by G. Sai Teja
8 pages
FIJI Instructions
No ratings yet
FIJI Instructions
3 pages
Averaging Low-Pass Filter and Median Filter - Jupyter Notebook
No ratings yet
Averaging Low-Pass Filter and Median Filter - Jupyter Notebook
3 pages
15" / 1.4" 2-Way Professional Speaker System: Technische Informationen Architects and Engineers Specifications
No ratings yet
15" / 1.4" 2-Way Professional Speaker System: Technische Informationen Architects and Engineers Specifications
6 pages
Digital Protection of Power System PDF
No ratings yet
Digital Protection of Power System PDF
4 pages
Qu 16 Block Diagram - V1.8 - 1
No ratings yet
Qu 16 Block Diagram - V1.8 - 1
1 page
COMMUNICATION SYSTEMS
From Everand
COMMUNICATION SYSTEMS
B.P. Lathi
No ratings yet
Advanced Signal Integrity for High-Speed Digital Designs
From Everand
Advanced Signal Integrity for High-Speed Digital Designs
Stephen H. Hall
No ratings yet
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
From Everand
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
S. Twomey
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)

A Simple Fixed-Point Error Bound For The Fast Fourier Transform

Uploaded by

A Simple Fixed-Point Error Bound For The Fast Fourier Transform

Uploaded by

IEEE TRANSACTIONS ON ACOUSTICS,

SPEECH, AND SIGNAL

A Simple Fixed-point Error Bound for the

WILLIAM R. KNIGHT AND R. KAISER

0096-3518/79/1200-0615$00.75 0 1979 IEEE

Thisis our basic bound on the arithmetic noise/signal ratio

so that both (18a) and (1 8b) are bounded by

Manuscriut received December 29, 1977; revised October 30, 1978

0096-3518/79/1200-0620$00.75 01979 IEEE

You might also like