FFT PDF
FFT PDF
The main application of the DFT and the DCT is as tools to compute frequency
information in large datasets. It is therefore important that these operations
can be performed by efficient algorithms. Straightforward implementation from
the definition is not efficient if the data sets are large. However, it turns out
that the underlying matrices may be factored in a way that leads to much more
efficient algorithms, and this is the topic of the present chapter.
where x(e) , x(o) are the sequences of length N/2 consisting of the even and
108
odd samples of x, respectively. In other words,
where we have substituted x(e) and x(o) as in the text of the theorem, and
recognized the N/2-point DFT in two places. For the second half of the DFT
coefficients, i.e. {yN/2+n }0≤n≤N/2−1 , we similarly have
N −1 N −1
1 � 1 �
yN/2+n = √ xk e−2πi(N/2+n)k/N = √ xk e−πik e−2πink/N
N k=0 N k=0
N/2−1 N/2−1
1 � 1 �
=√ x2k e−2πin2k/N − √ x2k+1 e−2πin(2k+1)/N
N k=0 N k=0
N/2−1
1 1 �
=√ � x2k e−2πink/(N/2)
2 N/2 k=0
N/2−1
1 1 �
− e−2πin/N √ � x2k+1 e−2πink/(N/2)
2 N/2 k=0
1 � � 1 � �
= √ FN/2 x(e) − √ e−2πin/N FN/2 x(o) .
2 n 2 n
109
This concludes the proof.
It turns out that Theorem 4.1 can be interpreted as a matrix factorization.
For this we need to define the concept of a block matrix.
We will express the Fourier matrix in factored form involving block matrices.
The following observation is just a formal way to split a vector into its even and
odd components.
Let DN/2 be the (N/2) × (N/2)-diagonal matrix with entries (DN/2 )n,n =
e−2πin/N for n = 0, 1, . . . , N/2 − 1. It is clear from Equation (4.1) that the
first half of y is then given by obtained as
1 � �
√ FN/2 DN/2 FN/2 PN x,
2
and from Equation (4.2) that the second half of y can be obtained as
1 � �
√ FN/2 −DN/2 FN/2 PN x.
2
110
From these two formulas we can derive the promised factorisation of the Fourier
matrix.
Theorem 4.4 (DFT matrix factorization). The Fourier matrix may be fac-
tored as � �
1 FN/2 DN/2 FN/2
FN = √ PN . (4.3)
2 FN/2 −DN/2 FN/2
1 � � (e) �
N/2−1
yN/2 = √ (x )n − (x(o) )n
N n=0
to obtain coefficient N2 , since this is the only coefficient which can’t be obtained
from y0 , y1 , . . . , yN/2−1 by symmetry.
In an implementation based on formula (4.3), we would first compute PN x,
which corresponds to splitting x into the even-indexed and odd-indexed samples.
The two leftmost blocks in the block matrix in (4.3) correspond to applying the
2 -point DFT to the even samples. The two rightmost blocks correspond to
N
applying the N/2-point DFT to the odd samples, and multiplying the result
with DN/2 . The results from these transforms are finally added together. By
repeating the splitting we will eventually come to the case where N = 1. Then
F1 is just the scalar 1, so the DFT is the trivial assignment y0 = x0 . The FFT
can therefore be implemented by the following MATLAB code:
function y = FFTImpl(x)
N = length(x);
if N == 1
y = x(1);
else
xe = x(1:2:(N-1));
xo = x(2:2:N);
ye = FFTImpl(xe);
yo = FFTImpl(xo);
D=exp(-2*pi*1j*(0:N/2-1)’/N);
y = [ ye + yo.*D; ye - yo.*D]/sqrt(2);
end
Note that this function is recursive; it calls itself. If this is you first encounter
with a recursive program, it is worth running through the code for N = 4, say.
111
4.1.1 The Inverse Fast Fourier Transform (IFFT)
The IDFT is very similar to the DFT, and it is straightforward to prove the
following analog to Theorem 4.1 and (4.3).
Theorem 4.5 (IDFT matrix factorization). The inverse of the Fourier matrix
can be factored as
� �
1 (FN/2 )H EN/2 (FN/2 )H
(FN )H = √ PN , (4.4)
2 (FN/2 )H −EN/2 (FN/2 )H
We note that the only difference between the factored forms of FN and FNH
is the positive exponent in e2πin/N . With this in mind it is straightforward to
modify FFTImpl.m so that it performs the inverse DFT.
MATLAB has built-in functions for computing the DFT and the IDFT,
called fft and ifft.
√ Note, however, that these functions do not used the
normalization 1/ N that we have adopted here. The MATLAB help pages give
a short description of these algorithms. Note in particular that MATLAB makes
no assumption about the length of the vector. MATLAB may however check
if the length of the vector is 2r , and in those cases a variant of the algorithm
discussed here is used. In general, fast algorithms exist when the vector length
N can be factored as a product of small integers.
Many audio and image formats make use of the FFT. To get optimal speed
these algorithms typically split the signals into blocks of length 2r with r some
integer in the range 5–10 and utilise a suitable variant of the algorithms discussed
above.
112
grows as f (N ) for large N , or more precisely, if
RN
lim = c > 0.
N →∞ f (N )
We will also use this notation for functions, and say that a real function g
is O(f (x)) if lim g(x)/f (x) = 0 where the limit mostly will be taken as x → 0
(this means that g(x) is much smaller than f (x) when x approaches the limit).
Let us see how we can use this terminology to describe the complexity of
the FFT algorithm. Let MN be the number of multiplications needed by the
N -point FFT as defined by Theorem 4.1. It is clear from the algorithm that
The factor 2 corresponds to the two matrix multiplications, while the term N/2
denotes the multiplications in the exponent of the exponentials that make up
the matrix DN/2 (or EN/2 ) — the factor 2πi/N may be computed once and for
all outside the loops. We have not counted the multiplications with 1/sqrt(2).
The reason is that, in most implementations, this factor is absorbed in the
definition of the DFT itself.
Note that all multiplications performed by the FFT are complex. It is normal
to count the number of real multiplications instead, since any multiplication of
two complex numbers can be performed as four multiplications of real numbers
(and two additions), by writing the number in terms of its real and imaginary
part, and myltiplying them together. Therefore, if we instead define MN to be
the number of real multiplications required by the FFT, we obtain the alterna-
tive recurrence relation
MN = 2MN/2 + 2N. (4.6)
In Exercise 1 you will be asked to derive the solution of this equation and
show that the number of real multiplications required by this algorithm is
O(2N log2 N ). In contrast, the direct implementation of the DFT requires N 2
complex multiplications, and thus 4N 2 real multiplications. The exact same
numbers are found for the IFFT.
Theorem 4.7 (Number of operations in the FFT and IFFT algorithms). The
N -point FFT and IFFT algorithms both require O(2N log2 N ) real multipli-
cations. In comparison, the number of real multiplications required by direct
implementations of the N -point DFT and IDFT is 4N 2 .
In other words, the FFT and IFFT significantly reduce the number of mul-
tiplications, and one can show in a similar way that the number of additions
required by the algorithm is also roughly O(N log2 N ). This partially explains
the efficiency of the FFT algorithm. Another reason is that since the FFT splits
the calculation of the DFT into computing two DFT’s of half the size, the FFT
113
is well suited for parallel computing: the two smaller FFT’s can be performed
independently of one another, for instance in two different computing cores on
the same computer.
Since filters are diagonalized by the DFT, it may be tempting to implement
a filter by applying an FFT, multiplying with the frequency response, and then
apply the IFFT. This is not usually done, however. The reason is that most
filters have too few nonzero coefficients for this approach to be efficient — it
is then better to use the direct algorithm for the DFT, since this may lead to
fewer multiplications than the O(N log2 N ) required by the FFT.
c. Explain why MN = O(2N log2 N ) (you do not need to write down the
initial conditions for the difference equation in order to find the particular
solution).
Ex. 3 — Write down a difference equation for computing the number of real
additions required by the FFT algorithm.
Ex. 4 — It is of course not always the case that the number of points in a
DFT is N = 2n . In this exercise we will see how we can attack the more general
case.
a. Assume that N can be divided by 3, and consider the following splitting,
which follows in the same way as the splitting used in the deduction of
114
the FFT-algorithm:
N −1
1 �
yn = √ xk e−2πink/N
N k=0
N/3−1 N/3−1
1 � −2πin3k/N 1 �
=√ x3k e +√ x3k+1 e−2πin(3k+1)/N
N k=0 N k=0
N/3−1
1 �
+√ x3k+2 e−2πin(3k+2)/N
N k=0
115
√
where c0,N = 1 and cn,N = 2 for n ≥ 1, and where x(1) ∈ RN is defined by
Splitting this sum into two sums, where the indices are even and odd, we get
N/2−1
� � � ��
n 1
yn = dn,N x2k cos 2π 2k +
2N 2
k=0
N/2−1
� � � ��
n 1
+ dn,N x2k+1 cos 2π 2k + 1 + .
2N 2
k=0
If we then also shift the indices with N/2 in this sum, we get
N
� −1 � � ��
n 1
dn,N x2N −2k−1 cos 2π 2N − 2k − 1 +
2N 2
k=N/2
N
� −1 � � ��
n 1
= dn,N x2N −2k−1 cos 2π 2k + ,
2N 2
k=N/2
where we used that cos is symmetric and periodic with period 2π. We see that
we now have the same cos-terms in the two sums. If we thus define the vector
116
x(1) as in the text of the theorem, we see that we can write
N
� −1 � � ��
n 1
yn = dn,N (x(1) )k cos 2π 2k +
2N 2
k=0
�N −1 �
�
(1) −2πin(2k+ 12 )/(2N )
= dn,N � (x ) ek
k=0
� N −1
�
√ 1 � (1) −2πink/N
= N dn,N � e−πin/(2N ) √ (x )k e
N k=0
� �
= cn,N � e−πin/(2N ) (FN x(1) )n
� � n � � n � �
= cn,N cos π �((FN x(1) )n ) + sin π �((FN x(1) )n ) ,
2N 2N
√
where we have recognized the N -point DFT, and where cn,N √ = N dn,N . In-
serting the values for dn,N , we see that c0,N = 1 and cn,N = 2 for n ≥ 1, which
agrees with the definition of cn,N in the theorem. This completes the proof.
With the result above we have avoided computing a DFT of double size.
If we in the proof above define the N × N -diagonal matrix QN by Qn,n =
cn,N e−πin/(2N ) , the result can also be written on the more compact form
� �
y = DN x = � QN FN x(1) .
We will, however, not use this form, since there is complex arithmetic involved,
contrary to (4.7). Let us see how we can use (4.7) to implement the DCT, once
we already have implemented the DFT in terms of the function FFTImpl as in
Section 4.1:
function y = DCTImpl(x)
N = length(x);
if N == 1
y = x;
else
x1 = [x(1:2:(N-1)); x(N:(-2):2)];
y = FFTImpl(x1);
rp = real(y);
ip = imag(y);
y = cos(pi*((0:(N-1))’)/(2*N)).*rp + sin(pi*((0:(N-1))’)/(2*N)).*ip;
y(2:N) = sqrt(2)*y(2:N);
end
In the code, the vector x(1) is created first by rearranging the components, and
it is sent as input to FFTImpl. After this we take real parts and imaginary parts,
and multiply with the cos- and sin-terms in (4.7).
117
4.2.1 Efficient implementations of the IDCT
As with the FFT, it is straightforward to modify the DCT implementation so
that it returns the IDCT. To see how we can do this, write from Theorem 4.8,
for n ≥ 1
� � n � � n � �
yn = cn,N cos π �((FN x(1) )n ) + sin π �((FN x(1) )n )
� 2N � � 2N � � �
N −n (1) N −n
yN −n = cN −n,N cos π �((FN x )N −n ) + sin π �((FN x(1) )N −n )
2N 2N
� � n � � n � �
(1)
= cn,N sin π �((FN x )n ) − cos π �((FN x(1) )n ) ,
2N 2N
where we have used the symmetry of FN for real signals. These two equations
enable us to determine �((FN x(1) )n ) and �((FN x(1) )n ) from yn and yN −n . We
get
� n � � n �
cos π yn + sin π yN −n = cn,N �((FN x(1) )n )
2N
� n � 2N
� n �
sin π yn − cos π yN −n = cn,N �((FN x(1) )n ).
2N 2N
Adding we get
� n � � n � � n � � n �
cn,N (FN x(1) )n = cos π yn + sin π yN −n + i(sin π yn − cos π yN −n )
� 2N
n � � n 2N� 2N 2N
=(cos π + i sin π )(yn − iyN −n ) = eπin/(2N ) (yn − iyN −n ).
2N 2N
This means that (FN x(1) )n = 1
cn,N e
πin/(2N )
(yn − iyN −n ) for n ≥ 1. For n = 0,
since �((FN x(1) )n ) = 0 we have that (FN x(1) )0 = c0,N 1
y0 . This means that
x(1) can be recovered by taking the IDFT of the vector with component 0 being
c0,N y0 = y0 , and the remaining components being cn,N e
1 1 πin/(2N )
(yn − iyN −n ):
Theorem 4.9 (IDCT algorithm). Let x = (DN )T y be the IDCT of y. and let
z be the vector with component 0 being c0,N
1
y0 , and the remaining components
being 1
cn,N e
πin/(2N )
(yn − iyN −n ). Then we have that
x(1) = (FN )H z,
118
else
Q=exp(pi*1i*((0:(N-1))’)/(2*N));
Q(2:N)=Q(2:N)/sqrt(2);
yrev=y(N:(-1):2);
toapply=[ y(1); Q(2:N).*(y(2:N)-1i*yrev) ];
x1=IFFTImpl(toapply);
x=zeros(N,1);
x(1:2:(N-1))=x1(1:(N/2));
x(2:2:N)=x1(N:(-1):(N/2+1));
end
MATLAB also has a function for computing the DCT and IDCT, called dct,
and idct. These functions are defined in MATLAB exactly as they are here,
contrary to the case for the FFT.
Since the DCT and IDCT can be implemented using the FFT and IFFT,
it has the same advantages as the FFT when it comes to parallel computing.
Much literature is devoted to reducing the number of multiplications in the
DFT and the DCT even further than what we have done. In the next section
119
we will show an example on how this can be achieved, with the help of extra
work and some tedious math. Some more notes on computational complexity
are in order. For instance, we have not counted the operations sin and cos in
the DCT. The reason is that these values can be precomputed, since we take the
sine and cosine of a specific set of values for each DCT or DFT of a given size.
This is contrary to to multiplication and addition, since these include the input
values, which are only known at runtime. We have, however, not written down
that we use precomputed arrays for sine and cosine in our algorithms: This is
an issue to include in more optimized algorithms. Another point has to do with
multiplication of √1N . As long as N = 22r , multiplication with N need not be
considered as a multiplication, since it can be implemented using a bitshift.
120
Proof. Taking real and imaginary parts in (4.1) we obtain
1 1
�(yn ) = √ �((FN/2 x(e) )n + √ �((DN/2 FN/2 x(o) )n )
2 2
1 1
�(yn ) = √ �((FN/2 x(e) )n + √ �((DN/2 FN/2 x(o) )n ).
2 2
These equations explain the first parts on the right hand side in (4.8) and (4.9).
Furthermore, for 0 ≤ n ≤ N/4 − 1 we can write
where we have used that cos is periodic with period 2π and symmetric, where z is
the vector defined in the text of the theorem, where we have recognized the DCT
matrix, and where E0 is a diagonal matrix with diagonal entries (E0 )0,0 = √12
and (E0 )n,n = 12 for n ≥ 1 (E0 absorbs the factor √ 1 , and the factor dn,N
N/2
from the DCT). By absorbing the additional factor √12 , we get a matrix E as
stated in the theorem. For N/4 + 1 ≤ n ≤ N/2 − 1, everything above but the
last statement is valid. We can now use that
� � � � �� ��
N
n(k + 12 ) 2 −n k + 12
cos 2π = − cos 2π
N/2 N/2
to arrive at −(E0 DN/4 z)N/2−n instead. For the case n = N4 all the cosine
entries are zero, and this completes (4.8). For the imaginary part, we obtain as
121
above
where we have used that sin is periodic with period 2π and anti-symmetric, that
� � � �
n(k + 12 ) π n(k + 12 )
sin 2π = cos − 2π
N/2 2 N/2
� �
(N/4 − n)(k + 12 )
= cos 2π − kπ
N/2
� �
(N/4 − n)(k + 12 )
= (−1)k cos 2π ,
N/2
When n = 0 this is 0 since all the cosines entries are zero. When 1 ≤ n ≤ N/4
this is (E0 DN/4 w)N/4−n , where w is the vector defined as in the text of the
theorem. For N/4 ≤ n ≤ N/2 − 1 we arrive instead at (E0 DN/4 z)n−N/4 ,
similarly to as above. This also proves (4.9), and the proof is done.
As for Theorem 4.1, this theorem says nothing about the coefficients yn for
n > N2 . These are obtained in the same way as before through symmetry. The
theorem also says nothing about yN/2 . This can be obtained with the same
formula as in Theorem 4.1.
It is more difficult to obtain a matrix interpretation for Theorem 4.11, so
we will only sketch an algorithm which implements it. The following code
implements the recursive formulas for �FN and �FN in the theorem:
function y = FFTImpl2(x)
N = length(x);
if N == 1
y = x;
elseif N==2
y = 1/sqrt(2)*[x(1) + x(2); x(1) - x(2)];
else
xe = x(1:2:(N-1));
xo = x(2:2:N);
yx = FFTImpl2(xe);
z = x(N:(-2):(N/2+2))+x(2:2:(N/2));
dctz = DCTImpl(z);
dctz(1)=dctz(1)/2;
122
dctz(2:length(dctz)) = dctz(2:length(dctz))/(2*sqrt(2));
w = (-1).^((0:(N/4-1))’).*(x(N:-2:(N/2+2))-x(2:2:(N/2)));
dctw = DCTImpl(w);
dctw(1)=dctw(1)/2;
dctw(2:length(dctw)) = dctw(2:length(dctw))/(2*sqrt(2));
y = yx/sqrt(2);
y(1:(N/4))=y(1:(N/4))+dctz;
if (N>4)
y((N/4+2):(N/2))=y((N/4+2):(N/2))-dctz((N/4):(-1):2);
y(2:(N/4))=y(2:(N/4))+1j*dctw((N/4):(-1):2);
end
y((N/4+1):(N/2))=y((N/4+1):(N/2))+1j*dctw;
y = [y; ...
sum(xe-xo)/sqrt(N); ...
conj(y((N/2):(-1):2))];
end
In addition, we need to change the code for DCTImpl so that it calls FFTImpl2
instead of FFTImpl. The following can now be shown:
This is a big reduction from the O(2N log2 N ) required by the FFT algorithm
from Theorem 4.1. We will not prove Theorem 4.12. Instead we will go through
the steps in a proof in Exercise 3. The revised FFT has yet a bigger advantage
that the FFT when it comes to parallel computing: It splits the computation
into, not two FFT computations, but three computations (one of which is an
FFT, the other two DCT’s). This makes it even easier to make use of many
cores on computers which have support for this.
123
Ex. 2 — Explain why, if FFTImpl needs MN multiplications AN additions,
then the number of multiplications and additions required by DCTImpl are MN +
2N and AN + N , respectively.
4.3 Summary
We obtained an implementation of the DFT which is more efficient in terms of
the number of arithmetic operations than a direct implementation of the DFT.
We also showed that this could be used for obtaining an efficient implementation
of the DCT.
124