0% found this document useful (0 votes)
79 views17 pages

FFT PDF

This section discusses efficient algorithms for computing the discrete Fourier transform (DFT) and discrete cosine transform (DCT), which are important for analyzing frequency information in large datasets. It introduces the fast Fourier transform (FFT) algorithm, which speeds up computation of the DFT by factorizing the Fourier matrix. The FFT reduces an N-point DFT to two N/2-point DFTs, recursively applying this factorization. Pseudocode for an FFT implementation is provided. The inverse FFT is also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views17 pages

FFT PDF

This section discusses efficient algorithms for computing the discrete Fourier transform (DFT) and discrete cosine transform (DCT), which are important for analyzing frequency information in large datasets. It introduces the fast Fourier transform (FFT) algorithm, which speeds up computation of the DFT by factorizing the Fourier matrix. The FFT reduces an N-point DFT to two N/2-point DFTs, recursively applying this factorization. Pseudocode for an FFT implementation is provided. The inverse FFT is also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 4

Implementation of the DFT


and the DCT

The main application of the DFT and the DCT is as tools to compute frequency
information in large datasets. It is therefore important that these operations
can be performed by efficient algorithms. Straightforward implementation from
the definition is not efficient if the data sets are large. However, it turns out
that the underlying matrices may be factored in a way that leads to much more
efficient algorithms, and this is the topic of the present chapter.

4.1 The Fast Fourier Transform (FFT)


In this section we will discuss the most widely used implementation of the DFT,
which is usually referred to as the Fast Fourier Transform (FFT). For simplicity,
we will assume that N , the length of the vector that is to be transformed by
the DFT, is a power of 2. In this case it is relatively easy to simplify the DFT
algorithm via a factorisation of the Fourier matrix. The foundation is provided
by a simple reordering of the DFT.

Theorem 4.1 (FFT algorithm). Let y = FN x be the N -point DFT of x with


N an even number. Foran any integer n in the interval [0, N/2 − 1] the DFT
y of x is then given by
1 � �
yn = √ (FN/2 x(e) )n + e−2πin/N (FN/2 x(o) )n , (4.1)
2
1 � �
yN/2+n = √ (FN/2 x(e) )n − e−2πin/N (FN/2 x(o) )n , (4.2)
2

where x(e) , x(o) are the sequences of length N/2 consisting of the even and

108
odd samples of x, respectively. In other words,

(x(e) )k = x2k for 0 ≤ k ≤ N/2 − 1,


(x(o) )k = x2k+1 for 0 ≤ k ≤ N/2 − 1.

Put differently, the formulas (4.1)–(4.2) reduces the computation of an N -


point DFT to 2 N/2-point DFT’s. It turns out that this can speed up compu-
tations considerably, but let us first check that these formulas are correct.
Proof. Suppose first that 0 ≤ n ≤ N/2 − 1. We start by splitting the sum in
the expression for the DFT into even and odd indices,
N −1
1 �
yn = √ xk e−2πink/N
N k=0
N/2−1 N/2−1
1 � 1 �
=√ x2k e−2πin2k/N + √ x2k+1 e−2πin(2k+1)/N
N k=0 N k=0
N/2−1
1 1 �
=√ � x2k e−2πink/(N/2)
2 N/2 k=0
N/2−1
1 1 �
+ e−2πin/N √ � x2k+1 e−2πink/(N/2)
2 N/2 k=0
1 � � 1 � �
= √ FN/2 x(e) + √ e−2πin/N FN/2 x(o) ,
2 n 2 n

where we have substituted x(e) and x(o) as in the text of the theorem, and
recognized the N/2-point DFT in two places. For the second half of the DFT
coefficients, i.e. {yN/2+n }0≤n≤N/2−1 , we similarly have
N −1 N −1
1 � 1 �
yN/2+n = √ xk e−2πi(N/2+n)k/N = √ xk e−πik e−2πink/N
N k=0 N k=0
N/2−1 N/2−1
1 � 1 �
=√ x2k e−2πin2k/N − √ x2k+1 e−2πin(2k+1)/N
N k=0 N k=0
N/2−1
1 1 �
=√ � x2k e−2πink/(N/2)
2 N/2 k=0
N/2−1
1 1 �
− e−2πin/N √ � x2k+1 e−2πink/(N/2)
2 N/2 k=0
1 � � 1 � �
= √ FN/2 x(e) − √ e−2πin/N FN/2 x(o) .
2 n 2 n

109
This concludes the proof.
It turns out that Theorem 4.1 can be interpreted as a matrix factorization.
For this we need to define the concept of a block matrix.

Definition 4.2. Let m0 , . . . , mr−1 and n0 , . . . , ns−1 be integers, and let


A(i,j) be an mi × nj -matrix for i = 0, . . . , r − 1 and j = 0, . . . , s − 1. The
notation  
A(0,0) A(0,1) ··· A(0,s−1)
 A(1,0) A(1,1) ··· A(1,s−1) 
 
A= .. .. .. .. 
 . . . . 
A(r−1,0) A(r−1,1) · · · A(r−1,s−1)
denotes the (m0 + m1 + . . . + mr−1 ) × (n0 + n1 + . . . + ns−1 )-matrix where
the matrix entries occur as in the A(i,j) matrices, in the way they are ordered,
and with solid lines indicating borders between the blocks. When A is written
in this way it is referred to as a block matrix.

We will express the Fourier matrix in factored form involving block matrices.
The following observation is just a formal way to split a vector into its even and
odd components.

Observation 4.3. Define the permutation matrix PN by

(PN )i,2i = 1, for 0 ≤ i ≤ N/2 − 1;


(PN )i,2i−N +1 = 1, for N/2 ≤ i < N ;
(PN )i,j = 0, for all other i and j;

and let x be a column vector. The mapping x → P x permutes the components


of x so that the even components are placed first and the odd components last,
� (e) �
x
PN x = ,
x(o)

with x(e) , x(o) defined as in Theorem 4.1.

Let DN/2 be the (N/2) × (N/2)-diagonal matrix with entries (DN/2 )n,n =
e−2πin/N for n = 0, 1, . . . , N/2 − 1. It is clear from Equation (4.1) that the
first half of y is then given by obtained as
1 � �
√ FN/2 DN/2 FN/2 PN x,
2
and from Equation (4.2) that the second half of y can be obtained as
1 � �
√ FN/2 −DN/2 FN/2 PN x.
2

110
From these two formulas we can derive the promised factorisation of the Fourier
matrix.

Theorem 4.4 (DFT matrix factorization). The Fourier matrix may be fac-
tored as � �
1 FN/2 DN/2 FN/2
FN = √ PN . (4.3)
2 FN/2 −DN/2 FN/2

This factorization in terms of block matrices is commonly referred to as the


FFT factorization of the Fourier matrix. In implementations, this factorization
is typically repeated, so that FN/2 is replaced with a factorization in terms of
FN/4 , this again with a factorization in terms of FN/8 , and so on.
The input vector x to the FFT algorithm is mostly assumed to be real. In
this case, the second half of the FFT factorization can be simplified, since we
have shown that the second half of the Fourier coefficients can be obtained by
symmetry from the first half. In addition we need the formula

1 � � (e) �
N/2−1
yN/2 = √ (x )n − (x(o) )n
N n=0
to obtain coefficient N2 , since this is the only coefficient which can’t be obtained
from y0 , y1 , . . . , yN/2−1 by symmetry.
In an implementation based on formula (4.3), we would first compute PN x,
which corresponds to splitting x into the even-indexed and odd-indexed samples.
The two leftmost blocks in the block matrix in (4.3) correspond to applying the
2 -point DFT to the even samples. The two rightmost blocks correspond to
N

applying the N/2-point DFT to the odd samples, and multiplying the result
with DN/2 . The results from these transforms are finally added together. By
repeating the splitting we will eventually come to the case where N = 1. Then
F1 is just the scalar 1, so the DFT is the trivial assignment y0 = x0 . The FFT
can therefore be implemented by the following MATLAB code:
function y = FFTImpl(x)
N = length(x);
if N == 1
y = x(1);
else
xe = x(1:2:(N-1));
xo = x(2:2:N);
ye = FFTImpl(xe);
yo = FFTImpl(xo);
D=exp(-2*pi*1j*(0:N/2-1)’/N);
y = [ ye + yo.*D; ye - yo.*D]/sqrt(2);
end
Note that this function is recursive; it calls itself. If this is you first encounter
with a recursive program, it is worth running through the code for N = 4, say.

111
4.1.1 The Inverse Fast Fourier Transform (IFFT)
The IDFT is very similar to the DFT, and it is straightforward to prove the
following analog to Theorem 4.1 and (4.3).

Theorem 4.5 (IDFT matrix factorization). The inverse of the Fourier matrix
can be factored as
� �
1 (FN/2 )H EN/2 (FN/2 )H
(FN )H = √ PN , (4.4)
2 (FN/2 )H −EN/2 (FN/2 )H

where EN/2 is the (N/2) × (N/2)-diagonal matrix with entries given by


(EN/2 )n,n = e2πin/N , for n = 0, 1, . . . , N/2 − 1.

We note that the only difference between the factored forms of FN and FNH
is the positive exponent in e2πin/N . With this in mind it is straightforward to
modify FFTImpl.m so that it performs the inverse DFT.
MATLAB has built-in functions for computing the DFT and the IDFT,
called fft and ifft.
√ Note, however, that these functions do not used the
normalization 1/ N that we have adopted here. The MATLAB help pages give
a short description of these algorithms. Note in particular that MATLAB makes
no assumption about the length of the vector. MATLAB may however check
if the length of the vector is 2r , and in those cases a variant of the algorithm
discussed here is used. In general, fast algorithms exist when the vector length
N can be factored as a product of small integers.
Many audio and image formats make use of the FFT. To get optimal speed
these algorithms typically split the signals into blocks of length 2r with r some
integer in the range 5–10 and utilise a suitable variant of the algorithms discussed
above.

4.1.2 Reduction in the number of multiplications with the


FFT
Before we continue we also need to explain why the FFT and IFFT factoriza-
tions lead to more efficient implementations than the direct DFT and IDFT
implementations. We first need some terminology for how we count the number
of operations of a given type in an algorithm. In particular we are interested
in in the limiting behaviour when N becomes large, which is the motivation for
the following definition.

Definition 4.6 (Order of an algorithm). Let RN be the number of operations


of a given type (such as multiplication, addition) in an algorithm, where N
describes the dimension of the data in the algorithm (such as the size of the
matrix or length of the vector), and let f be a positive function. The algorithm
is said to be of order N , which is written O(f (N )), if the number of operations

112
grows as f (N ) for large N , or more precisely, if

RN
lim = c > 0.
N →∞ f (N )

We will also use this notation for functions, and say that a real function g
is O(f (x)) if lim g(x)/f (x) = 0 where the limit mostly will be taken as x → 0
(this means that g(x) is much smaller than f (x) when x approaches the limit).
Let us see how we can use this terminology to describe the complexity of
the FFT algorithm. Let MN be the number of multiplications needed by the
N -point FFT as defined by Theorem 4.1. It is clear from the algorithm that

MN = 2MN/2 + N/2. (4.5)

The factor 2 corresponds to the two matrix multiplications, while the term N/2
denotes the multiplications in the exponent of the exponentials that make up
the matrix DN/2 (or EN/2 ) — the factor 2πi/N may be computed once and for
all outside the loops. We have not counted the multiplications with 1/sqrt(2).
The reason is that, in most implementations, this factor is absorbed in the
definition of the DFT itself.
Note that all multiplications performed by the FFT are complex. It is normal
to count the number of real multiplications instead, since any multiplication of
two complex numbers can be performed as four multiplications of real numbers
(and two additions), by writing the number in terms of its real and imaginary
part, and myltiplying them together. Therefore, if we instead define MN to be
the number of real multiplications required by the FFT, we obtain the alterna-
tive recurrence relation
MN = 2MN/2 + 2N. (4.6)
In Exercise 1 you will be asked to derive the solution of this equation and
show that the number of real multiplications required by this algorithm is
O(2N log2 N ). In contrast, the direct implementation of the DFT requires N 2
complex multiplications, and thus 4N 2 real multiplications. The exact same
numbers are found for the IFFT.

Theorem 4.7 (Number of operations in the FFT and IFFT algorithms). The
N -point FFT and IFFT algorithms both require O(2N log2 N ) real multipli-
cations. In comparison, the number of real multiplications required by direct
implementations of the N -point DFT and IDFT is 4N 2 .

In other words, the FFT and IFFT significantly reduce the number of mul-
tiplications, and one can show in a similar way that the number of additions
required by the algorithm is also roughly O(N log2 N ). This partially explains
the efficiency of the FFT algorithm. Another reason is that since the FFT splits
the calculation of the DFT into computing two DFT’s of half the size, the FFT

113
is well suited for parallel computing: the two smaller FFT’s can be performed
independently of one another, for instance in two different computing cores on
the same computer.
Since filters are diagonalized by the DFT, it may be tempting to implement
a filter by applying an FFT, multiplying with the frequency response, and then
apply the IFFT. This is not usually done, however. The reason is that most
filters have too few nonzero coefficients for this approach to be efficient — it
is then better to use the direct algorithm for the DFT, since this may lead to
fewer multiplications than the O(N log2 N ) required by the FFT.

Exercises for Section 4.1

Ex. 1 — In this exercise we will compute the number of real multiplications


needed by the FFT algorithm given in the text. The starting point will be the
difference equation (4.6) for the number of real multiplications for an N -point
FFT.
a. Explain why xr = M2r is the solution to the difference equation xr+1 −
2xr = 4 · 2r .

b. Show that the general solution to the difference equation is xr = 2r2r +


C2r .

c. Explain why MN = O(2N log2 N ) (you do not need to write down the
initial conditions for the difference equation in order to find the particular
solution).

Ex. 2 — When we wrote down the difference equation MN = 2MN/2 + 2N for


the number of multiplications in the FFT algorithm, you could argue that some
multiplications were not counted. Which multiplications in the FFT algorithm
were not counted when writng down this difference equation? Do you have a
suggestion to why these multiplications were not counted?

Ex. 3 — Write down a difference equation for computing the number of real
additions required by the FFT algorithm.

Ex. 4 — It is of course not always the case that the number of points in a
DFT is N = 2n . In this exercise we will see how we can attack the more general
case.
a. Assume that N can be divided by 3, and consider the following splitting,
which follows in the same way as the splitting used in the deduction of

114
the FFT-algorithm:
N −1
1 �
yn = √ xk e−2πink/N
N k=0
N/3−1 N/3−1
1 � −2πin3k/N 1 �
=√ x3k e +√ x3k+1 e−2πin(3k+1)/N
N k=0 N k=0
N/3−1
1 �
+√ x3k+2 e−2πin(3k+2)/N
N k=0

Find a formula which computes y0 , y1 , . . . , yN/3−1 by performing 3 DFT’s


of size N/3.
b. Find similar formulas for computing yN/3 , yN/3+1 , . . . , y2N/3−1 , and y2N/3 , y2N/3 +
1, . . . , yN −1 . State a similar factorization of the DFT matrix as in The-
orem 4.4, but this time where the matrix has 3 × 3 blocks.
c. Assume that N = 3n , and that you implement the FFT using the formu-
las you have deduced in a. and b.. How many multiplications does this
algorithm require?
d. Sketch a general procedure for speeding up the computation of the DFT,
which uses the factorization of N into a product of prime numbers.

4.2 Efficient implementations of the DCT


In the preceding section we defined the DCT by expressing it in terms of the
DFT. In particular, we can apply efficient implementations of the DFT, which
we will shortly look at. However, the way we have defined the DCT, there
is a penalty in that we need to compute a DFT of twice the length. We are
also forced to use complex arithmetic (note that any complex multiplication
corresponds to 4 real multiplications, and that any complex addition corresponds
to 2 real additions). Is there a way to get around these penalties, so that we can
get an implementation of the DCT which is more efficient, and uses less additions
and multiplications than the one you made in Exercise 1? The following theorem
states an expression of the DCT which achieves this. This expression is, together
with a similar result for the DFT in the next section, much used in practical
implementations:

Theorem 4.8 (DCT algorithm). Let y = DN x be the N -point DCT of the


vector x. Then we have that
� � n � � n � �
yn = cn,N cos π �((FN x(1) )n ) + sin π �((FN x(1) )n ) , (4.7)
2N 2N

115

where c0,N = 1 and cn,N = 2 for n ≥ 1, and where x(1) ∈ RN is defined by

(x(1) )k = x2k for 0 ≤ k ≤ N/2 − 1


(x(1) )N −k−1 = x2k+1 for 0 ≤ k ≤ N/2 − 1,

Proof. The N -point DCT of x is


N
� −1 � � ��
n 1
yn = dn,N xk cos 2π k+ .
2N 2
k=0

Splitting this sum into two sums, where the indices are even and odd, we get
N/2−1
� � � ��
n 1
yn = dn,N x2k cos 2π 2k +
2N 2
k=0
N/2−1
� � � ��
n 1
+ dn,N x2k+1 cos 2π 2k + 1 + .
2N 2
k=0

If we reverse the indices in the second sum, this sum becomes


N/2−1
� � � ��
n 1
dn,N xN −2k−1 cos 2π N − 2k − 1 + .
2N 2
k=0

If we then also shift the indices with N/2 in this sum, we get
N
� −1 � � ��
n 1
dn,N x2N −2k−1 cos 2π 2N − 2k − 1 +
2N 2
k=N/2
N
� −1 � � ��
n 1
= dn,N x2N −2k−1 cos 2π 2k + ,
2N 2
k=N/2

where we used that cos is symmetric and periodic with period 2π. We see that
we now have the same cos-terms in the two sums. If we thus define the vector

116
x(1) as in the text of the theorem, we see that we can write
N
� −1 � � ��
n 1
yn = dn,N (x(1) )k cos 2π 2k +
2N 2
k=0
�N −1 �

(1) −2πin(2k+ 12 )/(2N )
= dn,N � (x ) ek
k=0
� N −1

√ 1 � (1) −2πink/N
= N dn,N � e−πin/(2N ) √ (x )k e
N k=0
� �
= cn,N � e−πin/(2N ) (FN x(1) )n
� � n � � n � �
= cn,N cos π �((FN x(1) )n ) + sin π �((FN x(1) )n ) ,
2N 2N

where we have recognized the N -point DFT, and where cn,N √ = N dn,N . In-
serting the values for dn,N , we see that c0,N = 1 and cn,N = 2 for n ≥ 1, which
agrees with the definition of cn,N in the theorem. This completes the proof.

With the result above we have avoided computing a DFT of double size.
If we in the proof above define the N × N -diagonal matrix QN by Qn,n =
cn,N e−πin/(2N ) , the result can also be written on the more compact form
� �
y = DN x = � QN FN x(1) .

We will, however, not use this form, since there is complex arithmetic involved,
contrary to (4.7). Let us see how we can use (4.7) to implement the DCT, once
we already have implemented the DFT in terms of the function FFTImpl as in
Section 4.1:
function y = DCTImpl(x)
N = length(x);
if N == 1
y = x;
else
x1 = [x(1:2:(N-1)); x(N:(-2):2)];
y = FFTImpl(x1);
rp = real(y);
ip = imag(y);
y = cos(pi*((0:(N-1))’)/(2*N)).*rp + sin(pi*((0:(N-1))’)/(2*N)).*ip;
y(2:N) = sqrt(2)*y(2:N);
end

In the code, the vector x(1) is created first by rearranging the components, and
it is sent as input to FFTImpl. After this we take real parts and imaginary parts,
and multiply with the cos- and sin-terms in (4.7).

117
4.2.1 Efficient implementations of the IDCT
As with the FFT, it is straightforward to modify the DCT implementation so
that it returns the IDCT. To see how we can do this, write from Theorem 4.8,
for n ≥ 1
� � n � � n � �
yn = cn,N cos π �((FN x(1) )n ) + sin π �((FN x(1) )n )
� 2N � � 2N � � �
N −n (1) N −n
yN −n = cN −n,N cos π �((FN x )N −n ) + sin π �((FN x(1) )N −n )
2N 2N
� � n � � n � �
(1)
= cn,N sin π �((FN x )n ) − cos π �((FN x(1) )n ) ,
2N 2N
where we have used the symmetry of FN for real signals. These two equations
enable us to determine �((FN x(1) )n ) and �((FN x(1) )n ) from yn and yN −n . We
get
� n � � n �
cos π yn + sin π yN −n = cn,N �((FN x(1) )n )
2N
� n � 2N
� n �
sin π yn − cos π yN −n = cn,N �((FN x(1) )n ).
2N 2N
Adding we get
� n � � n � � n � � n �
cn,N (FN x(1) )n = cos π yn + sin π yN −n + i(sin π yn − cos π yN −n )
� 2N
n � � n 2N� 2N 2N
=(cos π + i sin π )(yn − iyN −n ) = eπin/(2N ) (yn − iyN −n ).
2N 2N
This means that (FN x(1) )n = 1
cn,N e
πin/(2N )
(yn − iyN −n ) for n ≥ 1. For n = 0,
since �((FN x(1) )n ) = 0 we have that (FN x(1) )0 = c0,N 1
y0 . This means that
x(1) can be recovered by taking the IDFT of the vector with component 0 being
c0,N y0 = y0 , and the remaining components being cn,N e
1 1 πin/(2N )
(yn − iyN −n ):

Theorem 4.9 (IDCT algorithm). Let x = (DN )T y be the IDCT of y. and let
z be the vector with component 0 being c0,N
1
y0 , and the remaining components
being 1
cn,N e
πin/(2N )
(yn − iyN −n ). Then we have that

x(1) = (FN )H z,

where x(1) is defined as in Theorem 4.8.

The implementation of IDCT can thus go as follows:


function x = IDCTImpl(y)
N = length(y);
if N == 1
x = y(1);

118
else
Q=exp(pi*1i*((0:(N-1))’)/(2*N));
Q(2:N)=Q(2:N)/sqrt(2);
yrev=y(N:(-1):2);
toapply=[ y(1); Q(2:N).*(y(2:N)-1i*yrev) ];
x1=IFFTImpl(toapply);
x=zeros(N,1);
x(1:2:(N-1))=x1(1:(N/2));
x(2:2:N)=x1(N:(-1):(N/2+1));
end

MATLAB also has a function for computing the DCT and IDCT, called dct,
and idct. These functions are defined in MATLAB exactly as they are here,
contrary to the case for the FFT.

4.2.2 Reduction in the number of multiplications with the


DCT
Let us also state a result which confirms that the DCT and IDCT implementa-
tions we have described give the same type of reductions in the number multi-
plications as the FFT and IFFT:

Theorem 4.10 (Number of multiplications required by the DCT and IDCT


algorithms). Both the N -point DCT and IDCT factorizations given by Theo-
rem 4.8 and Theorem 4.9 require O(2(N + 1) log2 N ) real multiplications. In
comparison, the number of real multiplications required by a direct implemen-
tation of the N -point DCT and IDCT is N 2 .

Proof. By Theorem 4.7, the number of multiplications required by the FFT is


O(2N log2 N ). By Theorem 4.8, two additional multiplications are needed for
each index, giving additionally 2N multiplications in total, so that we end up
with O(2(N + 1) log2 N ) real multiplications. For the IDCT, note first that
the vector z = cn,N1
eπin/(2N ) (yn − iyN −n ) seen in Theorem 4.9 should require
4N real multiplications to compute. But since the IDFT of z is real, z must
have conjugate symmetry between the first half and the second half of the
coefficients, so that we only need to perform 2N multiplications. Since the
IFFT takes an additional O(2N log2 N ) real multiplications, we end up with
a total of O(2N + 2N log2 N ) = O(2(N + 1) log2 N ) real multiplications also
here. It is clear that the direct implementation of the DCT and IDCT needs
N 2 multiplications, since only real arithmetic is involved.

Since the DCT and IDCT can be implemented using the FFT and IFFT,
it has the same advantages as the FFT when it comes to parallel computing.
Much literature is devoted to reducing the number of multiplications in the
DFT and the DCT even further than what we have done. In the next section

119
we will show an example on how this can be achieved, with the help of extra
work and some tedious math. Some more notes on computational complexity
are in order. For instance, we have not counted the operations sin and cos in
the DCT. The reason is that these values can be precomputed, since we take the
sine and cosine of a specific set of values for each DCT or DFT of a given size.
This is contrary to to multiplication and addition, since these include the input
values, which are only known at runtime. We have, however, not written down
that we use precomputed arrays for sine and cosine in our algorithms: This is
an issue to include in more optimized algorithms. Another point has to do with
multiplication of √1N . As long as N = 22r , multiplication with N need not be
considered as a multiplication, since it can be implemented using a bitshift.

4.2.3 *An efficient joint implementation of the DCT and


the FFT
We will now present a more advanced FFT algorithm, which will turn out to
decrease the number of multiplications and additions even further. It also has
the advantage that it avoids complex number arithmetic altogether (contrary to
Theorem 4.1), and that it factors the computation into smaller FFTs and DCTs
so that we can also use our previous DCT implementation. This implementation
of the DCT and the DFT is what is mostly used in practice. For simplicity we
will drop this presentation for the inverse transforms, and concentrate only on
the DFT and the DCT.

Theorem 4.11 (Revised FFT algorithm). Let y = FN x be the N -point DFT


of the real vector x. Then we have that
 1 (e)
 √2 �((FN/2 x )n ) + (EDN/4 z)n
 0 ≤ n ≤ N/4 − 1
1
√ �((FN/2 x(e) )n )
�(yn ) = 2
n = N/4

 √1 �((F (e)
2 N/2 x ) n ) − (ED z)
N/4 N/2−n N/4 + 1 ≤ n ≤ N/2 − 1
(4.8)
 1 (e)
 2 �((FN/2 x )n )
 √ q=0
�(yn ) = √1 �((FN/2 x(e) )n ) + (EDN/4 w)N/4−n 1 ≤ n ≤ N/4 − 1
 2
 √1 �((F x(e)
) ) + (ED w) N/4 ≤ n ≤ N/2 − 1
2 N/2 n N/4 n−N/4
(4.9)

where x(e) is as defined in Theorem 4.1, where z, w ∈ RN/4 defined by

zk = x2k+1 + xN −2k−1 0 ≤ k ≤ N/4 − 1,


k
wk = (−1) (xN −2k−1 − x2k+1 ) 0 ≤ k ≤ N/4 − 1,

and where E is a diagonal matrix with diagonal entries E0,0 = 1


2 and En,n =
1

2 2
for n ≥ 1.

120
Proof. Taking real and imaginary parts in (4.1) we obtain
1 1
�(yn ) = √ �((FN/2 x(e) )n + √ �((DN/2 FN/2 x(o) )n )
2 2
1 1
�(yn ) = √ �((FN/2 x(e) )n + √ �((DN/2 FN/2 x(o) )n ).
2 2
These equations explain the first parts on the right hand side in (4.8) and (4.9).
Furthermore, for 0 ≤ n ≤ N/4 − 1 we can write

�((DN/2 FN/2 x(o) )n )


N/2−1
1 �
=� �(e−2πin/N (x(o) )k e−2πink/(N/2) )
N/2 k=0
N/2−1
1 � 1
=� �( (x(o) )k e−2πin(k+ 2 )/(N/2) )
N/2 k=0
N/2−1
� � �
1 n(k + 12 )
=� (x(o) )k cos 2π
N/2 k=0 N/2
N/4−1
� � �
1 n(k + 12 )
=� (x(o) )k cos 2π
N/2 N/2
k=0
N/4−1
� � �
1 n(N/2 − 1 − k + 12 )
+� (x(o) )N/2−1−k cos 2π
N/2 k=0 N/2
N/4−1
� � ��
1 1 � n k + 21
(o) (o)
=√ � ((x )k + (x )N/2−1−k ) cos 2π
2 N/4 N/2
k=0
= (E0 DN/4 z)n ,

where we have used that cos is periodic with period 2π and symmetric, where z is
the vector defined in the text of the theorem, where we have recognized the DCT
matrix, and where E0 is a diagonal matrix with diagonal entries (E0 )0,0 = √12
and (E0 )n,n = 12 for n ≥ 1 (E0 absorbs the factor √ 1 , and the factor dn,N
N/2
from the DCT). By absorbing the additional factor √12 , we get a matrix E as
stated in the theorem. For N/4 + 1 ≤ n ≤ N/2 − 1, everything above but the
last statement is valid. We can now use that
� � � � �� ��
N
n(k + 12 ) 2 −n k + 12
cos 2π = − cos 2π
N/2 N/2

to arrive at −(E0 DN/4 z)N/2−n instead. For the case n = N4 all the cosine
entries are zero, and this completes (4.8). For the imaginary part, we obtain as

121
above

�((DN/2 FN/2 x(o) )n )


N/4−1
� � �
1 n(k + 12 )
=� ((x(o) )N/2−1−k − (x(o) )k ) sin 2π
N/2 N/2
k=0
N/4−1
� � �
1 (o) (o) k (N/4 − n)(k + 12 )
=� ((x )N/2−1−k − (x )k )(−1) cos 2π .
N/2 N/2
k=0

where we have used that sin is periodic with period 2π and anti-symmetric, that
� � � �
n(k + 12 ) π n(k + 12 )
sin 2π = cos − 2π
N/2 2 N/2
� �
(N/4 − n)(k + 12 )
= cos 2π − kπ
N/2
� �
(N/4 − n)(k + 12 )
= (−1)k cos 2π ,
N/2

When n = 0 this is 0 since all the cosines entries are zero. When 1 ≤ n ≤ N/4
this is (E0 DN/4 w)N/4−n , where w is the vector defined as in the text of the
theorem. For N/4 ≤ n ≤ N/2 − 1 we arrive instead at (E0 DN/4 z)n−N/4 ,
similarly to as above. This also proves (4.9), and the proof is done.
As for Theorem 4.1, this theorem says nothing about the coefficients yn for
n > N2 . These are obtained in the same way as before through symmetry. The
theorem also says nothing about yN/2 . This can be obtained with the same
formula as in Theorem 4.1.
It is more difficult to obtain a matrix interpretation for Theorem 4.11, so
we will only sketch an algorithm which implements it. The following code
implements the recursive formulas for �FN and �FN in the theorem:
function y = FFTImpl2(x)
N = length(x);
if N == 1
y = x;
elseif N==2
y = 1/sqrt(2)*[x(1) + x(2); x(1) - x(2)];
else
xe = x(1:2:(N-1));
xo = x(2:2:N);
yx = FFTImpl2(xe);

z = x(N:(-2):(N/2+2))+x(2:2:(N/2));
dctz = DCTImpl(z);
dctz(1)=dctz(1)/2;

122
dctz(2:length(dctz)) = dctz(2:length(dctz))/(2*sqrt(2));

w = (-1).^((0:(N/4-1))’).*(x(N:-2:(N/2+2))-x(2:2:(N/2)));
dctw = DCTImpl(w);
dctw(1)=dctw(1)/2;
dctw(2:length(dctw)) = dctw(2:length(dctw))/(2*sqrt(2));

y = yx/sqrt(2);
y(1:(N/4))=y(1:(N/4))+dctz;
if (N>4)
y((N/4+2):(N/2))=y((N/4+2):(N/2))-dctz((N/4):(-1):2);
y(2:(N/4))=y(2:(N/4))+1j*dctw((N/4):(-1):2);
end
y((N/4+1):(N/2))=y((N/4+1):(N/2))+1j*dctw;
y = [y; ...
sum(xe-xo)/sqrt(N); ...
conj(y((N/2):(-1):2))];
end

In addition, we need to change the code for DCTImpl so that it calls FFTImpl2
instead of FFTImpl. The following can now be shown:

Theorem 4.12 (Number of multiplications required by the revised FFT algo-


rithm). Let MN be the number of real multiplications required by the revised
algorithm of Theorem 4.11. Then we have that MN = O( 23 N log2 N ).

This is a big reduction from the O(2N log2 N ) required by the FFT algorithm
from Theorem 4.1. We will not prove Theorem 4.12. Instead we will go through
the steps in a proof in Exercise 3. The revised FFT has yet a bigger advantage
that the FFT when it comes to parallel computing: It splits the computation
into, not two FFT computations, but three computations (one of which is an
FFT, the other two DCT’s). This makes it even easier to make use of many
cores on computers which have support for this.

Exercises for Section 4.2

Ex. 1 — Write a function


function samples=DCTImpl(x)
which returns the DCT of the column vector x ∈ R2N as a column vector. The
function should use the FFT-implementation from the previous section, and the
factorization C = E −1 AF B from above. The function should not construct the
matrices A, B, E explicitly.

123
Ex. 2 — Explain why, if FFTImpl needs MN multiplications AN additions,
then the number of multiplications and additions required by DCTImpl are MN +
2N and AN + N , respectively.

Ex. 3 — In this exercise we will compute the number of real multiplications


needed by the revised N -point FFT algorithm of Theorem 4.11, denoted MN .
a. Explain from the algorithm of Theorem 4.11 that

MN = 2(MN/4 + N/2) + MN/2 = N + MN/2 + 2MN/4 . (4.10)

b. Explain why xr = M2r is the solution to the difference equation

xr+2 − xr+1 − 2xr = 4 × 2r .

c. Show that the general solution to the difference equation is


2 r
xr = r2 + C2r + D(−1)r .
3

d. Explain why MN = O( 23 N log2 N ) (you do not need to write down the


initial conditions for the difference equation in order to find the particular
solution to it).

4.3 Summary
We obtained an implementation of the DFT which is more efficient in terms of
the number of arithmetic operations than a direct implementation of the DFT.
We also showed that this could be used for obtaining an efficient implementation
of the DCT.

124

You might also like