0% found this document useful (0 votes)
26 views7 pages

155.FFT Ropec

Uploaded by

Đạt Vũ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

155.FFT Ropec

Uploaded by

Đạt Vũ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/304297645

FPGA design and implementation of radix-2 Fast Fourier Transform algorithm


with 16 and 32 points

Conference Paper · November 2015


DOI: 10.1109/ROPEC.2015.7395113

CITATIONS READS
16 18,326

5 authors, including:

Juan Raygoza Panduro Jorge Rivera Dominguez


University of Guadalajara Cinvestav guadalajara
71 PUBLICATIONS 375 CITATIONS 108 PUBLICATIONS 1,310 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Juan Raygoza Panduro on 17 September 2020.

The user has requested enhancement of the downloaded file.


FPGA Design and Implementation of Radix-2 Fast
Fourier Transform Algorithm with 16 and 32 Points.

Josue Saenz S., Juan J. Raygoza P., Edwin C. Susana Ortega Cisneros, Jorge Rivera Dominguez.
Becerra A. Electronic Design Department.
Electronic and Computer Division. CINVESTAV, Unidad Guadalajara.
Universidad de Guadalajara, CUCEI. Zapopan, México.
Guadalajara, México. [email protected],
[email protected], [email protected].
[email protected],
[email protected].

Abstract— The Fast Fourier Transform (FFT) is an important Where WN = e-j2π/N.


algorithm used in the field of Digital Signal Processing and
Communication Systems. The FFT has applications in a wide The DFT is identical to samples of the Fourier Transform at
variety of areas, such as linear filtering, correlation, and spectrum equally spaced frequencies. Consequently, computation of N-
analysis, among many others. This paper describes the point DFT corresponds to that of N samples of the Fourier
development of decimation-in-time radix-2 FFT algorithm with 16 Transform and N equally spaced frequencies WN [4, 5]. For a
and 32 points. VHDL was used as a description language, and ISE complex-valued sequence x(n) of N points, the DFT may be
Design Suite as an Integrated Development Environment (IDE). expressed as:
N-1
Keywords—DIT, FFT, FPGA, Butterfly, VHDL. 2πkn 2πkn (2)
XR (k)= ∑ [xR (n) cos +xI (n) sin ]
I. INTRODUCTION N N
n=0
The Discrete Fourier Transform (DFT) is one of the most N-1
common operations in the field of signal processing. The 2πkn 2πkn (3)
XI (k)=- ∑ [xR (n) sin -xI (n) cos ]
algorithm for computing the DFT effectively is known as Fast N N
n=0
Fourier Transform [1]. FFT is widely used in many fields, such
as communication, image processing, radar signal processing, The direct computation of (2) and (3) requires 2N2
sonar, biomedicine, physics mathematics, and others [2]. Direct evaluations of trigonometric functions, 4N2 real multiplications,
implementation of DFT has O(N2) complexity, which reduces to 4N(N-1) real additions, and a number of indexing and
O(Nlog2N) through FFT. The Cooley-Tukey algorithm, usually addressing operations. These operations are typical of DFT
called the Fast Fourier Transform, is a collection of algorithms computational algorithms [6].
for quicker calculation of the DFT. The FFT transforms a B. Radix-2 Fast Fourier Transform.
waveform into a series of sines and cosines at each frequency
present in the original signal [3]. Radix-2 algorithms are the simplest FFT algorithms. The
decimation-in-time (DIT) radix-2 FFT divides the DFT
FPGA co-processors have become an extremely cost- computation into even and odd indexed outputs. Each can be
effective means of off-loading computationally intensive calculated by shorter-length DFT’s of different combinations of
algorithms in order to improve overall system performance. A input samples [7]. This is shown in equation (4), where the first
major advantage of the advanced technology FPGA nodes is that part of the equation is the even samples and the second part is
they can achieve higher performance or throughput, while the odd samples.
having more flexibility, faster design time, and lower cost [3]. It
(N⁄2)-1 (N⁄2)-1
is for this reason that FPGA’s are becoming more and more
(2m+1)k
attractive for computationally intensive FFT based complex X(k)= ∑ x(2m)W2mk
N + ∑ x(2m+1)WN (4)
processing applications and are used as a development platform m=0 m=0
in the present work.
Let us consider the computation of the N = 2v point DFT by
II. ALGORITHM THEORY means of the divide-and-conquer approach. We split the N-point
data sequence into two N/2-point data sequences F1(k) and F2(k),
A. Discrete Fourier Transform. corresponding to the even-numbered and odd-numbered
Discrete Fourier Transform can be defined as: samples of x(n), respectively.
N-1 Since F1(k) and F2(k) are periodic, with period N/2, we have
X(k)= ∑ x(n)Wkn (1) F1(k+N/2) = F1(k) and F2(k+N/2) = F2(k). In addition, the factor
N
WNk+N/2 = - WNk. Hence, (4) can be expressed as:
k=0

978-1-4673-7121-6/15/$31.00 ©2015 IEEE


𝑋(𝑘) = 𝐹1 (𝑘) + 𝑊𝑁𝑘 𝐹2 (𝑘), 𝑘 = 0,1, … , 𝑁/2 − 1 (5) The last defined function was “mult”; this function
multiplies two complex numbers, basically consisting of
𝑋(𝑘 + 𝑁/2) = 𝐹1 (𝑘) − 𝑊𝑁𝑘 𝐹2 (𝑘), 𝑘 = 0,1, … , 𝑁/2 − 1 (6) assuming two complex numbers, “a” and “b”, in the following:
The decimation of the data sequence can be reduced to one- 𝑟𝑒𝑎𝑙 = (𝑎 ∗ 𝑏) − (𝑎𝑗 ∗ 𝑏𝑗) (11)
point sequences [6, 8].
𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦 = (𝑎 ∗ 𝑏𝑗) + (𝑎𝑗 ∗ 𝑏) (12)
In order to be consistent with the common notation, we
define: 𝑐𝑜𝑚𝑝𝑙𝑒𝑥 = (𝑟𝑒𝑎𝑙, 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦) (13)
𝐺1 (𝑘) = 𝐹1 (𝑘), 𝑘 = 0,1, … , 𝑁/2 − 1 (7) For the “mult” function, the fact that the multiplication of
two 16-bit binary numbers results in a 32-bit binary number and
𝐺2 (𝑘) = 𝑊𝑁𝑘 𝐹2 (𝑘), 𝑘 = 0,1, … , 𝑁/2 − 1 (8) that the twiddle factor (WN) is a fractional number was
Then the DFT X(k) may be expressed as: considered. Then, the multiplication results taken into account
were from 29 to 14 bits in order to maintain the 16-bit
𝑋(𝑘) = 𝐺1 (𝑘) + 𝐺2 (𝑘), 𝑘 = 0,1, … , 𝑁/2 − 1 (9) representation and continue working with integer numbers.
𝑋(𝑘 + 𝑁/2) = 𝐺1 (𝑘) − 𝐺2 (𝑘), 𝑘 = 0,1, … , 𝑁/2 − 1 (10) C. FFT implementation.
Observe that the basic computational method is to take two 16-point FFT has “signed_vector” as an input data type, but
complex numbers, i.e., the pair (a, b), multiply “b” by WN, and the FFT butterflies block work with “complex_array” data type;
then add and subtract the product from “a” to form two new therefore, a component was created that converted the
complex numbers (A, B). This basic computation, shown in Fig. “signed_vector” input to “complex_array”. If the MSB bit is
1, is called a butterfly. “1”, the 8 bit format expansion is filled with “11111111” and if
the MSB bit is “0”, the 8 bit format expansion is filled with
“00000000”.
As the input is composed of real numbers, there is no
imaginary part, and is therefore defined as zero.
To the complex output of this block, an inversion of bits is
Fig. 1. Basic butterfly in the DIT FFT algorithm.
applied, resulting in the converted input indexes in the following
order: s(0), s(8), s(4), s(12), s(2), s(10), s(6), s(14), s(1), s(9),
The N-point DFT computation through DIT FFT algorithm s(5), s(13), s(3), s(11), s(7), s(15). In order to process the input,
requires (N/2)log2N complex multiplications and Nlog2N a component called “mariposa” was created, based on butterfly
complex additions. operation, as shown in fig. 1. Each butterfly block receives two
inputs, e.g., the first block gets s(0) and s(8) inputs, the next
III. DESIGN block s(4) and s(12) inputs, and so on, for 16-point FFT, 4 stages
For the Fast Fourier Transform algorithm implementation, the of butterflies are needed, as shown in Fig. 2.
first challenge was that VHDL does not work “natively” with S(0) 2-point Y(0)

complex numbers. First, a package was created where the type S(8) DFT
Combine
Y(1)

complex was defined, and addition, subtraction, and S(4) 2-point


2-point DFT
Y(2)
S(12) DFT
multiplication operations for complex numbers were added.
Y(3)
Combine
4-point DFT
S(2) 2-point Y(4)
S(10) DFT Y(5)
Combine
A. Complex numbers and data types. S(6) 2-point
2-point DFT
Y(6)
Real to S(14) DFT
For the representation of complex numbers, a data type
Y(7)
Combine
Complex
8-point DFT
called “complex” was created, which consists of two registers, Converter S(1) 2-point Y(8)
S(9) DFT
one representing the real part and another the imaginary part of
Y(9)
Combine
2-point DFT
complex numbers. Every register is declared as 16-bit signed S(5) 2-point Y(10)
S(13) DFT Y(11)
numbers. Similarly, two array types were defined: an 8-bit Combine
4-point DFT
signed numbers vector with 16 components, called S(3)
S(11)
2-point
DFT
Y(12)
Y(13)
“signed_vector” and a 32-bit complex numbers array with 16 Combine
2-point DFT
components called “complex_array”. In 32-point FFT, arrays S(7)
S(15)
2-point
DFT
Y(14)
Y(15)
were extended to 32 components.
Fig. 2. 16-point DIT FFT algorithm block diagram.
B. Addition, Subtraction and Multiplication Functions.
Each butterfly stage uses different twiddle factors (WN)
In the same package, three functions were defined in order combinations. The twiddle factor is a constant value, so it was
to perform the needed FFT operations. The “add” function adds pre-calculated separately and saved in a constant of
two complex numbers; this function adds the real and imaginary “complex_array” type. The twiddle factor is a fractional number
parts separately, taking advantage of “complex” data type as that is calculated with the following equivalence WN = e-j2π/N =
complex numbers components are in different registers. The cos(2π/N) – j*sin(2π/N), and represented in 16-bit signed
“sub” function subtracts two complex numbers and operates as numbers; bit 15 represents the sign, bit 14 represents the integer
the “add” function. part, and bits 13 to 0 represent the fractional part.
The twiddle factor is multiplied by the butterfly inputs and A problem that had to be solved in FFT implementation was
due to the way that “mult” function was constructed, the result that the bonded IOB’s (Input/Output Block) usage was too high,
of the multiplication only considers the integer part. In this way, so in order to decrease the usage, shift registers were added to
we continue working with integers but we obtain accuracy from its inputs and outputs. For serial input, SIPO (Serial In Parallel
the twiddle factor, and therefore the result as well. Out) type shift registers were used and for serial output, PISO
(Parallel In Serial Out) type shift registers were used. In Fig. 6,
In Fig. 3, the 16-point radix-2 FFT algorithm diagram is the FFT component’s RTL diagram with input and output
shown, where the WN used in each butterfly stage is indicated. registers is shown. This was applied in both 16 and 32-point
algorithm implementation.
Real to Complex 1st Stage 2nd Stage 3rd Stage 4th Stage 5th Stage
Converter

Fig. 3. 16-point DIT FFT algorithm diagram.

In Fig. 4, the RTL diagram generated by ISE Design Suite is


shown, where the first component (left to right) is the real to
complex converter and the following components are the
butterfly stages. Eight butterfly blocks per stage were used,
corresponding to the FFT diagram in Fig. 2.
Real to Complex
Converter
1st Stage 2nd Stage 3rd Stage 4th Stage

Fig. 5. 32-point FFT algorithm implemented RTL diagram.

Fig. 4. 16-point FFT algorithm implemented RTL diagram.

For a shorter explanation, only the 16-point FFT algorithm


implementation was developed; for a 32-point FFT algorithm,
the procedure is similar to 16-point FFT algorithm. The RTL Fig. 6. FFT RTL diagram wit input and output registers.
(Register-Transfer Level) diagram for 32-point FFT algorithm
is shown in Fig. 5; for this algorithm, five butterfly stages are
needed.
IV. SIMULATION AND RESULTS In Fig. 11 and Fig. 12, the serial output obtained by adding
The implemented design was tested with a signed number shift registers to a 16-point FFT design in order to reduce the
signal with ladder form. The process latency is 39 ns for 16-point IOB utilization is shown. In Fig. 11, the real part output is shown
FFT implementation and 52.3 ns for 32-point FFT marked by number 5 in a serial representation. Every 20 ns a bit
implementation. can be sampled. Mark number 1 is the “ctrl” signal used to load
the data into the SIPO shifter registers, mark number 2 is the
The result obtained in the 16-point FFT algorithm simulation input data that enters as a serial signal, mark number 3 is the
is shown in Fig. 7. The simulation is displayed in binary radix. clock signal that is used to sample the output signal at 50 MHz,
In Fig. 7, mark number 1 is the binary representation of a real mark number 4 is the "load" signal used to release the output,
number input and mark number 2 the binary complex output. It and in Fig. 12, the imaginary part output is marked by number
can be observed that each sample has two binary numbers 1, which has the same serial representation as its real
separated by a comma, the left number being the real part and counterpart. In a 32-point FFT, twice the amount of shift
right number the imaginary part of the complex output. In Fig. registers were implemented, since more samples are needed in
8, the same simulation is shown, but displayed in signed number FFT and IOB utilization increases.
radix, and mark number 1 is the signed output and mark number
2 the complex output. In Fig. 8, the same simulation is shown, In Table II, a 16-point FFT algorithm utilization with parallel
but displayed in signed number radix, and mark number 1 is the inputs and outputs is shown, and in Table IV a 16-point FFT
signed output and mark number 2 the complex output. The algorithm utilization with serial inputs and outputs is shown. In
aforementioned results were compared with the result of the 16- Table IV the number of IOB usage decreases and the number of
point FFT algorithm implementation in MATLAB with the LUT’s increase by the added registers, in comparison with Table
same input sequence, and the result obtained in MATLAB is II. Because of this, we can implement the FFT algorithm with a
shown in Table I. Fig. 9 and Fig. 10 are counterparts to Fig. 7 higher number of samples and not exceed the Virtex 6 resources.
and Fig. 8, respectively, for 32-point algorithm simulation. In In Tables II and IV, the 32-point FFT algorithm utilization is
Table III the results of the 32-point FFT algorithm shown, with parallel inputs and outputs and serial inputs and
implementation obtained via MATLAB are shown. outputs, respectively. The same behavior can be seen in 16-point
FFT algorithm. In Table II, we can observe how the utilization
nearly doubles as we increase the number of samples by 2.

Fig. 7. 16-point FFT simulation binary results.

Fig. 10. 32-point FFT simulation signed numbers results.

Fig. 8. 16-point FFT simulation signed numbers results.

Fig. 11. Serial real output of 16-point FFT algorithm.

Fig. 9. 32-point FFT simulation binary results.


TABLE II. 16 AND 32-POINT FFT ALGORITHM VIRTEX 6 UTILIZATION.
Device Utilization Summary
Used Utilization
Logic Utilization
16 Pt. 32 Pt. 16 Pt. 32 Pt.
Number of Slice 2048 5120 0% 1%
Number of fully used LUT-FF pairs 0 0 0% 0%
Number of bonded IOB’s 640 1280 53% 106%
Number of DSP48E1’s 128 320 14% 37%

TABLE III. 32-POINT FFT ALGORITHM RESULT IN MATLAB.


Input Output Value
y(0) 80.0000000000000 + 0.00000000000000i
y(4) -80.0000000000000 + 193.137084989848i
Fig. 12. Serial imaginary output of 16-point FFT algorithm. y(6) -80.0000000000000 + 119.728461013239i
y(8) -80.0000000000000 + 80.0000000000000i
V. CONCLUSIONS y(11) -80.0000000000000 + 42.7608908760633i
This paper explores the Fast Fourier Transform design in y(13) -80.0000000000000 + 24.2677346885874i
y(14) -80.0000000000000 + 15.9129893903726i
VHDL for its implementation in FPGA. This algorithm
y(16) -80.0000000000000 + 0.00000000000000i
improves computing time compared with a directly applied DFT y(18) -80.0000000000000 - 15.9129893903726i
algorithm. For this reason, this algorithm is widely used in many y(19) -80.0000000000000 - 24.2677346885874i
fields today. We can conclude that this paper shows a technique y(20) -80.0000000000000 - 33.1370849898476i
for implementing complex numbers needed to perform FFT y(23) -80.0000000000000 - 65.6543032662929i
operations in VHDL. In comparison with other proposed y(24) -80.0000000000000 - 80.0000000000000i
circuits, as in [5], the circuit presented in this paper uses a lower y(26) -80.0000000000000 - 119.728461013239i
area; furthermore, the parallel data input and output increase the y(29) -80.0000000000000 - 263.724656715066i
y(30) -80.0000000000000 - 402.187159370068i
I/O blocks that are used, the 32-point algorithm exceeds the I/O
blocks available in a Virtex 6, because of this, PISO and SIPO
TABLE IV. 16 AND 32-POINT FFT ALGORITHM VIRTEX 6 UTILIZATION
shift registers are used to serialize the data input and output, WITH REGISTERS.
decreasing the usage of I/O blocks. In addition, the operating
frequency of 16-point algorithm is 25 MHz and for 32-point Device Utilization Summary
Used Utilization
algorithm is 19 MHz. Logic Utilization
16 Pt. 32 Pt. 16 Pt. 32 Pt.
Future works may include the increased accuracy with a Number of Slice 2402 5620 0% 1%
floating point unit that performs addition, subtraction, and Number of fully used LUT-FF pairs 295 545 12% 9%
Number of bonded IOB’s 51 99 4% 8%
multiplication; the replacement of shift registers with a RAM; Number of DSP48E1’s 128 320 14% 37%
the increase of the number of samples; and the exploration of
other algorithms with different radices. REFERENCES
[1] S. Qadeer, M.Z.A. Khan, “A Radix-2 DIT FFT with reduced arithmetic
TABLE I. 16-POINT FFT ALGORITHM RESULT IN MATLAB.
complexity”. International Conference on Advances in Computing,
Input Output Value Communications and Informatics (ICACCI). 24-27 Sept. 2014. pp. 1892-
y(0) 680.000000000000 + 0.00000000000000i 1896.
y(1) -40.0000000000000 + 201.093579685034i [2] X. Sun, D. Qiu, “An Implementation of FFT Processor”. Radar
y(2) -40.0000000000000 + 96.5685424949238i Conference. 14-16 April 2013. pp. 1-4.
y(3) -40.0000000000000 + 59.8642305066196i
y(4) -40.0000000000000 + 40.0000000000000i [3] S. K. Shome, A. Ahesh. “Architectural design of a highly programmable
y(5) -40.0000000000000 + 26.7271455167720i Radix-2 FFT processor with efficient addressing logic”. International
y(6) -40.0000000000000 + 16.5685424949238i Conference on Devices, Circuits and Systems (ICDCS). 15-16 March
y(7) -40.0000000000000 + 7.95649469518632i 2012. pp. 516-521.
y(8) -40.0000000000000 + 0.00000000000000i [4] M. Aravind, K. Manjunatha, “Complex-multiplier implementation for
y(9) -40.0000000000000 - 7.95649469518631i pipelined FFTs in FPGAs”. International Conference on Signal
y(10) -40.0000000000000 - 16.5685424949238i Processing and Communication Engineering Systems (SPACES). 2-3
y(11) -40.0000000000000 - 26.7271455167720i Jan. 2015. pp. 137-141.
y(12) -40.0000000000000 - 40.0000000000000i [5] A. Haveliya, “Design and Simulation of 32-Point FFT Using Radix-2
y(13) -40.0000000000000 - 59.8642305066196i Algorithm for FPGA Implementation”. Second International Conference
y(14) -40.0000000000000 - 96.5685424949238i on Advanced Computing & Communication Technologies (ACCT). 7-8
y(15) -40.0000000000000 - 201.093579685034i Jan. 2012. pp. 167-171.
[6] J. G. Proakis, D. G. Mamolakis, “Tratamiento Digital de Señales”. Cuarta
Edición. 2007. PEARSON Educación S.A. Madrid. pp.458-473.
[7] R. Bhakthavatchalu, A. Kripalal, “Modified FPGA based design and
implementation of reconfigurable FFT architecture”. International Multi-
Conference on Automation, Computing, Communication, Control and
Compressed Sensing (iMac4s). 22-23 March 2013. pp. 818-822.
[8] E. Joseph, A. Rajagopal, “FPGA implementation of Radix-2 FFT
processor based on Radix-4 CORDIC”. Nirma University International
Conference on Engineering (NUiCONE). 6-8 Dec. 2012. Pp. 1-6.

View publication stats

You might also like