Design and Implementation of A 1024-Point
Design and Implementation of A 1024-Point
Abstract: To design a Fast Fourier Transform (FFT) processor to Combining the advantages of flexible programming in the
meet the needs for high-speed and real-time signal processing. A software and high-speed by the dedicated integrated circuit,
1024-point, 32-bit, fixed, complex FFT processor is designed based FPGA is suitable for the FFT algorithm and has superiorities in
on a field programmable gate array (FPGA) by using the radix-2 performance, costing and power consumption. It is realized by
decimation in frequency (DIF) algorithm and the pipeline the related EDA (Electronic Design Automation) software and
structure in the butterfly module and the ping-pone operation in hardware description language. In modern signal processing,
data storage unit. When the primary clock is 100 MHz, the the requirements of high-speed and high-reliability are
1024-point FFT calculation takes about 62.95 us. The processor is becoming a hot research point. Furthermore, many researchers
fast enough for processing the high-speed and real time signals.
are researching in combining the real time requirement of FFT
The result provides reference values that theoretical study of the
FFT algorithm can be applied into the adaptive dynamic filter of
with the flexibility design by the FPGA, realizing the optimal
an ultrasonic diagnostic system and an ultrasonic Doppler flow configuration of the parallel algorithm and hardware structure,
measurement system. and improving the FFT processing speed.
According to the actual requirements, this paper presents a
Key words— field programmable gate array; 1024-point FFT; method which realizes the FFT operation based on the FPGA.
Butterfly; Ping-pong operation; Verilog HDL The FFT synthesis and simulation are realized on the Quartus
II 6.0 and Modlesim 6.0 software. Although at last we only
I. INTRODUCTION
realize the 1024-point fixed-point arithmetic, this method can
FFT (Fast Fourier Transform Algorithm) is a fast algorithm be also applied to 2M (M=2,3…) points arithmetic.
of DFT (Discrete Fourier Transformation). A real time
processing of digital signal was becomes realization when the II. THE BASIC PRINCIPLE OF THE RADIX-2-DIF FFT
FFT was appeared, and which was used in many domains such ARITHMETIC
as the ultrasound, the radar and the communication. In the The FFT arithmetic can be divided into two types basically,
process of ultrasonic echo signals processing, the FFT is a which is the decimation-in-time (DIT) and the
most important part. Its computing speed and accuracy are decimation-in-frequency (DIF). This radix-2-DIF is adopted in
directly influence the performance of the system. FFT is a this paper. The discrete Fourier transformation of the
centre part of the Doppler blood flow spectrum analysis, and is sequences x(n) is written as,
the most commonly used signal processing method in the
Doppler imaging system. −j
2π
N −1 WN = e N
Although the Altera and Xilinix have developed the X ( k ) = ∑ x( n)WNkn k, n = 0,1,2..., N −1
n=0
corresponding FFT IP (Intellectual Property) core, the price of
them is so high that they are not used extensively. As a result, In which N is the sequence length. Then the x(n) is divided
many researchers now design the FFT processors which are into two parts, which is include of the odd number part and the
more suitable for themselves. In the engineering realization of even number part. By this method, the N/2-point DFT can be
FFT, the most generally hardware realization methods are continued to divided into the N/4-point DFT, the N/8-point
included of DSP (Digital Signal Processor), FFT dedicated DFT and lastly obtained the 2-point DFT [1]. Figure 1 the
chip and FPGA [1]. The FFT operation in DSP not only frequency decimation operation diagram where N=8.
occupies much more time, but also reduces the throughput rate
of the whole system, and furthermore, the realization
flexibility of the DSP software can not make well. By using
dedicated FFT processing chip, the speed can achieve the
requirement, but has bad expansibility. With the rapid
development of the FPGA recent years, and comparing with
the DSP technique, the FPGA is suitable for the high-speed
signal processing system owing to its parallel signal processing
architecture.
Support by the National Science and Technology Pillar Program (2012BAI13B02), the
Fundamental Research Funds for the Central Universities and the PUMC Youth Fund
1113
typical Ping-pong operation module [7-8].RAM 1 and RAM 2 operation addresses generating procedure for FFT is
are used for storing the intermediate data in which one RAM is completed.
used as a reading block and the other is used as a writing
block. In the first step of FFT, the data reading from RAM 1 The logic parts of the addresses generator is composed of
are saved into RAM 2 after the butterfly operation element. upper and lower notes and the address logics of twiddle factors
The second step is just the opposite. In the same way until the which are A_address, B_address and rom_address
tenth step is implemented. Figure 5 is the Ping-pong operation respectively.
diagram. E. The sequential control unit
The RAM we adopted is using True DualPort Memory The sequential control unit is used to generate control
module. The reading and writing operations with random signals for each model. Its main task is exactly to coordinate
addresses can be implemented under independent working work and consequently accomplish the whole procedure of the
clock in each side. It is meaning that, the data can be read out FFT operation. In according to many processing elements
from two RAMs, for the butterfly operation and the results can related to FFT operation and the complex controlling, in this
write into two RAMs in the same time. This method greatly unit, we adopt the modeling of Mealy limited state machine.
reduce the writing and reading cycle compared to single port
RAM. The function of the sequential control unit is described as:
first, enter the initial state, and when the external reset signal is
For using the dual-port RAM, when one storage element is valid, all of the functional modules are reset; when the enable
used for data reading output, this storage element can save the signal is in valid, the inputting control unit is in inputting state
data from the upper level. The pipeline structure is suitable for and generate the enable signal to receive the data which is
this FFT structure, and can realize the continuous operations to intend to operate; the following is the butterfly operation state;
signal samples. The intermediate data are saved to a dual-port when the fix-point operation is completed, enter the output
RAM. Then the two complex data from the RAM are split to control state and send out the output enable signal; when the
four real data and then are inputted to the butterfly operation intermediate operation data are read from the RAM, the
element. After the butterfly operation, the data are input into completed signal is generated if all of the data is outputted and
the selecting buffer to constitute two complex data and lastly changed to initial state; the following is the delay operation
to output. The serial-to-parallel and the parallel-to-serial are and prepare for the next step operation.
realized [4].
IV. THE WAVEFORM SIMULATION VERIFICATION
AND THE SEQUENTIAL ANALYSIS
Each functional module is analyzed and realized above
mentioned. The FFT processor is synthesized on the top
module as figure 6.
1114
15%. The saving use of logic elements meets the system
requirement of rational utilization of FPGA resource.
In order to verify the accuracy of the system designed, the
simulation result is simulated digitally in Modelsim6.0 [11], as
seen in figure 8. The output is in valid when the signal rdv is
high. When the clock frequency is 100 MHz, it needed 62.95
us to finish a 1024-point 32-bit fixed point complex FFT
transformation.
The verification of algorithm is finished by the Matlab 7.0.
The input signal is x(t)=20000*sin(2*∏*10*t). The output
data from the Modelsim 6.0 are saved in .txt document and
imported into the MATLAB. Then the data format is changed
to the signed decimal from the binary complementary code.
The comparison between the simulation data by the Modelsim
Fig.7 The compiling result by the QuartusⅡ6.0
and the FFT functional computational results by the MATLAB
From the figure 7, we can see that the total logic elements is seen as figure 9.
are only used 2% and the total memory bits are only occupied
5
x 10
7 real x 10 imag
1 5
0.5
0
y
0
y
-0.5
-1 -5
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
f(Hz) f(Hz)
4 4
x 10 modelsim real x 10 modelsim imag
1 2
0.5 1
0 0
b
a
-0.5 -1
-1 -2
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
f(Hz) f(Hz)
Fig.9 The comparing between the FFT processor result and the FFT functional computational result
Form figure 9, we can see that the FFT processor results It is seen that the deviation between the two platforms is
and the computational results by the MATLAB are basically very small. So the fixed point FFT processor based on the
the same. The validity of the algorithm is verified. The FPGA get a good application which is in the allowable error
statistical error analysis between 1024-point simulation results range.
by the Modelsim and the testing data are worked on the
MATLAB platform [12-13]. The total average relative error is: V. CONCLUSIONS
Compared to the theoretical values, the errors from the
1024 A(i ) − B(i ) simulation results are quite small. Using the abundant logic
∑
i =1 B(i ) resources in FPGA is the superiority of this system. The
≈ 0.0062 = 0.62% pipeline architecture and the Ping-pong operation are based on
1024
the embedded M4K logic elements and are finally finished the
1115
high-speed fixed-point FFT operation. Under the clock of 100
MHz, the 1024-point FFT operation is only used 62.95 us
which is reached the high processing requirement. With the
improving of the FPGA resources, the superiority of using this
structure to realize the FFT operation is more and more
significantly. Furthermore, by this structure, it is easy to
expand the FFT operation by increasing the depth of RAM and
ROM, changing the twiddle factor table and adding the times
of the ping-pong operation.
In the design of dynamic filter in the traditional ultrasonic
imaging, when to determine the center frequency, it is only
taken into consideration that the frequency is attenuated with
the depth increased and neglected the nonlinear relationship
between depth and frequency. The assumption of fixed
attenuation coefficient and the frequency is changed linearly
from near field to far field leads to the differences between the
frequency characteristics of real echo signals and the
theoretical model. Consequently, this way weakens the
matched effect of dynamic filter. The fast Fourier transform
processor designed in this paper can analysis the frequency
characteristics of each small section of the echoes in real time
and determine the dynamic filter whose coefficients are
according to the center frequency. By this way, it can attain the
optimal effect of the matched filter and raise the
signal-to-noise ratio. This self-adaptive dynamic filter is very
useful for digital ultrasonography and the ultrasonic Doppler
flow measurement system.
REFERENCES
[1] Hu Guangshu. Digital signal processing-theoretical algorithm and
realization [M]. Peking: Tsinghua University Press, 2005
[2] Chu Chao, Zhang Qin, Xie Ying-ke, et al. Design of a high
performance FFT processor based on FPGA [C]. Proceeding of 2005
Asia and South Pacific Design Automation Conference, Shanghai,
January 18-21, 2005
[3] Uwe Meyer-Baese. Digial signal processing based on FPGA (second
edition)[M]. Peking: Tsinghua University Press. 2006
[4] Zhu Binglian, Liu Xuegang. FPGA Implementation of Pipelined FFT
[J]. Journal of Chongqing University. 2004,27(9): 33-36
[5] Sun Fei, Zhou Ning, Sun Yannan, et al. Hardwire Logic
Implementation of a 8-point FFT Algorithm [J]. Microelectronics and
Computer, 2002, 19(11): 5-17
[6] Ren Bingyu, Zhan Yinwei. Design of 64-point FFT Processor Based on
FPGA [J]. Modern Electronics Technique, 2009, 32(14): 1-3
[7] Liu Guodong, Chen Boxiao, Chen Duofang. Design of FFT Processor
based on FPGA [J]. AERONAUTICAL COMPUTER TECHNIQUE,
2004, 34(3): 101-104
[8] Xia Yuwen. Digital System Design Tutorial [M]. Peking: Beijing
University of Aeronautics and Astronautics Press, 2004.1
[9] Ma Qiang. Implementation of fast fourier transform on FPGA [D].
Nanjing University of Science and Technology, 2005
[10] Zhang Yu, Fang Kangling. Design of General FFT Processor Based on
FPGA [J]. Computer Technology and Development, 2010, 20(8):
87-90
[11] Qiao Lufeng. Verilog HDL digital system design and verification [M].
Peking: Publishing House of Electronics Industry, 2009.4
[12] Shousheng He, Mats Torkelson.Design and Implementation of a
1024-point Pipeline FFT Processor [C]. Custom Integrated Circuits
Conferenc.1998
[13] Levent Aksoy, Ece Olcay Gunes. Area optimization algorithms in
High-speed digital FIR filter synthesis [M] //SBCCI2008. New York:
ACM, 2008: 64-69.
1116