Li 2009
Li 2009
Fig. 1. Radix 22 based parallel FFT algorithm data flow
Fig. 3. The parallel Radix 22 based pipeline architecture
1 and 2, stages 3 and 4, and stages 5 and 6 to three common following N4 cycles, control signal I is set to one to enable
controller blocks. These common controller blocks all have a the butterfly function in stage 1. At the same time, the stage 2
structure as shown in Figure 4. Therefore, the whole parallel reads in the N8 data outputs of the stage 1, which is controlled
architecture can also be divided into the first three common by control signal II. The next N8 cycles, butterfly II of stage2
controller blocks, the last block, and the arithmetic blocks. The works and control signal II equals one. The data flow analysis
arithmetic blocks are composed of five ROMs and complex is shown in the Figure 5.
multipliers.
The last block only includes the seventh stage. Because the
odd and even data need to be commutated, two demultiplexers
seem to be required to switch the data, as shown in the
Figure 3. However, this can be improved by analyzing the
scheduling of the last stage. It can be found that only one
butterfly is working per clock circle and the first output data
of the even path will be processed with the first output of the
odd path of the 6th stage. As long as the timing is matched,
the even path outputs will be processed with the odd path
ones correspondingly. Therefore, the two demultiplexers are
Fig. 4. The common controller block not necessary and only one butterfly in the last stage is required
to process the data. The modified structure of the last stage
The basic idea of the data flow in these common controller and interface with previous stage is shown in Figure 6.
N
blocks is that the stage 1 repeats after calculating r2 r data,
N
and the stage 2 repeats after calculating r2r+1 data, where r
IV. I MPLEMENTATION AND RESULT ANALYSIS
(r = 1,2,3) is the index of the common controller blocks and
N is the FFT size. Only one counter is used to produce the A. FPGA Implementation
control signal I and II for both stage 1 and stage 2. For the The proposed design is synthesized and implemented by
first common controller block, first, control signal I is set to Xilinx ISE which is targeted for FPGA Xilinx Virtex4 im-
zero to let the N4 data be read into the stage 1, and in the plementation. The arithmetic blocks are directly mapped to
TABLE II
T HE ASIC I MPLEMENTATION C OMPARISON