FFT128 Project
FFT128 Project
II)Specification
III)FFT-128 Algorithm
VII)Acknowledgement
I)Overview of our project
1)Introduction
In our electronics and telecommunications industry, spectrum analysis such as
energy spectrum, amplitude spectrum, phase spectrum of signals in general and
spectrum of digital signals in particular in the frequency domain plays a very
important role. It tells us how the frequency components contribute to the signal,
how their energy is, how to use energy effectively….
From that we have a way to handle that signal appropriately. The problem is how
to transform the digital signal from the time domain to the frequency domain to
observe its spectrum. The simplest answer is to use the Discrete Fourier Transform
(DFT).
Discrete Fourier transform is used in many fields, it is used in speech processing,
image processing, .... It would not be an exaggeration to say that anything related
to digital signal processing requires Fourier transforms.
However, the use of discrete Fourier transform has a problem, that is, the
computation is relatively complicated when the data length to be calculated
increases. But as we know an image file, or any signal, is usually quite long, so if
you just calculate the DFT normally, the execution time will be very long and
complicated, so it won't satisfied time requirements. Although the DFT machine
produces good products, but the speed is too slow, the manufacturer will certainly
not be satisfied at all. That is why the fast Fourier transform (FFT) algorithm
Transform) was created.
2)Overview of FFT
The idea of the FFT algorithm is the divide-and-conquer technique. Instead of
calculating the DFT for an entire signal with a large length, we will perform a DFT
calculation for each smaller signal segment in that signal and then from the
obtained result we calculate the DFT of the original signal to be calculated first.
FFT has a very important role:
- FFT has improved the speed and accuracy of digital signal processing.
- FFT opens up a very wide field of spectrum analysis: telecommunications,
astronomy, geophysics management, medical diagnosis,….
- The FFT has rekindled the interests of many branches of mathematics that were
previously fully exploited.
- FFT has laid the foundation for computing other transformations such as Walsh
transform,Hamadard transform, Haar transform,….
=> Idea: Center’s goal is a FFT algorithm/architecture with the programmability
necessary to meet the variety of functional FFT demands of future wireless and
other signal processing applications.
So, our project of the FFT128 core architecture to explain its proper use. FFT128
soft core is the unit to perform the Fast Fourier Transform (FFT). It performs one
dimensional 128 – complex point FFT. The data and coefficient widths are
adjustable in the range 8 to 16.
II)Specification
1)Interface:
The FFT128 processor has the minimum multiplier number which is equal to 4.
This fact makes this core attractive to implement in ASIC. When configuring in
Xilinx FPGA, these multipliers are implemented in 4 DSP48 units respectively.
The customer can select the input data, output data, and coefficient widths which
provide application dynamic range needs. This can minimize both logic hardware
and memory volume.
*Signal Description:
Signal Type Description
CLK Input Global clock
RST Input Global reset
start Input FFT Start
n[6:0] Input Address of input data
DR[15:0] Input Input data real sample
DI[15:0] Input Input data imaginary sample
fft_ready Input Input data accepting fft ready
shift[3:0] Input Shift left code
DOR[19:0] Output Output data real sample
DIR[19:0] Output Output data imaginary sample
k[6:0] Output Result number or address
Output_ready Output Output data of FFT ready
OVF1 Output Overflag of output data real
OVF2 Output Overflag of output imaginary real
*Note: input and output data are represented by 16 and 20 bit twos complement
complex integers, respectively. The twiddle coefficients are 16bit wide numbers.
2)Typical core interconnection
The core interconnection depends on the application nature where it is used. The
simple core interconnection considers the calculation of the unlimited data stream
which are inputted in each clock cycle.
The data source, for example, the analog-to-digital converter, FFT128 is the core,
which is customized as one with 3 inner data buffers.
The FFT algorithm starts with the impulse START.
The respective results are outputted after the READY impulse and followed by the
address code ADDR.
The signal START is needed for the global synchronization, and can be generated
once before the system operation. The input data have the natural order, and can be
numbered from 0 to 63. When 3 inner data buffers are configured then the output
data have the natural order. When 2 inner data buffers are configured then the
output data have the 8-th inverse order, the order is 0,8,16,...56,1,9,17,...
III)FFT-128 Algorithm
1)Basic of FFT algorithm
*From the radix2 FFT, we now have other bases like radix 4, radix 8, along with
various types of FFT calculation constructs such as parallelism, SDF(single delay
feedback), MDC (multipath delay commutator), in-place (in-place),floods,
increasingly affirm the important role of FFT.
Here we only study the DIF frequency division FFT.
Let x[n] be a sequence of length N. The discrete Fourier transform DFT of x[n] is
calculated according to the following formula:
(1)
2π
If we set W knN =e− j N kn , this coefficients is called twiddle factor, this formula ca nbe
written as:
a)FFT radix-2
Detail of FFT-radix 2:
From formula (2),we can have:
Divide
X1 and X2 is N-points
DFT Transform
FFT radix-2
butterfly model
(1)
To simplify the notation, the complex-valued phase factor e –j2nk/128 is:
W128 = cos(2/128) – j sin(2/128)
=>The FFT algorithms take advantage of the symmetry and periodicity properties
of W128n to greatly reduce the number of calculations that the DFT requires. In an
FFT implementation the real and imaginary components of twiddle factors.
The basic of the FFT is that a DFT can be divided into smaller DFTs. In the
processor FFT128 a mixed radix 8 and 16 FFT algorithm is used. It divides DFT
into two smaller DFTs of the length 8 and 16, as it is shown in the formula:
which shows that 128-point DFT is divided into two smaller 8- and16-point DFTs.
This algorithm is illustrated by the graph which is shown in the Fig.1. The input
complex data x(n) are represented by the 2-dimensional array of data x(16l+m).
The columns of this array are computed by 8-point DFTs. The results of them are
multiplied by the twiddle factors W128ms . And the resulting array of data
X(16r+s) is derived by 16-point DFTs of rows of the intermediate result array.
The 8- and 16-point DFTs are implemented by the Winograd small point FFT
algorithms, which provide the minimum additions and multiplications. As a result,
the radix-16 FFT algorithm needs only 128 complex multiplications to the twiddle
factors W128ms and a set of multiplications to the twiddle factors W16sl except of
32768 complex multiplications in the origin DFT. Note that the well known radix-
2 128-point FFT algorithm needs 896 complex multiplications.
*Highly pipelined calculation:
Each base FFT operation is computed by the datapaths called FFT8 and FFT16.
FFT8 and FFT16 calculates the 8- and 16-point DFTs in the high pipelined mode.
Therefore in each clock cycle one complex number is read from the input data
buffer RAM and the complex result is written in the output buffer RAM. The 8-
and 16-point DFT algorithm is divided into several stages which are implemented
in the stages of the FFT8 and FFT16 pipelines. This supports the increasing the
clock frequency up to 200 MHz and higher. The latent delay of the FFT8 unit from
input of the first data to output of the first result is equal to 30 clock cycles. The
latent delay of the FFT16 unit from input of the first data to output of the first
result is equal to 30 clock cycles.
*High precision computations:
In the core the inner data bit width is higher to 4 digits than the input data bit
width. The main error source is the result truncation after multiplication to the
factors W ms
64 . Because the most of base FFT operation calculations are additions,
they are calculated without errors. The FFT results have the data bit width which is
higher in 3 digits than the input data bit width, which provides the high data range
of results when the input data is the sinusoidal signal. The maximum result error is
less than the 1 least significant bit of the input data. Besides, the normalizing
shifters are attached to the outputs of FFT8 pipelines, which provide the proper
bandwidth of the resulting data. The overflow detector outputs provide the
opportunity to input the proper shift left bit number for these shifters.
2)Block Diagram:
*BUFRAM128:
BUFRAM128 is the data buffer, which consists of the two port synchronous RAM
of the volume 512 complex data, and the write-read address counter. The real and
imaginary parts of the data are stored in the natural ascending order as in the
diagram in the Fig. 9. By the START impulse the address counter is reset and then
starts to count (signal address). The input data DR and DI are stored to the
respective address place by the rising edge of the clock signal.
After writing 128 data beginning at the START signal, the unit outputs the ready
signal RDY and starts to write the next 128 data to the second half of the memory.
At this period of time it outputs the data stored in the first half of the memory.
When this data reading is finished then the reading of the next array is starting.
This process is continued until the next START signal or RST signal are entered.
The reading address sequence is 8-6-th inverse order, the order is
0,16,32,...240,1,17,33,... . Really the reading address is derived from the writing
address by swapping 4 LSB and 4 MSB address bits.
BUFRAM128 unit can be implemented in 2 ways. The first way consists in use of
the usual one-port synchronous RAMs. Then BUFRAM128 consists of 2 parts,
firstly one data array is stored into one part of the buffer, and another data array is
read from the second part of the buffer, Then these parts are substituted by each
other. Such a BUFRAM128 is implemented by use of files BUFRAM128C.v –
root model of the buffer, RAM2x128C.v - dual ported synchronous RAM, and
RAM128.v -single ported synchronous RAM model. This kind of the buffer is
implemented when the FFT128bufferports1 parameter is recommented in the
FFT128_config.inc file.
The second way consists in use of the usual 2-port synchronous RAM with a single
clock input. Such a RAM is usually instantiated as the BlockRAM or the dual
ported Distributed RAM in the Xilinx FPGAs. In this situation the
FFT128bufferports1 parameter is commented or excluded in the
FFT128_config.inc file. Then the file RAM128.v, which describes the simple
model of the registered synchronous RAM, is not used.
*FFT16:
The datapath FFT16 implements the 16-point FFT algorithm in the pipelined
mode. 16 input complex data are calculated for 46 clock cycles, but each new 16
complex results are outputted each 16 clock cycles.
We have x and y are input and output arrays of the complex data, t1,…,t26, m1,…,
m17, s1,…,s20 are the intermediate complex results, j = √(-1). As we see the
algorithm contains only 20 real multiplications to the untrivial coefficients sin(π/4)
= 0.7071; sin(3π/8) = 0.9239; cos(3π/8) = 0.3827; (cos(π/8) + cos(3π/8)) =1.3066;
(sin(π/8) – sin(3π/8)) = 0.5412; and 156 real additions and subtractions. The
datapath is described in the files FFT16.v, MPUC707.v, MPUC924_383.v,
MPUC1307.v, MPUC541.v widely using the resource sharing, and pipelining
techniques. The counter ct counts the working clock cycles from 0 to 15. So a
single inferred adder adds x(0) + x(8) in one cycle, x(1) + x(9) in the next cycle,
D(1) + D(5) in another cycle and so on, and x(7) + x(15) in the final cycle of the
sequence of cycles deriving the results t1,t7,t9,…,t13 respectively. Four constant
multipliers are used to derive the multiplication to 5 different coefficients. So the
unit in MPUC707.v implements the multiplication to the coefficient 0.7071 in the
pipelined manner. Note that the unit MPUC924_383.v implements the
multiplication both to 0.9239 and to 0.3827. The multipliers use the adder tree,
which adds the multiplicand shifted to different bit numbers. For example, for
short input bit width the coefficient 0.7071 is approximated as 0.101101012, for
long input bit width it is approximated as 0.101101010000001012. The long
coefficient bit width is set by the parameter FFT128bitwidth_coef_high. The first
kind of the constant multiplier occupies 3 adders, and the second one occupies 4
adders. The importance of the long coefficient selection is seen from the following
fact. When the input bit width is 16 and higher, the selection of the long coefficient
bit width decreases the FFT128 result error in two times. The FFT16 unit
implements both FFT and inverse FFT depending on the parameter
FFT128paramifft. Practically the inverse FFT is implemented on the base of the
direct FFT by the inversion of operations in the final stage of computations for all
the results except y(0), y(8). For example, y(1):=s9 + s17; is substituted to y(1):=s9
– s17; The FFT16 unit starts its operation by the START impulse. The first result
is preceded by the RDY impulse which is delayed from the START impulse to 30
clock impulses. The output results have the bit width which is in 4 higher than the
input data bit width. That means that all the calculations except multiplication by
coefficients like 0.7071 are implemented without truncations, and therefore, the
FFT128 results have the minimized errors comparing to other FFT processors.
*FFT8:
The datapath FFT8 implements the 8-point FFT algorithm in the pipelined mode. 8
input complex data are calculated for 22 clock cycles, but each new 8 complex
results are outputted each 8 clock cycles.
We have D and DO are input and output arrays of the complex data, j = √(-1), t1,
…,t8, m1,…, m7, s1,…,s4 are the intermediate complex results. As we see the
algorithm contains only 4 multiplications to the untrivial coefficient sin(π/4) =
0.7071, and 26*2 real additions and subtractions. The multiplication to a
coefficient j means the negation the imaginary part and swapping real and
imaginary parts. The datapath is described in the files FFT8.v, MPU707.v widely
using the resource sharing technique. The FFT8 unit starts its operation by the
START impulse. The first result is preceded by the RDY impulse which is delayed
from the START impulse to 17 clock impulses.
*CNORM(For Normalize Output):
During computations in FFT8 and FFT16 the data magnitude increases up to 8 and
16 times, respectively, and the FFT128 result can increase up to 128 times
depending on the spectrum properties of the input signal. Therefore, to prevent the
signal dynamic bandwidth loose, the output signal bit width must be at least in 8
bits higher than the input signal bit width. To prevent this bit width increase, to
provide the proper signal dynamic bandwidth, and to ease the next computation of
the derived spectrum, the CNORM units are attached to the outputs of the FFT16
units. CNORM unit provides the data shift left to 0,1,2, and 3 bits depending on the
code SHIFT. The input data width is nb+3 and the output data width is nb+2,
where nb is the given processor input bit width. The overflow occurs in CNORM
unit when the SHIFT code is given too high. The SHIFT code must be set by the
customer to prevent the data overflow and to provide the proper dynamic
bandwidth. The CNORM unit contains the overflow detector with the output OVF.
When FFT128 core in operation, a 1 at the output OVF signals that for some input
data an overflow occurred. OVF flag is resetted by the RST or START signal.
The SHIFT inputs of two CNORM stages are concatenated to the 4-bit input
SHIFT of the FFT128 core, 2 LSB bits control the first stage, and 2 MSB bits do
the second stage. The selection of the proper SHIFT code depends on the spectrum
property of the input signal. When the input signal is the sinusoidal one or contains
a few of sinusoids, and the noise level is small then SHIFT =0000, or 0001, or
0010. When the input signal is a noisy signal then SHIFT can be 1100 and higher.
When the input signal has the stable statistic properties then the code SHIFT can
be set as a constant. Then the OVF outputs can be not in use, and the CNORM
units will be removed from the project by the hardware optimization when the core
is synthesized.
*ROTATOR128:
6)Algorithm of DFT-8
7)Algorithm of FFT-8
2D FFT-128 result(16-bit):
b[0][0] = 2154610.00+2336445.00i
0 11111 1111111111
0 11111 1111111111
b[0][1] = -1802.62+-385318.62i
1 11001 1100001011
1 11111 1111111111
b[0][2] = 162387.89+-578252.38i
0 11111 1111111111
1 11111 1111111111
b[0][3] = 55678.63+179581.86i
0 11110 1011001100
0 11111 1111111111
b[0][4] = 8236.13+-361030.91i
0 11100 0000000110
1 11111 1111111111
b[0][5] = -67043.34+51568.53i
1 11111 0000011000
0 11110 1001001100
b[0][6] = -228137.89+-438177.62i
1 11111 1111111111
1 11111 1111111111
b[0][7] = 13175.39+141488.14i
0 11100 1001101111
0 11111 1111111111
b[1][0] = -1802.62+-385318.62i
1 11001 1100001011
1 11111 1111111111
b[1][1] = 189610.00+-35357.94i
0 11111 1111111111
1 11110 0001010001
b[1][2] = -58216.67+-189503.42i
1 11110 1100011011
1 11111 1111111111
b[1][3] = 86979.44+12923.62i
0 11111 0101001111
0 11100 1001001111
b[1][4] = 95847.44+-145785.47i
0 11111 0111011010
1 11111 1111111111
b[1][5] = -349903.88+349044.12i
1 11111 1111111111
0 11111 1111111111
b[1][6] = -260158.02+-511264.12i
1 11111 1111111111
1 11111 1111111111
b[1][7] = -1516922.62+1313029.12i
1 11111 1111111111
0 11111 1111111111
b[2][0] = 1743923.75+-642582.94i
0 11111 1111111111
1 11111 1111111111
b[2][1] = 189610.00+-35357.94i
0 11111 1111111111
1 11110 0001010001
b[2][2] = 11734.89+-100078.91i
0 11100 0110111011
1 11111 1000011100
b[2][3] = -48241.72+39591.53i
1 11110 0111100100
0 11110 0011010101
b[2][4] = -8006.00+41592.99i
1 11011 1111010010
0 11110 0100010100
b[2][5] = 51698.62+160748.80i
0 11110 1001010000
0 11111 1111111111
b[2][6] = -85290.67+-39539.64i
1 11111 0100110101
1 11110 0011010100
b[2][7] = 67965.00+29705.72i
0 11111 0000100110
0 11101 1101000001
b[3][0] = 55678.63+179581.86i
0 11110 1011001100
0 11111 1111111111
b[3][1] = -48241.72+39591.53i
1 11110 0111100100
0 11110 0011010101
b[3][2] = -283865.59+79409.92i
1 11111 1111111111
0 11111 0011011001
b[3][3] = -29110.34+73386.04i
1 11101 1100011011
0 11111 0001111011
b[3][4] = 26420.79+232521.31i
0 11101 1001110011
0 11111 1111111111
b[3][5] = -39048.58+-26262.25i
1 11110 0011000100
1 11101 1001101001
b[3][6] = -30952.76+-241507.16i
1 11101 1110001111
1 11111 1111111111
b[3][7] = -72998.95+-57996.77i
1 11111 0001110101
1 11110 1100010100
b[4][0] = -11168.95+-476399.12i
1 11100 0101110100
1 11111 1111111111
b[4][1] = -58216.67+-189503.42i
1 11110 1100011011
1 11111 1111111111
b[4][2] = 140016.34+-90418.03i
0 11111 1111111111
1 11111 0110000101
b[4][3] = -283865.59+79409.92i
1 11111 1111111111
0 11111 0011011001
b[4][4] = 86103.10+-142582.53i
0 11111 0101000001
1 11111 1111111111
b[4][5] = -131301.47+-35930.91i
1 11111 1111111111
1 11110 0001100011
b[4][6] = 128592.62+-107287.40i
0 11111 1111011001
1 11111 1010001100
b[4][7] = 129856.65+16407.49i
0 11111 1111101101
0 11101 0000000001
b[5][0] = -67043.34+51568.53i
1 11111 0000011000
0 11110 1001001100
b[5][1] = 51698.62+160748.80i
0 11110 1001010000
0 11111 1111111111
b[5][2] = -131301.47+-35930.91i
1 11111 1111111111
1 11110 0001100011
b[5][3] = -41624.41+-180660.19i
1 11110 0100010101
1 11111 1111111111
b[5][4] = 265438.56+140123.47i
0 11111 1111111111
0 11111 1111111111
b[5][5] = 87888.97+13034.93i
0 11111 0101011101
0 11100 1001011101
b[5][6] = 9795.65+-37228.34i
0 11100 0011001000
1 11110 0010001011
b[5][7] = 112692.78+79197.13i
0 11111 1011100001
0 11111 0011010101
b[6][0] = 1156257.25+-552071.25i
0 11111 1111111111
1 11111 1111111111
b[6][1] = 86979.44+12923.62i
0 11111 0101001111
0 11100 1001001111
b[6][2] = 253050.09+-121457.27i
0 11111 1111111111
1 11111 1101101010
b[6][3] = -29110.34+73386.04i
1 11101 1100011011
0 11111 0001111011
b[6][4] = -100971.84+81906.02i
1 11111 1000101010
0 11111 0100000000
b[6][5] = -41624.41+-180660.19i
1 11110 0100010101
1 11111 1111111111
b[6][6] = -18927.83+-152277.38i
1 11101 0010011111
1 11111 1111111111
b[6][7] = -183383.89+-108921.97i
1 11111 1111111111
1 11111 1010100110
b[7][0] = 13175.39+141488.14i
0 11100 1001101111
0 11111 1111111111
b[7][1] = 67965.00+29705.72i
0 11111 0000100110
0 11101 1101000001
b[7][2] = 129856.65+16407.49i
0 11111 1111101101
0 11101 0000000001
b[7][3] = -183383.89+-108921.97i
1 11111 1111111111
1 11111 1010100110
b[7][4] = 98133.22+246292.69i
0 11111 0111111101
0 11111 1111111111
b[7][5] = 315524.47+234315.06i
0 11111 1111111111
0 11111 1111111111
b[7][6] = -62212.00+-26687.47i
1 11110 1110011000
1 11101 1010000100
b[7][7] = -194484.47+41739.19i
1 11111 1111111111
0 11110 0100011000
b[8][0] = -14545.19+-128961.22i
1 11100 1100011010
1 11111 1111011111
b[8][1] = 95847.44+-145785.47i
0 11111 0111011010
1 11111 1111111111
b[8][2] = -172053.22+199310.34i
1 11111 1111111111
0 11111 1111111111
b[8][3] = 26420.79+232521.31i
0 11101 1001110011
0 11111 1111111111
b[8][4] = -232156.75+238695.28i
1 11111 1111111111
0 11111 1111111111
b[8][5] = 265438.56+140123.47i
0 11111 1111111111
0 11111 1111111111
b[8][6] = -67068.81+164107.66i
1 11111 0000011000
0 11111 1111111111
b[8][7] = 98133.22+246292.69i
0 11111 0111111101
0 11111 1111111111
b[9][0] = 473152.00+378656.00i
0 11111 1111111111
0 11111 1111111111
b[9][1] = -119255760.00+-79477920.00i
1 11111 1111111111
1 11111 1111111111
b[9][2] = -44412144.00+107713128.00i
1 11111 1111111111
0 11111 1111111111
b[9][3] = -68875792.00+-13955128.00i
1 11111 1111111111
1 11111 1111111111
b[9][4] = 166352.00+215054688.00i
0 11111 1111111111
0 11111 1111111111
b[9][5] = 102260928.00+-20779862.00i
0 11111 1111111111
1 11111 1111111111
b[9][6] = 107616392.00+259506272.00i
0 11111 1111111111
0 11111 1111111111
b[9][7] = 597596288.00+-399871232.00i
0 11111 1111111111
1 11111 1111111111
b[10][0] = 521046.62+-325177.88i
0 11111 1111111111
1 11111 1111111111
b[10][1] = -349903.88+349044.12i
1 11111 1111111111
0 11111 1111111111
b[10][2] = -112325.67+316327.75i
1 11111 1011011011
0 11111 1111111111
b[10][3] = -39048.58+-26262.25i
1 11110 0011000100
1 11101 1001101001
b[10][4] = -82777.97+25873.89i
1 11111 0100001101
0 11101 1001010001
b[10][5] = 87888.97+13034.93i
0 11111 0101011101
0 11100 1001011101
b[10][6] = 10090.28+-41235.25i
0 11100 0011101101
1 11110 0100001001
b[10][7] = 315524.47+234315.06i
0 11111 1111111111
0 11111 1111111111
b[11][0] = 802654.00+290248.00i
0 11111 1111111111
0 11111 1111111111
b[11][1] = 2973330944.00+-591657984.00i
0 11111 1111111111
1 11111 1111111111
b[11][2] = 537231.38+4731880960.00i
0 11111 1111111111
0 11111 1111111111
b[11][3] = -2973127424.00+-591014784.00i
1 11111 1111111111
1 11111 1111111111
b[11][4] = 269118.00+812888.00i
0 11111 1111111111
0 11111 1111111111
b[11][5] = -885129472.00+-591250048.00i
1 11111 1111111111
1 11111 1111111111
b[11][6] = 467928.62+858560.00i
0 11111 1111111111
0 11111 1111111111
b[11][7] = -395038912.00+-591297536.00i
1 11111 1111111111
1 11111 1111111111
b[12][0] = 192538.27+-499212.94i
0 11111 1111111111
1 11111 1111111111
b[12][1] = -260158.02+-511264.12i
1 11111 1111111111
1 11111 1111111111
b[12][2] = 102383.93+219354.16i
0 11111 1001000000
0 11111 1111111111
b[12][3] = -30952.76+-241507.16i
1 11101 1110001111
1 11111 1111111111
b[12][4] = 60944.11+146385.39i
0 11110 1101110001
0 11111 1111111111
b[12][5] = 9795.65+-37228.34i
0 11100 0011001000
1 11110 0010001011
b[12][6] = -12323.17+3856.47i
1 11100 1000000100
0 11010 1110001000
b[12][7] = -62212.00+-26687.47i
1 11110 1110011000
1 11101 1010000100
b[13][0] = 645272.00+380984.00i
0 11111 1111111111
0 11111 1111111111
b[13][1] = -27354.38+47669.89i
1 11101 1010101110
0 11110 0111010010
b[13][2] = 909136.00+347344.00i
0 11111 1111111111
0 11111 1111111111
b[13][3] = 305379.62+20842.20i
0 11111 1111111111
0 11101 0100010111
b[13][4] = 176400.02+851808.00i
0 11111 1111111111
0 11111 1111111111
b[13][5] = 40534.06+6719.95i
0 11110 0011110011
0 11011 1010010000
b[13][6] = 442792.00+848920.00i
0 11111 1111111111
0 11111 1111111111
b[13][7] = 238008.66+3944.60i
0 11111 1111111111
0 11010 1110110100
b[14][0] = 266097.38+-119806.88i
0 11111 1111111111
1 11111 1101010000
b[14][1] = -1516922.62+1313029.12i
1 11111 1111111111
0 11111 1111111111
b[14][2] = 36940.27+-123510.10i
0 11110 0010000010
1 11111 1110001010
b[14][3] = -72998.95+-57996.77i
1 11111 0001110101
1 11110 1100010100
b[14][4] = -18900.14+-221693.56i
1 11101 0010011101
1 11111 1111111111
b[14][5] = 112692.78+79197.13i
0 11111 1011100001
0 11111 0011010101
b[14][6] = 88507.09+36214.27i
0 11111 0101100111
0 11110 0001101100
b[14][7] = -194484.47+41739.19i
1 11111 1111111111
0 11110 0100011000
b[15][0] = 549304.00+485144.00i
0 11111 1111111111
0 11111 1111111111
b[15][1] = -99656.21+45452.12i
1 11111 1000010101
0 11110 0110001100
b[15][2] = 768856.00+454880.00i
0 11111 1111111111
0 11111 1111111111
b[15][3] = -42461.36+12117.91i
1 11110 0100101111
0 11100 0111101011
b[15][4] = 76824.00+434080.03i
0 11111 0010110000
0 11111 1111111111
b[15][5] = 312264.66+-10280.56i
0 11111 1111111111
1 11100 0100000101
b[15][6] = 396856.00+947416.00i
0 11111 1111111111
0 11111 1111111111
b[15][7] = 248540.66+-19685.41i
0 11111 1111111111
1 11101 0011001110
VI)Implentation And Result
1)Main Verilog code for pipelined FFT-128(each module’s code in main modue of
FFT-128 will be attach with my report)
2)Create testbench
*Synthesis On Vivado:
The following table illustrates the performance of the FFT128 core with two data
buffers based on BlockRAMs in Xilinx Virtex device when implementing 128-
point FFT for 10 and 16-bit data and coefficients. Note that 4 DSP48 units in all
projects are used. The results are derived using the Xilinx ISE 9.1 tool
VII)Acknowledgement
- Find out about the company structure, the positions that graduates can apply later
recruit.
- Improve soft skills such as teamwork, planning, time division
time, communication.
- Orientation to jobs after graduation, improve basic knowledge
department, specializing in electronics.
- See the shortcomings and limitations of your own knowledge and skills to
can be replenished in a timely manner.
- Applying knowledge learned in class and self-finding knowledge
solve the problem.
- Learn more useful knowledge such as programming MatLab, C++,
Verilog language...
- A preliminary understanding of the hardware verification process in IC design.