Fast Furier Transoform
Fast Furier Transoform
ABSTRACT
The Fast Fourier Transform (FFT) is an efficient means for computing the Discrete Fourier Transform
(DFT). It is one of the most widely used computational elements in Digital Signal Processing (DSP)
applications. This DSP is ideally suited for such applications. They include an FFT hardware accelerator
(HWAFFT) that is tightly coupled with the CPU, allowing high FFT processing performance at very low
power. This application report describes FFT computation on the DSP and covers the following topics:
• Basics of DFT and FFT
• DSP Overview Including the FFT Accelerator
• HWAFFT Description
• HWAFFT Software Interface
• Simple Example to Illustrate the Use of the FFT Accelerator
• FFT Benchmarks
• Description of Open Source FFT Example Software
• Computation of Large (Greater Than 1024-point) FFTs
Project collateral and source code discussed in this application report can be downloaded from the
following URL: http://www.ti.com/lit/zip/SPRABB6.
Contents
1 Basics of DFT and FFT .......................................................................................................................... 2
2 DSP Overview Including the FFT Accelerator ........................................................................................ 6
3 FFT Hardware Accelerator Description .................................................................................................. 8
4 HWAFFT Software Interface ................................................................................................................ 11
5 Simple Example to Illustrate the Use of the FFT Accelerator ............................................................... 17
6 FFT Benchmarks .................................................................................................................................. 20
7 Description of Open Source FFT Example Software............................................................................ 21
8 Computation of Large (Greater Than 1024-Point) FFTs ...................................................................... 22
9 Appendix A Methods for Aligning the Bit-Reverse Destination Vector ............................................................... 25
Appendix A Revision History...................................................................................................................................... 27
List of Figures
1 DIT Radix 2 Butterfly .............................................................................................................................. 3
2 DIT Radix 2 8-point FFT ......................................................................................................................... 4
3 Graphical FFT Computation ................................................................................................................... 5
4 Block Diagram ........................................................................................................................................ 7
5 Bit Reversed Input Buffer ..................................................................................................................... 13
6 Graphing the Real Part of the FFT Result in CCS4 ............................................................................. 19
7 Graphing the Imaginary Part of the FFT Result in CCS4 ..................................................................... 19
8 FFT Filter Demo Block Diagram ........................................................................................................... 21
List of Tables
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 1
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
Basics of DFT and FFT www.ti.com
2 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Basics of DFT and FFT
k
P + P’ = P + Q∗W N
+
Q + k
Q’ = P – Q∗W N
W k –
N
The mathematical meaning of this butterfly is shown below with separate equations for real and imaginary
parts:
The flow graph in Figure 2 shows the interconnected butterflies of an 8-point Radix-2 DIT FFT. Notice that
the inputs to the FFT are indexed in bit-reversed order (0, 4, 2, 6, 1, 5, 3, 7) and the outputs are indexed
in sequential order (0, 1, 2, 3, 4, 5, 6, 7). Computation of a Radix-2 DIT FFT requires the input vector to be
in bit-reversed order, and generates an output vector in sequential order. This bit-reversal is further
explained in Section 4.3, Bit-Reverse Function.
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 3
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
Basics of DFT and FFT www.ti.com
Table 1 clearly shows significant reduction in computational complexity with the Radix-2 FFT, especially
for large N. This substantial decrease in computational complexity of the FFT has allowed DSPs to
efficiently compute the DFT in reasonable time. For its substantial efficiency improvement over direct
computation, the HWAFFT coprocessor in the DSP implements the Radix-2 FFT algorithm.
4 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Basics of DFT and FFT
FFT
IFFT
Imaginary (Time Domain Signal) Imaginary (Frequency Domain Signal)
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 5
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
DSP Overview Including the FFT Accelerator www.ti.com
6 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com DSP Overview Including the FFT Accelerator
6 6 2 4 16
3 Timers
MMC/SD
MMC/SD
USB 2.0
slave
GPIO
GPIO
RTC
IC
IS
IS
2
2
4-Ch.
EMIF/NAND, DMA
60 C55x
TM
SDRAM/ 2
mSDRAM DSP Core
INT
Peripheral Bus
FFT
Memory
DARAM
4 64 KB SARAM
32KHz 7
JTAG
ROM 256 KB
PLL
128 KB
3 LDOs
UART
10-Bit
GPIO
SAR
LCD
SPI
IS
IS
2
4 4 4 7
8 8
8 13
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 7
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
FFT Hardware Accelerator Description www.ti.com
The C55x CPU includes a tightly coupled FFT accelerator that communicates with the C55x CPU through
the use of the coprocessor instructions. The main features of this hardware accelerator are:
• Supports 8- to 1024-point (powers of 2) complex-valued FFTs.
• Internal twiddle factor generation for optimal use of memory bandwidth and more efficient
programming.
• Basic and software-driven auto-scaling feature provides good precision versus cycle count trade-off.
• Single-stage and double-stage modes enable computation of one or two stages in one pass, and thus
better handle the odd power of two FFT widths.
• Is 4 to 6 times more energy efficient and 2.2 to 3.8 times faster than the FFT computations on the
CPU.
8 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com FFT Hardware Accelerator Description
Consecutive stages can be overlapped such that the first data points for the next pass are read while the
final output values of the current pass are returned. For odd-power-of-two FFT lengths, the last double-
stage pass needs to be completed before starting a final single-stage pass. Thus, the double-stage
latency is only experienced once for even-powers-of-2 FFT computations and twice for odd-powers-of-2
FFT computations. Latency has little impact on the total computation performance, and less and less so
as the FFT size increases.
NOTE: To execute the HWAFFT routines from the ROM of the DSP, the programmer must satisfy
memory allocation restrictions for the data and scratch buffers. See the device-specific errata
for an explanation of the restrictions and workarounds:
• TMS320VC5505/VC5504 Fixed-Point DSP Silicon Errata (Silicon Revision 1.4)
[literature number SPRZ281]
• TMS320C5505/C5504 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ310]
• TMS320C5515/C5514 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ308]
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 9
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
FFT Hardware Accelerator Description www.ti.com
3.6 Scaling
FFT computation with fixed-point numbers is vulnerable to overflow, underflow, and saturation effects.
Depending on the dynamic range of the input data, some scaling may be required to avoid these effects
during the FFT computation. This scaling can be done before the FFT computation, by computing the
dynamic range of the input points and scaling them down accordingly. If the magnitude of each complex
input element is less than 1/N, where N = FFT Length, then the N-point FFT computation will not overflow.
Uniformly dividing the input vector elements by N (Pre-scaling) is equivalent to shifting each binary
number right by log2(N) bits, which introduces significant error (especially for large FFT lengths). When
this error propagates through the FFT flow graph, the output noise-to-signal ratio increases by 1 bit per
stage or log2(N) bits in total. Overflow will not occur if each input’s magnitude is less than 1/N.
Alternatively, a simple divide-by-2 and round scaling after each butterfly offers a good trade-off between
precision and overflow protection, while minimizing computation cycles. Because the error introduced by
early FFT stages is also scaled after each butterfly, the output noise-to-signal ratio increases by just ½ bit
per stage or ½ * log2(N) bits in total. Overflow is avoided if each input’s magnitude is less than 1.
The HWAFFT supports two scale modes:
• NOSCALE
– Scaling logic disabled
– Vulnerable to overflow
– Output dynamic range grows with each stage
– No overflow if input magnitudes < 1/N
• SCALE
– Scales each butterfly output by 1/2
– No overflow if input magnitudes < 1
10 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com HWAFFT Software Interface
The HWAFFT functions use an Int32 pointer to reference these complex vectors. Therefore, each 32-bit
element contains the 16-bit real part in the most significant 16 bits, and the 16-bit imaginary part in the
least significant 16 bits.
Int32 CMPLX_Vec32[N] = …(N = FFT Length)
Real[0] Imag[0] Real[1] Imag[1] Real[2] Imag[2]
Bit31,.................., Bit16, Bit15,. ................. , Bit0 Bit31,.................., Bit16, Bit15,. ................ , Bit0 Bit31,.................., Bit16, Bit15,. ................. , Bit0
To extract the real and imaginary parts from the complex vector, it is necessary to mask and shift each 32-
bit element into its 16-bit real and imaginary parts:
Uint16 Real_Part = CMPLX_Vec32[i] >> 16;
Uint16 Imaginary_Part = CMPLX_Vec32[i] & 0x0000FFFF;
NOTE: To execute the HWAFFT routines from the ROM of the DSP, the programmer must satisfy
memory allocation restrictions for the data and scratch buffers. See the device-specific errata
for an explanation of the restrictions and workarounds:
• TMS320VC5505/VC5504 Fixed-Point DSP Silicon Errata (Silicon Revision 1.4)
[literature number SPRZ281]
• TMS320C5505/C5504 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ310]
• TMS320C5515/C5514 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ308]
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 11
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
HWAFFT Software Interface www.ti.com
The HWAFFT functions are named hwafft_Npts, where N is the FFT length. For example, hwafft_32pts is
the name of the function for performing 32-point FFT and IFFT operations. The structure of the HWAFFT
functions is:
Uint16 hwafft_Npts( Performs N-point complex FFT/IFFT, where N = {8, 16, 32, 64, 128, 256, 512,
1024}
Int32 *data, Input/output – complex vector
Int32 *scratch, Intermediate/output – complex vector
Uint16 Flag determines whether FFT or IFFT performed, (0 = FFT, 1 = IFFT)
fft_flag,
Uint16 Flag determines whether butterfly output divided by 2 (0 = Scale, 1 = No Scale)
scale_flag
); Return value Flag determines whether output in data or scratch vector (0 = data, 1 = scratch)
The *data parameter is a complex input vector to HWAFFT. It contains the output vector if Return Value =
0 = OUT_SEL_DATA. There is a strict address alignment requirement if *data is shared with a bit-reverse
destination vector (recommended). See Section 4.3.1, Bit Reverse Destination Vector Alignment
Requirement.
Int32 *scratch
This is the scratch vector used by the HWAFFT to store intermediate results between FFT stages. It
contains complex data elements (real part in most significant 16 bits, imaginary part in least significant 16
bits). After the HWAFFT function completes the result will either be stored in the data vector or in this
scratch vector, depending on the status of the return value. The return value is Boolean, where 0 indicates
that the result is stored in the data vector, and 1 indicates this scratch vector. The data and scratch
vectors must reside in separate blocks of RAM (DARAM or SARAM) to maximize memory bandwidth.
#pragma DATA_SECTION(scratch_buf, "scratch_buf");
//Static Allocation to Section: "scratch_buf : > DARAM" in Linker CMD File
Int32 scratch_buf[N = FFT Length];
Int32 *scratch = scratch_buf;
Int32 *scratch:
The *scratch parameter is a complex scratchpad vector to HWAFFT. It contains the output vector if Return
Value = 1 = OUT_SEL_SCRATCH.
Uint16 fft_flag
The FFT/IFFT selection is controlled by setting the fft_flag to 0 for FFT and 1 for Inverse FFT.
#define FFT_FLAG ( 0 ) /* HWAFFT to perform FFT */
#define IFFT_FLAG ( 1 ) /* HWAFFT to perform IFFT */
Uint16 fft_flag:
12 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com HWAFFT Software Interface
Uint16 scale_flag
The automatic scaling (divide each butterfly output by 2) feature is controlled by setting the scale_flag to 0
to enable scaling and 1 to disable scaling.
#define SCALE_FLAG ( 0 ) /* HWAFFT to scale butterfly output */
#define NOSCALE_FLAG ( 1 ) /* HWAFFT not to scale butterfly output */
Uint16 scale_flag:
scale_flag = SCALE_FLAG: Divide by 2 scaling is performed at the output of each FFT Butterfly.
scale_flag = NOSCALE_FLAG: No scaling is performed, overflow may occur if the input dynamic is
too high.
Int32 data_br[8]
Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6] Data[7]
Index = 000 100 010 110 001 101 011 111
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 13
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
HWAFFT Software Interface www.ti.com
Int32 *data_br
This is the destination vector of the bit-reverse function. It contains complex data elements (real part in
most significant 16 bits, imaginary part in least significant 16 bits). A strict alignment requirement is placed
on this destination vector of the bit-reverse function: This buffer must be aligned in RAM such that log2(4 *
N) zeros appear in the least significant bits of the byte address (8 bits), where N is the FFT Length. See
Section 9, Appendix A Methods for Aligning the Bit-Reverse Destination Vector, for ways to force the
linker to enforce this alignment requirement.
#define ALIGNMENT 2*N // ALIGNS data_br_buf to an address with log2(4*N) zeros in the
// least significant bits of the byte address
#pragma DATA_SECTION(data_br _buf, "data_br_buf");
// Allocation to Section: "data_br _buf : > DARAM" in Linker CMD File
#pragma DATA_ALIGN (data_br_buf, ALIGNMENT);
Int32 data_br_buf[N = FFT Length];
Int32 * data_br = data_br_buf;
Int32 *data_br:
Strict address alignment requirement: This buffer must be aligned in RAM such that (log2(4 * N) zeros
appear in the least significant bits of the byte address (8 bits), where N is the FFT Length. See Section 9
for ways to force the linker to enforce this alignment requirement.
14 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com HWAFFT Software Interface
Uint16 *data_len
This Uint16 parameter indicates the length of the data and data_br vectors.
Uint16 data_len:
The data_len parameter indicates the length of the Int32 vector (FFT Length). Valid lengths include
powers of two: {8, 16, 32, 64, 128, 256, 512, 1024}.
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 15
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
HWAFFT Software Interface www.ti.com
NOTE: To execute the HWAFFT routines from the ROM of the DSP, the programmer must satisfy
memory allocation restrictions for the data and scratch buffers. See the device-specific errata
for an explanation of the restrictions and workarounds:
• TMS320VC5505/VC5504 Fixed-Point DSP Silicon Errata (Silicon Revision 1.4)
[literature number SPRZ281]
• TMS320C5505/C5504 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ310]
• TMS320C5515/C5514 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ308]
The HWAFFT functions occupy approximately 4 KBytes of memory, so to conserve RAM they have been
placed in the DSP’s 128 KBytes of on-chip ROM. These functions are identical to and have the same
names as the functions stored in hwafft.asm, but they do not consume any RAM. In order to utilize these
HWAFFT routines in ROM, add the following lines to the bottom of the project’s linker CMD file and
remove the hwafft.asm file from the project (or exclude it from the build). When the project is rebuilt, the
HWAFFT functions will reference the ROM locations. The HWAFFT ROM locations are different between
VC5505 (PG1.4) and C5505/C5515 (PG2.0). ROM locations for both device families are shown in Table 2.
/*** Add the following code to the linker command file to call HWAFFT Routines from ROM ***/
16 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Simple Example to Illustrate the Use of the FFT Accelerator
NOTE: To execute the HWAFFT routines from the ROM of the DSP, the programmer must satisfy
memory allocation restrictions for the data and scratch buffers. See the device-specific errata
for an explanation of the restrictions and workarounds:
• TMS320VC5505/VC5504 Fixed-Point DSP Silicon Errata (Silicon Revision 1.4)
[literature number SPRZ281]
• TMS320C5505/C5504 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ310]
• TMS320C5515/C5514 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ308]
The source code below demonstrates typical use of the HWAFFT for the 1024-point FFT and IFFT cases.
The HWAFFT Functions make use of Boolean flag variables to select between FFT and IFFT, Scale and
No Scale mode, and Data and Scratch output locations.
#define FFT_FLAG ( 0 ) /* HWAFFT to perform FFT */
#define IFFT_FLAG ( 1 ) /* HWAFFT to perform IFFT */
#define SCALE_FLAG ( 0 ) /* HWAFFT to scale butterfly output */
#define NOSCALE_FLAG ( 1 ) /* HWAFFT not to scale butterfly output */
#define OUT_SEL_DATA ( 0 ) /* Indicates HWAFFT output located in input data vector */
#define OUT_SEL_SCRATCH ( 1 ) /* Indicates HWAFFT output located in scratch vector */
Int32 *data;
Int32 *data_br;
Uint16 fft_flag;
Uint16 scale_flag;
Int32 *scratch;
Uint16 out_sel;
Int32 *result;
if (out_sel == OUT_SEL_DATA) {
result = data;
}else {
result = scratch;
}
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 17
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
Simple Example to Illustrate the Use of the FFT Accelerator www.ti.com
if (out_sel == OUT_SEL_DATA) {
result = data;
} else {
result = scratch;
}
18 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Simple Example to Illustrate the Use of the FFT Accelerator
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 19
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
FFT Benchmarks www.ti.com
6 FFT Benchmarks
Table 3 compares the FFT performance of the HWAFFT versus FFT computation using the CPU under
the following conditions:
• Core voltage = 1.05 V
• PLL = 60 MHz
• Power measurement condition:
– At room temperature only
– All peripherals are clock gated
– Measured at VDDC
In summary, Table 3 shows that for the test conditions used, HWAFFT is 4 to 6 times more energy
efficient and 2.2 to 3.8 times faster than the CPU. Table 4 compares FFT performance of the accelerator
versus FFT computation using the CPU under the following conditions:
• Core voltage = 1.3 V
• PLL = 100 MHz
• Power measurement condition:
– At room temperature only
– All peripherals are clock gated
– Measured at VDDC
Table 4. FFT Performance on HWAFFT vs CPU (Vcore = 1.3 V, PLL = 100 MHz)
FFT with HWA CPU (Scale) HWA versus. CPU
Complex FFT FFT + BR (1) Cycles Energy/FFT FFT + BR (1) Cycles Energy/FFT x Times Faster x Times Energy
(nJ/FFT) (nJ/FFT) (Scale) Efficient (Scale)
8 pt 92 + 38 = 130 36.3 196 + 95 = 291 145.9 2.2 4
16 pt 115 + 55 = 170 49.3 344 + 117 = 461 241 2.7 4.9
32 pt 234 + 87 = 321 106.9 609 + 139 = 748 414 2.3 3.9
64 pt 285 + 151 = 436 151.3 1194 + 211 = 1405 815.7 3.2 5.4
128 pt 633 + 279 = 912 336.8 2499 + 299 = 2798 1672.9 3.1 5
256 pt 1133 + 535 = 1668 625.6 5404 + 543 = 5947 3612.9 3.6 5.8
512 pt 2693 + 1047 = 3740 1442.8 11829 + 907 = 12736 7823.8 3.4 5.4
1024 pt 5244 + 2071 = 7315 2820.6 25934 + 1783 = 27717 17032.4 3.8 6
(1)
BR = Bit Reverse
In summary, Table 4 shows that for the test conditions used, HWAFFT is 4 to 6 times more energy
efficient and 2.2 to 3.8 times faster than the CPU.
20 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Description of Open Source FFT Example Software
Filter
Coeffs
FFT
InBuf 1 OutBuf 1
Overlap
CPLX
FFT IFFT &
From Codec or MUL Add To Codec
Waveforms in InBuf 2 OutBuf 2
Memory
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 21
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
Computation of Large (Greater Than 1024-Point) FFTs www.ti.com
22 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Computation of Large (Greater Than 1024-Point) FFTs
#pragma DATA_ALIGN (data_br_buf, 4096);
// Align 2048-pt bit-reverse dest vector to byte addr w/ 13 least sig zeros
Int32 data_br_buf[TEST_DATA_LEN];
// Function Prototypes:
Int32 CPLX_Mul(Int32 op1, Int32 op2);
// Yr = op1_r*op2*r - op1_i*op2_i, Yi = op1_r*op2_i + op1_i*op2_r
Int32 CPLX_Add(Int32 op1, Int32 op2, Uint16 scale_flag);
// Yr = 1/2 * (op1_r + op2_r), Yi = 1/2 *(op1_i + op2_i)
Int32 CPLX_Subtract(Int32 op1, Int32 op2, Uint16 scale_flag);
// Yr = 1/2 * (op1_r - op2_r), Yi = 1/2 *(op1_i - op2_i)
// Declare Variables
Int32 *data_br;
Int32 *data;
Int32 *data_even, *data_odd;
Int32 *scratch_even, *scratch_odd;
Int32 *twiddle;
Int32 twiddle_times_data_odd;
Uint16 fft_flag;
Uint16 scale_flag;
Uint16 out_sel;
Uint16 k;
// HWAFFT flags:
fft_flag = FFT_FLAG; // HWAFFT to perform FFT (not IFFT)
scale_flag = SCALE_FLAG; // HWAFFT to scale by 2 after each butterfly stage
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 23
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
Computation of Large (Greater Than 1024-Point) FFTs www.ti.com
// Combine the even and odd FFT results with a final Radix-2 Butterfly stage on the CPU
for(k=0; k<DATA_LEN_2048/2; k++) // Computes 2048-point FFT
{
// X(k) = 1/2*(X_even[k] + Twiddle[k]*X_odd(k))
// X(k+N/2) = 1/2*(X_even[k] - Twiddle[k]*X_odd(k))
// Twiddle[k]*X_odd(k):
// X(k):
data[k] = CPLX_Add(data_even[k], twiddle_times_data_odd, SCALE_FLAG); // Add then scale by 2
// X(k+N/2):
data[k+DATA_LEN_2048/2] = CPLX_Subtract(data_even[k], twiddle_times_data_odd, SCALE_FLAG);
//Sub then scale
}
24 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com Appendix A Methods for Aligning the Bit-Reverse Destination Vector
NOTE: To execute the HWAFFT routines from the ROM of the DSP, the programmer must satisfy
memory allocation restrictions for the data and scratch buffers. See the device-specific errata
for an explanation of the restrictions and workarounds:
• TMS320VC5505/VC5504 Fixed-Point DSP Silicon Errata (Silicon Revision 1.4)
[literature number SPRZ281]
• TMS320C5505/C5504 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ310]
• TMS320C5515/C5514 Fixed-Point DSP Silicon Errata (Silicon Revision 2.0)
[literature number SPRZ308]
Place the buffer at the beginning of a DARAM or SARAM block with log2(4 * N) zeros in the least
significant bits of its byte address. For example, memory section DARAM2_3 below starts at address
0x0004000, which contains 14 zeros in the least significant bits of its binary address (0x0004000 =
0b0100 0000 0000 0000). Therefore, this address is a suitable bit-reverse destination vector for FFT
Lengths up to 4096-points because log2(4 * 4096) = 14.
In the Linker CMD File...
MEMORY
{
MMR (RWIX): origin = 0000000h, length = 0000c0h /* MMRs */
DARAM0 (RWIX): origin = 00000c0h, length = 001f40h /* on-chip DARAM 0, 4000 words */
DARAM1 (RWIX): origin = 0002000h, length = 002000h /* on-chip DARAM 1, 4096 words */
DARAM2_3 (RWIX): origin = 0004000h, length = 004000h /* on-chip DARAM 2_3, 8192 words */
DARAM4 (RWIX): origin = 0008000h, length = 002000h /* on-chip DARAM 4, 4096 words */
... (leaving out rest of memory sections)
}
SECTIONS
{
data_br_buf : > DARAM2_3 /* ADDR = 0x004000, Aligned to addr with 14 least-sig zeros */
}
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 25
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
Appendix A Methods for Aligning the Bit-Reverse Destination Vector www.ti.com
9.2 Use the ALIGN Descriptor to Force log2(4 * N) Zeros in the Least Significant Bits
The ALIGN descriptor forces the alignment of a specific memory section, while providing the linker with
added flexibility to allocate sections across the entire DARAM or SARAM because no blocks are statically
allocated. It aligns the memory section to an address with log2(ALIGN Value) zeros in the least significant
bits of the binary address.
For example, the following code aligns data_br_buf to an address with 12 zeros in the least significant
bits, suitable for a 1024-point bit-reverse destination vector.
In the Linker CMD File...
MEMORY
{
MMR (RWIX): origin = 0000000h, length = 0000c0h /* MMRs */
DARAM (RWIX): origin = 00000c0h, length = 00ff40h /* on-chip DARAM 32 Kwords */
SARAM (RWIX): origin = 0010000h, length = 040000h /* on-chip SARAM 128 Kwords */
}
SECTIONS
{
data_br_buf : > DARAM ALIGN = 4096
/* 2^12 = 4096 , Aligned to addr with 12 least-sig zeros */
}
26 FFT Implementation on the TMS320VC5505, TMS320C5505, and SPRABB6B– June 2010 – Revised January 2013
TMS320C5515 DSPs Submit Documentation Feedback
Copyright © 2010–2013, Texas Instruments Incorporated
www.ti.com
This revision history highlights the changes made to this document to make it a SPRABB6B revision.
SPRABB6B– June 2010 – Revised January 2013 FFT Implementation on the TMS320VC5505, TMS320C5505, and 27
Submit Documentation Feedback TMS320C5515 DSPs
Copyright © 2010–2013, Texas Instruments Incorporated
IMPORTANT NOTICE AND DISCLAIMER
TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE
DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”
AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD
PARTY INTELLECTUAL PROPERTY RIGHTS.
These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate
TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable
standards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants you
permission to use these resources only for development of an application that uses the TI products described in the resource. Other
reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third
party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims,
damages, costs, losses, and liabilities arising out of your use of these resources.
TI’s products are provided subject to TI’s Terms of Sale (www.ti.com/legal/termsofsale.html) or other applicable terms available either on
ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’s applicable
warranties or warranty disclaimers for TI products.
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2019, Texas Instruments Incorporated