Circular Buffering On TMS320C6000: Dipa Rao DSP West Applications
Circular Buffering On TMS320C6000: Dipa Rao DSP West Applications
ABSTRACT
This application report explains how circular buffering is implemented on the TMS320C6000
devices. Circular buffering helps to implement finite impulse response (FIR) filters efficiently.
Filters require delay lines or buffers of past (and current) samples. Circular addressing
simplifies the manipulation of pointers in accessing the data samples.
Contents
1 Circular Buffer Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Circular Buffer Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Block FIR Circular Buffer Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Appendix A Block FIR With Circular Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
A.1 C Code: Main.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
A.2 Hand Coded Assembly File: FIRCIRC.ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
List of Figures
Figure 1. Delay Line Implemented With Shifting of Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Figure 2. Delay Line With Pointer Manipulation Using Circular Addressing . . . . . . . . . . . . . . . . . . . . . . . 2
Figure 3. Address Mode Register (AMR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 4. Circular Buffer Pointer Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Figure 5. Delay Line State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
List of Tables
Table 1. AMR Mode Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
TMS320C6000 is a trademark of Texas Instruments.
1
SPRA645A
ȍ xn*i * ai
N*1
yn + (1)
i+0
where ai is a filter coefficient , xk is a data sample, k is the time index and N is the number of
taps. As seen from equation (1), in order to calculate an output yn, we need to maintain a buffer
of previous values (also called a delay line) along with the current sample. Typically, a pointer is
set up at the beginning of the sample array (oldest sample) and then manipulated to access the
consecutive values. Whenever a new sample needs to be added to the delay line either all the
values need to be shifted down (Figure 1) or the oldest value need to be overwritten (Figure 2).
The second technique can be implemented by using circular mode for pointer access.
New input x[n–0]
sample x[n]
x[n–1]
x[n–2]
x[n–3]
x[n–4]
x[n–2]
x[n–1]
x[n–0]
Becomes x[n–1]
7 6 5 4 3 2 1 0
A7 mode A6 mode A5 mode A4 mode
The block size fields (BK0 and BK1) in AMR can be used in conjunction with any of the registers
to specify the length of the buffer as shown in Table 1. A value N in the register BK0/BK1
corresponds to a block size of 2(N+1) bytes. With the 5 bit field in BK0 and BK1 it is possible to
specify 32 different block lengths for circular addressing from 2 bytes to 4G bytes (which
incidentally is also the maximum address reach using a 32 bit pointer register). This makes it
convenient to implement filters with very large number of taps. The circular buffer is always
aligned on the block size byte boundary. This is necessary for the hardware to correctly locate
the start and end of the circular buffer. For instance, if the register A4 contains an address
0x80000005 and the block size is 64 bytes then the start address of the defined circular buffer is
strapped to 0x80000000 and the end address is 0x8000003F.
11 Reserved
To understand the circular addressing, let us consider a case where we require a pointer A4 to
be set up in circular mode to point to a buffer of length 16 bytes. Firstly, the pointer A4 needs to
be setup in circular mode by initializing bit field [1:0] of register AMR with ’01’ (if using BK0) or
’10’ (if using BK1). The field BK0 set to 00011 resulting in a circular buffer size of 23+1 or 16
bytes. So finally, AMR register will have bit field [1:0] set to 01 and field [25:21] set to 00011. Let
us assume that pointer A4 contains the address 0x8000000E. Now if the pointer A4 is post
incremented by using an instruction such as,
LDH ; *A4++, A8
This instruction loads the contents of 0x8000000E into A8 and then increments the pointer A4 by
2 bytes, then the pointer will end up at location 0x80000000 due to circular addressing. Circular
addressing hardware automatically defines address 0x80000000 as the top of the buffer and
0x8000000F as the end of the 16 byte long buffer as shown in Figure 4. In fact, if the pointer A4
were to point to any shaded location in the buffer shown in Figure 4, the start and end addresses
of the circular buffer would be the same for the given definition of circular buffer.
0x80000000
0x80000001
0x80000002
....
....
....
....
....
A4
ÇÇÇÇÇÇ
0x8000000e
0x8000000f
ÇÇÇÇÇÇ
Figure 4. Circular Buffer Pointer Modification
....
y 5 + a 0x 5 ) a 1x 4 ) a 2x 3 ) a 3x 2
We require 9 data samples x[–3]– x[5] from the buffer as seen in equation (2). Thus, to calculate
each output we require the current sample as well (N–1) previous samples where N is the
number of taps in the filter. For the first output y0, the current sample is x[0] while the old
samples are x[–3],x[–2] and x[–1]. The calling routine uses index to indicate the first location that
needs to be accessed for the current iteration through the filtering loop. Every time the function
is called the calling algorithm passes not only the pointers to the data samples but also the index
into the buffer as well. Hence, the first time the algorithm is called the index into the data sample
buffer is 0 or x[–3]. After the outputs are computed the routine returns to the main calling
program. Before the filtering routine is invoked next, a new set (block size=6) of samples is
added to the buffer. Hence, the second block of data x[6]–x[11] is added to the delay line and
then the function is called a second time. To calculate the outputs y[6]–y[11] we require the data
samples x[3]–x[11] which would imply that the index into the delay line is 6 as illustrated in
Figure 5. The index can be easily calculated as
In our example during the second call to the algorithm, the previous index was 0, block size is 6
and buffer size is 16 hence the index is 6 (pointing to location x[3]).
– – x[12] x[12]
Now let us consider how the pointer needs to be manipulated within the filtering algorithm. The
third block of 6 samples x[12]–x[17] needs to go into the delay line right after sample x[11].
However, since the buffer is only 16 half words long, some of the new samples will be added at
the top of the buffer overwriting the old values like x[–3], x[–2], etc. Within the filtering algorithm
the pointer used to access the data needs to be manipulated so that new samples are accessed
correctly. In our example, during the third call to the filtering algorithm, as shown in Figure 5, the
pointer needs to be wrapped around to the top of the buffer to access value x[13]. Setting up the
pointer in circular mode ensures that the pointer will fold back to the beginning of the buffer
automatically. By using the circular buffering technique, the filtering algorithm can be called
multiple times without using up data memory space.
The hand assembly code to implement this technique is presented in Appendix A. In the code,
the number of taps is set to be 16, the block size for filtering is 20 and the buffer size is set to
128 bytes or 64 half words (16bits). The instructions that setup the circular pointer mode are
highlighted. The code presented uses the technique of software pipelining to achieve maximum
efficiency and performance. As a consequence there are certain restrictions on the size of the
block as well as number of taps. In order to make the algorithm efficient, the FIR filter algorithm
has been unrolled such that four multiply-accumulates (as shown in equation (2) ) happen in the
same iteration, hence the number of taps must be a multiple of 4. In addition, two output values
are computed in the same loop kernel as a result the block size must be a multiple of 2. This is
explained in the header of the function. Two pointers A7 and B4 access the data samples in the
fir routine so both the pointers are set up in circular mode with the appropriate block size
indicated in BK0 bit field. The sample code also includes a main function that calls the filtering
algorithm multiple times, each time with a newly calculated index value as well as newly loaded
data samples.
4 Conclusion
This application note discussed the basic benefits of using circular addressing and illustrated it
with an example of block FIR filtering. The C6000 code generation tools will offer support for
circular buffering from the C environment in a future release.
readdata(inp_samp,in_array,35,55);
fir_circ_asm(&out_array[NUM_SAMP],in_array,NUM_TAPS,coeff_array,scale_factor,NUM_SA
MP,BUF_LEN,index);
index= (index+NUM_SAMP)%(BUF_LEN/2);
readdata(inp_samp,in_array,55,75);
fir_circ_asm(&out_array[2*NUM_SAMP],in_array,NUM_TAPS,coeff_array,scale_factor,NUM_
SAMP,BUF_LEN,index);
index= (index+NUM_SAMP)%(BUF_LEN/2);
readdata(inp_samp,in_array,75,95);
fir_circ_asm(&out_array[3*NUM_SAMP],in_array,NUM_TAPS,coeff_array,scale_factor,NUM_
SAMP,BUF_LEN,index);
}
void readdata(short init_values[],short array[],int n,int m)
{
int i,temp;
for(i=n;i<m;i++)
{
temp=i%(BUF_LEN/2);
array[temp]=init_values[i];
}
}
* {
* int i, j;
* Long40 y0;
* Long40 round = (Long40) 1 << (s – 1);
* for (j = 0; j < nr; j++) {
* y0 = round;
* for (i = 0; i < nh; i++)
* y0 += x[(i + j + index) % (size/sizeof(short))]
* * h[i];
* r[j] = y0 >> s;
* }
* }
*
*
* ASSUMPTIONS:
* x fills Block Size, aligned on a Block Size boudary
* e.g. size=128, x[64] aligned on 128 byte boundary
* nh MULTIPLE of 4 >= 4
* nx EVEN >= 2
* index <= (size/2) – 2
* size is a power of 2
*
* MEMORY NOTE:
* This code has no memory hits regardless of where x and h
* are located in memory.
*
*
* TECHNIQUES
* The inner loop is unrolled four times thus the number of
* filter coefficients must be a multiple of four. The outer
* loop is unrolled twice so the number of output samples must
* be a multiple of 2.
*
* If an odd number of output samples is needed or possible,
* the final store can either be removed or conditionally
* executed depending on whether nx is even or odd. This code
* would have to be added to the existing code.
*
* The outer loop, like the inner loop, is software pipelined
* as well. e, o, and p in the comments of the individual
B_END:
*** END Benchmark Timing ***
END: LDW .D2 *++B15, A15 ; pop A15 off the stack
|| MV .L1X B15, A1 ; copy stack pointer
*======================================================================
* end of fir_circ assembly code
*======================================================================
* Copyright (C) 1997–1999 Texas Instruments Incorporated. *
* All Rights Reserved *
*======================================================================
Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue
any product or service without notice, and advise customers to obtain the latest version of relevant information
to verify, before placing orders, that information being relied on is current and complete. All products are sold
subject to the terms and conditions of sale supplied at the time of order acknowledgment, including those
pertaining to warranty, patent infringement, and limitation of liability.
TI warrants performance of its products to the specifications applicable at the time of sale in accordance with
TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary
to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except
those mandated by government requirements.
In order to minimize risks associated with the customer’s applications, adequate design and operating
safeguards must be provided by the customer to minimize inherent or procedural hazards.
TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent
that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other
intellectual property right of TI covering or relating to any combination, machine, or process in which such
products or services might be or are used. TI’s publication of information regarding any third party’s products
or services does not constitute TI’s approval, license, warranty or endorsement thereof.
Reproduction of information in TI data books or data sheets is permissible only if reproduction is without
alteration and is accompanied by all associated warranties, conditions, limitations and notices. Representation
or reproduction of this information with alteration voids all warranties provided for an associated TI product or
service, is an unfair and deceptive business practice, and TI is not responsible nor liable for any such use.
Resale of TI’s products or services with statements different from or beyond the parameters stated by TI for
that product or service voids all express and any implied warranties for the associated TI product or service,
is an unfair and deceptive business practice, and TI is not responsible nor liable for any such use.
Also see: Standard Terms and Conditions of Sale for Semiconductor Products. www.ti.com/sc/docs/stdterms.htm
Mailing Address:
Texas Instruments
Post Office Box 655303
Dallas, Texas 75265