ES Assignment 3
ES Assignment 3
Assignment no.3
TEJASV GUPTA
2K17/EC/175
Answer 1
1. Circular addressing is used to create a circular buffer
2. Buffer is created in hardware and is very useful for applications like digital filtering
3. This addressing mode in conjunction with circular buffer updates samples by shifting
data without creating overhead as in direct shifting
4. When a pointer reaches the bottom location, and when incremented, the pointer is
automatically wrapped around to the top location.
5. Two independent buffers are available using BK0 and BK1 within the AMR register
6. Registers A4-A7 and B4-B7 in conjunction with .D unit can be used as pointers
7. MVC (move constant) is the only instruction to access AMR and other control registers
8. Circular Buffer At the beginning of each sample period, a new sample will be read into
the circular buffer,overwriting the oldest sample. The newest sample x(n) will be stored
at the memory location pointed at by auxiliary register AR(i).
9. The need of processing the digital signals in real time, evolves the concept of Circular
Buffering. Circular buffers are used to store the most recent values of a continually
updated signal. Circular buffering allows processors to access a block of data
sequentially and then automatically wrap around to the beginning address exactly the
pattern used to access coefficients in FIR filter.
10. Circular buffering is also very helpful in implementing first-in, first-out buffers, commonly
used for I/O and for FIR delay lines.
11. Most DSP Implement Circular addressing in hardware in order to conserve memory and
minimizing software overhead.
Answer 2
It implements a wide range of arithmetic and logical functions, most of which execute in a single
clock cycle.
After an operation is performed in the ALU, the result is usually transferred to a destination
accumulator (accumulator A or B). The ALU can also function as two separate 16-bit ALUs and
perform two 16-bit operations simultaneously.
ALU input takes several forms from several sources. The X input source to the ALU is either
of two values: The shifter output (a 32-bit or 16-bit data-memory operand or a shifted
accumulator value), A data-memory operand from data bus DB. The Y input source to the ALU
is any of three values: The value in one of the accumulators (A or B), A data-memory operand
from data bus CB or The value in the T register.
When a 16-bit data-memory operand is fed through data bus CB or DB, the 40-bit ALU input is
constructed in one of two ways:
1. If bits 15 through 0 contain the data-memory operand, bits 39 through 16 are zero filled
(SXM=0) or sign-extended (SXM=1).
2. If bits 31 through 16 contain the data memory operand, bits 15 through 0 are zero filled,
and bits 39 through 32 are either zero filled (SXM=0) or sign extended (SXM = 1)
Answer 3
Separate program and data spaces allow simultaneous access to program instructions and
data, providing the high degree of parallelism. Two reads and one write operation can be
performed in a single cycle. Instructions with parallel store and application-specific instructions
can fully utilize this architecture. In addition, data can be transferred between data and program
spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation
operations that can all be performed in a single machine cycle. Also included are the control
mechanisms to manage interrupts, repeated operations, and function calls.
- A barrel shifter
- A 17 × 17-bit multiplier/adder
Bus Structure
The ’54x device architecture is built around eight major 16-bit buses:
- One program-read bus (PB) which carries the instruction code and immediate operands from
program memory
- Two data-read buses (CB, DB) and one data-write bus (EB), which interconnect to various
elements, such as the CPU, data-address generation logic (DAGEN), program-address
generation logic (PAGEN), on-chip peripherals, and data memory
-The CB and DB carry the operands read from data memory. The EB carries the data to be
written to memory.
- Four address buses (PAB, CAB, DAB, and EAB), which carry the addresses needed for
instruction execution
Memory
The minimum memory address range for the ’54x devices is 192K words — composed of 64K
words in program space, 64K words in data space, and 64K words in I/O space. Selected
devices also provide extended program memory space of up to 8M words. The program
memory space contains the instructions to be executed as well as tables used in execution. The
data memory space stores data used by the instructions. The I/O memory space interfaces to
external memory-mapped peripherals and can also serve as extra data storage space.
The ’54x DSPs provide both on-chip RAM and ROM to improve system performance and
integration.
On-Chip Peripherals
All the ’54x devices have the same CPU structure; however, they have different on-chip
peripherals connected to their CPUs. The on-chip peripheral options provided are:
Answer 4:
1. Circular Buffer Size Register -
A circular buffer, circular queue, cyclic buffer or ring buffer is a data structure that uses a single,
fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering
data streams.
Circular buffers are useful in DSP programming because most implementations include a loop
of some sort. In the filter example, all the coefficients are processed, and then the coefficient
pointer is reset when the loop is finished. Using circular buffering, the coefficient pointer will
automatically wrap around to the beginning when the end of the loop is encountered. Therefore,
the time that it takes to update the pointers is saved. Setting up circular buffers usually involves
writing to some registers to tell the DSP the buffer start address, buffer size, and a bit to tell the
DSP to use circular buffers.
2. Block-repeat Register -
Several registers are used for block repeats—instructions that are executed several times in a
row. The block repeat counter BRC0 counts block repeat iterations. The block repeat start and
end registers RSA0L and REA0L keep track of the start and end points of the block.
The block repeat register 1 BRC1 and block repeat save register 1 BRS1 are used to repeat
blocks of instructions. There are two repeat start address registers RSA0 and RSA1. Each is
divided into low and high parts: RSA0L and RSA0H, for example.
3. Interrupt Register -
When an interrupt occurs, firmware reads the interrupt register and then has to look, bit by bit,
for which interrupt occurred. Once it finds a bit, it services that interrupt. Once that interrupt is
serviced, it will continue to scan the rest of the bits for more pending interrupts.
Several registers control interrupts. The interrupt mask registers 0 and 1, named IER0 and
IER1, determine what interrupts will be recognized. The interrupt flag registers 0 and 1, named
IFR0 and IFR1, keep track of currently pending interrupts. Two other registers, DBIER0 and
DBIER1, are used for debugging. Two registers, the interrupt vector register DSP (IVPD) and
interrupt vector register host (IVPH), are used as the base address for the interrupt vector table.
The architecture provides six status registers. Three of the status registers, ST0 and ST1 and
the processor mode status register PMST, are inherited from the C54x architecture. The C55x
adds four registers ST0_55, ST1_55, ST2_55, and ST3_55. These registers provide arithmetic
and bit manipulation flags, a data page pointer and auxiliary register pointer, and processor
mode bits, among other features.
PMST is a memory mapped register that contains status and control bits. The Processor Status
Register (abbreviated as P) is a hardware register which records the condition of the CPU as a
result of arithmetic, logical or command operations. The purpose of the Processor Status
Register is to hold information about the most recently performed ALU operation, control the
enabling and disabling of interrupts and set the CPU operating mode.
A.5
1. Central Processing Unit (CPU)
● The main components of the CPU of TMS320C55x are:
○ Internal Data and Address Buses
○ Memory Interface Unit
○ Instruction Buffer Unit (I Unit)
○ Program Flow Unit (P Unit)
○ Address-Data Flow Unit (A Unit)
○ Data Compression Unit (D Unit)
● The main registers present in the CPU of the TMS320C55x are:
○ Accumulators - AC0 to AC3
○ Auxiliary Registers - AR0 to AR7
○ Circular Buffer Size Registers - BK03, BK47, BKC
○ Block Repeat Counters - BRC0, BRC1
○ CFCT - Control Flow Context Register
○ IER0, IER1 - Interrupt Enable Registers
○ IFR0, IFR1 - Interrupt Flag Registers
○ PC - Program Counter
○ ST0 to ST3 - Status Registers
● The functions of internal data and address buses are as follows:
○ Data-Read Data Buses (BB, CB, DB): These three buses carry 16-bit
data from data space or I/O space to functional units of the CPU. BB
only carries data from internal memory to the D unit (primarily to the
dual multiply-and-accumulate (MAC) unit).
○ Data-Read Address Buses (BAB, CAB, DAB): These three buses
carry 23-bit word data addresses to the memory interface unit, which
then fetches the data from memory and transfers the requested values
to the data-read data buses.
○ Program-Read Data Bus (PB): PB carries 32 bits (4 bytes) of program
code at a time to the I unit, where instructions are decoded.
○ Program-Read Address Bus (PAB): PAB carries the 24-bit byte
program address of the program code that is carried to the CPU by
PB.
○ Data-Write Data Buses (EB, FB): These two buses carry 16-bit data
from functional units of the CPU to data space or I/O space. EB and
FB receive data from the P unit, the A unit, and the D unit.
○ Data-Write Address Buses (EAB, FAB): These two buses carry 23-bit
addresses to the memory interface unit, which then receives the
values-driven on the data-write data buses.
● The memory interface unit mediates all data transfers between the CPU and
program/data space or I/O space.
Ans 6
● The C6x processors are closer to traditional very long instruction word
(VLIW) processors because they seek to exploit the high levels of
instruction-level parallelism (ILP) in many signal processing algorithms.
● For the embedded space, code compatibility is less of a problem, and so new
applications can be either hand tuned or recompiled for the newest
generation of processor. The other reason superscalar excels on the desktop
is because the compiler cannot predict memory latencies at compile time. In
embedded, however, memory latencies are often much more predictable. In
fact, hard real-time constraints force memory latencies to be statically
predictable. Of course, a superscalar would also perform well in this
environment with these constraints, but the extra hardware to dynamically
schedule instructions is both wasteful in terms of precious chip area and in
terms of power consumption. Thus VLIW is a natural choice for high-
performance embedded.
● The C6x family employs different pipeline depths depending on the family
member. For the C64x, for example, the pipeline has 11 stages. The first
four stages of the pipeline perform instruction fetch, followed by two stages
for instruction decode, and finally four stages for instruction execution.
● The C6x family’s execution stage is divided into two parts, the left or “1”
side and the right or “2” side. The L1 and L2 units perform logical and
arithmetic operations. D units in contrast perform a subset of logical and
arithmetic operations but also perform memory accesses (loads and stores).
The two M units perform multiplication and related operations (e.g., shifts).
Finally the S units perform comparisons, branches, and some SIMD
operations. Each side has its own 32- entry, 32-bit register file (the A file for
the 1 side, the B file for the 2 side). A side may access the other side’s
registers, but with a 1- cycle penalty. Thus, an instruction executing on side
1 may access B5, for example, but it will take 1- cycle extra to execute
because of this.
● Therefore, the partitioned registers in TI C6x help to accomplish several
features that make this family much faster and efficient than other dsp
processors, including pipelining etc.