0% found this document useful (0 votes)
94 views

ES Assignment 3

1. Circular buffers are useful in DSP programming as they allow coefficient pointers to automatically wrap around when processing loops are finished, saving time compared to updating pointers. 2. Several registers control circular buffers and block repeats of instructions, including the circular buffer size register, block repeat register, and block repeat start/end registers. 3. Interrupt handling involves registers like the interrupt mask and flag registers to determine which interrupts are pending and recognize interrupts, while the interrupt vector registers provide the base address for the interrupt vector table.

Uploaded by

satinder singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

ES Assignment 3

1. Circular buffers are useful in DSP programming as they allow coefficient pointers to automatically wrap around when processing loops are finished, saving time compared to updating pointers. 2. Several registers control circular buffers and block repeats of instructions, including the circular buffer size register, block repeat register, and block repeat start/end registers. 3. Interrupt handling involves registers like the interrupt mask and flag registers to determine which interrupts are pending and recognize interrupts, while the interrupt vector registers provide the base address for the interrupt vector table.

Uploaded by

satinder singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Embedded systems

Assignment no.3
TEJASV GUPTA
2K17/EC/175

Answer 1
1. Circular addressing is used to create a circular buffer
2. Buffer is created in hardware and is very useful for applications like digital filtering
3. This addressing mode in conjunction with circular buffer updates samples by shifting
data without creating overhead as in direct shifting
4. When a pointer reaches the bottom location, and when incremented, the pointer is
automatically wrapped around to the top location.
5. Two independent buffers are available using BK0 and BK1 within the AMR register
6. Registers A4-A7 and B4-B7 in conjunction with .D unit can be used as pointers
7. MVC (move constant) is the only instruction to access AMR and other control registers
8. Circular Buffer At the beginning of each sample period, a new sample will be read into
the circular buffer,overwriting the oldest sample. The newest sample x(n) will be stored
at the memory location pointed at by auxiliary register AR(i).
9. The need of processing the digital signals in real time, evolves the concept of Circular
Buffering. Circular buffers are used to store the most recent values of a continually
updated signal. Circular buffering allows processors to access a block of data
sequentially and then automatically wrap around to the beginning address exactly the
pattern used to access coefficients in FIR filter.
10. Circular buffering is also very helpful in implementing first-in, first-out buffers, commonly
used for I/O and for FIR delay lines.
11. Most DSP Implement Circular addressing in hardware in order to conserve memory and
minimizing software overhead.

Answer 2
It implements a wide range of arithmetic and logical functions, most of which execute in a single
clock cycle.
After an operation is performed in the ALU, the result is usually transferred to a destination
accumulator (accumulator A or B). The ALU can also function as two separate 16-bit ALUs and
perform two 16-bit operations simultaneously.
ALU input takes several forms from several sources. The X input source to the ALU is either
of two values: The shifter output (a 32-bit or 16-bit data-memory operand or a shifted
accumulator value), A data-memory operand from data bus DB. The Y input source to the ALU
is any of three values: The value in one of the accumulators (A or B), A data-memory operand
from data bus CB or The value in the T register.
When a 16-bit data-memory operand is fed through data bus CB or DB, the 40-bit ALU input is
constructed in one of two ways:

1. If bits 15 through 0 contain the data-memory operand, bits 39 through 16 are zero filled
(SXM=0) or sign-extended (SXM=1).

2. If bits 31 through 16 contain the data memory operand, bits 15 through 0 are zero filled,
and bits 39 through 32 are either zero filled (SXM=0) or sign extended (SXM = 1)

Answer 3

It uses an advanced, modified Harvard architecture that maximizes processing power by


maintaining one program memory bus and three data memory buses. These processors also
provide an arithmetic logic unit (ALU) that has a high degree of parallelism, application-specific
hardware logic, on-chip memory, and additional on-chip peripherals. These DSP families also
provide a highly specialized instruction set, which is the basis of the operational flexibility and
speed of these DSPs.

Separate program and data spaces allow simultaneous access to program instructions and
data, providing the high degree of parallelism. Two reads and one write operation can be
performed in a single cycle. Instructions with parallel store and application-specific instructions
can fully utilize this architecture. In addition, data can be transferred between data and program
spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation
operations that can all be performed in a single machine cycle. Also included are the control
mechanisms to manage interrupts, repeated operations, and function calls.

Central Processing Unit (CPU)


The CPU of the ’54x devices contains:

- A 40-bit arithmetic logic unit (ALU)

- Two 40-bit accumulators

- A barrel shifter

- A 17 × 17-bit multiplier/adder

- A compare, select, and store unit (CSSU)

Bus Structure

The ’54x device architecture is built around eight major 16-bit buses:

- One program-read bus (PB) which carries the instruction code and immediate operands from
program memory

- Two data-read buses (CB, DB) and one data-write bus (EB), which interconnect to various
elements, such as the CPU, data-address generation logic (DAGEN), program-address
generation logic (PAGEN), on-chip peripherals, and data memory
-The CB and DB carry the operands read from data memory. The EB carries the data to be
written to memory.

- Four address buses (PAB, CAB, DAB, and EAB), which carry the addresses needed for
instruction execution

Memory

The minimum memory address range for the ’54x devices is 192K words — composed of 64K
words in program space, 64K words in data space, and 64K words in I/O space. Selected
devices also provide extended program memory space of up to 8M words. The program
memory space contains the instructions to be executed as well as tables used in execution. The
data memory space stores data used by the instructions. The I/O memory space interfaces to
external memory-mapped peripherals and can also serve as extra data storage space.

The ’54x DSPs provide both on-chip RAM and ROM to improve system performance and
integration.

On-Chip Peripherals

All the ’54x devices have the same CPU structure; however, they have different on-chip
peripherals connected to their CPUs. The on-chip peripheral options provided are:

- Software-programmable wait-state generator


- Programmable bank-switching
- Parallel I/O ports
- DMA controller
- Host-port interface (standard 8-bit, enhanced 8-bit, and 16-bit)
- Serial ports (standard, TDM, BSP, and McBSP)
- General-purpose I/O pins
- 16-bit timer with 4-bit prescaler
- Phase-locked loop (PLL) clock generator

Answer 4:
1. Circular Buffer Size Register -

A circular buffer, circular queue, cyclic buffer or ring buffer is a data structure that uses a single,
fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering
data streams.

Circular buffers are useful in DSP programming because most implementations include a loop
of some sort. In the filter example, all the coefficients are processed, and then the coefficient
pointer is reset when the loop is finished. Using circular buffering, the coefficient pointer will
automatically wrap around to the beginning when the end of the loop is encountered. Therefore,
the time that it takes to update the pointers is saved. Setting up circular buffers usually involves
writing to some registers to tell the DSP the buffer start address, buffer size, and a bit to tell the
DSP to use circular buffers.
2. Block-repeat Register -

Several registers are used for block repeats—instructions that are executed several times in a
row. The block repeat counter BRC0 counts block repeat iterations. The block repeat start and
end registers RSA0L and REA0L keep track of the start and end points of the block.

The block repeat register 1 BRC1 and block repeat save register 1 BRS1 are used to repeat
blocks of instructions. There are two repeat start address registers RSA0 and RSA1. Each is
divided into low and high parts: RSA0L and RSA0H, for example.

3. Interrupt Register -

When an interrupt occurs, firmware reads the interrupt register and then has to look, bit by bit,
for which interrupt occurred. Once it finds a bit, it services that interrupt. Once that interrupt is
serviced, it will continue to scan the rest of the bits for more pending interrupts.

Several registers control interrupts. The interrupt mask registers 0 and 1, named IER0 and
IER1, determine what interrupts will be recognized. The interrupt flag registers 0 and 1, named
IFR0 and IFR1, keep track of currently pending interrupts. Two other registers, DBIER0 and
DBIER1, are used for debugging. Two registers, the interrupt vector register DSP (IVPD) and
interrupt vector register host (IVPH), are used as the base address for the interrupt vector table.

4. Processor Mode Status Register -

The architecture provides six status registers. Three of the status registers, ST0 and ST1 and
the processor mode status register PMST, are inherited from the C54x architecture. The C55x
adds four registers ST0_55, ST1_55, ST2_55, and ST3_55. These registers provide arithmetic
and bit manipulation flags, a data page pointer and auxiliary register pointer, and processor
mode bits, among other features.

PMST is a memory mapped register that contains status and control bits. The Processor Status
Register (abbreviated as P) is a hardware register which records the condition of the CPU as a
result of arithmetic, logical or command operations. The purpose of the Processor Status
Register is to hold information about the most recently performed ALU operation, control the
enabling and disabling of interrupts and set the CPU operating mode.

A.5
1. Central Processing Unit (CPU)
● The main components of the CPU of TMS320C55x are:
○ Internal Data and Address Buses
○ Memory Interface Unit
○ Instruction Buffer Unit (I Unit)
○ Program Flow Unit (P Unit)
○ Address-Data Flow Unit (A Unit)
○ Data Compression Unit (D Unit)
● The main registers present in the CPU of the TMS320C55x are:
○ Accumulators - AC0 to AC3
○ Auxiliary Registers - AR0 to AR7
○ Circular Buffer Size Registers - BK03, BK47, BKC
○ Block Repeat Counters - BRC0, BRC1
○ CFCT - Control Flow Context Register
○ IER0, IER1 - Interrupt Enable Registers
○ IFR0, IFR1 - Interrupt Flag Registers
○ PC - Program Counter
○ ST0 to ST3 - Status Registers
● The functions of internal data and address buses are as follows:
○ Data-Read Data Buses (BB, CB, DB): These three buses carry 16-bit
data from data space or I/O space to functional units of the CPU. BB
only carries data from internal memory to the D unit (primarily to the
dual multiply-and-accumulate (MAC) unit).
○ Data-Read Address Buses (BAB, CAB, DAB): These three buses
carry 23-bit word data addresses to the memory interface unit, which
then fetches the data from memory and transfers the requested values
to the data-read data buses.
○ Program-Read Data Bus (PB): PB carries 32 bits (4 bytes) of program
code at a time to the I unit, where instructions are decoded.
○ Program-Read Address Bus (PAB): PAB carries the 24-bit byte
program address of the program code that is carried to the CPU by
PB.
○ Data-Write Data Buses (EB, FB): These two buses carry 16-bit data
from functional units of the CPU to data space or I/O space. EB and
FB receive data from the P unit, the A unit, and the D unit.
○ Data-Write Address Buses (EAB, FAB): These two buses carry 23-bit
addresses to the memory interface unit, which then receives the
values-driven on the data-write data buses.
● The memory interface unit mediates all data transfers between the CPU and
program/data space or I/O space.

2. Program Flow Unit (P Unit)


● The P unit generates all program-space addresses and sends them out on
PAB. It also controls the sequence of instructions by directing operations
such as hardware loops, branches, and conditional execution.
● The program-address generation logic is responsible for generating 24-bit
addresses for fetches from program memory. Normally, it generates
sequential addresses. However, for instructions that require reads from non
sequential addresses, the program-address generation logic can accept
immediate data from the I unit and register values from the D unit.
● The program control logic accepts immediate values from the I unit and test
results from the A unit or the D unit and performs the following actions:
○ Tests whether a condition is true for a conditional instruction and
communicates the result to the program-address generation logic.
○ Initiates interrupt servicing when an interrupt is requested and
properly enabled.
○ Controls the repetition of a single instruction preceded by a single-
repeat instruction, or a block of instructions preceded by a block-
repeat instruction.
○ Manages instructions that are executed in parallel. Parallelism within
the C55x DSP enables the execution of program-control instructions
at the same time as data processing instructions.
● The main registers of the Program Flow Unit are:
○ PC - Program counter
○ RETA - Return address register
○ CFCT - Control flow context register

3. Address-Data Flow Unit


● The A unit contains all the logic and registers necessary to generate the data-
space and I/O space addresses. It also contains an arithmetic logic unit
(ALU) that can perform arithmetic, logical, shift, and saturation operations.
● Data-Address Generation Unit (DAGEN): DAGEN generates all addresses
for reads from or writes to data space and I/O space. In doing so, it can
accept immediate values from the I unit and register values from the A unit.
● The A unit contains a 16-bit ALU that accepts immediate values from the I
unit and communicates bidirectionally with memory, I/O space, the A-unit
registers, the D-unit registers, and the P-unit registers. The A-unit ALU
performs the following actions:
○ Performs additions, subtractions, comparisons, Boolean logic
operations, signed shifts, logical shifts, and absolute value
calculations.
○ Tests, sets, clears, and complements A-unit register bits and memory
bits.
○ Modifies and moves register values.
○ Rotates register values.
○ Moves certain results from the shifter to an A-unit register
● The registers used by the A-Unit are:
○ Data Page Registers
○ Pointers
○ Circular Buffer Registers
○ Temporary Registers
4. Data Computation Unit (D Unit)
● The D unit contains the primary computational units of the CPU.
● The D-unit shifter: accepts immediate values from the I unit and
communicates bidirectionally with memory, I/O space, the A-unit registers,
the D-unit registers, and the P-unit registers. The shifter performs the
following actions:
○ Shifts 40-bit accumulator values up to 31 bits to the left or up to 32
bits to the right. The shift count can be read from one of the temporary
registers (T0–T3) or it can be supplied as a constant in the instruction.
○ Shifts 16-bit register, memory, or I/O-space values up to 31 bits to the
left or up to 32 bits to the right. The shift count can be read from one
of the temporary registers (T0–T3) or it can be supplied as a constant
in the instruction.
○ Shifts 16-bit immediate values up to 15 bits to the left. You supply the
shift count as a constant in the instruction.
○ Normalizes accumulator values
● D-Unit Arithmetic Logic Unit (D-Unit ALU): The CPU contains a 40-bit
ALU in the D unit that accepts immediate values from the I unit and
communicates bidirectionally with memory, I/O space, the A-unit registers,
the D-unit registers, and the P-unit registers. In addition, it receives results
from the shifter. The D-unit ALU performs the following actions:
○ Performs additions, subtractions, comparisons, rounding, saturation,
Boolean logic operations, and absolute value calculations.
○ Performs two arithmetical operations simultaneously when a dual 16-
bit arithmetic instruction is executed
○ Tests, sets, clears, and complements D-unit register bits
○ Moves register values
● Two Multiply-and-Accumulate Units (MACs): Two MACs support
multiplication and addition/subtraction. In a single cycle, each MAC can
perform a 17-bit × 17-bit multiplication (fractional or integer) and a 40-bit
addition or subtraction with optional 32-/40-bit saturation. The accumulators
(which are D-unit registers) receive all the results of the MACs. The MACs
accept immediate values from the I unit; accept data values from memory,
I/O space, and the A-unit registers; and communicate bi-directionally with
the D-unit registers and the P-unit registers. Status register bits (in the P
unit) are affected by MAC operations. Overflow detection is only performed
for the final operation of a calculation.
● The D unit contains and uses the Accumulators, Transition Registers.

Ans 6

● The C6x processors are closer to traditional very long instruction word
(VLIW) processors because they seek to exploit the high levels of
instruction-level parallelism (ILP) in many signal processing algorithms.
● For the embedded space, code compatibility is less of a problem, and so new
applications can be either hand tuned or recompiled for the newest
generation of processor. The other reason superscalar excels on the desktop
is because the compiler cannot predict memory latencies at compile time. In
embedded, however, memory latencies are often much more predictable. In
fact, hard real-time constraints force memory latencies to be statically
predictable. Of course, a superscalar would also perform well in this
environment with these constraints, but the extra hardware to dynamically
schedule instructions is both wasteful in terms of precious chip area and in
terms of power consumption. Thus VLIW is a natural choice for high-
performance embedded.
● The C6x family employs different pipeline depths depending on the family
member. For the C64x, for example, the pipeline has 11 stages. The first
four stages of the pipeline perform instruction fetch, followed by two stages
for instruction decode, and finally four stages for instruction execution.
● The C6x family’s execution stage is divided into two parts, the left or “1”
side and the right or “2” side. The L1 and L2 units perform logical and
arithmetic operations. D units in contrast perform a subset of logical and
arithmetic operations but also perform memory accesses (loads and stores).
The two M units perform multiplication and related operations (e.g., shifts).
Finally the S units perform comparisons, branches, and some SIMD
operations. Each side has its own 32- entry, 32-bit register file (the A file for
the 1 side, the B file for the 2 side). A side may access the other side’s
registers, but with a 1- cycle penalty. Thus, an instruction executing on side
1 may access B5, for example, but it will take 1- cycle extra to execute
because of this.
● Therefore, the partitioned registers in TI C6x help to accomplish several
features that make this family much faster and efficient than other dsp
processors, including pipelining etc.

You might also like