0% found this document useful (0 votes)
24 views

Module 3 (1)

Uploaded by

Prekshith Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Module 3 (1)

Uploaded by

Prekshith Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Module3_DSPAA

Module 3:
Programmable Digital Signal Processors:
Introduction, Commercial Digital Signal-processing Devices, Data Addressing Modes of
TMS32OC54XX, Memory Space of TMS32OC54xx Processors, Program Control. Detail Study of
TMS320C54X & 54xx Instructions and Programming, On Chip Peripherals, Interrupts of
TMS32OC54XX Processors, Pipeline Operation of TMS32OC54xx Processor. L1, L2, L3

3.1 Introduction:

Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog


devices & Motorola manufacture the digital signal processor (DSP) chips. These manufacturers
have developed a range of DSP chips with varied complexity.

The TMS320 family consists of two types of single chips DSPs: 16-bit fixed point &32-bit
floating-point. These DSPs possess the operational flexibility of high-speed controllers and the
numerical capability of array processors

3.2 Commercial Digital Signal-Processing Devices:

There are several families of commercial DSP devices. Right from the early eighties, when
these devices began to appear in the market, they have been used in numerous applications, such as
communication, control, computers, Instrumentation and consumer electronics. The architectural
features and the processing power of these devices have been constantly upgraded based on the
advances in technology and the application needs. However, their basic versions, most of them have
Harvard architecture, a single-cycle hardware multiplier, an address generation unit with dedicated
address registers, special addressing modes, on-chip peripherals interfaces.

Of the various families of programmable DSP devices that are commercially available, the
three most popular ones are those from Texas Instruments, Motorola, and Analog Devices. Texas
Instruments was one of the first to come out with a commercial programmable DSP with the
introduction of its TMS32010 in 1982.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 1


Module3_DSPAA

Table 3.1 Summary of the Architectural Features of three fixed-Points DSPs


Architectural TMS320C25 DSP 56000 ADSP2100
Feature
Data representation 16-bit fixed point
format 16-bit fixed 24-bit fixed point
Hardware multiplier 16 x 16 24 x 24 16 x 16
ALU 32 bits 56 bits 40 bits
24-bit program
Internal buses 16-bit program bus 24-bit program bus bus
2 x 24-bit data
16-bit data bus buses 16-bit data bus
24-bit global bit data 16-bit result bus
bus
Data bus bus
16-bit program/data 24-bit program/data bus 24-bit program bus
External buses bus 16-bit data bus
On-chip Memory 544 words RAM 512 words PROM -
4K words ROM 2 x 256 words data
RAM
2 x 256 words data
ROM
Off-chip memory 64 K words 64K words program 16K words program
program 2 x 64K words data 16K words data
64k words data
Cache memory - - 16 words Program
Instruction cycle time 100 nsec 97.5 nsec. 125 nsecc.
Special addressing Bit reversed Modulo Modulo
modes Bit reversed Bit reversed
Data address 1 2 2
generators
Interfacing features Synchronous serial Synchronous and DMA
I/O DMA Asynchronous serial
I/O DMA

3.3. The architecture of TMS320C54xx digital signal processors:

TMS320C54xx processors retain in the basic Harvard architecture of their predecessor,


TMS320C25, but have several additional features, which improve their performance over it. Figure
3.1 shows a functional block diagram of TMS320C54xx processors. They have one program and
three data memory spaces with separate buses, which provide simultaneous accesses to program
instruction and two data operands and enables writing of result at the same time. Part of the memory
is implemented on-chip and consists of combinations of ROM, dual-access RAM, and single-access
RAM. Transfers between the memory spaces are also possible.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 2


Module3_DSPAA

The central processing unit (CPU) of TMS320C54xx processors consists of a 40- bit
arithmetic logic unit (ALU), two 40-bit accumulators, a barrel shifter, a 17x17 multiplier, a 40-bit
adder, data address generation logic (DAGEN) with its own arithmetic unit, and program address
generation logic (PAGEN). These major functional units are supported by a number of registers
and logic in the architecture.
A powerful instruction set with a hardware-supported, single-instruction repeat and block
repeat operations, block memory move instructions, instructions that pack two or three
simultaneous reads, and arithmetic instructions with parallel store and load make these devices very
efficient for running high-speed DSP algorithms.

Several peripherals, such as a clock generator, a hardware timer, a wait state generator,
parallel I/O ports, and serial I/O ports, are also provided on-chip. These peripherals make it
convenient to interface the signal processors to the outside world.

In these following sections, we examine in detail the various architectural features of the
TMS320C54xx family of processors.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 3


Module3_DSPAA

Figure 3.1. Functional architecture for TMS320C54xx processors.

3.3.1 Bus Structure:

The performance of a processor gets enhanced with the provision of multiple buses to
provide simultaneous access to various parts of memory or peripherals. The 54xx architecture is
built around four pairs of 16-bit buses with each pair consisting of an address bus and a data bus.
As shown in Figure 3.1, these are

The program bus pairs (PAB, PB); which carries the instruction code from the program memory.

Three data bus pairs (CAB, CB; DAB, DB; and EAB, EB); which interconnected the various units
within the CPU. In Addition, the pair CAB, CB and DAB, DB are used to read from the data
memory, while

The pair EAB, EB; carries the data to be written to the memory. The ‘54xx can generate up to two

Prof. N Ajay Kumar, Dept. of ECE, SVIT 4


Module3_DSPAA

data-memory addresses per cycle using the two auxiliary register arithmetic unit (ARAU0 and
ARAU1) in the DAGEN block. This enables accessing two operands simultaneously.

3.3.2 Central Processing Unit (CPU):

The ‘54xx CPU is common to all the ‘54xx devices. The ’54xx CPU contains a 40-bit
arithmetic logic unit (ALU); two 40-bit accumulators (A and B); a barrel shifter; a 17 x 17-bit
multiplier; a 40-bit adder; a compare, select and store unit (CSSU); an exponent encoder (EXP); a
data address generation unit (DAGEN); and a program address generation unit (PAGEN).

The ALU performs 2’s complement arithmetic operations and bit-level Boolean operations
on 16, 32, and 40-bit words. It can also function as two separate 16-bit ALUs and perform two 16-
bit operations simultaneously. Figure 3.2 show the functional diagram of the ALU of the
TMS320C54xx family of devices.

Accumulators A and B; store the output from the ALU or the multiplier/adder block and
provide a second input to the ALU. Each accumulator is divided into three parts: guards’ bits (bits
39-32), high-order word (bits-31-16), and low-order word (bits 15- 0), which can be stored and
retrieved individually.

Each accumulator is memory-mapped and partitioned. It can be configured as the


destination registers. The guard bits are used as a head margin for computations.

AG (39-32) AH (31-16) AL (15-0)

BG (39-32) BH (31-16) BL (15-0)

Prof. N Ajay Kumar, Dept. of ECE, SVIT 5


Module3_DSPAA

Figure 3.2. Functional diagram of the central processing unit of the TMS320C54xx
processors.

Barrel shifter: provides the capability to scale the data during an operand read or write. No
overhead is required to implement the shift needed for the scaling operations. The’54xx barrel
shifter can produce a left shift of 0 to 31 bits or a right shift of 0 to 16 bits on the input data. The
shift count field of status registers ST1, or in the temporary register T. Figure 3.3 shows the
functional diagram of the barrel shifter of TMS320C54xx processors.

The barrel shifter and the exponent encoder normalize the values in an accumulator in a
single cycle. The LSBs of the output are filled with0s, and the MSBs can be either zero filled or
sign extended, depending on the state of the sign-extension
mode bit in the status register ST1. An additional shift capability enables the processor to perform
numerical scaling, bit extraction, extended arithmetic, and overflow prevention operations.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 6


Module3_DSPAA

Figure 3.3. Functional diagram of the barrel shifter

Multiplier/adder unit: The kernel of the DSP device architecture is multiplier/adder unit. The
multiplier/adder unit of TMS320C54xx devices performs 17 x 17 2’s complement multiplication
with a 40-bit addition effectively in a single instruction cycle. In addition to the multiplier and
adder, the unit consists of control logic for integer and fractional computations and a 16-bit
temporary storage register, T. Figure 3.4 show the functional diagram of the multiplier/adder unit
of TMS320C54xx processors.

The compare, select, and store unit (CSSU) is a hardware unit specifically incorporated to
accelerate the add/compare/select operation. This operation is essential to implement the Viterbi
algorithm used in many signal-processing applications.

The exponent encoder unit supports the EXP instructions, which stores in the T register the
number of leading redundant bits of the accumulator content. This information is useful while
shifting the accumulator content for the purpose of scaling.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 7


Module3_DSPAA

Figure 3.4. Functional diagram of the multiplier/adder unit of TMS320C54xx processors.

3.3.3 Internal Memory and Memory-Mapped Registers:

The amount and the types of memory of a processor have direct relevance to the efficiency
and performance obtainable in implementations with the processors. The ‘54xx memory is
organized into three individually selectable spaces: program, data, and I/O spaces. All ‘54xx
devices contain both RAM and ROM. RAM can be either dual-access
type (DARAM) or single-access type (SARAM). The on-chip RAM for these processors is
organized in pages having 128word locations on each page.

The ‘54xx processors have a number of CPU registers to support operand addressing and
computations. The CPU registers and peripherals registers are all located on page 0 of the data
memory. Figure 3.5(a) and (b) shows the internal CPU registers and peripheral registers with their
addresses. The processors mode status (PMST) registers that is used to configure the processor. It
is a memory-mapped register located at address 1Dh on page 0 of the RAM. A part of on-chip ROM

Prof. N Ajay Kumar, Dept. of ECE, SVIT 8


Module3_DSPAA

may contain a boot loader and look-up tables for function such as sine, cosine, µ- law, and A- law.

Figure 3.5(a) Internal memory-mapped registers of TMS320C54xx processors.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 9


Module3_DSPAA

Figure 3.5(b). Peripheral registers for the TMS320C54xx processors


3.3.4 Status registers (ST0,ST1):

ST0: Contains the status of flags (OVA, OVB, C, TC) produced by arithmetic operations & bit
manipulations.

ST1: Contain the status of various conditions & modes. Bits of ST0&ST1registers can be set or
clear with the SSBX & RSBX instructions.
PMST: Contains memory-setup status & control information.

Status register0 diagram:


The Status register ST0 is shown in figure 3.6 (a).

ARP TC C OVA OVB DP


(15-13) (12) (11) (10) (9) (8-0)

Figure 3.6(a). ST0 diagram ARP: Auxiliary register pointer.


TC: Test/control flag. C: Carry bit.
OVA: Overflow flag for accumulator A. OVB: Overflow flag for accumulator B. DP: Data-memory
page pointer.

Status register1 diagram:


The Status register ST1 is shown in figure 3.6 (b).

BRAF(15) CPL XF HM INTM 0 OVM SXM C16 FRCT(6) CMPT(5) ASM (4-0)
(14) (13) (12) (11) (10) (9) (8) (7)

Figure 3.6(b). ST1 diagram


BRAF: Block repeat active flag
BRAF=0, the Block repeat is deactivated.
BRAF=1, the Block repeat is activated
CPL: Complier mode
CPL=0, the relative direct addressing mode using data page pointer is selected.
CPL=1, the relative direct addressing mode using stack pointer is selected
XF: external Flag, a general-purpose output pin for multiprocessor configuration.
Set SSBX; Reset: RSBX
HM: Hold mode determine whether the CPU (processor) continues internal execution or
acknowledge for external interface. .

INTM: interrupt mode-It globally masks or enables all the interrupts.


0- Always read as zero.
OVM: Overflow mode

Prof. N Ajay Kumar, Dept. of ECE, SVIT 10


Module3_DSPAA

OVM=0: the destination accumulator is set either the most positive value and most negative value.
OVM=1; Overflowed result is in destination accumulator.
SXM: Sign extension mode
SXM=0, sign extension suppressed.
SXM=1, the data is sign extended.

C16: Dual 16-bit or double precision arithmetic mode


C16=0: ALU operates in the double precision arithmetic mode
C16=1: ALU operates in the dual 16-bit arithmetic mode

FRCT: Fractional bit


The multiplier output is left shifted by 1-bit, to compensate the extra sign bit.

CMPT: Compatibility mode.


CMPT=0: The ARP is not updated in the indirecting addressing mode.
CMPT=1: The ARP is updated in the indirecting addressing mode

ASM: Accumulator Shift Mode. 5-bit field, & specifies the Shift value within -16 to 15 range.

3.3.4 Processor Mode Status Register (PMST):


The Processor mode status register PMST is shown in figure 3.6 (c).

IPTR (15-7) MP/MC (6) OVLY (5) AVIS (4) DROM (3) CLKOFF (2) SMUL (1) SST (0)

Figure 3.6(c). PMST register diagram

• IPTR (15-7)): 1FFh Interrupt vector pointer.


• The 9-bit IPTR field points to the 128-word program page where the interrupt vector reside.
• The interrupt vectors can be remapped to RAM for boot loaded operationsFF80 h in the
program memory space.
• The rest instruction does not affect this field.
• At reset, these bits are all set to 1: the reset vector always reside at address

• MP/MC bar: Microprocessor /Microcomputer mode.


• MP/MC bar enables / disables the on-chip ROM to be addressable in the program memory.
• MP/MC bar=0: The on-chip ROM is enabled and addressable
• MP/MC bar=1: The On chip ROM not available.
• This bit can be set/ reset by software.

• OVLY: RAM Overlay:

• It enables on-chip dual access data RAM blocks to be mapped into the program space.

• OVLY =0, The On-chip RAM is addressable in data space but not in program space.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 11


Module3_DSPAA

• OVLY=1, The on-chip RAM is mapped into program space and data space.

• AVIS: Address visibility mode.

• It enables the internal program address to be visible at the address pins.

• AVIS=0, The external lines do not change with the internal program address.

• Control and data lines are not affected and the address bus is driven with the last address on

the bus.

• AVIS=1, It allows the internal program address can be traced.

• Also, it allows the interrupt vector to be decoded in conjunction with IACK when the

interrupt vectors reside on the on-chip memory.

DROM: It enables the on-chip DARAM (4-7) to be mapped into data space.
DROM=0, the on-chip DARAM (4-7) is not mapped into the data space
DROM=1, the on-chip DARAM (4-7) is mapped into the data space
CLKOFF: CLOCKOUT off: CLKBFF=1, the output of the CLKOUT is disabled and
remains at high level.
SMUL: saturation on Multiplication:
SMUL=1, saturation of a Multiplication result occurs before performing the accumulation in a
MAC of MAS instruction.
SMUL bit applies only when OVM =1 and FRCT=1.
SST: Saturation on store.
SST=1, saturation of the data from the accumulator is enabled storing in memory.
• The saturation is performed after the shift operation

3.4 Data Addressing Modes of TMS320C54X Processors:

Data addressing modes provide various ways to access operands to execute instructions and place
results in the memory or the registers. The 54XX devices offer seven basic addressing modes

1. Immediate addressing.

2. Absolute addressing.

3. Accumulator addressing.

4. Direct addressing.

5. Indirect addressing.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 12


Module3_DSPAA

6. Memory mapped addressing

7. Stack addressing.

3.4.1 Immediate addressing:

The instruction contains the specific value of the operand. The operand can be short (3,5,8 or 9 bit
in length) or long (16 bits in length). The instruction syntax for short operands occupies one memory
location,

Example: LD #20, DP.


RPT #0FFFFh.
3.4.2 Absolute Addressing:

The instruction contains a specified address in the operand.

1. Dmad addressing. MVDK Smem,dmad MVDM dmad,MMR

2. Pmad addressing. MVDP Smem,pmad MVPD pmem,Smad

3. PA addressing. PORTR PA, Smem,

4.*(lk) addressing.
Example:

MVKP 1000h, *AR5 ; 1000 H *AR5 (dmad addressing) MVPD 1000h, *AR7 ; 1000h
*AR7 (pmad addressing) PORTR 05h, *AR3 ; 05h *AR3 (PA addressing)
LD *(1000h), A ; *(1000h) A (*(lk) addressing)

3.4.3 Accumulator Addressing:

Accumulator content is used as address to transfer data between Program and Data memory.
Ex: READA *AR2

3.4.4 Direct Addressing:

Block diagram of the direct addressing mode for TMS320C54xx Processors is shown in figure 3.7.
Base address + 7 bits of value contained in instruction = 16-bit address. A page of 128 locations
can be accessed without change in DP or SP. Compiler mode bit (CPL) in ST1 register is used.
If CPL =0 selects DP CPL = 1 selects SP,
It should be remembered that when SP is used instead of DP, the effective address is computed by
adding the 7-bit offset to SP.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 13


Module3_DSPAA

Figure 3.7 Block diagram of the direct addressing mode for TMS320C54xx Processors.

3.4.5 Indirect Addressing:

Block diagram of the indirect addressing mode for TMS320C54xx Processors is shown in
figure 3.8.
• Data space is accessed by address present in an auxiliary register.

• 54xx have 8, 16 bit auxiliary register (AR0 – AR 7). Two auxiliary register arithmetic
units (ARAU0 & ARAU1)

• Used to access memory location in fixed step size. AR0 register is used for indexed and bit
reverse addressing modes.

• For single – operand addressing

MOD € type of indirect addressing ARF € AR used for addressing


• ARP depends on (CMPT) bit in ST1

CMPT = 0, Standard mode, ARP set to zero


CMPT = 1, Compatibility mode, Particularly AR selected by ARP

Prof. N Ajay Kumar, Dept. of ECE, SVIT 14


Module3_DSPAA

Figure 3.8 Block diagram of the indirect addressing mode for TMS320C54xx Processors.

SL NO Operand syntax Function

1 *ARx Addr = ARx;


2 *ARx - Addr = ARx ; ARx = ARx -1

3 *ARx + Addr = ARx; ARx = ARx +1

4 *+ARx Addr = ARx+1; ARx = ARx +1

5 *ARx - 0B Addr = ARx ; ARx = B(ARx – AR0)

6 *ARx - 0 Addr = Arx ; ARx = ARx – AR0


7 *ARx + 0 Addr = Arx ; ARx = ARx +AR0

8 *ARx + 0B Addr = ARx ; ARx = B(ARx + AR0)

9 *ARx - % Addr = ARx ; ARx = circ(ARx – 1)

Prof. N Ajay Kumar, Dept. of ECE, SVIT 15


Module3_DSPAA

10 *ARx + % Addr = ARx ; ARx = circ (ARx + 1)

11 *(lk) addr lk

12 *ARx(lk) addr ARx+lk

13 *+ARx(lk) ARx ARx+lk


addr = Arx
14 *+ARx(lk) % Arx circ(ARx+lk)
addr ARx
15 *ARx + 0% addr = ARx ; ARx = circ(ARx + AR0)
16 *ARx - 0% addr = ARx ; ARx = circ(ARx – AR0)

Table 3.2 Indirect addressing options with a single data –memory operand.

Circular Addressing Mode:


• Block diagram of the circular addressing mode for TMS320C54xx Processors is shown in figure
3.9.
• in convolution, correlation and FIR filters.

• A circular buffer is a sliding window contains most recent data. Circular buffer of size R
must start on a N-bit boundary, where 2N > R .

• The circular buffer size register (BK): specifies the size of circular buffer.

• Effective base address (EFB): By zeroing the N LSBs of a user selected AR (ARx).

• End of buffer address (EOB) : By replacing the N LSBs of ARx with the N LSBs of BK.

If 0 ≤ index + step < BK ; index = index +step;


else if index + step ≥ BK ; index = index + step - BK; else if index + step < 0; index + step
+ BK

• Circular addressing mode implementation forTMS320C54xx Processors is shown


in figure 3.10

Prof. N Ajay Kumar, Dept. of ECE, SVIT 16


Module3_DSPAA

Figure 3.9 Block diagram of the circular addressing mode for TMS320C54xx Processors.

Figure 3.10 Circular addressing mode implementation for TMS320C54xx Processors.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 17


Module3_DSPAA

Bit-Reversed Addressing:

• Used for FFT algorithms.

• AR0 specifies one half of the size of the FFT.

• The value of AR0 = 2N-1: N = integer FFT size = 2N

• AR0 + AR (selected register) = bit reverse addressing.

• The carry bit propagating from left to right.

Problem: Assuming the contents of AR3 to be 200h, what will be its contents after each of

the following TMS320C54XX addressing modes is used? Assume that the contents of AR0

are 20h.

A. *AR3+0B

B. *AR3-0B

a) AR3 AR3+AR0 with reverse carry propagation

AR3=200h+20h (with reverse carry propagation) = 220h

AR3 AR3-AR0 with reverse carry propagation

AR3=200h-20h (with reverse carry propagation) = 23Fh

Dual-Operand Addressing:

• Dual data-memory operand addressing is used for instruction that simultaneously


perform two reads (32-bit read) or a single read (16-bit read) and a parallel store
(16-bit store) indicated by two vertical bars, II.
• These instructions access operands using indirect addressing mode.

• If in an instruction with a parallel store the source operand the destination operand
point to the same location, the source is read before writing to the destination.
• Only 2 bits are available in the instruction code for selecting each auxiliary register
in this mode.
• Thus, just four of the auxiliary registers, AR2-AR5, can be used, The ARAUs
together with these registers, provide capability to access two operands in a single
cycle.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 18


Module3_DSPAA

• Figure 3.11 shows how an address is generated using dual data-memory operand
addressing.

Name Function
Opcode This field contains the operation code for the instruction
Xmod Defined the type of indirect addressing mode used for accessing the Xmem
operand
XAR Xmem AR selection field defines the AR that contains the address of Xmem

Ymod Defies the type of indirect addressing mode used for accessing the Ymem
operand
Yar Ymem AR selection field defines the AR that contains the address of Ymem

Table 3.3. Function of the different field in dual data memory operand addressing

Figure 3.11 Block diagram of the Indirect addressing options with a dual data –memory
operand.

3.4.6. Memory-Mapped Register Addressing:

• Used to modify the memory-mapped registers without affecting the current data-

Prof. N Ajay kumar, Dept. of ECE, SVIT 19


Module3_DSPAA

page pointer (DP) or stack-pointer (SP)

– Overhead for writing to a register is minimal

– Works for direct and indirect addressing

– Scratch –pad RAM located on data PAGE0 can be modified

• STM #x, DIRECT

• STM #tbl, AR1

Scratch –pad RAM located on data PAGE0 can be modified


Taking only 7 bits LSB of the 16 bit direct address or the value of the AR register used for
indirect addressing, the required address is generated.
Example1 , if AR1 is used indirectly to point MMR using MMR addressing mode and its
contents re 3825h, then AR1 points to the timer period register (PRD), since the 7 LSBs
of AR11 are 25h, Which is the address of the PRD register.
After execution AR1, contains 0025h.
Example 2. LDM AR4, A
Here the data stored at 0014h, which is memory address of the AR4, is loaded onto A.
Figure 3.12 shows a 16-bit memory mapped register address generation.

Figure 3.12.16 bit memory mapped register address generation.

3.4.7 Stack Addressing:

• Used to automatically store the program counter during interrupts and subroutines.
• Values of stack & SP before and after operation is shown in figure 3.13.

• Can be used to store additional items of context or to pass data values.

• Uses a 16-bit memory-mapped register, the stack pointer (SP).

Prof. Ajay Kumar, Dept. of ECE, SVIT 20


Module3_DSPAA

• PSHD X2

Figure 3.13. Values of stack &SP before and after operation.

Example Problem P3.1: Assuming the current content of AR3 to be 200h, what will be its contents
after each of the following TMS320C54xx addressing modes is used? Assume that the contents of
AR0 are 20h.
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3
e. *AR3
f. *+AR3(40h)
g. *+AR3(-40h)
Solution:

a. AR3 → AR3 + AR0;


AR3 = 200h + 20h = 220h
b. AR3 → AR3 - AR0;
AR3 = 200h - 20h = 1E0h
c. AR3 → AR3 + 1;
AR3 = 200h + 1 = 201h
d. AR3 → AR3 - 1;
AR3 = 200h - 1 = 1FFh
e. AR3 is not modified.
AR3 = 200h
f. AR3 → AR3 + 40h;
AR3 = 200 + 40h = 240h
g. AR3 → AR3 - 40h;
AR3 = 200 - 40h = 1C0h

Problem P3.2 Assume that the register AR3 with contents 1020h is selected as the pointer for the
circular buffer. Let BK = 40h to specify the circular buffer size as 40h.Determine the start and the end
addresses fort the buffer. What will be the contents of register AR3 after the execution to the instruction
LD*AR3 + 0%, A, if the contents of register AR0 are 0025h?

Prof. N Ajay Kumar, Dept. of ECE, SVIT 21


Module3_DSPAA

Solution:

AR3 = 1020h means that currently it points to location 1020h. Masking the lower 6
bits zeros gives the start address of the buffer as 1000h. Replacing the same bits with
the BK gives the end address as 1040h.

The Instruction LD*AR3 + 0%, A modifies AR3 by adding AR0 to it and


applying the circular modification. It yields
AR3 = circ(1020h+0025h) = circ(1045h) = 1045h - 40h = 1005h.
Thus the location 1005h is the one pointed to by AR3.

Problem P3.3 Assuming the current contents of AR3 to be 200h, what will be its contents after each
of the following TMS320C54xx addressing modes is used? Assume that the contents of AR0 are 20h

f. *AR3 + 0B
g. *AR3 – 0B
Solution:

a. AR3 → AR3 + AR0 with reverse carry propagation;


AR3 = 200h + 20h (with reverse carry propagation) = 220h.

b. AR3 → AR3 - AR0 with reverse carry propagation;


AR3 = 200h - 20h (with reverse carry propagation) = 23Fh.

3.5. Memory Space of TMS320C54xx Processors

• A total of 128k words extendable up to 8192k words.

• Total memory includes RAM, ROM, EPROM, EEPROM or Memory mapped


peripherals.

• Data memory: To store data required to run programs & for external memory
mapped registers.
• Program memory: To store program instructions &tables used in the execution
of programs.
• Figure 3.14 shows the memory map for the TMS320C5416 Processor

Organized into 128 pages, each of 64k word size

Prof. N Ajay Kumar, Dept. of ECE, SVIT 22


Module3_DSPAA

Fig. 3.14 Division of Size 64 K words memory


Table 3.4. Function of different pin PMST

Register PMST bit Logic On-chip memory configuration


MP/MC 0 ROM enabled
1 ROM not available
OVLY 0 RAM in data space
1 RAM in program space

DROM 0 ROM not in data space


1 ROM in data space

Prof. N Ajay Kumar, Dept. of ECE, SVIT 23


Module3_DSPAA

Figure 3.15 Memory map for the TMS320C5416 Processor.

3.6. Program Control

❑ It contains program counter (PC), the program counter related H/W, hard stack,
repeat counters &status registers.

❑ PC addresses memory in several ways namely:

➢ Branch: The PC is loaded with the immediate value following the


branch instruction

➢ Subroutine call: The PC is loaded with the immediate value

Prof. N Ajay Kumar, Dept. of ECE, SVIT 24


Module3_DSPAA

following the call instruction

➢ Interrupt: The PC is loaded with the address of the appropriate


interrupt vector.
➢ Instructions such as BACC, CALA, etc ;The PC is loaded with the
contents of the accumulator low word

❑ End of a block repeat loop: The PC is loaded with the contents of the block repeat
program address start register.

❑ Return: The PC is loaded from the top of the stack.

Unit 2: Instruction and programming


3.7Assembly language instructions can be classified as:

❑ Load and store instructions


❑ Arithmetic operations.

❑ Logical operations.

❑ Program-control operations

Operators Used in Instruction Set:

Table 3.5. Operator used in instruction set

Prof. N Ajay Kumar, Dept. of ECE, SVIT 25


Module3_DSPAA

3.7.1. Load and store instructions

➢ LD: Load accumulator with shift


➢ Loads the accumulator with data memory values or an immediate value.
➢ This instruction supports shifting.
➢ It loads accumulator with shift.
➢ Syntax:
o LD Smem, dst
o LD Smem, TS,dst
o LD Smem, 16,dst
o LD Xmem, SHIFT1,dst
o LD #lk,16,dst
➢ Load (T/DP/ASM/ARP)
➢ Load value can be single memory operand or a constant
➢ Syntax:
o LD Smem, T
o LD Smem, DP
o LD Smem, ASM
o LD #k3, ARP
➢ LDR: Load memory value in Accumulator high with rounding
➢ LDM: Load Memory Mapped Register
➢ Syntax: LDM MMR, dst
➢ LD||MAC[R]: Load Accumulator With Parallel Multiply Accumulate With/Without
Rounding
➢ LD||MAS[R]: Load Accumulator With Parallel Multiply Subtract With/Without
Rounding
➢ LDU :Load Unsigned Memory Value
➢ LMS: Least Mean Square
➢ LTD :Load T and Insert Delay

Store Instruction
ST Store TREG, TRN or immediate value into ST T, Smem
memory ST TRN, Smem
ST #lk, Smem
STH Store accumulator high into memory STH src, Smem
STH src, ASM, Smem
STL Store accumulator low into memory STH src, Smem
STH src, [,SHIFT], Smem
STLM Store accumulator low to MMR STLM src, MMR
STM Store immediate value to MMR STM lk, MMR

Prof. N Ajay Kumar, Dept. of ECE, SVIT 26


Module3_DSPAA

CMPS Compare select max and store ST src, ymem||LD Xmem,


Compares two 16-bit signed values of high dst
and low parts of source accumulator
ST||LD Store accumulator with parallel load ST sc, Ymem
LD Xmem, dst
STRCD Store TREG conditionally STRCD Xmem, Cond
Store the contents of TREG into Xmem if
condition is true, if false content will be
written back to Xmem location itself.
LD||MAC Multiply accumulate with parallel load. LD Xmem, dst
Multiply dual memory operand by the || MAC Ymem, dst2
contents of TREG and adds the result to
dst2. In parallel it loads the higher part of i.e., dst, dst2 can be either A
destination accumulator (31-16) with dual or B
memory operand)

ST||ADD Store accumulator with parallel ADD ST src, Ymem|| ADD Xmem,
dst
ST||SUB Store accumulator with parallel subtract ST src, Ymem|| SUB Xmem,
dst
ST||MAC Store accumulator with parallel multiply ST src, Ymem|| MAC Xmem,
accumulate. dst
ST||MPY Store accumulator with parallel multiply ST src, Ymem|| MPY Xmem,
dst
ST||MAC[R]: Store Accumulator With Parallel Multiply Accumulate With/Without
Rounding
ST||MAS[R]: Store Accumulator With Parallel Multiply Subtract With/Without
Rounding
STRCD: Store T Conditionally
3.7.2. Arithmetic Instructions
ADD instructions: (Flags affected OVM, C and OVsrc, SXM, OVdst)
Adds a 16-bit value to the contents of selected accumulator or 16-bit Xmem operand in dual-
memory operand addressing mode.
16-bit value can be
i). Contents of single data memory operand
ii). Contents of dual data memory operand
iii). 16-bit long immediate operand
iv). Shifted value in source accumulator.
NOTE: If destination is specified then result is stored in the destination location, else stored
in the source accumulator itself.

Syntax: Expression
ADD Smem, src src = src + Smem
ADD Smem, TS, src src = src + Smem << TS

Prof. N Ajay Kumar, Dept. of ECE, SVIT 27


Module3_DSPAA

ADD Smem, 16, src [ , dst ] dst = src + Smem << 16

ADD Smem [, SHIFT ], src [ , dst ] dst = src + Smem << SHIFT

ADD Xmem, SHFT, src src = src + Xmem <<ıSHFT

ADD Xmem, Ymem, dst dst = Xmem << 16 + Ymem << 16

ADD #lk [, SHFT ], src [ , dst ] dst = src + #lk << SHFT

ADD #lk, 16, src [ , dst ] dst = src + #lk << 16

ADD src [ , SHIFT ] [ , dst ] dst = dst + src << SHIFT

ADD src, ASM [ , dst ] dst = dst + src << ASM

ADDC Smem, src src = src + Smem + C

ADDM #lk, Smem Smem = Smem + #lk

Operands:
Smem: single data-memory operand
Xmem: Dual data-memory operands
src,dst: A (accumulator A)
B (accumulator B)

-32768<=lk<=32767
-16<=SHIFT<=15
0<=SHFT<=15
ADD: ADD accumulator
ADDM: Add long immediate value to 16-bit single operand
ADDC: Add to accumulator with carry
ADDS: Add to accumulator with sign extension suppressed
SUB: Subtract from Accumulator
SUBB: Subtract from accumulator with barrow
SUBS Subtract with accumulator sign extension suppressed.
SUBC: Subtract conditionally is used for division.
NOTE: Subtraction instructions have the same format as ADD instruction and operands are
also obtained in the same as in addition.

MPY: Multiply with/ Without Rounding


MPYA : Multiply by Accumulator A
MPYU:Multiply Unsigned
SQUR: Square
SQURA: Square and Accumulate
SQURS: Square and Subtract

Prof. N Ajay Kumar, Dept. of ECE, SVIT 28


Module3_DSPAA

MAC[R]: Multiply Accumulate With/Without Rounding


MACA[R]: Multiply by Accumulator A and Accumulate With/Without Rounding
MACD: Multiply by Program Memory and Accumulate With Delay
MACP: Multiply by Program Memory and Accumulate
MACSU: Multiply Signed by Unsigned and Accumulate
MACSU: Multiply Signed by Unsigned and Accumulate

MAS[R]: Multiply and Subtract With/Without Rounding


MASA[R]: Multiply by Accumulator A and Subtract With/Without Rounding
MAX: Accumulator Maximum
MIN: Accumulator Minimum
ABDST: Absolute Distance
ABS: Absolute Value of Accumulator
CMPL: Complement Accumulator
CMPM: Compare Memory With Long Immediate
CMPS: Compare, Select and Store Maximum
EXP: Accumulator Exponent
SAT: Saturate Accumulator
NORM: Normalization

3.7.3. Logical operations


AND: AND With Accumulator
ANDM:AND Memory With Long Immediate
OR: OR with Accumulator
ORM: OR Memory With Constant
XOR: Exclusive OR With Accumulator
XORM: Exclusive OR Memory with Constant
ROL: Rotate Accumulator Left
ROLTC: Rotate Accumulator Left Using TC
ROR: Rotate Accumulator Right
SFTA: Shift Accumulator Arithmetically
SFTC: Shift Accumulator Conditionally
SFTL: Shift Accumulator Logically
BIT :Test Bit
BITF: Test Bit Field Specified by Immediate Value
BITT :Test Bit Specified by T

3.7.4. Miscellaneous Load-Type and Store-Type Instructions


MVDD: Move Data From Data Memory to Data Memory With X, Y addressing

Prof. N Ajay Kumar, Dept. of ECE, SVIT 29


Module3_DSPAA

MVDK: Move Data From Data Memory to Data Memory With Destination Addressing
MVDM: Move Data From Data Memory to Memory-Mapped Register
MVDP: Move Data from Data Memory to Program Memory
MVKD: Move Data From Data Memory to Data Memory With Source Addressing
MVMD: Move Data From Memory-Mapped Register to Data Memory
MVMM: Move Data From Memory-Mapped Register to Memory-Mapped Register
MVPD: Move Data From Program Memory to Data Memory
PORTR: Read Data from Port
PORTW: Write Data to Port
READA: Read Program Memory addressed by Accumulator A and Store in Data
Memory
WRITA: Write Data to Program Memory Addressed by Accumulator A

Branch Instructions
B[D]: Branch Unconditionally
BACC[D]: Branch to Location Specified by Accumulator
BANZ[D]: Branch on Auxiliary Register Not Zero
BC[D]: Branch Conditionally
FB[D]: Far Branch Unconditionally
FBACC[D]: Far Branch to Location Specified by Accumulator

CALA[D]: Call Subroutine at Location Specified by Accumulator


CALL[D]: Call Unconditionally
CC[D]: Call Conditionally
FCALA[D]: Far Call Subroutine at Location Specified by Accumulator
FCALL[D]: Far Call Unconditionally

3.7.5. Interrupts Instruction


INTR: Software Interrupt
Allows user program to execute any Interrupt service subroutine
TRAP: Software Interrupt
Non Maskable interrupt and not affected by INTM.

3.7.6. Return Instructions


FRET[D]: Far Return
FRETE[D]: Enable Interrupts and Far Return From Interrupt
RC[D]: Return Conditionally
RET[D]: Return
RETE[D]: Enable Interrupts and Return From Interrupt.
RETF[D]: Enable Interrupts and Fast Return From Interrupt
3.7.7. Repeat Instructions
RPT: Repeat Next Instruction

Prof. N Ajay Kumar, Dept. of ECE, SVIT 30


Module3_DSPAA

RPTB[D]: Block Repeat


RPTZ: Repeat Next Instruction and Clear Accumulator
3.7.8. Stack Manipulating Instructions
FRAME: Stack Pointer Immediate Offset
POPD: Pop Top of Stack to Data Memory

Prof. N Ajay kumar, Dept. of ECE, SVIT 31


Module3_DSPAA

POPM: Pop Top of Stack to Memory-Mapped Register


PSHD: Push Data-Memory Value onto Stack

PSHM: Push Memory-Mapped Register onto Stack

3.7.9. Miscellaneous Program Control Instrcutions


SSBX: Set Status Register Bit

Prof. N Ajay Kumar, Dept. of ECE, SVIT 32


Module3_DSPAA

RSBX: Reset Status Register Bit

NOP: No Operation
RESET: Software Reset

Prof. N Ajay kumar, Dept. of ECE, SVIT 33


Module3_DSPAA

NOTE:

The Following instructions are very important:


Double instruction
1). DADD: Double precision ADD or dual 16- bit add to accumulator DADD Lmem, src[,dst]
2). DLD: Long word load to accumulator. DLD Lmem, dst
3). DST: Store later in long word DST SRC, Lmem
4). DSUB: Double precision subtract or dual 16 bit from accumulatorDADD Lmem, src
5). DSUBT: Load with subtract dual 16-bit load with subtract DSUBT Lmem, dst

Multiply instruction (MPY)’

MPY Xmem, Yemen, dst; where Xmem and Ymem are dual data memory operands and dst is
accumulator A orB.

The instruction multiplies data memory value by another data memory value and stores the result in
accumulator A or B.
The register T is loaded with the Xmem value you in the read memory phace.

dst← (Xmem) x (Ymem); T← (Xmem)

In the indirect addressing mode, the instruction can also multiply the contents of the auxiliary
register used for indirect addressing.

Problem P3.4: Describe the operation of the following MPY instructions.


1). MPY 13, B

Prof. N Ajay Kumar, Dept. of ECE, SVIT 34


Module3_DSPAA

2) MPY #01234h, A
3). MPY *AR2-, * AR4+ 0, B

Solution: instruction 1) multiply the current content of the T register by the content of the data
memory location 13 in the current data page. The result is placed in the accumulator B.

Instruction 2). multiplies the current content of the T register by the constant 1234h and places the
result in the accumulator A.

Instruction 3). multiplies the content of the memory pointed by AR2 by the content of the memory
pointed by AR4. the result is placed in the accumulator B. during this instruction execution, register
T is loaded with the content of the same data memory location pointed by
AR2. AR2 is then decremented by 1 and AR4 is updated by adding to it the content of AR0.

Multiply and Accumulate instruction (MAC)


This instruction is an improvement over the MPY instruction one of the several form that this
instruction can take is

MAC Xmem, Yemen, src, dst;


where Xmem and Ymem are dual data memory operands and src, dst are accumulator A and B.
The instruction multiplies data memory value by another data memory value and adds the product to
the content of the source, which may be either of the two accumulators A and B. The result is stored
in the other accumulator. The register T is loaded with Xmem value.

dst← (Xmem) x (Ymem) +(src); T← (Xmem)


Similar to the MPY instruction, this instruction can modify the content of the auxiliary register used
in the indirect addressing.

Problem P3.5: Describe the operation of the following MAC instruction

1). MAC *AR5+, #1234H, A

2). MAC *AR3-, * AR4+, B, A

Instruction 1) multiplies the content of data memory location pointed by AR5 by the constant 1234h,
and adds the product to the content of the accumulator A. During the execution the register is loaded
with the content of the data memory location pointed by AR5. AR5 is then incremented by one.

Instruction 2) multiply the content of the data memory pointed by AR3 by the content of the data
memory pointed by AR4. The contents of the accumulator B are added to the product and the result
is placed in the accumulator A. It is loaded with the content of the same data memory location
pointed by AR3. AR3 is decremented by 1 and AR4 is incremented by1.

The MAC instruction used for computing the sum of the series of the product terms.

Multiply and Subtract instruction (MAS)

Prof. N Ajay Kumar, Dept. of ECE, SVIT 35


Module3_DSPAA

This instruction is similar to the MAC instruction.

MAS Xmem, Yemen, src, dst; where Xmem and Ymem are dual data memory operands and src,
dst are accumulator A and B

The instruction multiplies data memory value by another data memory value and subtract the
product from the content of the source, which may be either of the two accumulators A and B. The
result is stored in the other accumulator. the register T is loaded with Xmem value in the read
memory phase.

dst←(src)- (Xmem) x (Ymem) ; T← (Xmem)

The indirect mode, in addition to the multiply operation, the instruction can modify the content of
the auxiliary registers used for indirect addressing.

Problem P3.6: Describe the operation of the following MAS instruction

1). MAS *AR3-, * AR4+, B, A

This instruction multiplies the content of the data memory pointed by AR3 by the content of the data
memory pointed by AR4.
The product is subtracted from the content of the accumulator B and the result is placed in the
accumulator A.
During this instruction, register T is loaded with the content of the same data memory location
appointed by AR3.
AR3 is then decremented by one and AR5 is incremented by the instruction used for computing the
butterflies in the fft implementation.

Multiply accumulate and delay instruction (MACD)

This instruction carries out all the functions of the MAC instruction and, in addition, copies the
content of the current data memory address to the next higher data memory address.
However, the two operands of the multipliers are required to be single data memory value and
program memory value.

This feature is equivalent to implementing the Z-1 delay encountered in digital Signal Processing
algorithms.
For this reason, the MACD instruction is often used for implementing FIR filters.
The format and all other features of the MACD instruction are the same as those of the MAC
instruction.

Repeat instruction
The format of the instruction is

RPT Smem ; Smem is a single data memory operand


RPT #k; k is the short or long constant.

Prof. N Ajay Kumar, Dept. of ECE, SVIT 36


Module3_DSPAA

This instruction loads the operand in the repeat counter, RC.


The instruction following the RPT instruction is repeated k+1 times, where k is the initial value of
the RC.
Due to the dedicated hardware support, the repeat instruction is used to repeat an instruction a given
number of times without any penalty for the looping.
It may be used to compute the sum of the product as required in the implementation of FIR filters.
Problem P3.7: Explain what is accomplished by the following instruction sequence.
RPT #2
MAC *AR1+, * AR2-, A

Solution:
The first instruction loads the register RC with 2.
This number is the repeat count for the next MAC instruction.
The MAC instruction executes 3 times.
It multiplies and accumulates in A the data locations contents pointed to by the register AR1 and
AR2.
After each multiply and pointer AR1 is incremented and pointer AR2 is decremented.

Block Repeat instruction (RPTB)


RPTB instruction has the format

RPTB pmad, where pmad is the program memory address denoting the end of the block of
instruction to be repeated.

• This instruction is similar to the RPT instruction, except that it repeat a block of the code
given number of times without any penalty for looping.
• One more than the number of times the block of the instructions is to be repeated is initially
loaded into the memory mapped block repeat counter register, BRC .

3.7.1 Programming Examples Assembly Files

❑ Describe steps to create executable output files

❑ Create an assembly file containing:

➢ Code

➢ Constants (initialized data)

➢ Variables

❑ Create a linker command file which:


➢ Identifies input and output files

➢ Describes a system’s available memory

➢ Indicates where code and data shall be located

❑ Develop multi-file systems

Prof. N Ajay Kumar, Dept. of ECE, SVIT 37


Module3_DSPAA

Figure 3.16. Assembly conventions diagram.

❑ Any ASCII text is O.K

❑ Use .asm extension

❑ Instructions & directives cannot be in first column

❑ Comments O.K any column after semicolon

❑ Mnemonics

❑ Lines of 320 code

❑ Generally written in upper case

❑ Become components of program memory

❑ Directives

❑ Begin with a period (.) and are lower case

❑ Can create constants and variables


❑ May occupy no memory space when used to control ASM and LNK
process

Prof. N Ajay Kumar , Dept. of ECE, SVIT 38


Module3_DSPAA

Table 3.6 Different COFF data types

The .bss Directive:

❑ Only directive with assembly label defined in the operand field

❑ Use separate .bss statements for each

❑ named variable

Prof. N Ajay Kumar, Dept. of ECE, SVIT 39


Module3_DSPAA

❑ Remember .bss by thinking::

➢ Block - reserves a block of memory

➢ Symbol - beginning at address symbol

➢ Size - of the specified size

❑ Example: Create a 5-word array ’x’

❑ .bss x, 5

Table 3.7 Different Assembler Directives

Table 3.8. Different Named section

Prof. N Ajay Kumar, Dept. of ECE, SVIT 40


Module3_DSPAA

Table 3.9. Summary of COFF Directive

Example: 1. Write a program to find the sum of a series of signed numbers stored at successive
locations in the data memory and places the result in the accumulator.
Solution:

➢ AR1 as pointer to the numbers.

➢ AR2 as counter for the numbers.

➢ Accumulator value set to zero.

➢ Sign extension mode is selected.

➢ Add each number into accumulator.

➢ Increment the pointer & decrement the counter.

➢ Repeat until count in AR2 reaches zero.


➢ Accumulator contains the sum of number.

This program computes the signed sum of data memory locations from address 410h to
41fh.The result is placed in A.
A=dmad(410h)+dmad(411h)+ ............. + dmad(41fh)

.mmregs
.global _c_int000
.text
._c_int00:
STM #10h, AR2 :initialize counter AR2=10h STM #410h,
AR2 :Initialize Pointer AR2=410h
LD #0h, A :Initialize sum A=0

Prof. N Ajay Kumar, Dept. of ECE, SVIT 41


Module3_DSPAA

SSBX SXM :Select sign extension mode START:


ADD *AR1+, A :Add the next data value BANZ START, *AR2-
:Repeat if not done
NOP :No operation

.end

Eaxmple2: Program to computes multiply and accumulate using direct addressing mode: Y (n)
=h0x(n)+h1x(n-1)+h2x(n-2)

Solution: data memory

➢ h0x(n),h1x(n-1)&h2x(n-2)are computed using MPY instruction


:(T)*(dmad)→Acc A or B

➢ Accumulator contain output value Acc (15-0) →dmad

Acc (31-16) →dmad+1


.global _c_int00

X .usect “Input Samples”, 3 Y .usect “outout”, 2


h .usect “coefficient”, 3
.text

_c_int00:

Prof.N Ajay Kumar, Dept. of ECE, SVIT 42


Module3_DSPAA

SSBX SXM ;Select sign extension mode


LD #h, DP ;Select the data page for coefficients
LD @h, T #x, ;get the coefficient h(0)
LD DP ;select the data page for input samples
MPY @x, A ; A = x(n) * h(0)
LD #h, DP ; select the data page for coefficients
LD @h+1, T ; get the coefficient h(1)
LD #x, DP ;select the data page for input singals
MPY @x+1, B ; B = x(n-1) * h(1)
ADD A, B ; B = x(n)*h(0) + x(n-1)*h(1)
LD #h, DP ; select the data page for coefficients
LD @h+2, T ; get the coefficient h(2)
LD #x, DP ;select the data page for input samples
MPY @x+2, B ; B = x(n-2) * h(1)
ADD A, B ; B = x(n)*h(0)+ x(n-1)*h(1) + x(n-2) * h(2)

LD #y, DP ; select the data page for outputs


STL B, @y ; save low part of output
STH B, @y+1 ; save high part of output
NOP ; No operation
.end

Example3: Program computes multiply and accumulate using indirect addressing mode

.global _c_int00

h .int 10, 20, 30


.text

_c_int00:
SSBX SXM ; Select sign extension mode
STM #310H, AR2 ; Initialize pointer AR2 for x(n) stored at 310H
STM @h, AR3 ; Initialize pointer AR3 for coefficients MPY
*AR2+,*AR3+, A ; A = x(n) * h(0)
MPY *AR2+,*AR3+, B ; A = x(n-1) * h(1)
ADD A, B ; B = x(n) * h(0) + x(n-1) * h(1)

MPY *AR2+,*AR3+, A ; A = x(n-2) * h(2)

Prof.N Ajay Kumar , Dept. of ECE, SVIT 43


Module3_DSPAA

ADD A, B ; B = x(n) * h(0) + x(n-1) * h(1) + x(n-2) * h(2)

STL B, *AR2+ ; Save low part of result


STH B, * AR2+ ; Save high part of result

NOP ; No operation

.end

Example4: Program computes multiply and accumulate using MAC instruction :

.global _c_int00
.data
.bss x, 3
.bss y, 2

h .int 10, 20, 30

.text

_c_int00:
SSBX SXM ; Select sign extension mode
STM #x, AR2 ; Initialize AR2 to point to x(n)

STM #h, AR3 ; Initialize AR3 to point to h(0)


LD #0H, A ; Initialize result in A = 0

RPT #2 ; Repeat the next operation 3 times


MAC *AR2+,*AR3+,A ; y(n) computed

STM #y, AR2 ; Select the page for y(n)


STL A, *AR2+ ; Save the low part of y(n)
STL A, *AR2+ ; Save the high part of y(n)

NOP ; No operation
.end

3.8 On chip peripherals:

Prof. N Ajay Kumar , Dept. of ECE, SVIT 44


Module3_DSPAA

It facilitates interfacing with external devices.

The peripherals are:

• General purpose I/O pins

• A software programmable wait state generator.

• Hardware timer

• Host port interface (HPI)

• Clock generator

• Serial port

3.8.1. It has two general purpose I/O pins:

• BIO→input pin used to monitor the status of external devices.

• XF →output pin, software controlled used to signal external devices


3.8.2. Software programmable wait state generator:

• Extends external bus cycles up to seven machine cycles.

3.8.3. Hardware Timer

• An on chip down counter

• Used to generate signal to initiate any interrupt or any other process

• Consists of 3 memory mapped registers:

➢ The timer register (TIM)

➢ Timer period register (PRD)

➢ Timer controls register (TCR)

• Pre scaler block (PSC).

• TDDR (Time Divide Down ratio)

• TINT &TOUT

o The timer register (TIM) is a 16-bit memory-mapped register that decrements at every
pulse from the prescaler block (PSC).

Prof. N Ajay Kumar, Dept. of ECE, SVIT 45


Module3_DSPAA

o The timer period register (PRD) is a 16-bit memory-mapped register whose contents
are loaded onto the TIM whenever the TIM decrements to zero or the device is reset
(SRESET).
o The timer can also be independently reset using the TRB signal. The timer control
register (TCR) is a 16-bit memory-mapped register that contains status and control bits.
o Table 3.10. shows the functions of the various bits in the TCR. The prescaler block is
also an on-chip counter.
o Whenever the prescaler bits count down to 0, a clock pulse is given to the TIM register
that decrements the TIM register by 1.
o The TDDR bits contain the divide-down ratio, which is loaded onto the prescaler block
after each time the prescaler bits count down to 0.
o That is to say that the 4-bit value of TDDR determines the divide-by ratio of the timer
clock with respect to the system clock.
o In other words, the TIM decrements either at the rate of the system clock or at a rate
slower than that as decided by the value of the TDDR bits.
o TOUT and TINT are the output signal generated as the TIM register decrements to 0.
TOUT can trigger the start of the conversion signal in an ADC interfaced to the DSP.
o The sampling frequency of the ADC determines how frequently it receives the TOUT
signal.
o TINT is used to generate interrupts, which are required to service a peripheral such as
a DRAM controller periodically.
o The timer can also be stopped, restarted, reset, or disabled by specific status bits.

Bit Name Function


15-12 Reserved Reserved; always read as 0.

11 Soft Used in conjunction with the free bit to determine the state of the timer
Soft=0,the timer stops immediately.
Soft=1,the timer stops when the counter decrements to 0.

10 Free Use in conjunction with the soft bit Free=0,the soft bit selects the timer
mode free=1,the timer runs free

Bit Name Function

9-6 PSC Timer prescaler counter, specifies the count for the on-chip timer

Prof. N Ajay Kumar, Dept. of ECE, SVIT 46


Module3_DSPAA

5 TRB Timer reload. Reset the on-chip timer.

4 TSS Timer stop status, stop or starts the on-chip timer.

3-0 TDDR Timer divide-down ration

Table 3.10. Pin details of software wait state generator

Figure 3.17. Logical block diagram of timer circuit.

3.8.4 Host port interface (HPI):

➢ Allows to interface to an 8bit or 16bit host devices or a host processor

➢ Signals in HPI are:

➢ Host interrupt (HINT)

➢ HRDY

➢ HCNTL0 &HCNTL1

➢ HBIL

➢ HR/𝑊

Prof. N Ajay Kumar, Dept. of ECE, SVIT 47


Module3_DSPAA

Fig 3.18. A generic diagram of the host port interface (HPI)

Important signals in the HPI are as follows:


▪ The 16-bit data bus and the 18-bit address bus.
▪ The host interrupt, Hint, for the DSP to signal the host when it attention is
required.
▪ HRDY, a DSP output indicating that the DSP is ready for transfer.
▪ HCNTL0 and HCNTL1, control signal that indicate the type of transfer to carry
out. The transfer types are data, address, etc.
▪ HBIL. If this is low it indicates that the current byte is the first byte; if it is high, it
indicates that it is second byte.
▪ HR/W indicates if the host is carrying out a read operation or a write operation

3.8.5. Clock Generator:

The clock generator on TMS320C54xx devices has two options-an external clock and the
internal clock. In the case of the external clock option, a clock source is directly connected
to the device. The internal clock source option, on the other hand, uses an internal clock
generator and a phase locked loop (PLL) circuit. The PLL, in turn, can be hardware
configured or software programmed. Not all

Prof. N Ajay Kumar, Dept. of ECE, SVIT 48


Module3_DSPAA

devices of the TMS320C54xx family have all these clock options; they vary from device
to device.

3.8.6. Serial I/O Ports:

Three types of serial ports are available:

➢ Synchronous ports.

➢ Buffered ports.

➢ Time-division multiplexed ports.

The synchronous serial ports are high-speed, full-duplex ports and that provide direct
communications with serial devices, such as codec, and analog-to-digital (A/D) converters. A
buffered serial port (BSP) is synchronous serial port that is provided with an auto buffering unit
and is clocked at the full clock rate. The head of servicing interrupts. A time-division multiplexed
(TDM) serial port is a synchronous serial port that is provided to allow time-division multiplexing
of the data. The functioning of each of these on-chip peripherals is controlled by memory-mapped
registers assigned to the respective peripheral.

3.9 Interrupts of TMS320C54xx Processors:

Many times, when CPU is in the midst of executing a program, a peripheral device may
require a service from the CPU. In such a situation, the main program may be interrupted by a signal
generated by the peripheral devices. This results in the processor suspending the main program in
order to execute another program, called interrupt service routine, to service the peripheral device.
On completion of the interrupt service routine, the processor returns to the main program to
continue from where it left.

Interrupt may be generated either by an internal or an external device. It may also be


generated by software. Not all interrupts are serviced when they occur. Only those interrupts that
are called nonmaskable are serviced when ever they occur. Other interrupts, which are called
maskable interrupts, are serviced only if they are enabled. There is also a priority to determine
which interrupt gets serviced first if more than one interrupts occur simultaneously.

Almost all the devices of TMS320C54xx family have 32 interrupts. However, the types and
the number under each type vary from device to device. Some of these interrupts are reserved for
use by the CPU.

3.10. Pipeline operation of TMS320C54xx Processors:

The CPU of ‘54xx devices have a six-level-deep instruction pipeline. The six stages of the
pipeline are independent of each other. This allows overlapping execution of

Prof. N Ajay Kumar, Dept. of ECE, SVIT 49


Module3_DSPAA

instructions. During any given cycle, up to six different instructions can be active, each at a different
stage of processing. The six levels of the pipeline structure are program prefetch, program fetch,
decode, access, read and execute.

1 During program prefetch, the program address bus, PAB, is loaded with the address of
the next instruction to be fetched.
2 In the fetch phase, an instruction word is fetched from the program bus, PB, and loaded
into the instruction register, IR. These two phases from the instruction fetch sequence.
3 During the decode stage, the contents of the instruction register, IR are decoded to
determine the type of memory access operation and the control signals required for the
data-address generation unit and the CPU.
4 The access phase outputs the read operand’s on the data address bus, DAB. If a second
operand is required, the other data address bus, CAB, also loaded with an appropriate
address. Auxiliary registers in indirect addressing mode and the stack pointer (SP) are
also updated.
5 In the read phase the data operand(s), if any, are read from the data buses, DB and CB.
This phase completes the two-phase read process and starts the two- phase write
processes. The data address of the write operand, if any, is loaded into the data write
address bus, EAB.
6 The execute phase writes the data using the data write bus, EB, and completes the
operand write sequence. The instruction is executed in this phase.

Figure 3.19. Pipeline operation of TMS320C54xx Processors

Prof. N Ajay Kumar, Dept. of ECE, SVIT 50


Module3_DSPAA

Figure 3.20. Pipe flow diagram

Eaxmple1: Show the pipeline operation of the following sequence of instructions if the initial
value of AR3 is 80 & the values stored in memory location 80, 81, 82 are 1, 2 & 3. LD *AR3+, A
ADD #1000h, A STL A, *AR3+

Figure 3.21. Pipeline operation for above example1

Prof. N Ajay Kumar, Dept. of ECE, SVIT 51


Example 2). Show the pipeline operation of the following sequence of instructions if the initial
value of AR1, AR3, A are 84,81,1 & the values stored in memory location 81,82,83,84 are
2,3,4,6. Also provide the values of registers AR3, AR1, T & accumulator A ,after completion of
each cycle.
ADD *AR3+, A LD
*AR1+, T MPY *AR3+, B
ADD B, A,
.
.
.

Figure 3.22. Pipeline operation for above example2

You might also like