0% found this document useful (0 votes)
4 views

Module 3

The document discusses programmable digital signal processors (DSPs), particularly focusing on the TMS320 family from Texas Instruments, which includes various architectures and features that enhance performance for applications in communication, control, and consumer electronics. It details the architecture of the TMS320C54xx processors, highlighting their multiple memory spaces, advanced arithmetic units, and efficient instruction sets that support high-speed DSP algorithms. Additionally, it covers the internal memory organization, status registers, and various addressing modes that facilitate operand access and execution of instructions.

Uploaded by

chiragbengre10
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 3

The document discusses programmable digital signal processors (DSPs), particularly focusing on the TMS320 family from Texas Instruments, which includes various architectures and features that enhance performance for applications in communication, control, and consumer electronics. It details the architecture of the TMS320C54xx processors, highlighting their multiple memory spaces, advanced arithmetic units, and efficient instruction sets that support high-speed DSP algorithms. Additionally, it covers the internal memory organization, status registers, and various addressing modes that facilitate operand access and execution of instructions.

Uploaded by

chiragbengre10
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 119

Programmable digital signal processors

Introduction
 Leading manufacturers of integrated circuits such as Texas

Instruments (TI), Analog devices & Motorola manufacture the


digital signal processor (DSP) chips
 These manufacturers have developed a range of DSP chips with

varied complexity
 The TMS320 family consists of two types of single chips DSPs: 16-

bit fixed point & 32-bit floating-point


 These DSPs possess the operational flexibility of high-speed

controllers and the numerical capability of array processors


Commercial Digital Signal-Processing Devices
 There are several families of commercial DSP devices

 Right from the early eighties, when these devices began to appear in

the market, they have been used in numerous applications


 Such as communication, control, computers, Instrumentation,

and consumer electronics


 The architectural features and the processing power of these devices

have been constantly upgraded based on the advances in technology


and the application needs
 However, their basic versions, most of them have Harvard

architecture, a single-cycle hardware multiplier, an address


generation unit with dedicated address registers, special addressing
modes, on-chip peripherals interfaces
 Of the various families of programmable DSP devices that are

commercially available, the three most popular ones are those from
Texas Instruments, Motorola, and Analog Devices
 Texas Instruments was one of the first to come out with a

commercial programmable DSP with the introduction of its


TMS32010 in 1982
Table1: Summary of the Architectural Features of three fixed-Points
Architectural Feature TMS320C25 DSP 56000 ADSP 2100
Data representation 16-bit fixed point 24-bit fixed format 16-bit fixed format
format

Hardware multiplier 16X16 24X24 16X16


ALU 32 bits 56 bits 40 bits
Internal bus 16-bit program bus 24-bit program bus 24-bit program bus
16-bit data bus 2X24-bit data buses 16-bit data bus
24-bit global data bus 16-bit result bus

External bus 16-bit program/data 24-bit program/data 24-bit program bus


bus bus 16-bit data bus

On-chip memory 544 words RAM 512 words PROM -


4k words ROM 2X256 words data
RAM
2X256 words data
ROM
Off-chip memory 64K words program 64K words program 16K words program
64K words data 2X64K words data 16K words data

Cache memory - - 16 words program

Instruction cycle time 100nsec 97.5 nsec 125nsec

Special addressing Bit reversed Modulo Modulo


modes Bit reversed Bit reversed

Data address modes 1 2 2

Interfacing features Synchronous serial I/O Synchronous and DMA


DMA asynchronous serial
I/O DMA
The architecture of TMS320C54xx digital signal processors
 TMS320C54xx processors retain in the basic Harvard architecture

of their predecessor, TMS320C25, but have several additional


features, which improve their performance over it
 Figure 3.1 shows a functional block diagram of TMS320C54xx

processors
 They have one program and three data memory spaces with

separate buses, which provide simultaneous accesses to program


instruction and two data operands and enables writing of result at
the same time
 Part of the memory is implemented on-chip and consists of

combinations of ROM, dual-access RAM, and single-access RAM


 Transfers between the memory spaces are also possible

 The central processing unit (CPU) of TMS320C54xx processors

consists of
1. A 40-bit arithmetic logic unit (ALU)

2. Two 40-bit accumulators

3. A barrel shifter

4. A 17x17 multiplier

5. A 40-bit adder

6. Data address generation logic (DAGEN) with its own arithmetic


unit
7. Program address generation logic (PAGEN)
 These major functional units are supported by a number of registers

and logic in the architecture


 A powerful instruction set with a hardware-supported, single-

instruction repeat and block repeat operations, block memory move


instructions, instructions that pack two or three simultaneous reads,
and arithmetic instructions with parallel store and load make these
devices very efficient for running high-speed DSP algorithms
 Several peripherals, such as a clock generator, a hardware timer, a

wait state generator, parallel I/O ports, and serial I/O ports, are also
provided on-chip
 These peripherals make it convenient to interface the signal

processors to the outside world


 In these following sections, we examine in detail the various

architectural features of the TMS320C54xx family of processors

Functional architecture for TMS320C54xx processors


Bus Structure
 The performance of a processor gets enhanced with the provision of

multiple buses to provide simultaneous access to various parts of


memory or peripherals
 The 54xx architecture is built around four pairs of 16-bit buses with

each pair consisting of an address bus and a data bus


 As shown in Figure above these are -

1. The program bus pair (PAB, PB); which carries the instruction
code from the program memory
2. Three data bus pairs (CAB, CB; DAB, DB; and EAB, EB); which
interconnected the various units within the CPU
3. In Addition the pair CAB, CB and DAB, DB are used to read from
the data memory, while the pair EAB, EB; carries the data to be
written to the memory
 The 54xx can generate up to two data-memory addresses per cycle

using the two auxiliary register arithmetic unit (ARAU0 and


ARAU1) in the DAGEN block
 It enables accessing two operands simultaneously
Central Processing Unit (CPU)
 The 54xx CPU is common to all the 54xx devices

 The 54xx CPU contains

1. A 40-bit arithmetic logic unit (ALU)

2. Two 40-bit accumulators (A and B)

3. A barrel shifter

4. A 17 x 17-bit multiplier

5. A 40-bit adder

6. A compare, select and store unit (CSSU)

7. An exponent encoder (EXP)

8. A data address generation unit (DAGEN) and

9. A program address generation unit (PAGEN)


 The ALU performs 2’s complement arithmetic operations and bit-

level Boolean operations on 16, 32, and 40-bit words


 It can also function as two separate 16-bit ALUs and perform two

16-bit operations simultaneously


 Figure below show the functional diagram of the ALU of the

TMS320C54xx family of devices


Functional diagram of the central processing unit of the
TMS320C54xx processors
Accumulators A and B
 It stores the output from the ALU or the multiplier/adder block and

provide a second input to the ALU


 Each accumulators is divided into three parts: guards bits (bits 39-

32), high-order word (bits-31-16), and low-order word (bits 15-


0), which can be stored and retrieved individually
 Each accumulator is memory-mapped and partitioned

 It can be configured as the destination registers

 The guard bits are used as a head margin for computations


Barrel shifter
 It provides the capability to scale the data during an operand read or

write
 No overhead is required to implement the shift needed for the scaling

operations
 The 54xx barrel shifter can produce a left shift of 0 to 31 bits or a

right shift of 0 to 16 bits on the input data


 The shift count field of status registers ST1, or in the temporary

register T
 Figure below shows the functional diagram of the barrel shifter of

TMS320C54xx processors
 The barrel shifter and the exponent encoder normalize the values in

an accumulator in a single cycle


 The LSBs of the output are filled with 0s, and the MSBs can be

either zero filled or sign extended, depending on the state of the sign-
extension mode bit in the status register ST1
 An additional shift capability enables the processor to perform

numerical scaling, bit extraction, extended arithmetic, and overflow


prevention operations
Functional diagram of the barrel shifter
Multiplier/adder unit
 The kernel of the DSP device architecture is multiplier/adder unit

 The multiplier/adder unit of TMS320C54xx devices performs 17 x

17 2’s complement multiplication with a 40-bit addition effectively


in a single instruction cycle
 In addition to the multiplier and adder, the unit consists of control

logic for integer and fractional computations and a 16-bit temporary


storage register, T
 Figure below show the functional diagram of the multiplier/adder

unit of TMS320C54xx processors


 The compare, select, and store unit (CSSU) is a hardware unit

specifically incorporated to accelerate the add/compare/select


operation
 This operation is essential to implement the Viterbi algorithm used in

many signal-processing applications


 The exponent encoder unit supports the EXP instructions, which

stores in the T register the number of leading redundant bits of the


accumulator content
 This information is useful while shifting the accumulator content for

the purpose of scaling


Internal Memory and Memory-Mapped Registers
 The amount and the types of memory of a processor have direct

relevance to the efficiency and performance obtainable in


implementations with the processors
 The 54xx memory is organized into three individually selectable

spaces: program, data, and I/O spaces


 All 54xx devices contain both RAM and ROM

 RAM can be either dual-access type (DARAM) or single-access type

(SARAM)
 The on-chip RAM for these processors is organized in pages having

128 word locations on each page


 The 54xx processors have a number of CPU registers to support

operand addressing and computations


 The CPU registers and peripherals registers are all located on page 0

of the data memory


 Figure (a) and (b) shows the internal CPU registers and peripheral

registers with their addresses


 The processors mode status (PMST) registers that is used to

configure the processor


 It is a memory-mapped register located at address 1Dh on page 0 of

the RAM
 A part of on-chip ROM may contain a boot loader and look-up tables

for function such as sine, cosine, μ- law, and A- law


(a) Internal memory-mapped registers of TMS320C54xx processors
(b) Peripheral registers for the TMS320C54xx processors
Status registers (ST0, ST1)
 ST0: Contains the status of flags (OVA, OVB, C, TC) produced by

arithmetic operations & bit manipulations


 ST1: Contain the status of various conditions & modes

 Bits of ST0 & ST1 registers can be set or clear with the SSBX &

RSBX instructions
 PMST: Contains memory-setup status & control information

Status register0 diagram


 ARP: Auxiliary register pointer

 TC: Test/control flag

 C: Carry bit

 OVA: Overflow flag for accumulator A

 OVB: Overflow flag for accumulator B

 DP: Data-memory page pointer

Status register1 diagram


 BRAF: Block repeat active flag

 BRAF=0, the block repeat is deactivated

 BRAF=1, the block repeat is activated

 CPL: Compiler mode

 CPL=0, the relative direct addressing mode using data page pointer

is selected
 CPL=1,the relative direct addressing mode using stack pointer is

selected
 HM: Hold mode, indicates whether the processor continues internal

execution or acknowledge for external interface


 INTM: Interrupt mode, it globally masks or enables all interrupts

 INTM=0_all unmasked interrupts are enabled

 INTM=1_all masked interrupts are disabled

 0: Always read as 0

 OVM: Overflow mode

 OVM=1_the destination accumulator is set either the most positive

value or the most negative value


 OVM=0_the overflowed result is in destination accumulator

 SXM: Sign extension mode

 SXM=0 _Sign extension is suppressed

 SXM=1_Data is sign extended


 C16: Dual 16 bit/double-Precision arithmetic mode

 C16=0_ALU operates in double-Precision arithmetic mode

 C16=1_ALU operates in dual 16-bit arithmetic mode

 FRCT: Fractional mode

 FRCT=1_the multiplier output is left-shifted by 1bit to compensate

an extra sign bit


 CMPT: Compatibility mode

 CMPT=0_ ARP is not updated in the indirect addressing mode

 CMPT=1_ARP is updated in the indirect addressing mode

 ASM: Accumulator Shift Mode. 5 bit field, & specifies the Shift

value within -16 to 15 range


Processor Mode Status Register (PMST)

 INTR: Interrupt vector pointer, point to the 128-word program page

where the interrupt vectors reside


 MP/MC: Microprocessor/Microcomputer mode,

 MP/MC=0, the on chip ROM is enabled

 MP/MC=1, the on chip ROM is disabled

 OVLY: RAM OVERLAY, OVLY enables on chip dual access data

RAM blocks to be mapped into program space


 AVIS: It enables/disables the internal program address to be visible

at the address pins


 DROM: Data ROM, DROM enables on-chip ROM to be mapped

into data space


 CLKOFF: CLOCKOUT off

 SMUL: Saturation on multiplication

 SST: Saturation on store


Data Addressing Modes of TMS320C54X Processors
 Data addressing modes provide various ways to access operands to

execute instructions and place results in the memory or the registers


 The 54XX devices offer seven basic addressing modes

1. Immediate addressing
2. Absolute addressing
3. Accumulator addressing
4. Direct addressing
5. Indirect addressing
6. Memory mapped addressing
7. Stack addressing
1. Immediate addressing
 The instruction contains the specific value of the operand

 The operand can be short (3,5,8 or 9 bit in length) or long (16 bits in

length)
 The instruction syntax for short operands occupies one memory

location,
Example: LD #20, DP
RPT #0FFFFh
2. Absolute Addressing
 The instruction contains a specified address in the operand

(i). Dmad addressing


 MVDK Smem, dmad

 MVDM dmad, MMR

(ii). Pmad addressing


 MVDP Smem, pmad

 MVPD pmem, Smad

(iii). PA addressing
 PORTR PA, Smem

(iv). *(lk) addressing


Example:
 MVKP 1000h, *AR5; 1000 H →*AR5 (dmad addressing)
 MVPD 1000h, *AR7 ; 1000h →*AR7 (pmad addressing)

 PORTR 05h, *AR3 ; 05h →*AR3 (PA addressing)

 LD *(1000h), A ; *(1000h)→ A (*(lk) addressing)

3. Accumulator Addressing
 Accumulator content is used as address to transfer data between

Program and Data memory


Ex: READA *AR2
4. Direct Addressing
 Base address + 7 bits of value contained in instruction = 16 bit

address
 A page of 128 locations can be accessed without change in DP or SP

 Compiler mode bit (CPL) in ST1 register is used

 If CPL =0 selects DP

 CPL = 1 selects SP

 It should be remembered that when SP is used instead of DP, the

effective address is computed by adding the 7-bit offset to SP


Block diagram of the direct addressing mode for TMS320C54xx
Processors
5. Indirect Addressing
 Data space is accessed by address present in an auxiliary register

 54xx have 8, 16 bit auxiliary register (AR0 – AR7)

 Two auxiliary register arithmetic units (ARAU0 & ARAU1)

 Used to access memory location in fixed step size

 AR0 register is used for indexed and bit reverse addressing modes

 For single – operand addressing

 MOD → type of indirect addressing

 ARF →AR used for addressing

 ARP depends on (CMPT) bit in ST1

 CMPT = 0, Standard mode, ARP set to zero

 CMPT = 1, Compatibility mode, Particularly AR selected by ARP


Block diagram of the indirect addressing mode for TMS320C54xx
Processors
Table : Indirect addressing options with a single data –memory operand
Circular Addressing
Operand syntax Function
*ARx Addr = ARx;
*ARx - Addr = ARx ; ARx = ARx -1
*ARx + Addr = ARx; ARx = ARx +1
*+ARx Addr = ARx+1; ARx = ARx +1
*ARx - 0B Addr = ARx ; ARx = B(ARx – AR0)

*ARx – 0 Addr = Arx ; ARx = ARx – AR0


*ARx + 0 Addr = Arx ; ARx = ARx +AR0
*ARx + 0B Addr = ARx ; ARx = B(ARx + AR0)

*ARx - % Addr = ARx ; ARx = circ(ARx – 1)

*+AR – 0% Addr = ARx; ARx = circ(ARx - AR0)

*ARx + % Addr = ARx ; ARx = circ(ARx + 1)


 Used in convolution, correlation and FIR filters

 A circular buffer is a sliding window contains most recent data

 Circular buffer of size R must start on a N-bit boundary, where 2N >

R
 The circular buffer size register (BK): specifies the size of circular

buffer
 Effective base address (EFB): By zeroing the N LSBs of a user

selected AR (ARx)
 End of buffer address (EOB) : By replacing the N LSBs of ARx with

the N LSBs of BK
 If 0 _ index + step < BK ; index = index +step;

else if index + step ≥ BK ; index = index + step - BK;


else if index + step < 0; index + step + BK

Block diagram of the circular addressing mode for TMS320C54xx Processors


Circular addressing mode implementation for TMS320C54xx Processors
Bit-Reversed Addressing
 Used for FFT algorithms

 AR0 specifies one half of the size of the FFT

 The value of AR0 = 2N-1: N = integer FFT size = 2N

 AR0 + AR (selected register) = bit reverse addressing

 The carry bit propagating from left to right


Dual-Operand Addressing
 Dual data-memory operand addressing is used for instruction that

simultaneously perform two reads (32-bit read) or a single read (16-


bit read) and a parallel store (16-bit store) indicated by two vertical
bars, II
 These instructions access operands using indirect addressing mode

 If in an instruction with a parallel store the source operand the

destination operand point to the same location, the source is read


before writing to the destination
 Only 2 bits are available in the instruction code for selecting each

auxiliary register in this mode


 Thus, just four of the auxiliary registers, AR2-AR5, can be used

 The ARAUs together with these registers, provide capability to

access two operands in a single cycle


 Figure below shows how an address is generated using dual data-

memory operand addressing


Table Function of the different field in dual data memory operand addressing

Name Function
Opcode This field contains the operation code for the instruction
Xmod Defined the type of indirect addressing mode used for accessing the Xmem
Operand
XAR Xmem AR selection field defines the AR that contains the address of Xmem
Ymod Defines the type of indirect addressing mode used for accessing the Ymem
Operand
Yar Ymem AR selection field defines the AR that contains the address of Ymem
Block diagram of the Indirect addressing options with a dual data –memory
operand
6. Memory-Mapped Register Addressing
 Used to modify the memory-mapped registers without affecting the

current data page pointer (DP) or stack-pointer (SP)


– Overhead for writing to a register is minimal
– Works for direct and indirect addressing
– Scratch –pad RAM located on data PAGE0 can be modified
 STM #x, DIRECT

 STM #tbl, AR1


16 bit memory mapped register address generation

7. Stack Addressing
 Used to automatically store the program counter during interrupts

and subroutines
 Can be used to store additional items of context or to pass data

values
 Uses a 16-bit memory-mapped register, the stack pointer (SP)

 PSHD X2
Values of stack & SP before and after operation

1. Assuming the current content of AR3 to be 200h, what will be its


contents after each of the following TMS320C54xx addressing
modes is used? Assume that the contents of AR0 are 20h
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3-
e. *AR3
f. *+AR3(40h)
g. *+AR3(-40h)

Solution:
a. AR3 ←AR3 + AR0;
AR3 = 200h + 20h = 220h
b. AR3 ←AR3 - AR0;
AR3 = 200h - 20h = 1E0h
c. AR3 ← AR3 + 1;
AR3 = 200h + 1 = 201h
d. AR3 ← AR3 - 1;
AR3 = 200h - 1 = 1FFh
e. AR3 is not modified
AR3 = 200h
f. AR3 ←AR3 + 40h;
AR3 = 200 + 40h = 240h
g. AR3 ← AR3 - 40h;
AR3 = 200 - 40h = 1C0h
2. Assume that the register AR3 with contents 1020h is selected as
the pointer for the circular buffer. Let BK = 40h to specify the
circular buffer size as 40h. Determine the start and the end
addresses fort the buffer. What will be the contents of register
AR3 after the execution to the instruction LD*AR3 + 0%, A, if
the contents of register AR0 are 0025h?

Solution:
 AR3 = 1020h means that currently it points to location 1020h

 Masking the lower 6 bits zeros gives the start address of the buffer as

1000h
 Replacing the same bits with the BK gives the end address as 1040h
 The Instruction LD*AR3 + 0%, A modifies AR3 by adding AR0 to it

and applying the circular modification


 It yields

 AR3 = circ(1020h+0025h) = circ(1045h) = 1045h - 40h = 1005h

 Thus the location 1005h is the one pointed to by AR3

3. Assuming the current contents of AR3 to be 200h, what will be its


contents after each of the following TMS320C54xx addressing
modes is used? Assume that the contents of AR0 are 20h
a. *AR3 + 0B
b. *AR3 – 0B
Solution:
a. AR3 ← AR3 + AR0 with reverse carry propagation;
AR3 = 200h + 20h (with reverse carry propagation) = 220h
b. AR3 ← AR3 - AR0 with reverse carry propagation;
AR3 = 200h - 20h (with reverse carry propagation) = 23Fh

 Program memory: To store program instructions & tables used in the

execution of programs
 Organized into 128 pages, each of 64k word size
Table: Function of different pin PMST register

PMST bit Logic On-chip memory configuration


MP/MC 0 ROM enabled
1 ROM not available
OVLY 0 RAM in data space
1 RAM in program space
DROM 0 ROM not in data space
1 ROM in data space
Memory map for the TMS320C5416 Processor
Program Control
 It contains program counter (PC), the program counter related

H/W, hard stack, repeat counters & status registers


 PC addresses memory in several ways namely:

 Branch: The PC is loaded with the immediate value following the

branch instruction
 Subroutine call: The PC is loaded with the immediate value

following the call instruction


 Interrupt: The PC is loaded with the address of the appropriate

interrupt vector
 Instructions such as BACC, CALA, etc ;The PC is loaded with the

contents of the accumulator low word


 End of a block repeat loop: The PC is loaded with the contents of

the block repeat program address start register


 Return: The PC is loaded from the top of the stack
TMS320C54xx Instructions and programming
Assembly language instructions can be classified as:
 Arithmetic operations

1. Addition instruction: ex-ADD, ADDC


2. Subtract instruction: ex-SUB, SUBB
3. Multiply instruction: ex-MPY, MPYA
4. Multiply accumulate instruction: ex-MAC, MACD
5. Multiply subtract instruction: ex-MAS, MASA
6. Double (32-bit operand) instruction: ex-DADD, DSUB
7. Application specific instruction: ex-EXP, LMS
 Load and store instructions

1. Load instruction: ex-LD, LDM


2. Store instruction: ex-ST, STM
3. Conditional store instruction: ex-CMPS, STRCD
4. Parallel load and store instruction: ex-LDǁST
5. Parallel load and Multiply instruction: ex-LDǁMPY
6. Parallel store and add/sub instruction: ex-STǁADD, STǁSUB
7. Parallel store and multiply instruction: ex-STǁMPY, STǁMAC
8. Miscellaneous load type instruction: ex-MVDD, MVPD
 Logical operations

1. AND instruction: ex AND, ANDM


2. OR instruction: ex OR, ORM
3. XOR instruction: ex XOR, XORM
4. Shift instruction: ex ROL, SFTL
5. Test instruction: ex BIT, CMPM
 Program-control operations

1. Branch instruction: ex B, BACC


2. Call instruction: ex CALL, CALA
3. Interrupt instruction: ex INTR, TRAP
4. Return instruction: ex RET, FRET
5. Repeat instruction: ex RPT, RPTB
6. Stack manipulating: ex PUSH, POP
7. Miscellaneous PC instruction: ex IDLE, RESET

MPY: Multiply With/Without Rounding


Syntax: Operation:
1: MPY[R] Smem, dst (T)x(Smem)→dst
2: MPY Xmem, Ymem, dst
(Xmem)x(Ymem)→dst
(Xmem)→T
3:MPY #1k, dst (T)x1k→dst
4:MPY Smem, #1k, dst (Smem)x1k→dst
(Smem)→T
Operands:
 Smem: Single data-memory operand

 Xmem, Ymem: Dual data-memory operands

 dst: A (accumulator A)

B (accumulator B)
 –32 768 ≤ lk ≤32 767

Status Bits:
 Affected by FRCT and OVM

 Affects OVdst
MPYA: Multiply by Accumulator A
Syntax: Operation:
1:MPYA Smem (Smem)x(A(32-16))→B
(Smem)→T
2:MPYA dst (T)x(A(32-16))→dst
Operands:
 Smem: Single data-memory operand

 dst: A (accumulator A)

B (accumulator B)
Status Bits:
 Affected by FRCT and OVM

 Affects Ovdst (OVB in syntax1)


MPYU: Multiply Unsigned
Syntax: Operation:
 MPYU Smem, dst unsigned(T)x unsigned(Smem)→dst
Operands:
 Smem: Single data-memory operand

 dst: A (accumulator A)

B (accumulator B)
Status Bits:
 Affected by FRCT and OVM

 Affects OVdst
MAC[R]: Multiply Accumulate With/Without Rounding
Syntax Operation
1:MAC[R]Smem, src (Smem)x(T)+(src)→src

2:MAC[R]Xmem, Ymem, src[dst] (Xmem)x(Ymem)+(src)→dst


(Xmem)→T
3:MAC #1k, src[dst] (T)x1k+(src)→dst
4:MACSmem, #1k, src[dst] (Smem)x1k+(src)→dst
(Smem)→T
Operands:
 Smem: Single data-memory operand

 dst: A (accumulator A)

B (accumulator B)
 –32 768 ≤ lk ≤32 767

Status Bits:
 Affected by FRCT and OVM

 Affects OVdst (or OVsrc, if dst is not specified)


MACA[R]: Multiply by Accumulator A and Accumulate With/Without
Rounding
Syntax Execution
1:MACA[R] Smem[B] (Smem)x(A(32-16))+(B)→B
(Smem)→T
2:MACA[R] T, src[dst] (T)x(A(32-16))+(src)→dst
Operands:
 Smem: Single data-memory operand

 dst: A (accumulator A)

B (accumulator B)
Status Bits:
 Affected by FRCT and OVM

 Affects OVdst (or OVsrc, if dst is not specified) and OVB in syntax 1
MACD: Multiply by Program Memory and Accumulate With
Delay
Syntax:
 MACD Smem, pmad, src

Operands:
 Smem: Single data-memory operand

 src: A (accumulator A)

B (accumulator B)
 0≤pmad≤65 535

Execution
 pmad→PAR
 if(RC)≠0

 Then

 (Smem)xPmem addressed by PAR)+(src)→src

 (Smem)→T

 (Smem)→Smem+1

 (PAR)+1→PAR

 Else

 (Smem)x(Pmem addressed by PAR)+(src)→src

 (Smem)→T

 (Smem)→Smem+1
Status Bits:
 Affected by FRCT and OVM

 Affects Ovsrc

MACP: Multiply by Program Memory and Accumulate


Syntax
 MACP Smem, pmad, src
Operands:
 Smem: Single data-memory operand

 src: A (accumulator A)

B (accumulator B)
 0≤pmad≤65 535

Execution:
 pmad→PAR

 if(RC)≠0

 Then

 (Smem)xPmem addressed by PAR)+(src)→src

 (Smem)→T
 (PAR)+1→PAR

 Else

 (Smem)x(Pmem addressed by PAR)+(src)→src

 (Smem)→T

 Status Bits:

 Affected by FRCT and OVM

 Affects OVsrc
MACSU: Multiply Signed by Unsigned and Accumulate
Syntax Execution
 MACSU Xmem, Ymem, src

unsigned(Xmem)xsigned(Ymem)+(src)→src
(Xmem)→T
Operands
 Xmem,Ymem: Dual data-memory operands

 Src :A(accumulator A)
B(accumulator B)
Status Bits:
 Affected by FRCT and OVM

 Affects OVsrc
MAS[R] :Multiply and Subtract With/Without Rounding
Syntax Execution
1: MAS[R] Smem, src (src)-(Smem)x(T)→src
2: MAS[R] Xmem,Ymem, src[dst] (src)-(Xmem)x(Ymem)→dst
(Xmem)→T
Operands
 Smem: Single data-memory operands
 Xmem,Ymem: Dual data-memory operands

 Src, dst :A(accumulator A)


B(accumulator B)
Status Bits:
 Affected by FRCT and OVM

 Affects OVdst (or OVsrc, if dst=src)


MASA[R] :Multiply by Accumulator A and Subtract With/Without
Rounding
Syntax Execution
1:MASA Smem [B] (B)-(Smem)x(A(32-16))→B
(Smem)→T
2:MASA[R] T,src[,dst] (src)-(T)x(A(32-16))→dst
Operands
 Smem: Single data-memory operands

 Src, dst :A(accumulator A)


B(accumulator B)
Status Bits:
 Affected by FRCT and OVM

 Affects OVdst(or OVsrc, if dst is not specified)and OVB in syntax 1


Repeat Instructions
 RPT: Repeat Next Instruction
 RPTB[D]: Block Repeat
 RPTZ: Repeat Next Instruction and Clear Accumulator
 Programming Examples

 Basic assembler directives


Example 1: Write a program to find the sum of a series of signed
numbers stored at successive locations in the data memory and
places the result in the accumulator

Solution:
 AR1as pointer to the numbers

 AR2 as counter for the numbers

 Accumulator value set to zero

 Sign extension mode is selected

 Add each number into accumulator

 Increment the pointer & decrement the counter

 Repeat until count in AR2 reaches zero


Accumulator contains the sum of number

This program computes the signed sum of data memory locations from

address 410h to 41fh


The result is placed in A

A=dmad(410h)+dmad(411h)+………..+ dmad(41fh)

mmregs
.global _c_int000
.text
._c_int00:
STM #10h, AR2 :initialize counter AR2=10h
STM #410h, AR1 :Initialize Pointer AR1=410h
LD #0h, A :Initialize sum A=0
SSBX SXM :Select sign extension mode
START:
ADD *AR1+, A :Add the next data value
BANZ START, *AR2- :Repeat if not done
NOP :No operation
.end

Example 2: Program to computes multiply and accumulate using


direct addressing mode: Y(n) =h0x(n)+h1x(n-1)+h2x(n-2)
Solution: data memory
 h0x(n), h1x(n-1) & h2x(n-2) are computed using MPY

instruction
 (T)*(dmad)→Acc A or B

 Accumulator contain output value

 Acc (15-0) →dmad

 Acc (31-16) →dmad+1


.global _c_int00
X .usect “Input Samples”, 3
Y .usect “output”, 2
H .usect “coefficients”, 3
.text
_c_int00:
SSBX SXM ;Select sign extension mode
LD #h, DP ;Select the data page for coefficients
LD @h, T ;get the coefficient h(0)
LD #x, DP ;select the data page for input samples
MPY @x, A ; A = x(n) * h(0)
LD #h, DP ; select the data page for
coefficients
LD @h+1, T ; get the coefficient h(1)
LD #x, DP ;select the data page for input signals
MPY @x+1, B ; B = x(n-1) * h(1)
ADD A, B ; B = x(n)*h(0) + x(n-1)*h(1)
LD #h, DP ; select the data page for coefficients
LD @h+2, T ; get the coefficient h(2)
LD #x, DP ;select the data page for input samples
MPY @x+2, A ; A = x(n-2) * h(1)
ADD A, B ; B = x(n)*h(0)+ x(n-1)*h(1) + x(n-2) * h(2)
LD #y, DP ; select the data page for outputs
STL B, @y ; save low part of output
STH B, @y+1 ; save high part of output
NOP ; No operation
.end
Example 3: Program computes multiply and accumulate using
indirect addressing mode

.global _c_int00
h .int 10, 20, 30
.text
_c_int00:
SSBX SXM ; Select sign extension mode
STM #310H, AR2 ; Initialize pointer AR2 for x(n) stored
at 310H
STM @h, AR3 ; Initialize pointer AR3 for
coefficients
MPY *AR2+,*AR3+, A ; A = x(n) * h(0)
MPY *AR2+,*AR3+, B ; B = x(n-1) * h(1)
ADD A, B ; B = x(n) * h(0) + x(n-1) * h(1)
MPY *AR2+,*AR3+, A ; A = x(n-2) * h(2)
ADD A, B ; B = x(n) * h(0) + x(n-1) * h(1) + x(n-2) * h(2)
STL B, *AR2+ ; Save low part of result
STH B, * AR2+ ; Save high part of result
NOP ; No operation
.end

Example 4: Program computes multiply and accumulate using


MAC instruction :

.global _c_int00
.data
.bss x, 3
.bss y, 2
h .int 10, 20, 30
.text
_c_int00:
SSBX SXM ; Select sign extension mode
STM #x, AR2 ; Initialize AR2 to point to x(n)
STM #h, AR3 ; Initialize AR3 to point to h(0)
LD #0H, A ; Initialize result in A = 0
RPT #2 ; Repeat the next operation 3 times
MAC *AR2+,*AR3+, A ; y(n) computed
STM #y, AR2 ; Select the page for y(n)
STL A, *AR2+ ; Save the low part of y(n)
STL A, *AR2+ ; Save the high part of y(n)
NOP ; No operation
.end
On chip peripherals
 It facilitates interfacing with external devices

 The peripherals are:

 General purpose I/O pins

 A software programmable wait state generator

 Hardware timer

 Host port interface (HPI)

 Clock generator

 Serial port

1. It has two general purpose I/O pins:


 BIO→input pin used to monitor the status of external devices

 XF →output pin, software controlled used to signal external devices


2. Software programmable wait state generator:
 Extends external bus cycles up to seven machine cycles

3. Hardware Timer
 An on chip down counter

 Used to generate signal to initiate any interrupt or any other process

 Consists of 3 memory mapped registers:

i. The timer register (TIM)


ii. Timer period register (PRD)
iii. Timer controls register (TCR)
 Pre scaler block (PSC)

 TDDR (Time Divide Down ratio)

 TIN &TOUT
 The timer register (TIM) is a 16-bit memory-mapped register that

decrements at every pulse from the prescaler block (PSC)


 The timer period register (PRD) is a 16-bit memory-mapped register

whose contents are loaded onto the TIM whenever the TIM
decrements to zero or the device is reset (SRESET)
 The timer can also be independently reset using the TRB signal

 The timer control register (TCR) is a 16-bit memory-mapped register

that contains status and control bits


 Table shows the functions of the various bits in the TCR

 The prescaler block is also an on-chip counter


 Whenever the prescaler bits count down to 0, a clock pulse is given

to the TIM register that decrements the TIM register by 1


 The TDDR bits contain the divide-down ratio, which is loaded onto

the prescaler block after each time the prescaler bits count down to 0
 That is to say that the 4-bit value of TDDR determines the divide-by

ratio of the timer clock with respect to the system clock


 In other words, the TIM decrements either at the rate of the system

clock or at a rate slower than that as decided by the value of the


TDDR bits
 TOUT and TINT are the output signal generated as the TIM register

decrements to 0
 TOUT can trigger the start of the conversion signal in an ADC

interfaced to the DSP


 The sampling frequency of the ADC determines how frequently it

receives the TOUT signal


 TINT is used to generate interrupts, which are required to service a

peripheral such as a DRAM controller periodically


 The timer can also be stopped, restarted, reset, or disabled by specific

status bits
Logical block diagram of timer circuit
4. Host port interface (HPI):
 Allows to interface to an 8bit or 16bit host devices or a host

processor
 Signals in HPI are:

 Host interrupt (HINT)

 HRDY

 HCNTL0 &HCNTL1
 HBIL

 HR/W

A generic diagram of the host port interface (HPI)


Important signals in the HPI are as follows:
 The 16-bit data bus and the 18-bit address bus

 The host interrupt, Hint, for the DSP to signal the host when it

attention is required
 HRDY, a DSP output indicating that the DSP is ready for transfer

 HCNTL0 and HCNTL1, control signal that indicate the type of

transfer to carry out


 The transfer types are data, address, etc

 HBIL. If this is low it indicates that the current byte is the first byte;

if it is high, it indicates that it is second byte


 HR/W indicates if the host is carrying out a read operation or a write

operation
5. Clock Generator:
 The clock generator on TMS320C54xx devices has two options-an

external clock and the internal clock


 In the case of the external clock option, a clock source is directly

connected to the device


 The internal clock source option, on the other hand, uses an internal

clock generator and a phase locked loop (PLL) circuit


 The PLL, in turn, can be hardware configured or software

programmed
 Not all devices of the TMS320C54xx family have all these clock

options; they vary from device to device


6. Serial I/O Ports:
Three types of serial ports are available:

i. Synchronous ports

ii. Buffered ports

iii. Time-division multiplexed ports


The synchronous serial ports are high-speed, full-duplex ports and

that provide direct communications with serial devices, such as


codec, and analog-to-digital (A/D) converters
A buffered serial port (BSP) is synchronous serial port that is

provided with an auto buffering unit and is clocked at the full


clock rate

 A time-division multiplexed (TDM) serial port is a synchronous

serial port that is provided to allow time-division multiplexing of the


data
 The functioning of each of these on-chip peripherals is controlled by

memory-mapped registers assigned to the respective peripheral


Interrupts of TMS320C54xx Processors
 Many times, when CPU is in the midst of executing a program, a

peripheral device may require a service from the CPU


 In such a situation, the main program may be interrupted by a signal

generated by the peripheral devices


 This results in the processor suspending the main program in order to

execute another program, called interrupt service routine, to


service the peripheral device
 On completion of the interrupt service routine, the processor returns

to the main program to continue from where it left


 Interrupt may be generated either by an internal or an external

device
 It may also be generated by software

 Not all interrupts are serviced when they occur

 Only those interrupts that are called nonmaskable are serviced

whenever they occur


 Other interrupts, which are called maskable interrupts, are serviced

only if they are enabled


 There is also a priority to determine which interrupt gets serviced

first if more than one interrupts occur simultaneously


 Almost all the devices of TMS320C54xx family have 32 interrupts

 However, the types and the number under each type vary from

device to device
 Some of these interrupts are reserved for use by the CPU

Pipeline operation of TMS320C54xx Processors


 The CPU of ‘54xx devices have a six-level-deep instruction pipeline

 The six stages of the pipeline are independent of each other

 This allows overlapping execution of instructions

 During any given cycle, up to six different instructions can be active,

each at a different stage of processing


 The six levels of the pipeline structure are program prefetch,

program fetch, decode, access, read and execute


1. During program prefetch, the program address bus, PAB, is loaded
with the address of the next instruction to be fetched
2. In the fetch phase, an instruction word is fetched from the program
bus, PB, and loaded into the instruction register, IR
These two phases from the instruction fetch sequence
3. During the decode stage, the contents of the instruction register, IR
are decoded to determine the type of memory access operation and
the control signals required for the data-address generation unit and
the CPU
4. The access phase outputs the read operand’s on the data address bus,
DAB
 If a second operand is required, the other data address bus, CAB,

also loaded with an appropriate address


 Auxiliary registers in indirect addressing mode and the stack pointer

(SP) are also updated


5. In the read phase the data operand(s), if any, are read from the data
buses, DB and CB
 This phase completes the two-phase read process and starts the two

phase write processes


 The data address of the write operand, if any, is loaded into the data

write address bus, EAB


6. The execute phase writes the data using the data write bus, EB, and
completes the operand write sequence
 The instruction is executed in this phase

Pipeline operation of TMS320C54xx Processors


Pipe flow diagram
Example 1: Show the pipeline operation of the following sequence
of instructions if the initial value of AR3 is 80 & the values stored
in memory location 80, 81, 82 are 1, 2 & 3
LD *AR3+, A
ADD #1000h, A
STL A, *AR3+
Pipeline operation for above example1

Example 2: Show the pipeline operation of the following sequence


of instructions if the initial value of AR1, AR3, A are 84,81,1 &
the values stored in memory location 81, 82, 83, 84 are 2, 3, 4, 6,
Also provide the values of registers AR3, AR1, T & accumulator
A , after completion of each cycle
ADD *AR3+,A
LD *AR1+, T
MPY *AR3+, B
ADD B, A

Pipeline operation for above example2


Some assembler directives:
Assembler Description
Directive
.mmregs Permits the memory map register to be refered using names such
as AR0,SP etc
.include “XX” Informs the assembler to insert a list of instructions in the file XX
to be inserted in this place and assemble it
.end The end of assembly language program
.data Assemble into data memory area
.text Assembler into program memory area
.equ Equate a symbol to a constant
.word x,y,....z Reserves 16 bit location and initialise them with values x,
y,...z.this may be used in both the text and data section
.space n Reserve and initialize n bits of memory and when a label is used
with this directive, the label is assigned the address of first word
of the block reserved.
.bes n Reserve and initialize n bits of memory and when a label is used
with this directive, the label is assigned the address of last word
of the block reserved.

You might also like