Unit-5 DSP Processor
Unit-5 DSP Processor
7.1 Introduction
The DSP processors were introduced in the early 1980s to handle information
which was available in digital form. Since then it has made very rapid progress
to handle very complex problems. The DSP processors are divided into two broad
categories as general purpo'se and special purpose processors. The special purpose
programmable digital signal processors are designed with features that are specif-
ically required for digital signal processing applications. Some of the features
include multiplier and multiplier accumulator (MAC), modified bus structures,
multiple access memory, multi-port memory and pipelining. But the conventional
microprocessor which is a general purpose microprocessor does not have these
features. An advanced microprocessor or a RISC processor may use some of the
techniques of programmable digital signal processors or may even have instructions
that are specifically required for DSP applications or they may have performances
close to that of programmable Digital Signal Processors for certain operations.
However in terms of low power requirement, cost, real time input/ouput capa-
bility and availability of on chip memories, the programmable DSPs have an
advantage over the advanced microprocessors. Some of the P-DSPs include fixed
point devices such as Texas instruments TMS320C5x, TMS320C54x and Motorola
DSP563x and floating point processors such as Texas instruments TMS320C4x and
TMS320C67xx. Some of the features specifically required for performing digital
signal processing operations are described ahead.
Digital Signal Processor 7.3
....
:om the dat7memory. The processing unit consIsts of the re~lsters and processln~
elements such as MAC units, multiplier, ALU, shifter, etc. Wlth.the Harv~rd archI-
status Opcode
Data bus tecture the number of memory accesses/clock cycle is two. ThIS can be,,~cr~~~e~
furthe; by using more number of b~ses .. Several P-DSPs follow th e o 1 Ie
Harvard architecture which is shown In FIgure 7A. h'l
~,
One set of bus is used to access a memory that has both program and data wthl e
....... Instruction DatalInstruction ..... another has data alone. Data can also be transferred from one memory to ano er.
Control Data and
Program
Unit '\7 ...... Memory
Results/Operands
... ...
-"-
Data
Address Processing
~
...
-"-
Memory
Unit
Figure 7.2 Von-Neumann architecture.
~
Address
During first clock period the instruction code can be fed from the program status Opcode
memory to the control unit. In' second clock period one of the operands is fed to
processing unit from the program memory. In third clock period the second operands
is fed to control unit from the data memory. In fourth clock period, write the content
•
Control Instruction
of the data memory with address dma with location with the address dIna + 1. The Unit ~
Program
...
-"-
Memory
MACD instruction to be executed in a Inachine with "Von-Neumann Architecture" Address
is shown in Figure 7.2. It requires four clock cycles because it has single address
bus and a single data bus for accessing the program as well as data memory area. Figure 7.3 Harvard architecture.
In a computer with Von-Neumann architecture, the CPU can be either reading
an instruction or reading/writing data from/to the memory. 'Both cannot occur at
the same time since the instruction and data use the same signal pathways and
memory. The Von-Neumann architecture consists of three buses namely the data
bus, the address bus and control bus. Results/Operands ...
...
~
-"-
Data
Processing Memory
......
Unit
• Data bus: Transfers the data between CPU and its peripheral. It is bi-
~
directional. The CPU can read or write data in the peripheral.
• Address bus: CPU uses the address bus to indicate which peripherals it wants Address
to access and within each peripheral with specific register: It is unidirectional. status Opcode
The CPU always writes the address, which is ,read by the peripherals.
• Control bus: It carries signals that are used to manage and synchronize the
exchange between the CPU and its peripherals, as well as that indicates if the ~~ Address .......
i
Program
CPU wants to read or write the peripheral. Control ... Instruction
f"""
Memory
Unit -"-
...
Address
One of the ways by which the number of clock cycles required for the memory
access can be reduced is to use more than one bus for both address, and data. That Figure 7.4 Modified Harvard architecture.
is implemented in "Harvard Architecture".
Digital Signal Processor 7.7
7.6 Digital Signal Processing
This type of architecture is used in P-DSPs from Texas Instruments and Analog
devices.
J.. \ Multi ported register file j
... A~
DARAM - Dual access RAM, pemlits two memory access/clock period. control
unit
If DARAM is connected to a processing unit of the P-DSP (by using Har-
vard architecture) with two independent data and address buses, four memory
accesses/clock period can be achieved.
.. ..
\ Functional ............
.. r
Functional j
unit I umtn
Another architecture used for P-DSPs is the Very Long Instruction Word (VLIW)
architecture. In this architecture, P-DSPs have a number of processing units such Disadvantages
as ALUs, MAC units, shifters, etc. The VLIW is accessed from memory and is 1· It may not always be possible to have independent stream of data for proc~ssi~:.
used to specify the operands and operations to be performed by each of the data 2~ The number of functional units is also l~mited by the hardware cost or e
paths. The block diagram ofVLIWarchitecture is shown in Figure 7.6. multi-ported register file and cross bar sWItch. .
• Multiple functional units share a COlnmon multi-ported register file for fetching 3. High power consumption, and high program memory band~Idth.
the operands and storing the results; " 4. Misleading MIPs ratings.
7.8 Digital Signal Processing Digital Signal Processor 7.9
One of the approaches adopted for increasing the efficiency of the advanced . _ In processor with pipelining, the functional units can be kept busy almost all time
processors as well as P-DSPs is by using instruction .' I" An' mlc.ro . by processing a number of instructions simultaneously in the CPU. Consider the
I " . pIpe Inlng. InstructIOn
cyc e startIng WIth the fetchIng of. an instruction and ending wI'th th t' f processor with four functional units. Four instructions It,12,!} and14 can be pro-
th . tru f . I d' . e execu IOn 0 cessed simultaneously as shown in Figure 7.8. When It enters the decode phase, h
fe I~S . C IOn I~C u I~g the tIme storage of the results can be split into a number
o mlcro~nstructlOns. Execution of each of the microinstructions is also referred to can enter the opcode fetch phase. When It enters the operand read phase, h enters
~:no~: ;atdsteo' bFo~ e~amplhe, an instruction cycle requiring four microinstructions the decode phase and h enters the opcode fetch phase. When It enters the execute
e In lour p ases as follows: phase, h enters the operand read phase, h enters the decode phase and 14 enters
the opcode fetch phase. The pipe1ining is fully loaded now and all the functional
1. Fetch phase in. whic~ instruction is fetched from the program memo units have useful work to do.
i' ~ecOde
. . emo~
phase In WhIC~ the i.nstruction is decoded in the instruction r~ister.
read phase In WhICh the operand required for the execution of the
Value ofT Fetch Phase Decode Phase Read Phase Execute Phase
7.6 Architecture ofTMS320C5x TMS3205x), it indicates that NMOS technology is used for the IC and on-chip non-
volatile memory is a ROM. Under C5x itself there are thee processors, 'C50, 'C51
T~.1S320C5x is a 16 bit fixed point processor. The DSP chips have IC number and 'C54, that have identical instruction set but have differences in the capacity of
wIth prefix .TMS320 (Texas instruments). The next letter indicates that CMOS on-chip ROM and RAM. The instruction set of TMS320C5x and other DSP chips
technology. IS used for the IC and the on-chip non-volatile memory is a ROM. If the is superior to the instruction set of conventiQnal microprocessors such as 8085,
next letter IS ~ (e.g., TMS320E5x) it indicates that the technology used is CMOS Z80, etc., as most of the instructions require only a single cycle for execution. The
and the on-chIp non-volatile memory is an EPROM. If it is neither of these (e.g., block diagram of the internal architecture ofTMS320C5x is shown in Figure 7.9.
It has advanced Harvard architecture because they have separate memory bus for
program and data and has instructions that enable data transfer between the program
Data Bus
t .~ ~
and data memory area.
Program
ROM
.'
Data/Program
SARAM
Memory
...
'" r
Host
... IJo. Control The program and data buses can work together to transfer data from on-chip data
registers ~f+ Prot 104- f-+ 18
Multiplier
, Interface memory and internal or external program memory to the multiplier for single -
Interrupt Accmulator
... h • Hardware cycle multiply/accumulate operations.
.. 1""
Imba 1S-
stack
Generation
Auxiliary
register
ACC buffer
Shifters
Arithmetic
Parallel I+- M Test\ }
emulation 1-+ 7
"""- ation 1 Arithmetic logic unit
'" Y
Oscillator
logic
unit
logic unit
(ALU)
(PLU) t-~ 7.6.2 Central Arithmetic LogiC Unit (CALU)
(ARAU)
... Timer I •... Instruction It consists of the following elements:
register
~ ~ (i) (16 x 16) bit parallel multiplier: It performs 16x 16 multiplication of num-
, , ber represented in 2's complement form. The 16 bit temporary. register 0
Data Bus (TREGO) holds the multiplicand. The other operand forthe multiplication can
be specified using one of the addressiIlg modes. The 32 bit PREG (Product
Figure 7.9 Internal architecture ofTMS320C5x. register) holds the result of multiplication.
Digital Signal Processor 7.13
7.6.3 Auxiliary Register ALU (ARAU) All these registers are 16 bit wide.
It consists of following elements: 1. Repeat counter register (RPTC) holds the repeat count in a repeat single
instruction operation and is loaded by the RPT and RPTZ instructions.
~~) Eig~t 16.b!t auxiliary registers (ARs) ARO-AR7. 2. Block repeat counter register (BRCR) holds the count value for the block
~~~) 3 bI~ aUXIlIary register pointer (ARP). repeat feature. This value is loaded before a block repeat operation is initiated.
(uz) UnsIgned 16 bitALU. 3. Block repeat program address start register (PASR) indicates the 16 bit
address where the repeated block of code starts.
ARAU calculates indirect addresses b usi' . 4. Block repeat program address end register (PAER) indicates the 16 bit
Register (INDX) and auxiliary regist y ng mputs commg fromARS, 16 bit index
index the current AR while the data :~~mpa~e re~iste~ (A~CR). ARAU can auto
address where the repeated block of code ends. The PASR and PAER are loaded
index either by ±I or by the contents ofo:le =on IS bemg addressed~and can by the RPTB instructions.
does not require the CALU for dd . . X. As a result, accessIng data
:6 a ress mampulatIOn Therefo th CALU'
or other operations in parallel This mak h ' ' . re, e IS free
compared to the conventional ~icropro es t eFlnstructIOns to be executed faster 7.6.4 Parallel LogiC Unit (PLU)
cessor. or example: It performs boolean operations or the bit manipulation required of high speed con-
troller. The PLU can set, clear, test or toggle the bits in a status register control
In8085 (MOV A, M; INX iI)
register or any data memory location. The PLU allows logic operations to be per-
formed on data memory values directly without affecting the contents of the ACC
ing mode and HL register used as the ad: to e. load~d. using indirect address-
These instructions enable the accumulat b
or PREG. Results of a PLU function are written back to the original data memory
instructi.ons can be replaced by sI'ngle' truest~ po~nter IS Incremented. These two
Ins c IOn In C5x location.
LACC *+, O-any one of the auxilia
pointer and incremented. The register that TY.~efsters ~an be ~ed as the address
. .
oftheARP. Some of the other registers of~U e udsethd ~s specI~ed by the content 7.6.5 Memory Mapped Registers
an elr functIOns are as follows.
The 'C5x has 96 memory mapped registers mapped into page 0 of the data memory
INDEX Register (INDX) space. All ' C5x DSPs have 28 CPU registers and 16 input/output (110) port register
but have different numbers of peripheral and reserved registers. Since the memory
_ mapped registers are a component of the data memory space, they can be written
by more than I) to modifY the address in the tep va~ue ~ad?l1:J.on or subtraction
The 16 bit INDX is used by the ARAU as a s ' .
to and read from in the same way as any other data memory location. The memory
example when the ARAU t ARS dunng mdrrect addressing. For
mapped registers are used for indirect data address points, temporary storage, CPU
~cremented
. ' s eps across a row of a matrix th . d' d
by I. However, when theARAU ste ' em rreet a dress is status and control, or integer arithmetic processing through the ARAU.
mcremented by the dimension of the matrix. ps down a column, the address is
Digital Signal Processor 7.15
7.14 Digital Signal Processing
Data Memory Page Pointer Bits (DP): These bits specify the address of the
7.6.6 Program Controller current data memory page.
Auxiliary Register Buffer (ARB): It holds the previous value contained in the
~he program controller contains logic circuits that perform the
tIons: c: 11OWIng
. opera-
10 ARP in STO. Whenever the ARP is loaded, the previous ARP value is copied to
the ARB, except when using the LST #0 instruction. When the ARB is loaded
1. Decodes the instructions. using the LST #1 instruction, the same value is also copied to the ARP.
2. Manages the CPU pipeline. On-chip RAM Configuration Control Bit (CNF): It enables the on-chip dual
3. Stores the status of CPU operation. access RAM block 0 (DARAM BO) to be addressable in data memory space
4. Decodes the conditional operations. or program memory space.
oP!::~:dt~~e~oncurrent m~ory
Parallelism of architecture lets the processor £ (i) If CNF = 0, the on-chip DARAM BO is mapped into data memory space.
o?erations, !etches an instruction, reads an The CNF bit can be cleared by CLRC CNF instruction.
t: 11
gIven machIne cycle. It consists of the 10 . e1ements: an operand 111 any
OW111g (ii) If CNF = 1, the on-chip DARAM BO is mapped into program memory
1. 16 b~t Program Counter (PC). space. The CNF bit can be set by SETC CNF instruction.
• SAMM - Store accumulator in memory - mapped register. The advantage of this addressing mode is that the address of the block of.memory
• SMMR - Store memory - mapped register. to be acted upon can be changed during execution of the program. For example:
ARP
AR1 I 325hl
ARP
AR1 I
o325hl
The 8 bit CBCR enables and disables the circular buffer operation. First, the start
and end addresses are loaded into corresponding buffer register. Next, a value
between the start and end registers for the circular buffer is loaded into an AR. The
corresponding circular buffer enable bit in the CBCR should be set.
Data memory
825h ~
~
8;;~ memory
D
I 1234h I
7.8 Instruction Sets
25h II 2345h 25h II 2345h
7.8.1 Addition/Subtraction Instructions
ACC ~ ACC ~ In the addition/subtraction instructions of 'C5x, one of the operands is ACC. The
other operand can be PREG, ACCB or the content of memory fetched using one of
After execution
is loaded into of the LAMM*'IllStructlon,
. . data memory location 25h
the value III the addressing modes. .
bits of PAB ACC~5h corresponds to the lower-order 7 bits ofAR1 and the higher
are rna e to be 0 as the MMR corresponds to page O.
7.B. 1. 1 Addition Instruction
7.7.5 Dedicated-Register Addressing 1. ADD dma, [shift] (direct addressing): The contents ofthe data memory address
(dma) or a 16 bit constant are shifted left as defined by the shift code and added
Ing mode except that the add
, .
~
The dedicated - register addressing mode operates like the I . .
. ong ImmedIate address-
ress comes lrom one of two sp . I
to the contents of the accumulator and the result is stored in ACC. For example:
mapped registers in the CPU: ' eCla - purpose memory ADD 55h, 2: ACC is added with the content of a data memory with dma 55h in
the current page after shifting it left by two positions.
~~) Block ~ov~ Addr~ss R~gister (BMAR). ADD {immediate}, [shift], [ARn] (indirect addressing): short immediate-
( ) DynamIc BIt ManIpulatIOn Register (DBMR). ADD #K and long immediate -ADD #2K, [shift].
Digital Signal Processor 7.25
7.24 Digital Signal Processing
Before execution After execution Before execution After execution of SBB instruction
~
(dma) in the current page after shifting it left by specified shift position. For
ARP 1 ARP
example:
AR1 I 2100h I AR1 I 2100h I (i) SUB 25h, 2: ACC is subtracted with the content of data memory with dma
25h after shifting it left by two. position.
Data memory
2100h I 4563hI Data memory
2100h I 4563h I (ii) SUB *, 2: ACC is subtracted with the cohtent of location pointed by the
current AR after shifting it left by two position.
ACC I 1234h I ACC 19CFAh I Other subtraction instructions: SBBB, SUBB, SUBC, SUBS, SUBT and SBRK.
2. AD~ #K: The 8 bit i~ediate constant value is added to the current auxiliary
regIster (AR). The result IS stored in the AR. For example: . 7.8.2 Multiplication Instruction
In multiply instructions, one ofthe operands is taken from TREGO and other operand
ADRK#25h
is specified using one of the addressing modes.
Before· execution After execution 1. MPY: Multiply numbers in 2 's complement form.
_ Direct addressing: MPY dma - The contents of TREGO are multiplied by the
ARP
D ARP
~ contents of data memory address and the result is stored in the PREG.
PREG is shifted as specified by the PM bits and added to the ACC before the Other AND instruction is ANDB: The contents of the ACC are ANDed with the
PREG is loaded with the product. For example: contents of the ACCB. The result is stored in theACC and the contents of the ACCB
are unaffected.
MACD OFFOFh, ISh
OR instruction
Before execution After execution
ORing the ACC with a long constant K, the content of a dma or pma and the result
~ ~
Data memory Data memory
315h 315h is stored in the ACe.
Data mem01Y
316h §J Data memory
316h 1
45h
l
1. Direct addressing: OR dma.
2. Indirect addressing: OR {ind} [, ARn]·
~ ~
Programme memory Programme memory 3. Long imn1ediate: OR # lk, [shift].
FFOFh FFOFh
For example: OR * ARI
TREGO I 1234h I TREGO
~ After execution
PREG I4563h I PREG I I Before execution
ACCH.
SACB: Store the contents of the ACC in ACCB~
STO ISE07h I STO 11E07h I
SACH: Store ACCH, with left shift in data memory address. STI I 09AOh I STI I OS87h I
SACL: Store ACCL, with left shift in data memory address.
SAMM: Store ACCL in memory - mapped register. ARP
CD ARP ~
LAR: Load data memory value to ARX (Auxiliary register).
SAR: Store ARX in data memory location. (iv) LT 300h
LDP: Load data memory value to DP (Data Pointer) bits.
MAR: Modify auxiliary register. Before execution After execution
SPLK: Store long immediate in data memory loca~ion.
LPH: Load data memory value to PREG higher byte. Data memory I. 60h I Data memory I6Qhl
300h 300h ~
LT: Load data memory value to TREGO.
PAC: Load PREG with shift specified by PM bits to ACC. TREGO ~ TREGO ~
SPH: Store PREG higher byte, with shift specified by PM bits in data memory
location. (v) LDP#30h
LST: Load data memory value to STO.
Load data memory value to STI.
SST: Store STO in data memory location.
: Store STI in data memory location. DP ~ DP
Digital Signal Processor 7.31
7.30 Digital Signal Processing
\
For example:
(vi) LACC *,4
(i) BLDPOOh
Before execution After execution --~--B-e-{t-o-re--ex-e-c-u-ti~o-n------------A-f~te-r-e-x-e-c-u~ti-o-n------
100h
ACC
100h
ACC
Programme memory
1850h \1254h ,
Programme memory
1850h \ 2523h I
(ii) BLPD OOh
(vii) LACB
------------------------~~-
After execution
Before execution
Before execution After execution
Data memory ~ Data memory
ACC 1 0000 1234h I ACC 1234 5678h I 300h ~ 300h
Programme memory
ACCB 11234 5678h I ACCB 1234 5678h 1
Programme memory
2850h IIIOOh I 2850h
Data memory
Data memory
AR4 1 300h I AR4 300h ~ 300h
7.8.9 Repeat Instructions 'C54x consists of eight 16 bit buses i.e., four program/data buses and four address
buses. Program bus (PB) carries the instruction code and immediate operands from
. RPT: Repeat next instruction: The iteration .. . program memory.
addressing, indirect addressing short 11 colunt c.an be ~pecIfied usmg dIrect
. , a s we as ong ImmedIate.addressing.
1. DIrect addressing: RPT dma. Data buses (CB, DB and EB)
2. Indirect addressing: RPT *.
CB and DB carry the operands that are read from data memory. EB carries the data
3. Short imm"ediate addressing: RPT #7. to be written to memory. Address buses (PAB, CAB, DAB and EAB) - carry the
4. Long immediate addressing: RPT #2345h. addresses needed for instruction execution.
RPTB: 'Block of instructions repeated The nu b . ..
repeat count register (BRCR). .. m er of repeats speCIfied m block
RPTZ: Repeat preceded by clearing both ACC and PREG. 7.9.2 Internal Memory Organization
For example: The '54x memory is organized into three individually selectable spaces: program,
data and input/output spaces (DARAM and SARAM). It contains both RAM and
RPTZ #K: Clears both ACC d PREG d
times. . an an then executes the next instruction· K. ROM.
On-chip ROM: The on-chip ROM is part of program memory space (20K. word)
and in some cases, part of the data memory. Therefore the ROM may be mapped
7.8.10 IN and OUT Instruction into both data and program space (8K. words)
On-chip Dual access RAM (DARAM): It consists of several blocks and each
IN: Reads a 16 bit number from inp t rt d . .
OUT: Reads a 16 bit b· fi u po an stores It In the data memory location. block can be accessed twice per machine cycle, the CPU can read from and write
num er rom data memory and writes it onto the output port. to a single block ofDARAM in the same cycle. It is mapped in to program space
(5K. word).
Digital Signal Processor ].35'
7.34 Digital Signal Pro~essing
Table 7.5 Comparison of the features of 5x and 54x.
Program Address Generation Data Address Generation
Logic (PAGEN) Logic (DAGEN)
5x 54x
Description
PC.1PTR. RC ARAUO, ARAU1 ARO-AR7
BRC, RSA, REA ARP, BK. DP. SP One, PB One, PB
Name of program bus
DB and CB (for Read)
Name of the data bus One, DB
EB (for write)
PAB, CAB, DAB, EAB
Name of address buses PAB,DAB
32 bitALU 40bitALU
MainALU
Memory and 40 bit ACCA and ACCB
Accumulators 32 bitACC
Extemallnterface
0-16 bit left shift 40 bit: 0-31 left shift
Barrel shitter 0-15 right shift
0-16 bit right shift
16 x 16 bit 17 x 17-bit
Multiplier
Peripheral 40 bit
Interface Adder 32 bit
ARAUO & ARAUI
Auxiliary Register ALU ARAU
ARO-AR7 ARO-AR7
Auxiliary registers
Not available' 16 bit: SP
Stack pointer (SP)
Two 16 bit start & end register 16 bit BK
Circular buffer register
16 bit PMST, STU, STI 16 bit PMST, STO, STI
Status registers
16 bit BRCR, PASR, PAER 16 bit BRC, RSA, REA
Block repeat registers
16 bit PC
Program counter 16 bit PC
Not available 7 bitXPC
Extended program memory
16 bit IMR and IFR
Interrupt registers 16-bit IMR and IFR
Same as that of 5x
General purpose I/O BIO andXF
SWWSR
Wait state generator PDWSR
Same as that of 5x
Hardware timer 16 bit timer
Same as that of 5x
Clock generator PLL based
Full duplex and double buffered Same as that of 5x
Synchronous serial port
Multiplier (17* 17) Upto 7 devices using TDM can Same as that of 5x
TDM serial ports
communicate serially
Standard 5x serial port with Same as that of 5x
Buffered serial port
additional.auto buffering unit
8 bit standard HPI 8 bit standard HPI or
Host port interface enhanced 8 bit and .
16 bit HPI
Available
Multichannel buffered serial Not available
port including internal
programmable clock and
other advanced features
Available
On-chip ROM for look up Not available
table for A law, Jl, law
companding, sine wave
Figure 7.12 Block-diagram of 54x internal hardware. generation
7.36 Digital Signal Processing
CMPT (Compatibility mode): It determines the compatibility mode for the ARP.
ARP (Auxiliary register pointer)' It sel .. ASM (Accumulator shift mode): Specify the shift value within -16 through 15 range
indirect addressing. . ects the aUXIlIary register to be used in is coded as a 2 's complement form.
TC (Test/control flage): TC stores the It '. Processor Mode Status Register (PMST): It is loaded with memory - mapped
test bit operations. TC is affected ~es~ s of the anthmetIc logic unit (AtU) register instruction such,as STM.
CMPS and SFTC instructions. The s~a~u~ BIT, BITF, BITT, CMPM, CMPR,
the conditional branch call exe t 'd (set or cleared) of TC detennines if Auxiliary Registers \
C (carry) - C = 1 ifresult of, dd't'
, cu e an return in t t'
s ruc IOns are executed.
a 1 IOn generates a ca C - 0 'f 'C54x also includes eight auxiliary registers and a software stack to enable a
generates a borrow. rry. - 1 result of subtraction
highly-optimized C compiler. The eight 16 bit auxiliary register (ARO-AR7) can
OVA (overflow flag for ACCA) - OVA = 1 h .
ALU or multiplier's adder in ACCA w en an overflow Occurs In either the be accessed by the CPU and modified by the auxiliary register arithmetic units
(ARAUS). The primary function of the auxiliary registers is to generate 16 bit
OVB (overflow flag for ACCB) - OVA~ 1 h .
ALU or multiplier's adder in ACCB w en an overflow occurs In either the addresses for data space.
DP (Data - me~ory page pointer): This 9-bit field i . Barrel Shifter
LSBs ofan Instruction word to c. d' s concatenated WIth the seven
. I lonn Irect memory add f 16 b'
SIng e data memory operand addressin Th' . re~s 0 ItS address for The 40 bit barrel shifter of 'C54x can perform arithmetic and logical shifts by up to
mode bit is STI (CPL) = o. g. IS operatIOn IS done if the compiler 31 bits left or by up to '16 bits rights in a single instruction cycle. Shifter inputs can
come directly from data memory or from either of the two accumulators., Shifter
outputs can be sent to the ALU or stored in memory.
7.38 Digital Signal Processing
The barrel shifter is also used for scaling operatio s such as: Digital Signal Processor 7.39
l1
• operation.
Pre-scaling an input data memory operand or the ACC value before an ALU
. . Ie from one of six different instructio~s can be
instructions. Dunng any gIven cyc, If The six levels and functIOn of the
• Perfonning a logical or arithmetic shift of the ACC value. active, each at a different stages of comp e IOn.
• Nonnalizing the ACC. pipeline structure are:
• Post-scaling the ACC before storing the accumulator value into data memory. • Program pre-fetch.
Compare, Select and Store Unit (CSSU) • Program fetch.
• Decode.
• Access.
• Read.
old state If (met 1 + D 1) > (met 2+D ) • Execute ..
2
then new met 1 =met 1+Dl
2J else new met 1 = met 2 +D2 7.9.5 On-Chip Peripherals
(met 1)
The 'C54x has the following on-chip peripherals:
General- purpose input/output pins.
• Software - programmable wait state .generator.
2J+I' L~_---------~
J+STNB/2 • Programmable book - switching lOgIC.
(met 2) (Newmet2) • Host port interface.
• Hardware timer.
(old metrics)
• Clock generator.
• Serial ports.
Figure 7.15
metrics). Viterbi operator (SlNB = no. of states, met path metrics and 0 = branch
• Synchronous serial ports.
• Buffered serial port~. M) erial ports.
• Time Division MultIplexed (TD s
The compare, select ana store unit (CSSU) is an application specific hardware unit
dedicated to add/compare/select (ACS) operations of the viterbi operator. Figure
7.9.6 Data Addressing
7.15 shows the CSSU, which is used with theALU to perform fastACS operations.
The CSSU allows the 'C54x to support various viterbi butterfly algorithms used . d at a addressing modes:
The 'C54x offers seven baSIC
in equalizers and channel decoders. The add function of the viterbioperator is
• Immediate addressIng: uses t ~ Ins t' to encode a fined address.
. h . truction to encode a fined value.
performed
1 by the ALU. This function consists of a double addition function (Met I
2
D and Met ID2). Double addition is completed in one machine cycle if the ALU • Absolute addressing: uses the mstruc 10ttor A to access a location in program
is configured for dual 16 bit mode by setting the C 16 bit in ST 1. • Accumulator addressing: uses accumu a
The CSSU implements the compare and select operation via the CMPS instruc- memory as data. . ~
instruction to encode the lower 7 b'Its fan
tion,a comparator and the 16 bit transition (TRN) register. This operation compares • Direct addressing: uses 7 bIts ?ft~e d t page pointer (DP) or the stack pomter
address. The 7 bits are used WIth t e ~~
two 16 bit parts of the specified accumulator and shifts the decision into bit 0 of
(SP) to determine the actual memory a ress. ory
TRN. This decision is also stored in the TC bit of STO. Based on the decision,
h ARS to access mem . .
the corresponding 16 bit part of the accumulator is stored in data memory: TRN • Indirect addressing: uses t e . es the memory _ mapped regIsters
• Memory mappe r~gIs
d . ter addreSSIng: us I
register contains information of the path transition decisions to new states. This
ent DP value or the current SP va ue.
information can be used for a back - tracking routine that finds the optimal path, without modifying eIther the cu:r d oving in terms from the stack.
which results in de.coding the 'code. d . manages addmg an rem
• Stack ad ressIng: . . . d' ct or memory mapped regis-
f' t t' ons USIng direct, In Ire dr
7.9.4 Pipe Line During the execution 0 Ins ruc I . I . (DAGEN) computes the ad esses
ter addressing, the data - address generatIOn OglC . .
of data-memory operands.
The 'C54x DSP has a six-level deep instruction pipeline. The six stages of the
pipeline are independent of each other, which allows overlapping execution of
Digital Signal Processor 7.41
Example 7.1 Solution In this example, two 64 bit numbers X and Y will be added as follows:
X ==>Xl Xo
Program:
Y ==> Yl Yo Load the data page pointer
{+} Zl Zo LDP #lOOH
First 32 bit addition
Load the higher accumulator with Xl
. - where Xl and Yl are MSB (16 bit) f X dY LACC 0001, lOR
(16 bit) of X and Y respectively. 0 an respectively; Xo and Yo are LSB ACC = Xl 00
0000 Add the accumulator with Xo
ADDS
Program: ACC =X1 X O
Add the lower accumulator with Yo
LDP #100H Load the data page pointer ADDS 0004
ACC = XlXo + OOYo
LACC OOOl,lOH Load the higher accumulator with Xl Add the higher accumulator with Y1
ACC = Xl 00 . ADDS 0005, lOR
Acr=~~+~~ /
ADDS OOOQ Add theACC withXo Store the content of lower accumulator (ACCL)in
ACC =XlXO SACL 0008
dma 8008
ADDS 0002 Add the ACC with Yo
ACCL=Zo
ACC = XlXO + OOYo Store the content of higher accumulator (ACCH) in
ADD 0003,10H Add the ACC with YI with shift SACR 0009
dma 8009
ACC = XIXO + Yl Yo
ACCR = Zl
SACL 0004 Store the content of lower accumulator in dma 8004' _-.Second 32 bit addition
ACCL = Zo ' load the higher accumulator with X3
SACH 0005 Store the content of higher accumulator indma 8005' LACC 0003, lOR
ACC =X3 00
ACCH =Zl ' Add the accumulator with X2 and first 32 bit
H: B ADDC 0002
addition CARRY
For example ACC = X3X2 + CARRY
ACC = X3X2 + 00Y2 + CARRY
Inputs Outputs ADDS 0006
0007, lOR ACC = X3X2 + Y3 Y2 + CARRY
8000 - 2439 (Xo) 8004 - 355F (Zo) ADD
Store ACCL in dma 8010;
8001 - 5523 (Xl) 8005 - C6A8 (Zl) SACL 0010
ACCL = Z2
8002 - 1126 (Yo) Store ACCH in dma 8011;
8003'- 7185 (.YI) SACR 0011
ACCH = Z3 -
R: B
7.42 Digital Signal Processing Digital Signal Processor· 7.43
For example
Inputs Outputs Example 7.4
8000 - 1123 (Xo) 8008 - 6465 (Zo) To write an assembly language program for 64 bit substraction.
8001 - 8279 (Xl) 8009 - 53CB (Zl)
8002 - A453 (X2) 8010 - 2B99 (Z2)
8003 - CB21 (X3) 8011 - EE34 (Z3) Solution In this example, two 64 bit numbers X and Y will be subtracted as
8004 - 5342 (Yo) follows:
8005 - D152 (Yl)
8006 - 8745 (Y2) x ===} X3 X2 Xl Xo
8007 - 2312 (Y3) Y ===} Y3 Y2 YI Yo
{-} Z3 Z2 Z1 Zo
Program:
Example 7.3
LDP #100R' Load the data page pointer
To write an assembly language progranl for 32 bit substraction. First 32 bit substraction
LACC 0001, lOR Load the higher accumulator with Xl
Solution In this example two 32 bit numbers X and Y will be subtracted as ACC = Xl 00
follows: ADDS 0000 Add the accumulator with Xo
ACC =XIXO
X ==}XlXO SUBS 0004 Subtract the accumulator with Yo
ACC = XIXO - OOYo
Y ===}Yl Yo
SUB 0005, lOR Subtract Yl from accumulator with shift
{-} Zl Zo
ACC = XlXO - YI Yo
Program: SACL 0008 Store ACCL in dma 8008
: ACCL=Zo
LDP #lOOH Load data page pointer Store ACCR in dma 8009
SACR 0009
LACC 0001, lOR Load the higher accumulator with Xl \ ACCR=ZI
ACC = Xl 00 second 32 bit substraction
ADDS 0000 Add the accumulator with Xo . load the higher accumulator with X3
LACC 0003, lOR
ACC =XIXO ACC =X3 00
SUBS 0002 Subtract Yo from ACC Add the accumulator with X2
ADDS 0002
ACC = XlXO -:- OOYo ACC =X3 X 2
SUB 0003, lOR Subtract Yl from ACC with shift SubtraCt the Y2 CARRY from ACC
SUBB 0006
ACC = XlXO - Yl Yo ACC = X3X2 - 00Y2 - CARRY
SACL 0004 Store ACCL in dma 8004 ACC = X3X2 - Y3 Y2 - CARRY
SUB 0007, lOR
ACCL = Zo Store ACCL in dma 8010
SACL 0010
SACR 0005 Store ACCR in dma 8005 ACCL = Z2
ACCR == Zl Store ACCR in dma 8011
SACR 0011
For example ACCR = Z3
Inputs Outputs R: B
8000 - 7725 (Xo) 8004 - 2311 (Zo)
8001 - 894A (Xl) 8005 - 4422 (Zl)
8002 - 5414 (Yo)
8003 - 4528 (Yd
Digital Signal Processor 7.45
7.44 Digital Signal Processing
I
For example
Example 7.6
Inputs Outputs To write a program to calculate the value of the function
8000 - 894A (Xo) 8008 - 4422 (Zo)
8001 - 7725 (Xl) 8009 - 2311 (ZI> Y = A * Xl + B * X2 + C * X3
8002 - 6525 (X2) 8010 - 1223 (Z2)
8003 - BC 13 (X3) 8911 ~'2411 (Z3) (Anna University, November/December, 2006)
8004 - 4528 (Yo)
8005 - 5414 (Yd Solution
8006 - 5302 (Y2) ~--- Program: The constants A, B, C, Xl, X2 andX3 are to be stored in dma from 8000
8007 - 9802 (Y3) to 8005 as shown below:
Data memory I
Address
(DMA) Data
Example 7.5 8000 A
8001 B
I To write a program for 32 bit integer multiplication. 8002 C
8003 Xl
Solution 8004 X2
8005 X3
Program:
#100H Load data page pointer
LDP #100H Load data page pointer LDP
LACL #0 Clear the accumulator
LACC #037AH,0 Load the ACC with multiplicand Load the T register with constant A; T = A
i.e., ACC = 0000037A LT 0000
Multiply the T register with Xl; T = AXI
SACL 0000,0 Store ACC in memory location 8000 MPY 0003 Loads the T register from data memory and adds the .
8000 = 037A LTA 0001
contents of T register with ACC and stores the result In
LACC #012EH,0 Load the ACC with multiplier acc~mulator ACC = AXI; T = B
i.e., ACC = 0000012E Multiply the T register withX2
SACL 0001,0 Store ACCL in memory location 8001 MPY 0004
T = BX2; ACC = AXI
8001 = 012E
0002 T = C; ACC = AXI + BX2
LT 0000 Load the product (TREG) register with content of LTA
0005 T = CX3; ACC = AXI + BX2
memory location 8000 MPY Adds the contents of T register with accumulator and stores
TREG = 037A APAC
the result in accumulator ACC = AXI + BX2 + CX3
MPY 0001 Multiply the content of product register (TREG) with Store the ACCL in dma 8006.
content of memory location 8001 and the result is SACL 0006,0
stored in TREG Output
PAC Mov~ the content from product register to ACC __ If A = 1,B = 2, C = 3,XI = 4,X2 = 5 and X3 = 6 gives 8006: 0020
ACe = 000419EC
SACL 0002,0 Store the content in ACCL memory location 8002 Note:
SACH 0003,0 Store the content of ACCH in memory location 8003 LTA: Instruction is equivalent to two instructions: LT and APAC.
H: B H LT: Loads the T register from data memory. .
APAC: Adds the content of T register with accumulator and stores the result m
Outputs
accumulator.
8002 19EC
8003 0004
Digital Signal Processor 7.47
7.46 Digital Signal Processing
Example 7.9
Example 7.7 To write a program for square waveform and sawtooth waveform generation.
To write a' program that explain the usage of RPT instruction.
Solution
Solution RPT instruction is an immediate addressing mode instruction. The
eight bit constant N specified by the RPT instruction is loaded into the repeat counter Square Waveform Generation
of'C50. This causes the instruction following RPTinstniction to be executedN + 1 .MMREGS
times. . . TEXT
In this example, the value 4 is added 5 times continuous and the result is stored START:
in data memory 8000. LDP #1204
LACC #OH Load the ACC with lower amplitude
Program: Store ACCL in dma
Loop: SACL 0
LDP #100H ACC = 0000
LACL #4 Load the lower accumulator with 4 RPT #OFFH Frequency of the square wave
SACL 0000,0 Store ACCL in dma 8000 OUT 0,04 Address for DAC
LACL #0 Clear the ACC CMPL Complements the ACC contents
RPT #4 Set repeat count is 4 ACC = FFFF
ADD 0000,0 4 is added withACC 5 times B loop Go to pma loop
ACC = 14 .END
SACL 0000,0 Store ACCL in dma 8000
H: B H Sawtooth Waveform Generation
.MMREGS
. TEXT
START:
Example 7.8 LDP #120H
#OH Load the accumulator with lower alnplitude
To find the two's complement of given number. LACC
SACL o Store ACCL in dma
OUT 0,04H Address for DAC
Solution CMPL instruction replaces the content of accumulator with its logical o Load the ACe with content in dma
Loop: LACe
inversion i.e., one's complement. By adding one with its complement number we 0,04H Address for DAC
OUT
will get the 2 's complement of a number. #05h Change this value for frequency
ADD
Program: / SACL o Store ACCL in dma
SUB #OFFFh Change upper amplitude
LDP #100H LEQ
BCND Loop
LACL #7 Load 7 toACC
B START
ACC = 0007
CMPL Complements the accumulator contents (l's complement) .END
ACC = FFF8
ADD #1 AddACC with 1 (2's complement)
ACC = FFF9
SACL 0000,0 Store ACCL in dma 8000
H: B H
7.48 Digital Signal Processing Digital Signal Processor 7.49
Output:
Example 7.10 Data nlemory location: 8200 - 0001
To. write a program for linear convolution 8201 - 0005
8202 - 0008
yen) = x(n) * hen). 8203 - 0008
8204 - 0007
8205 - 0003
Solution
Program:
.text Example 7.11
.mmregs To write a program for circular convolution
START:
LDP #0002H
LAR AR3, #0200H yen) starting
LAR AR4, #0007 NI + N2 - 1 (length of linear
convolution sequence) Solution
LAR ARI, #OIOOH x(n) data array
Loop: MAR *,ARI modify the auxiliary register Program:
LACC *+ .MMREGS . program initialisation
SACL 050H starting of the scope of multiplication .Text
LAR AR2, #0153H end of the array, to be multiplied LDP #100H
with hen) {I50 + NI - I} LACC OOOOH length of the input is given is 8000
MAR *,AR2 SUB #OOOIH
ZAP SACL OOOIH
RPT #0003 NI - 1 times so that NI times LAR ARb,lH
MACD OCI00H, *_ LAR AR2, #0010H
APAC .to accumulate the final product sample Loop3: LAR ARI #0060H give the inputs Xl (n) in ARI
MAR *, AR3 LAR AR3 #0050H give the inputs x2(n) inA R 3
SACL *+ LAR AR4,IH
MAR *,AR4 ZAP
BANZ LOP, *_ Loop: MAR #, AR 3 multiply xI(n) andx2(n) and
H:B H add the multiplication
LT *+, ARI output
Input: MPY *+
SPL 5H
x(n) data memory: 8100 - 0001
ADD 5H
8101 - 0003
MAR *,AR4
8102 - 0001 Loop, *-, AR2 outputs of correlation are stored
BANZ
8104 - 0003 inAR2
hen) data mem9ry: C100 - 0001 SACL *+
C101 - 0002 CALL ROTATE
C102.; 0001 Loop2: MAR *,ARo
BANZ LOOP3, *-
H:B H
750
. D"Igltal Signal
. Processing Digital Signal Processor 7.51
ROTATE:
Summary
LDP # I OOH rotate the values of X2 (n)
LACC OOOIH • DSP algorithms require extensive use of arithmetic operations such as multipli-
SUB #IH cations and additions and therefore the amount of data flow through the CPU
SACL 0002H is very high. Standard microprocessors do not possess the required hardware
LACC 0050H architecture and instruction set for the above purpose.
SACB • To overcome the limitations of the standard microprocessors digital signal pro-
LAR AR3 #8051H cessors are designed which include Harvard architecture, pipelining, dedicated
LAR ARs #8070H hardware such as fast hardware multiplier-accumulator and shifters, fast internal
LAR AR62H memories and DSP connected special instructions.
Loopl: MAR *, AR3 • In the latest generation DSP processors, new architectures such as Very Long
LACC *+,O,ARs Instruction Word (VLIW) and Static Super Scalar are used. These architec-
SACL *+,0, AR6 tures have multiple data paths and arithmetic units. Parallelismat the instruction
BANZ Loopl, *_ level enhances the performance of the processors.
LACB • Digital signal processor are classified as general purposes and special purpose
MAR *,ARs processors. The functioning of general purposes DSP processors is similar to
SACL *+ standard microprocessors except that they have specially designed architecture
LACC #8070H and instruction sets. On the other hand special purpose processors are used to
SAMM BMAR perform certain specific algorithms such as digital FIR filtering and for execution
LAR AR3, #0050H of application dependent operations. While special purpose processors are faster
MAR *, AR 3 in execution they are not as flexible as general purpose processors.
RPT OH
BLOD
RET
BMAR,*_
IShort Questions and Answers I
Input: 1. What are the classifications of digital signal processors?
The digital signal processors are classified into two categories. They are:
Xl (n): 8000 - 0004 (i) General purpose digital signal processors.
8050 - 0002 (ii) Special purpose digital signal processors.
8051 - 0001
2. What is meant by general purpose digital signal processors?
8052 - 0002
General purpose digital signal processors are basically high speed micropro-
8053 - 0001
cessors with architecture and instruction sets optimized for DSP operation.
x2(n): 8060-0001 Examples are:
8061 - 0002
(i) Fixed point processors such as TMS320C5x, TMS320C54x, ADSP-219x
8062 - 0003
and ADSP-21 9xx.
8063 - 0004 (U) Float point processors such as TMS320C3x, TMS320C67x and ADSP-
Output: 21xxxx
3. What is meant by special purpose digital signal processor?
X3(n): 8010 - OOOE Special· purpose digital signal processor consists of hardware, designed for
8011 - 0010 specific DSP algorithm such as FFT and designed for specific DSP applications
8012 - OOOE such as PCM and filtering. Examples are:
8013 - 0010
(i) MT93001 - Mitel's multichannel telephony voice echo canceller.
(ii) P-DSP 16515A, TM-44 and TM-66 - FFT processor.
(iii) UPDSP 16256 and model 3092 - programmable FIR filter.
7.52 Digital Signal Processing
Digital Signal Processor 7.53
- New kind of programmer/compiler complexity. 13. What is the advantage of Harvard architecture ofTMS320 series?
- Increased memory use. (Anna University, November/December, 2006)
- P~ogram must keep track of instruction scheduling. The Harvard architecture has two separate memories for their instruction and
- HIgh power consumption. ' data. It is capable of simultaneous reading an instruction code and reading or
- Misleading MIPs ratings. writing a memory or peripheral.
7. What is pipelining? 14. Differentiate between Von-Neumann and Harvard architectures.
In Von-Neumann architectures the CPU can be either reading an instruction or
P~pe1inin~ a ~rocessors me~ns breaking down its instructions into a series of reading/writing data from/to memory. Both cannot occur at the same time since
~~~~~~r:IPelme stages WhICh can be completed in sequence by specialized the instruction and data use the same signal pathways and memory whereas the
Harvard architecture has two memories for their instruction and data, requiring
8. What are the different stages in pipelining and explain? dedicated buses for each of them.
(i) Fetch phase: Next instruction is fetched from the address stored in th 15. State the merit- and demerit of multi-ported memories?
program counter. e (Anna University, May/June, 2007)
(ii) ~~ode phase: Insu:uction in the instruction register is decoded and the Merit: It increases the number of accesses/clock period. For example, in dual
. . . a ress m the program counter is incremented. ported memory, program and data memory can be accessed simultaneously.
Demerits: It requires larger number 9f pins and larger chip area and it is more
(lll) dMetmtorythredad phase: Reads the data from the data buses and also writes
a a 0 e ata buses. expensive.
(iv) Ex~cute phase: Executes the instruction currently in the instruction 16. List the various registers used withARAV.
regIster and also completes the write process. - Eight auxiliary n~gisters (ARo-AR7).
- Auxiliary register pointer (ARP).
9. What is pipeline depth?
- Unsigned 16 bitALU.
The n~mber of pipeline stages is referred to as the pipeline depth. 17. What ~re the elements of control processing units of TMS320C5x?
10. What IS the pipeline depth ofTMS320C50 and TMS320C54 ? - Central arithmetic logic unit (CALU).
TMS320C50 - 4 and TMS320C54x _ 6. x. - Parallel logic unit (PLU).
7.54 Digital Signal Processing
Digital Signal Processor 7.55
- Auxiliary register arithmetic unit (ARAU).
- .Memory mapped registers. 12. Write short notes on:
- Program controllef. (i) 32 bit accumulator.
18. What is the function of parallel logic unit? (ii) 16 x 16 bit parallel multiplier.
The parallel logic unit is a second logic unit, that execute logic operation on (iii) Shifter.
data without affecting the contents of accumulator.
13. Give the key features of the digital signal processor.
19. What are the arithmetic instructions of 'C5x?
ADD, ADDB, ADDC, SUB, SUBB, MPY, MPYU 14. Explain the memory mapped addressing mode used in P-DSPs.
20. What are the logical instructions of 'C5x? 15. What are the different ways in which the auxiliary register pointer can be
updated in 5x?
AND, ANDB, OR, ORB, XOR, XORB
21. What are load/store instruction? 16. Explain the immediate addressing mode of 'C5x with examples.
LACB, LACC, LACL, LAMM, LAR, SACB, SACH, SACL, SAR, SAMM 17. Explain the arithmetic instruction of 'C5x.
22. What are the shift instructions? 18. Write short notes on:
ROR,ROL,ROLB,RORB,BSAR (i) Barrel shifter.
(ii) Exponent encoder.
ILong Answer Type Questions I (iii) ,Compare Select and Store Unit (CSSU).
1. Explain how Harvard architecture as used by the TMS320 family differs from
the strict Harvard architecture. Compare this with the architecture of a standard
Von-Neumann processor.
2. A multiplier-accumulator, with three pipe stages is required for a digital signal
processor. Sketch a block diagram of a suitable configuration for the MAC.
With the aid of a timing diagram explain how the MAC works.
3. In relation to DSP processor, explain SIMD and VLIW techniques. In each
case, clearly point out the advantages and disadvantages of the technique in
signal processing.
4. Explain the operation of CSSU and TMS320C54x and explain its use
considering the veterbi operator.
5. Explain what is meant by instruction pipelining. Explain with an example, how
pipelining increases the throughput efficiency.
6. Explain the operation ofTDM serial ports in P-DSPs.
7. With a suitable diagram describe the functions of multiplierladder unit of
TMS320C54x.
8. Explain the function of auxiliary registers in the indirect addressing mode to
point the data memory location.
9. Write a program to use the auxiliary register in the memory pointing and
looping.
10. Write a program to compute the following equation: