0% found this document useful (0 votes)
151 views

Unit-5 DSP Processor

The document discusses digital signal processors (DSPs). DSPs are specialized processors designed for digital signal processing tasks like filtering and FFT. They have hardware features like multiplier-accumulator units that are optimized for repetitive math operations required in DSP algorithms. Common DSP processors include Texas Instruments TMS320C5x series and Motorola DSP563x series.

Uploaded by

CCE notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views

Unit-5 DSP Processor

The document discusses digital signal processors (DSPs). DSPs are specialized processors designed for digital signal processing tasks like filtering and FFT. They have hardware features like multiplier-accumulator units that are optimized for repetitive math operations required in DSP algorithms. Common DSP processors include Texas Instruments TMS320C5x series and Motorola DSP563x series.

Uploaded by

CCE notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

UNIT-5

Digital Signal Processor

7.1 Introduction
The DSP processors were introduced in the early 1980s to handle information
which was available in digital form. Since then it has made very rapid progress
to handle very complex problems. The DSP processors are divided into two broad
categories as general purpo'se and special purpose processors. The special purpose
programmable digital signal processors are designed with features that are specif-
ically required for digital signal processing applications. Some of the features
include multiplier and multiplier accumulator (MAC), modified bus structures,
multiple access memory, multi-port memory and pipelining. But the conventional
microprocessor which is a general purpose microprocessor does not have these
features. An advanced microprocessor or a RISC processor may use some of the
techniques of programmable digital signal processors or may even have instructions
that are specifically required for DSP applications or they may have performances
close to that of programmable Digital Signal Processors for certain operations.
However in terms of low power requirement, cost, real time input/ouput capa-
bility and availability of on chip memories, the programmable DSPs have an
advantage over the advanced microprocessors. Some of the P-DSPs include fixed
point devices such as Texas instruments TMS320C5x, TMS320C54x and Motorola
DSP563x and floating point processors such as Texas instruments TMS320C4x and
TMS320C67xx. Some of the features specifically required for performing digital
signal processing operations are described ahead.
Digital Signal Processor 7.3

7.2 Digital Signal Processing


For example:
7.2 M'ultiplier Accumulator (MAC) Unit MACD pgm, dma multiplies the content of the program memory (pgm) with the
content of the data memory with address (dma) and stores the result in the product
M?st of th~ digital signal processing computations such as convolution register. The content of product register is added to the accumulator before the new
If r . . . and corre-
latlOn reqUire array multiplication. Arra
multiplier and adder that is shown in Fi :~;
of these array multipliers is that t h ;
t
Icanon c~ be done usmg a single
.. One ofthe Important requrrements
product is stored. Further, the content of dma is copied to the next location whose
address is dma + 1.
before the next sample of the inp e adve to pro~ess the SIgnals in real time i. e.,
y The efficient operation of a digital signal processor depends on its architec-
ut an output SIgnals arrive at th· t f h tural features, execution speed, type of arithmetic and word length. There are two
array, the array multiplication should be completed Th e mpu 0 t e
solve this problem. One of the approaches' h . ~re are two approaches to types of special purpose hardware used in DSP. To execute DSP algorithms such as
implemented in hard '. IS t at a dedIcated MAC unit may be digital filters and FFT an algorithm specific DSP hardware is designed. For DSP
hardware unit. It is a;o~;~;~C!~~!::~~a~~~Plier and accumulator in a single applications in telecommunications digital audio and control applications, applica-
~pproach is to have separate multiplier and accum:~~~~~ DSP5600~ ..The other
tion specific' DSP hardware is designed. The general purpose processor used now a
days in general require operations performed sequentially and faster. Further many
In Texas Instruments DSP roces T. . IS approac IS adopted
stored into the product reister. ;;:;' co~~!;~i~~' The output ~f the multiplier is DSP algorithms require repetitive arithmetic operations. The standard microproces-
to accumulator register ACC in the central ALVIS product regIster can be added sors available do not meet these requirements. However, in DSP hardware design,
operation can be completed in one clock cycle. . In both approaches, the MAC the hardware architecture and the instruction set for DSP operations are given at
most priority. This is achieved using the concepts of parallelism which include the
• ~:' ~e [~utp;tat the nth sampling instant is· obtained by multiplying the array following:
present and
h
n,
the past M _ i-I'
n-l, Xn -2,··· Xn M+3 x M x]
n-, +2, ?-M+l
.'
correspondmg to the
samp es of Input WIth array h - [n h h
1. Hardware architecture.
2. VLIW architecture.
M-3,nM-2, hM-tl which correspond to the impulse respon - 0, 1, 2,···,
• Y th tp h .h se sequence . 3. pipe1ining.
i~;~; Si:U:~ ;:;~::l ~~ ~a~~pling instant is obtained by multipl~ng
the 4. On:"chips memory/cache.
S. Instruction sets dedicated to DSP.
the (n + l)th sample o;~:~: a~e y sluftmg array Xn towards right so that
• The input signal array x . bt' d b ' . The architectural features of DSP are described below.
the elements of x are shifted t put data. x n + 1 becomes the first element and all

:~:~e:;~:~'ie~:;:~~;h~:ll ~~: t~me after finishing th~ vec~:r ~U:ti;li:~i~:'


n owar s fIght by 1 position Inst d f h·ft·
7.3 Bus Structures and Memory Access S'chemes
that uses these elements is o~e: ';h.se~arat~~y soo~ after the MAC operation The MACD (MAC operation with data move) instruction requires four memory
instruction called Multiply Acc~ula~: ~i:: ~:~:~:ft~~~~. using Special accesses per instruction cycle. Instruction cycle is the time that elapses .since an
instruction is fetched till the particular instruction completes executioninc1uding
the time taken for writing the result into a register or memory. The four memory
access/clock periods required for the MACD instructions are as follows:
(0 Fetch the MACD instruction from the program memory.
Input (iO Fetch one of the operands from the program memory.
Data
(iii) Fetch the second operands from the data memory.
(iv) Write the content of the data memory with address dma into the location with
the address dma + L
The impulse response coefficients are stored in program memory and the sample
of input data are stored in the data memory.

7.3.1 Von-Neumann Architecture


Von~Neumann concepts where operations are performed sequentially were intro-
Figure
plier/ 7.1 Implementation of convolution with array multiplier. (Single multi-
Adder).
duced in the year 1946 in the computer architecture design.
Digital Signal Processor 7.5
7.4 Digital Signal Processin,g

.3.2 Harvard Architecture


Results ...... The Harvard architecture which has two separate buses for the program and data
Processing memo is shown in Figure 7.3. Hence, the content of program memory and data
Unit ... Operands memo~ can be accessed in parallel. The instruction code can be fed from ~he pr~~
am memo to the control unit while the ope~and is fed t~ the processIng ~l
~

....
:om the dat7memory. The processing unit consIsts of the re~lsters and processln~
elements such as MAC units, multiplier, ALU, shifter, etc. Wlth.the Harv~rd archI-
status Opcode
Data bus tecture the number of memory accesses/clock cycle is two. ThIS can be,,~cr~~~e~
furthe; by using more number of b~ses .. Several P-DSPs follow th e o 1 Ie
Harvard architecture which is shown In FIgure 7A. h'l
~,
One set of bus is used to access a memory that has both program and data wthl e
....... Instruction DatalInstruction ..... another has data alone. Data can also be transferred from one memory to ano er.
Control Data and
Program
Unit '\7 ...... Memory
Results/Operands
... ...
-"-
Data
Address Processing
~

...
-"-
Memory
Unit
Figure 7.2 Von-Neumann architecture.
~

Address
During first clock period the instruction code can be fed from the program status Opcode
memory to the control unit. In' second clock period one of the operands is fed to
processing unit from the program memory. In third clock period the second operands
is fed to control unit from the data memory. In fourth clock period, write the content

Control Instruction
of the data memory with address dma with location with the address dIna + 1. The Unit ~
Program
...
-"-
Memory
MACD instruction to be executed in a Inachine with "Von-Neumann Architecture" Address
is shown in Figure 7.2. It requires four clock cycles because it has single address
bus and a single data bus for accessing the program as well as data memory area. Figure 7.3 Harvard architecture.
In a computer with Von-Neumann architecture, the CPU can be either reading
an instruction or reading/writing data from/to the memory. 'Both cannot occur at
the same time since the instruction and data use the same signal pathways and
memory. The Von-Neumann architecture consists of three buses namely the data
bus, the address bus and control bus. Results/Operands ...
...
~
-"-
Data
Processing Memory
......
Unit
• Data bus: Transfers the data between CPU and its peripheral. It is bi-
~
directional. The CPU can read or write data in the peripheral.
• Address bus: CPU uses the address bus to indicate which peripherals it wants Address
to access and within each peripheral with specific register: It is unidirectional. status Opcode
The CPU always writes the address, which is ,read by the peripherals.
• Control bus: It carries signals that are used to manage and synchronize the
exchange between the CPU and its peripherals, as well as that indicates if the ~~ Address .......
i
Program
CPU wants to read or write the peripheral. Control ... Instruction
f"""
Memory
Unit -"-
...
Address
One of the ways by which the number of clock cycles required for the memory
access can be reduced is to use more than one bus for both address, and data. That Figure 7.4 Modified Harvard architecture.
is implemented in "Harvard Architecture".
Digital Signal Processor 7.7
7.6 Digital Signal Processing

This type of architecture is used in P-DSPs from Texas Instruments and Analog
devices.
J.. \ Multi ported register file j
... A~

7.3.3 Multiple Access Memory


The number of memory accesses/clock period can also be increased by using .. r
,. r
high speed memory that permits more than one memory access/clock period. For
example: r Read/write cross bar
A
j
Program A

DARAM - Dual access RAM, pemlits two memory access/clock period. control
unit
If DARAM is connected to a processing unit of the P-DSP (by using Har-
vard architecture) with two independent data and address buses, four memory
accesses/clock period can be achieved.
.. ..
\ Functional ............
.. r
Functional j
unit I umtn

7.3.4 Multi-ported Memory i


Another technique that is used for increasing the number of accesses/clock period J Instruction cache
J
is to use multi-ported memory. It has two independent data and address buses as "\
shown in Figure 7.5. Using this, memory accesses can be achieved in a clock period.
Figure 7.6 Block diagram of the VLIW architecture.
It is mostly needed for storing the program and data in two different memory chips
in order to permit simultaneous access to both program and data memory.

functional units to the register file is facilitated by


Address bus 1 ... Data bus 1 • Parallel ran.dom access by the
Dual port
. the read/write cross bar. . . . . h 1 d/ t
memory
• Execution of the operations in the functional ~ItS IS carned out WIt oa sore
Address bus 2 Data bus 2
. .. operation of data between a RAM and the regIster file .

Figure 7.5 Dual ported memory.


Advantages
The increased performance can be achieved with ':LIW architecture which
Limitation 1. depends on the degree of parallelism in the algonthm and the number of
It increases the cost compared to two single port memory since multi-ported mem- functional units. .. f· d ndent
2. The throughput is higher since the algorithm involves executIOn 0 III epe
ory requires larger number of pins and larger chip area. Larger number of pins
require more expensive package and a larger dye size. operations.
3. Potentially easier to program.
4. Potentially scalable.
7.4 VLIW Architecture 5. Better programming efficiency.

Another architecture used for P-DSPs is the Very Long Instruction Word (VLIW)
architecture. In this architecture, P-DSPs have a number of processing units such Disadvantages
as ALUs, MAC units, shifters, etc. The VLIW is accessed from memory and is 1· It may not always be possible to have independent stream of data for proc~ssi~:.
used to specify the operands and operations to be performed by each of the data 2~ The number of functional units is also l~mited by the hardware cost or e
paths. The block diagram ofVLIWarchitecture is shown in Figure 7.6. multi-ported register file and cross bar sWItch. .
• Multiple functional units share a COlnmon multi-ported register file for fetching 3. High power consumption, and high program memory band~Idth.
the operands and storing the results; " 4. Misleading MIPs ratings.
7.8 Digital Signal Processing Digital Signal Processor 7.9

7.5 Pipelining Processor with Pipelining

One of the approaches adopted for increasing the efficiency of the advanced . _ In processor with pipelining, the functional units can be kept busy almost all time
processors as well as P-DSPs is by using instruction .' I" An' mlc.ro . by processing a number of instructions simultaneously in the CPU. Consider the
I " . pIpe Inlng. InstructIOn
cyc e startIng WIth the fetchIng of. an instruction and ending wI'th th t' f processor with four functional units. Four instructions It,12,!} and14 can be pro-
th . tru f . I d' . e execu IOn 0 cessed simultaneously as shown in Figure 7.8. When It enters the decode phase, h
fe I~S . C IOn I~C u I~g the tIme storage of the results can be split into a number
o mlcro~nstructlOns. Execution of each of the microinstructions is also referred to can enter the opcode fetch phase. When It enters the operand read phase, h enters
~:no~: ;atdsteo' bFo~ e~amplhe, an instruction cycle requiring four microinstructions the decode phase and h enters the opcode fetch phase. When It enters the execute
e In lour p ases as follows: phase, h enters the operand read phase, h enters the decode phase and 14 enters
the opcode fetch phase. The pipe1ining is fully loaded now and all the functional
1. Fetch phase in. whic~ instruction is fetched from the program memo units have useful work to do.
i' ~ecOde
. . emo~
phase In WhIC~ the i.nstruction is decoded in the instruction r~ister.
read phase In WhICh the operand required for the execution of the
Value ofT Fetch Phase Decode Phase Read Phase Execute Phase

Instruction may be read from the data memory 1 It


4 E . . 2 h 11
. xecuftlon ph~se in which execution as well as the storage of the results in either
one 0 the regIsters or memory is carried out. 3 h h It
4 14 h h It
!a~h p~~e maYhbe carried out separately by four functional units. Let us assume 5 Is 14 h h
a eac our p ases take equal time for completion. 6 16 Is 14 h
7 h 16 Is 14
8 Is· h 16 Is
Processor with No pipelining 9 19 Is h 16
10 19 Is h
f----. 11 19 Is
The inst:uction. cycles of processor with no pipelining is shown in Fi e 77 I
conven~onal mlc~opro~essor with no pipelining each of the functional : s is' b~s n 12 19
o~ly 25 Yo of the tIme SInce only one instruction is processed at the CPU at t' y Figure 7.8 Instruction cycle of a processor with pipelining.
~:~t~o'7nsShows thadt eJach functional unit is busy when a program containin; t : : Let T denote the time required for each phase ofthe instruction. One clock cycle
I 1, J2 an 3 are executed.
of processor corresponds to T. In a period of 12T, only three instructions can.be
executed in a machine without pipelining where as in a machine with pipelining nine
instructions can be executed in the same period. Hence, the throughput is increased
Value ofT Fetch Phase Decode Phase by a factor of 3 in this case. Therefore for executing a program with N instructions,
Read Phase Execute Phase
1 the time required for execution is (N + 4) T in a machine with pipelining whereas
II
2 in a machine without pipelining, the time required for executing N instructions is
II 4NT.
3
II The number of instructions that are processed simultaneously in the CPU, also
4
II referred to as depth of pipelining, differs in different families of P-DSPs. The
5 h pipeline depth~ of some of the P-DSPs are given in Table 7.1.
6 12
7
8 lz Table 7.1 Pipeline depth of some P-DSPs.
9 !} h
P-DSPs name/family Pipeline depth
10 13
11 Analog devices 2
13
12 Motorola DSP5600x 3
13 TMS320C5x 4
Figure 7.7 InstructIOn cycles of processor with no pipelining. TMS320C54x 5
7.10 Digital Signal Processing Digital Signal Processor 7.11

7.6 Architecture ofTMS320C5x TMS3205x), it indicates that NMOS technology is used for the IC and on-chip non-
volatile memory is a ROM. Under C5x itself there are thee processors, 'C50, 'C51
T~.1S320C5x is a 16 bit fixed point processor. The DSP chips have IC number and 'C54, that have identical instruction set but have differences in the capacity of
wIth prefix .TMS320 (Texas instruments). The next letter indicates that CMOS on-chip ROM and RAM. The instruction set of TMS320C5x and other DSP chips
technology. IS used for the IC and the on-chip non-volatile memory is a ROM. If the is superior to the instruction set of conventiQnal microprocessors such as 8085,
next letter IS ~ (e.g., TMS320E5x) it indicates that the technology used is CMOS Z80, etc., as most of the instructions require only a single cycle for execution. The
and the on-chIp non-volatile memory is an EPROM. If it is neither of these (e.g., block diagram of the internal architecture ofTMS320C5x is shown in Figure 7.9.
It has advanced Harvard architecture because they have separate memory bus for
program and data and has instructions that enable data transfer between the program
Data Bus
t .~ ~
and data memory area.

Program
ROM
.'
Data/Program
SARAM
Memory
...

7.6.1 Bus Structure


Separate program and data buses allow simultaneous access to program instructions
'C50 2k 'C 50 9k and data, which provide a high degree of parallelism. For example, in multiply
Data
'C 51 2k 'C 51 lk DARAM accumulate operation, while data is multiplied, a previous product can be loaded
Peripheral into, added to or subtracted from the accumulator and at the same time, a new
'C52 4k 'C52 ....
Datal address can be generated. Thus'everything is executed in one cycle. Such a parallism
'C53
'C56
16k
32k
'C 53
'C 56
3k
6k
Program
DARAM
'B 2
(32XI6)
~
H Serial
Port 1 } 1-+ 6
supports a powerful set of arithmetic, logic and bit - manipulation operations that
'C57S 2k 'C57S 6k all can be performed in a single machine cycle:
'LC5732k 'L57S 6k Bo
(512XI6)
BI
(512XI6) ~
H Serial
Port 2 } 1-+ 6
The TMS320C5x architecture has four buses and their functions are as

,,. , ,,. ,,. follows:


TDM
(i) Program Bus (PB): It carries the instruction code and immediate operands
,Ir ,Ir ~ f+ Serial 1+1-+ 6
Port from program memory to the CPU.
I Program Bus 1 (ii) Program Address Bus (PAB): It provides addresses to program memory for
t Buffered
--. I+- f+ Serial ~ 1-+ 6
Port
both reads and writes.
(iii) Data read Bus (DB): It interconnects various elements of the CPU to data
Program
MemL Controller ~ Ir memory space.
(iv) Data read Address Bus (DAB): It provides the address to access the data
"""- control
'" r
Multi-
Program
Counter
Memory
mapped
I+-
H Timer } 1-+ memory space.
processing Status/ register CALU

'" r
Host
... IJo. Control The program and data buses can work together to transfer data from on-chip data
registers ~f+ Prot 104- f-+ 18
Multiplier
, Interface memory and internal or external program memory to the multiplier for single -
Interrupt Accmulator
... h • Hardware cycle multiply/accumulate operations.
.. 1""
Imba 1S-
stack

Generation
Auxiliary
register
ACC buffer
Shifters
Arithmetic
Parallel I+- M Test\ }
emulation 1-+ 7
"""- ation 1 Arithmetic logic unit
'" Y
Oscillator
logic
unit
logic unit
(ALU)
(PLU) t-~ 7.6.2 Central Arithmetic LogiC Unit (CALU)
(ARAU)
... Timer I •... Instruction It consists of the following elements:
register

~ ~ (i) (16 x 16) bit parallel multiplier: It performs 16x 16 multiplication of num-
, , ber represented in 2's complement form. The 16 bit temporary. register 0
Data Bus (TREGO) holds the multiplicand. The other operand forthe multiplication can
be specified using one of the addressiIlg modes. The 32 bit PREG (Product
Figure 7.9 Internal architecture ofTMS320C5x. register) holds the result of multiplication.
Digital Signal Processor 7.13

7.12 Digital Signal Processing


Auxiliary Register Compare Register (ARCR)
(ii) Accumulator (ACC): It is a 32 bit re ister . . .. '
operation. The higher order word andgl usedd for anthmettc and logical It is a 16 bit register used for address boundary comparison. For example:
d . . ower or er words of ACe is re
~e:o; ACCH and ACCL respectively. It can be load either directly or~: CMPR Instruction
(iii) Arithmetic Logic· Unit (ALU)" One ofth . .
comes from ACC. Either the hi~h
' Compares the ARCR to the selected AR and places the result of the comparison in
d e oPderands for the ALU operation
can be loaded from er o~ er wor or lower order word of ACC the TC bit of STI (Status Register 1).
are stored in ACC. memory. The result of operation performed in centralALU
Block Move Address Register (BMAR)
(iv) Accumulator
of ACC. Buffer (ACCB)"
"Itisa32b't
1 regIs. terused for temporary storage
The 16 bit BMAR holds an address value to be used with block moves and multi-

of memory to be left shifted by


stored from ALU t
0;
(v) 0-16 bit left barrel shifter and ri ht b
. 0
16 ~el slufter: It ~ermits the contents
"
e ore they are eIther fed to ALU or
ply/accumulate operation. This register provides the 16 bit address for an indirect
_ addressed second operand.
o memory.
Block Repeat Registers (RPTC, BRCR, PASR, PAER)

7.6.3 Auxiliary Register ALU (ARAU) All these registers are 16 bit wide.
It consists of following elements: 1. Repeat counter register (RPTC) holds the repeat count in a repeat single
instruction operation and is loaded by the RPT and RPTZ instructions.
~~) Eig~t 16.b!t auxiliary registers (ARs) ARO-AR7. 2. Block repeat counter register (BRCR) holds the count value for the block
~~~) 3 bI~ aUXIlIary register pointer (ARP). repeat feature. This value is loaded before a block repeat operation is initiated.
(uz) UnsIgned 16 bitALU. 3. Block repeat program address start register (PASR) indicates the 16 bit
address where the repeated block of code starts.
ARAU calculates indirect addresses b usi' . 4. Block repeat program address end register (PAER) indicates the 16 bit
Register (INDX) and auxiliary regist y ng mputs commg fromARS, 16 bit index
index the current AR while the data :~~mpa~e re~iste~ (A~CR). ARAU can auto
address where the repeated block of code ends. The PASR and PAER are loaded
index either by ±I or by the contents ofo:le =on IS bemg addressed~and can by the RPTB instructions.
does not require the CALU for dd . . X. As a result, accessIng data
:6 a ress mampulatIOn Therefo th CALU'
or other operations in parallel This mak h ' ' . re, e IS free
compared to the conventional ~icropro es t eFlnstructIOns to be executed faster 7.6.4 Parallel LogiC Unit (PLU)
cessor. or example: It performs boolean operations or the bit manipulation required of high speed con-
troller. The PLU can set, clear, test or toggle the bits in a status register control
In8085 (MOV A, M; INX iI)
register or any data memory location. The PLU allows logic operations to be per-
formed on data memory values directly without affecting the contents of the ACC
ing mode and HL register used as the ad: to e. load~d. using indirect address-
These instructions enable the accumulat b
or PREG. Results of a PLU function are written back to the original data memory
instructi.ons can be replaced by sI'ngle' truest~ po~nter IS Incremented. These two
Ins c IOn In C5x location.
LACC *+, O-any one of the auxilia
pointer and incremented. The register that TY.~efsters ~an be ~ed as the address
. .

oftheARP. Some of the other registers of~U e udsethd ~s specI~ed by the content 7.6.5 Memory Mapped Registers
an elr functIOns are as follows.
The 'C5x has 96 memory mapped registers mapped into page 0 of the data memory
INDEX Register (INDX) space. All ' C5x DSPs have 28 CPU registers and 16 input/output (110) port register
but have different numbers of peripheral and reserved registers. Since the memory
_ mapped registers are a component of the data memory space, they can be written
by more than I) to modifY the address in the tep va~ue ~ad?l1:J.on or subtraction
The 16 bit INDX is used by the ARAU as a s ' .
to and read from in the same way as any other data memory location. The memory
example when the ARAU t ARS dunng mdrrect addressing. For
mapped registers are used for indirect data address points, temporary storage, CPU
~cremented
. ' s eps across a row of a matrix th . d' d
by I. However, when theARAU ste ' em rreet a dress is status and control, or integer arithmetic processing through the ARAU.
mcremented by the dimension of the matrix. ps down a column, the address is
Digital Signal Processor 7.15
7.14 Digital Signal Processing
Data Memory Page Pointer Bits (DP): These bits specify the address of the
7.6.6 Program Controller current data memory page.
Auxiliary Register Buffer (ARB): It holds the previous value contained in the
~he program controller contains logic circuits that perform the
tIons: c: 11OWIng
. opera-
10 ARP in STO. Whenever the ARP is loaded, the previous ARP value is copied to
the ARB, except when using the LST #0 instruction. When the ARB is loaded
1. Decodes the instructions. using the LST #1 instruction, the same value is also copied to the ARP.
2. Manages the CPU pipeline. On-chip RAM Configuration Control Bit (CNF): It enables the on-chip dual
3. Stores the status of CPU operation. access RAM block 0 (DARAM BO) to be addressable in data memory space
4. Decodes the conditional operations. or program memory space.

oP!::~:dt~~e~oncurrent m~ory
Parallelism of architecture lets the processor £ (i) If CNF = 0, the on-chip DARAM BO is mapped into data memory space.
o?erations, !etches an instruction, reads an The CNF bit can be cleared by CLRC CNF instruction.
t: 11
gIven machIne cycle. It consists of the 10 . e1ements: an operand 111 any
OW111g (ii) If CNF = 1, the on-chip DARAM BO is mapped into program memory
1. 16 b~t Program Counter (PC). space. The CNF bit can be set by SETC CNF instruction.

Circular Buffer Control R~giste; (CBCR).


2. 16 bIt status registers STO STI Processor M . Test/Control Flag Bit (TC): It stores the results oftheALU or parallel logic unit
ode Status Register (PMST) and
(PLU) test bit operation. The status of the TC bit determines if the conditional
3. (16 x 16) bit hardware stack. branch, call and return instructions are to be executed.
4. Address generation logic. Sign Extension Mode Bit (SXM): . It enables/disables sign extension of an arith-
5. Instruction register. nletic operation. The SXM bit does not affect the operation of certain arithmetic
6. Intermpt flag register and interrUpt mask register. or logic instruction:ADDC, ADDS, SUBB or SUBS instruction.
Carry Bit (C): This 1 bit field indicates an arithmetic operation carry or borrow
7.6.7 Status Registers
in the ALU. The single bit shift and rotate instructions affect the C bit.
Hold Mode Bit (HM): This 1 bit field determines whether the central processing
thereby allowing the processor status to be saved ry and loaded from data ~emory,
The status registers can be stored into data memo
unit (CPU) stops or continues execution when acknowledging an active HOLD
bIt assignment details for STO and STI
. .
. ~nd ~estored for subroutInes. The
. . ' are gIven In FIgure 7.10. signal.
c: 11ows:
SIgnificance of the vanous bits of STO and STI are as 10 Pin Status Bit (XF): It determines the level of the external flag (XF) output pin.
Product Shift Mode Bits (PM): These bits determine the product shifter (P-
Auxiliary Register Pointer (ARP): It selects the . : . . SCALER) mode and shift value for the PREG output into the ALU.
.addressing. When the ARP is loaded th . AR to be used 111 IndIrect
.. ' e prevIOus ARP value is . d t
the aUXIlIary register buffer (ARB) in STI. cople 0 The Table 7.2 gives the functions of PM bits.
Overflow Flag Bit (OV): This bit indicat ..
overflows in the ALU. es that an anthmetIc operation Table 7.2 PM bits and their functions.
Overflow Mode Bit (OVM)· Functions
saturation mode in the ALU.Th'IS b'It enables/disables the accumulator overflow PM Bits
P-scalar mode for PREG output
Interrupt Mode Bit (INTM): This bit globally masks or enables all inte~pts. bl bo
0 0 No shift
(a) 0 1 Left shifted by 1 bit, LSB zero-field
1 0 Left shifted by 4 bits, LSB zero-field
15-13 12 11 10 9 8-0 Right shifted by 6 bits, 6 LSB lost
ARP I OV OVM I I INTM I DP
1

(b) 7.6.8 On-Chip Memory


15-13 12 11 10 9 The 'C5xhas a total address range of 224K words x 16 bits. The memory space is
8 7 6 5 4 3 2 1 0
IARBICNFITclSXMI C II HM I XF III I I PM
divided into four individually selectable memory segments:
(i) 64K - word -+ Program memory space.
Figure 7.10 (a) Status register (STO) b't .
I aSSIgnment and (b) status register (STl) bit (ii) 64K - word -+ Local data memory space.
assignment.
Digital Signal Processor 7.17
7.16 Digital Signal Processing

(iii) 64K - word -+ Input/Output ports. 7.6.9 On-Chip Peripherals


(iv) 32K - word -+ Global data memory space. The on-chip peripherals available in'C5x are as follows:
The large on-chip memory of 'C5x includes: (i) Clock generator.
(i) Program Read - Only Memory (ROM). (ii) Hardware Timer.
(iii) Software - Programmable wait - state generators.
(ii) DatalProgram Dual - Access RAM (DARAM).
(iii) DatalProgram Single - Access RAM (SARAM). (iv) Parallel Input/Output ports.
(v) Host Port Interface (HPI).
Program ROM (vi) Serial Port.
(vii) Buffered Serial Port (~SP). .
The 'C5x DSP carry a 16 bit on-chip maskable programmable ROM. The program (viii) Time - Division MultIplexed (TDM) Senal Port~
memory can reside both ON and Off chip. The device configuration can be changed (ix) User - Maskablce Interrupts.
by setting and clearing the MP!MC bit in the processor mode status register (PMST).
IfMP!MC pin is low, then 'C5x selects on-chip ROM. IfMP/MC pin is high, then Clock Generator
'C5x devices start execution from off - chip memory. I k enerator consists of an internal oscillator and a Phase Locked Loop

Data/Program Dual - Access RAM


"[:~~ °c~rc!t.The generator can be driven inte~ally by a crysta1.resona:~p~
dri n externally by a clock source. The PLL CIrcuIt can generate an Intema
All 'C5x DSPs carry a 1056 - word x 16 bit on-chip dual access RAM (DARAM). clo:~ by multiplying the clock source by a specific fact0r and so a clock source
The DARAM is divided into three memory blocks: with frequency lower than that of the CPU can be used.

(i) 512-word data or program DARAM block Bo. Hardware Timer


(U) 512-word data DARAM blockBl.
(iii) .32-word data DARAM block B2. A 16 bit hardware timer with a 4 bit pre-scalarjs.availabl~. The timer can ?e sto(*,::)
rt d et or disabled by specific status bItS. The time counter regIster
DARAM blocks Bl and B2 are always configured as data memory. DARAM block r~sta e ,res nt count of the timer. The timer Period Register (PRD) ~efines the
Bo can be configured as data or program memory. ::~:r~rc:etirner. Timer Control Register (TCR) controls the operattons of the
timer.
Data/Program Single - Access RAM
Software - Programmable Wait - State Generators
All 'C5x DSPs except 'C52 carry a 16 bit on-chip single access RAM of various
sizes which is divided into 2K word and 1K word block that continues in program or l Wait - State logic is incorporated in 'C5x allowing wait-
Software - Programmab e . .h I ff chip
data memory space. The SARAM can be configured by software in one offollowing .. .thout any external hardware for interfacIng WIt sower 0 -
three ways: state gen:~~I~~~~vices. It consists of multiple wait-state generating cir~uits. Each
~em?I?' mmable to operate in different wait states for off-chIp memory
(i) All SARAM are configured as data memory. CIrcuIt IS user-progra
(ii) All SARAM are configured as program memory. accesses.
(iii) SARAM are configured as both data memory and program memory. ParaliellnputiOutput Ports
All 'C5x CPUs support parallel accesses to these SARAM blocks. However, one The 'C5x has 64K input/output ports. Sixteen of these ports aredmem07:a:ed
SARAM block can be accessed only once per machine cycle. In other words, the . Each of the input/output ports can be a dresse Y or
CPU can read from or write to one SARAM block while accessing another SARAM In dat~ mem~ry space .. nstruction that reads from or writes to data memory. The
OUT InstructIOn or any 1 . rt
block. IS signal indicates a read or write operation through an Input/output po .

On-Chip Memory Protection


Serial Port
The' C5x DSPs have a maskable option that protects the contents of on-chip mem- Three different kinds of serial ports are available:
ories. When the related bit is set, no externally originating instruction can access
the on-chip memory spaces. (i) General - purpose serial port.
7.18 Digital Signal Processing
Digital Signal Processor 7.19

(ii) Time - division multiplexed serial port. 7.7 Addressing Modes


(iii) Buffered serial port.
The 'C5x supports the following six addressing modes:
Each 'C5x contains at least one general-purpose, high speed synchronous, full-
duplexes serial port interface that provides direct communication with serial devices 1. Direct addressing.
such as codes, serial analog to digital converters and other serial system. It is capable 2. Indirect addressing.
of operating at up to one-fourth the machine cycle. 3. Immediate addressing.,
Five 16 bit registers that control and operate the serial port interface are as 4. Dedicated - register addressing.
follows: 5. Memory - mapped register addressing.
6. Circular addressing.
(i) Serial Port Control (SJ>C): It contains the mode control and status bit of the
serial port.
(ii) Data Receive Register (DRR): It holds the incoming serial data.
7.7.1 Direct Addressing
(iii) Data Transmit Register (DXR): It holds the outgoing serial data. The data memory used in C5x processors is split into 512 pages e~ch of 128 words
(iv) Data Transmit Shift Register (XSR): It controls the shifting of the data from long In the direct addressing mode of C5x, only lower-order 7 bIts of the address
the DXR to the output pin. are s~ecified in the instruction. The upper 9 bits are taken from the data memory
(v) Data Receive Shift Register (RSR): It controls the storing of the data from the . ter (DP) . The DP in STO holds the address of the current data memory
page pOln
input pin to the DRR. page as shown in Figure 7.11.
Host Port Interface
DP (9 bits) Instruction Register (16 bits)
It is available on the 'C57s and 'LC57. It is an 8 bit parallel inputloutputport that
provides an interface to a host processor. Information is exchanged between the
DSP and the host processor through on-chip memory that is accessible to both the 9 bits
host processor and the 'C57. 7 bits LSBs
MSBs

Buffered Serial Port 15 ~ 7 6 , 0

DP Direct memory address


It is available on the 'C56 and 'C57. It is a full-duplexes, double-buffered serial
port and an auto buffering unit. It provides flexibility on the data stream length. It
Figure 7.11 16 bit data memory address bus.
supports high speed data transfer and reduces interrupt latencies.

TDM Serial Port


7.7.2 Indirect Addressing
It is available on the 'C50, 'C51 and 'C53. It is a full-duplexes serial port that can
be configured by software either for synchronous operations or time division mul- The Auxiliary registers (ARO - AR 7) .are used. for accessing data,. usi~g indirect
tiplexed operations. It provides simple and efficient interface for multiprocessing addressing mode. In indirect addressing, out of elghtARs the one whIch IS currently
applications. used for accessing data is denoted by the regist~r ~RP. The co~tents of ARP c~n
be temporarily stored in the ARB register. The IndIrect addressmg mode permIts
User - Maskable Interrupts the AR used for addressing to be updated automatically eit~er after or before the
operand is fetched. Hence, a separate instruction is not requIred to update the AR.
The 'C5x has four external interrupts (INTI - INT4) and five internal interrupts However it requires the contents of an AR to be incremented or decremented by a
(a timer interrupt and four serial interrupts are used maskable). When an interrupt 8 bit constant using SBRK andADRK instructions. i.e.,
service routine (ISR) is executed, the contents of the program counter are saved
on an 8-level hardware stack and the contents of 11 specific CPU registers, ACC, SBRK #K -+ subtracts the constant K from AR pointed by ARP.
ACCB, PREG, STO, STI, PMST, TREGO, TREGI, TREG2, INDXandARCRare ADRK #K -+ adds the constant K to the AR pointed by ARP.
saved in one deep stack (shadow registers). When a return from interrupt instruction The symbol used to indicate the. indi~ect addressing mode and the action taken
is executed, the CPU registers contents are restored. after executing the instruction are gIven In Table 7.3.
Digital Signal Processor 7.21
7.20 Digital Signal Processing
Table 7.4 Instructions that support immediate addressing.
Table 7.3 Various options in the indirect addressing.
8 bit constant 9 bit constant 13 bit constant
Symbol Value of AR after instruction execution
ADD LDP MPY
Short immediate addressing
* AR unaltered ./ ADRK
AR incremented by 1 LACL·
AR decremented by 1 LAR
RPT
*0+ AR incremented by the content of INDX
SBRK
*0- AR decremented by the content of INDX SUB
*BRO+ AR incremented by the content of INDX with reverse carry propagation
ADD AND SPL CPL
Long immediate addressing
*BRO- AR decremented by the content of INDX with reverse carry propagation LACC LAR MPY OPL
(16 bit constant)
OR RPT RPTZ XPL
SPLK SUB XOR
For example, let the value of ARP -+ 2, AR2 -+ 1250h and INDX -+ 2h
and content of data memory location 1240h - 1260h be filled with data 2345h.
The contents o! ACC and AR2 after the following sequence of LACC instructions Short Immediate Addressing
executed are gIven below: In short immediate instructions the operand is contained within instruction machine
code. For example:
Contents after execution ADD #OFEh - lower 8 bits are the operand and are added to the ACC.
Instruction executed ACC AR2

(i) LACC *, 0 I2345h I r 1250h I Long Immediate Addressing


In long immediate instructions, the operand is contained in the second word of a
(ii) LACC *+,1 I 468Ah I ~ 2-word instruction. There are two long immediate addressing mode: one-operand

(iii) LACC *-,2 I 9786Ah I I 1250h I


instruction and two-operand instructions. For example:
ADD #1245h - the second word (1245h) of the 2-word instruction is added to ACC.
(iv) LACC *0+,4 IllA28h I I 1252h I BLDD #2345h, 012h - the source address (operand 1) is fetched via PAB, the
decimation address (operand 2) uses the direct addressing mode. Bits 15-8 of
(v) LACC *0-,3 18D14h I I 125hOh I machine code contains the opcode. Bit 7 with a value of 0 defines the direct
addressing and bits 6-0 contains the dma.

7.7.3 Immediate Addressing 7.7.4 Memory-Mapped Register Addressing


The immediate addressing mode can be used to load either 16 bit constant or a . The registers corresponding to data memory page 0 are referred to as memory -
const~nt of l~ngth 13, 9.or 8. Accordingly it is referred to as long immediate or
mapped registers (MMRS). With memory - mapped register addressing, the MMRS
short ImmedIate .addressmg mode. This mode is indicated by the symbol # For can be modified without affecting the current data page pointer value. The memory -
example: . m.apped register addressing mode operates like the direct addressing mode, except
that the 9 MSBs of the address are forced to 0 instead of being loaded with the
ADD #55h -+ adds 55h to Accumulator (ACC) [short immediate addressing]. contents of the DP. This allows the memory - mapped registers of data page 0 to
ADD #1267h -+ adds 1267h to ACC [long immediate addressing]. be modified directly without the overhead of changing the DP or auxiliary register.
The following instructions operate in the memory - mapped register addressing
Th~refore: in immediate addressing, the instruction word(s) contains the v~lue mode.
of the ImmedI~te ope~an~. The 'C5x has both 1 word (8 bit, 9 bit and 13 bit • LAMM - Load accumulator with memory - mapped register.
~onstan9 short ImmedI~te Ins~ctions and 2-word (16 bit constant) long immediate • LMMR- Load memory - mapped register.
InstructIOn. Table 7.4 lIsts the Instructions that support immediate addressing.
7.22 Digital Signal Processing Digital Signal Processor 7.23

• SAMM - Store accumulator in memory - mapped register. The advantage of this addressing mode is that the address of the block of.memory
• SMMR - Store memory - mapped register. to be acted upon can be changed during execution of the program. For example:

For example: 1. BLDD BMAR, DAT 100; DP=O


IfBMAR contains the value of 300h, then the content of data memory location
(i) LMMRARO, #1500h 300h is copied to data memory location 100 on the current data page.
2. OPL DAT10; DP=6
Before execution After execution If DBMR contains the value of OFFFOh and the address 030Ah contains the
value of 01h, then the content of data memory location 030Ah is ORed with
Data memory Data memory ~ the content ofDMAR. The resulting value OFFF1h is stored back in memory
1500h 1500h ~
location 030Ah.
ARO ARO I 6785h I
7.7.6 Circular Addressing
(U) SMMRARO, #1600h
Circular addressing is the most sophisticated 'C5x addressing mode. Many algo-
Before execution . After execution rithms such as a convolution, correlation and finite impulse response (FIR) filters
can use circular buffer in memory to implement a sliding window, which contains
Data memory ~ Data memory ~ the most recent data to be processed. There are five memory - mapped registers
1600h ~ 1600h ~ that control the circular buffer operation and they are:
ARO I 1325hl ARO 11325h I • CBSR1 - circular buffer 1 start register.
• CBSR2 - circular buffer 2 start register.
(iii) LAMM* • CBER1 - circular buffer 1 end register.
• CBER2 - circular buffer 2 end register.
Before execution After execution • CBCR - circular buffer control register.

ARP

AR1 I 325hl
ARP

AR1 I
o325hl
The 8 bit CBCR enables and disables the circular buffer operation. First, the start
and end addresses are loaded into corresponding buffer register. Next, a value
between the start and end registers for the circular buffer is loaded into an AR. The
corresponding circular buffer enable bit in the CBCR should be set.
Data memory
825h ~
~
8;;~ memory
D
I 1234h I
7.8 Instruction Sets
25h II 2345h 25h II 2345h
7.8.1 Addition/Subtraction Instructions
ACC ~ ACC ~ In the addition/subtraction instructions of 'C5x, one of the operands is ACC. The
other operand can be PREG, ACCB or the content of memory fetched using one of
After execution
is loaded into of the LAMM*'IllStructlon,
. . data memory location 25h
the value III the addressing modes. .
bits of PAB ACC~5h corresponds to the lower-order 7 bits ofAR1 and the higher
are rna e to be 0 as the MMR corresponds to page O.
7.B. 1. 1 Addition Instruction
7.7.5 Dedicated-Register Addressing 1. ADD dma, [shift] (direct addressing): The contents ofthe data memory address
(dma) or a 16 bit constant are shifted left as defined by the shift code and added
Ing mode except that the add
, .
~
The dedicated - register addressing mode operates like the I . .
. ong ImmedIate address-
ress comes lrom one of two sp . I
to the contents of the accumulator and the result is stored in ACC. For example:
mapped registers in the CPU: ' eCla - purpose memory ADD 55h, 2: ACC is added with the content of a data memory with dma 55h in
the current page after shifting it left by two positions.
~~) Block ~ov~ Addr~ss R~gister (BMAR). ADD {immediate}, [shift], [ARn] (indirect addressing): short immediate-
( ) DynamIc BIt ManIpulatIOn Register (DBMR). ADD #K and long immediate -ADD #2K, [shift].
Digital Signal Processor 7.25
7.24 Digital Signal Processing

'.8. 1.2 Subtraction Instructions


Examples
1. SBB: The contents of the accumulator buffer (ACCB) are subtracted from the
(a) ADD # 23h: ACC is added with the immediate constant 23h. contents of the ACC. The result is stored in the ACC and the contents of the
(b) ADD # 2345h, 2: data 2345h is left shifted by two positions before it . ACCB are unaffected. The C bit is cleared if the result generates a borrow,
added to ACC. IS otherwise the C bit is set. For example:

Before execution After execution Before execution After execution of SBB instruction

ACC I 1234h I ACC 19F48h I ACC


0 12000h I ACC CD
C
IIOOOh I
C
(c) ADD *, 1, AR2
ACCB 11000h I ACCB \ 1000h I
Before execution After execution
2. SUB dma, [shift]: ACC is subtracted with the content of data memory address

~
(dma) in the current page after shifting it left by specified shift position. For
ARP 1 ARP
example:
AR1 I 2100h I AR1 I 2100h I (i) SUB 25h, 2: ACC is subtracted with the content of data memory with dma
25h after shifting it left by two. position.
Data memory
2100h I 4563hI Data memory
2100h I 4563h I (ii) SUB *, 2: ACC is subtracted with the cohtent of location pointed by the
current AR after shifting it left by two position.
ACC I 1234h I ACC 19CFAh I Other subtraction instructions: SBBB, SUBB, SUBC, SUBS, SUBT and SBRK.

2. AD~ #K: The 8 bit i~ediate constant value is added to the current auxiliary
regIster (AR). The result IS stored in the AR. For example: . 7.8.2 Multiplication Instruction
In multiply instructions, one ofthe operands is taken from TREGO and other operand
ADRK#25h
is specified using one of the addressing modes.
Before· execution After execution 1. MPY: Multiply numbers in 2 's complement form.
_ Direct addressing: MPY dma - The contents of TREGO are multiplied by the
ARP
D ARP
~ contents of data memory address and the result is stored in the PREG.

AR2 I 4200h I AR2 I 4225h I _ Indirect addressing: MPY {ind} , [ARn]


_ Short immediate: MPY #K
_ Long immediate: MPY #LK
3. ~.DCB: The contents of the accumulator buffer (ACCB) and the value of the C
It are added to the contents of the ACC and the result is stored in ACC Th 2. MAC: Multiply and accumulate.
content of ACCB are ~ffected. The C bit is set, if the result of the adilitio: MAC pma, dma (direct addressing): The contents of the product register
generates a carry; otherwIse the C bit is cleared. For example: (PREG) are shifted, as defined by the PM bits and added to the ACC. The result
is stored in the ACC. The contents of the data memory address (dma) are loaded
Before execution After execution into TREGO. The contents of the dma are multiplied by the contents of pma.
The result is stored in PREG. The C bit is set, if the result of addition generates
ACC
CD I
C
1235h I ACC
0 I
C
1258h I a carry otherwise C bit is cleared.
3. MACD: Multiply and accumulate with data move.
ACCB 2h ACCB 2h MACD pma, dma (direct addressing): The contents of dma multipiied by the
contents of pma and the result is stored in PREG. The previous value of the
Other addition instructions: ADDB, ADDC, ADDS and ADDT.
Digital Signal Processor 7.27
7.26 Digital Signal Processing

PREG is shifted as specified by the PM bits and added to the ACC before the Other AND instruction is ANDB: The contents of the ACC are ANDed with the
PREG is loaded with the product. For example: contents of the ACCB. The result is stored in theACC and the contents of the ACCB
are unaffected.
MACD OFFOFh, ISh
OR instruction
Before execution After execution
ORing the ACC with a long constant K, the content of a dma or pma and the result

~ ~
Data memory Data memory
315h 315h is stored in the ACe.
Data mem01Y
316h §J Data memory
316h 1
45h
l
1. Direct addressing: OR dma.
2. Indirect addressing: OR {ind} [, ARn]·

~ ~
Programme memory Programme memory 3. Long imn1ediate: OR # lk, [shift].
FFOFh FFOFh
For example: OR * ARI
TREGO I 1234h I TREGO
~ After execution
PREG I4563h I PREG I I Before execution

ACC I 1234h I ACC


0045h

I 5797h I ARP OJ ARP 8


ARO ~ ARO 11l00h I
Other multiplication instructions:
1. MPYU: multiply unsigned numbers.
Data memory
llOOh
I 1235h I Data memory
1100h
I 1235h I
2.
3.
MPYA: multiply and add the product to ACC.
MPYS: multiply and subtract the product from ACC. ACC ~ ACC I 2346h I
4. MADD: multiply and add the product to ACC address of an operand given
byBMAR. Other OR instruction is ORB: The contents oftheACC are ORed with the contents
5. MPYA: multiply and add the product to ACC and move the on-chip RAM of the accumulator buffer.
by one word.
XOR Instruction
7.8.3 Shiftl Logical Instructions XORing the ACC with a long constant K, the content of a dma or pma and the result
7.8.3. 1 Logicallnstructions is stored in ACC

AND Instruction 1. Direct addressing: XOR dma.


2. Indirect addressing: XOR {ind} [, shift].
1. Direct addressing: AND dma. 3. Long immediate: XOR #lk [, shift].,
2. Indirect addressing: AND {Ind}, [ARnl.
For example: XOR SOh
3. Long immediate: AND # LK, [shift].
Before execution After execution
One of the operands for the AND instructions is ACC. The other operand can
be the content of a memory locations specified using direct, indirect address-
ing. Alternately a long constant can be specified using immediate addressing. For DP DP 0
example:
Data memory [4. 563h , . .Data memory
FFDOh
I I
4563h
FFDOh
AND #234Sh, 2
ACC r 1234h \ ACC I I
5757h
Before execution After execution
Other XOR instruction is XORB: The contents of the ACC are X-ORed with the
ACC I 1234h I ACC I 0014h I contents of the accumulator buffer (ACCB).
· 7.28 Digital Signal Processing
Digital Signal Processor 7.29
Shift Instructions
For examples:
ROL: Rotate ACC left once: .rotate the contents of the ACC left in 1 bit. The (i) LST #0, OOh
value of carry bit is shifted into LSB of the ACC. The MSB bit of original
ACC is shifted into the C bit.
Before execution After execution
ROLB: Rotate the contents of both ACC and ACCB left once.
ROR: Rotate the contents of the ACC right once. Data memory Data memory
RORB: Rotate the contents of both ACC and ACCB right once. 300h 300h
Other shiftllogicallnstruction STO STO

BSAR: Rotate ACC right by n(1 - 16) bits. STI STI


EXAR: Exchange the contents of ACC and ACCB.
NEG: Find 2 's complement of ACC. (ii) LST #1, OOh
CMPL: Find 1's complement ofACC.
BIT: Copy bit n of a memory onto TC. Before execution After execution
BITT: Copy bit n of a memory onto TC bit in STI. n is given by the 4 LSBs of
TREG2. Data memory
300h
1102Sh I Data memory
300h
1102ShI
7.8.4 Load/Store Instructions STO ISE07h I STO 11E07h I
LACB:
LACC:
Load the contents of the ACC to ACCB.
Load data memory value, with left shift, to ACC (LACC dma, shift).
STI I 1234h I STI 11025h I
: Load long immediate, with left shift to ACC (LACC #lk, shift). (iii) LST #1, *, AR3 (indirect addressing)
LACL: Load data memory value to ACCL and zeros filled to ACCH (LACL
dma).
Before execution After execution
: Load short immediate to ACCL and zeros filled to ACCH (LACL #K).
LACT: Load data memory value with left shift specified by TREG 1 to ACC.
LAMM: Load conte~ts of memory - mapped register to ACCL; zeros filled to
Data memory
300h I OS87h I Data memory
300h
I OS87h I

ACCH.
SACB: Store the contents of the ACC in ACCB~
STO ISE07h I STO 11E07h I
SACH: Store ACCH, with left shift in data memory address. STI I 09AOh I STI I OS87h I
SACL: Store ACCL, with left shift in data memory address.
SAMM: Store ACCL in memory - mapped register. ARP
CD ARP ~
LAR: Load data memory value to ARX (Auxiliary register).
SAR: Store ARX in data memory location. (iv) LT 300h
LDP: Load data memory value to DP (Data Pointer) bits.
MAR: Modify auxiliary register. Before execution After execution
SPLK: Store long immediate in data memory loca~ion.
LPH: Load data memory value to PREG higher byte. Data memory I. 60h I Data memory I6Qhl
300h 300h ~
LT: Load data memory value to TREGO.
PAC: Load PREG with shift specified by PM bits to ACC. TREGO ~ TREGO ~
SPH: Store PREG higher byte, with shift specified by PM bits in data memory
location. (v) LDP#30h
LST: Load data memory value to STO.
Load data memory value to STI.
SST: Store STO in data memory location.
: Store STI in data memory location. DP ~ DP
Digital Signal Processor 7.31
7.30 Digital Signal Processing
\
For example:
(vi) LACC *,4
(i) BLDPOOh
Before execution After execution --~--B-e-{t-o-re--ex-e-c-u-ti~o-n------------A-f~te-r-e-x-e-c-u~ti-o-n------

ARP ARP Data memory


400h , 2523h I Data memory
·400h I 2523h I
AR3
Data memory
AR3
Data memory
BMAR I 1850h , BMAR \ 1850h \

100h
ACC
100h
ACC
Programme memory
1850h \1254h ,
Programme memory
1850h \ 2523h I
(ii) BLPD OOh
(vii) LACB
------------------------~~-
After execution
Before execution
Before execution After execution
Data memory ~ Data memory
ACC 1 0000 1234h I ACC 1234 5678h I 300h ~ 300h
Programme memory
ACCB 11234 5678h I ACCB 1234 5678h 1
Programme memory
2850h IIIOOh I 2850h

(viii) LAR AR4- BMAR I 2850h I BMAR

Before execution After execution (iii) BLPD #250h, OOh

ARP-0 ARP Before execution


~
After execution

Data memory
Data memory
AR4 1 300h I AR4 300h ~ 300h

300h ~ 300h Programme memory


250h
~
~I
Programme memory
250h
~
~
The instruction SAMM, LAMM, SMMR and LMMR have already been
explain in Section 7.7.4.
7.8.6 Branch and Call Instructions
7.8.5 Move Instructions The 'C5x has both conditional and unconditional branch and call instruct.ions . T~e
b h and call instructions permit more than one condition to ~e tested usmg sIng e
The data move instruction copies the data from one memory location to the next i~::ction. Branching occurs only if all the conditions are sattsfied.
higher location. It can use either direct or indirect addressing mode.
B: Branch unconditionally. .
BLDP: Block move data from data memory to program memory. BACC: Branch unconditionally to the address gIven by ACC ..
BLDD: Block move data from one data memory to another. BACCD: Delayed branch to pma specified by ACCL.
BLPD: Block move data from program memory to data memory. BCND: Branch conditionally.
TBLR: Block move data from program memory to data memory. The program BCNDD: Delayed branch condi~ionallY to pma.
memory address is contained in ACC lower order word. The dma can be BANZ: Branch conditionally If ARn not zero.
specified using either direct or indirect addressing. BANZD: Delayed branch to pma if AR not zero.
TBLW: Block move data from data memory to program memory. The program BD: Delayed branch unconditionally to pma:
memory address is contained in ACCC. The dma can be specified using CALA: Call a subroutine using indirect addreSSIng.
either direct or indirect addressing.
7.32 Digital Signal Proces$ing

Digital Signal Processor 7.33


CALAD: Delayed call t? subroutine addressed by ACCL.
CALL: Call a subroutme unconditionally. 7.8.11 NORM Instruction
CALLD: Delayed call to subroutine unconditionally.
CC: Call a subroutine conditionally.. It is useful for converting a fined point number into a floating point number. The
CCD: Delayed call to subroutine conditionally. number to be converted is stored in ACC. In a sign magnitude representation the
MSB is sign extended bit, and only the remaining bits denote the magnitude. Every
7.8.7 PUSH and POP Instructions time NORM instruction is executed, it removes an extra bit in ACC, which denotes
the bit corresponding to sign extension. By repeated use of the NORM instruction,
PUSH: Pushes the values down one levei in the seven lower locations of the the ACC can be made to contain only the magnitude. The exponent is stored in the
staChk. The contents of ACCL are copied to the top of the stack currentAR.
PUSHD: P us es a data memory I t· h . .
f1 . . oca IOn to t e top of the stack instead of ACC
a ter pushmg the contents of the stack one level down
POP: Pops the top of the stack to ACC. ; .
POPD: Pops the top of the stack to a data memory.
7.9 Architecture of 54x
When the stack is popped' th b tt d. The block diagram ofTMS320C54x internal hardware is shown in Figure 7.12. It
two word t· h , e 0 om wor IS left unaffected and hence the bottom
s con am t e same values. consists of the CPU containing the various functional units such as ALU, MAC unit,
Exp encoder, barrel shifter, memory mapped registers, system control interface,
7.8.8 RET Instructions peripheral interface, memory and external interface, program address generation
logic (PAGEN) and data address generation logic (DAGEN) end eight 16 bit buses
RET: Return from subroutine: The contents of the top of the stack are co ied which interconnect these units. The comparison of the features of 5x and 54x are
to the p.rodgram counter. The stack is poped one level after the cont~nts tabulated in Table 7.5.
are COPIe .
RETCD: Delayed return frOlTI subroutine conditionally
RETD: Delayed retmn from subroutine. . 7.9.1 Bus Structure

7.8.9 Repeat Instructions 'C54x consists of eight 16 bit buses i.e., four program/data buses and four address
buses. Program bus (PB) carries the instruction code and immediate operands from
. RPT: Repeat next instruction: The iteration .. . program memory.
addressing, indirect addressing short 11 colunt c.an be ~pecIfied usmg dIrect
. , a s we as ong ImmedIate.addressing.
1. DIrect addressing: RPT dma. Data buses (CB, DB and EB)
2. Indirect addressing: RPT *.
CB and DB carry the operands that are read from data memory. EB carries the data
3. Short imm"ediate addressing: RPT #7. to be written to memory. Address buses (PAB, CAB, DAB and EAB) - carry the
4. Long immediate addressing: RPT #2345h. addresses needed for instruction execution.
RPTB: 'Block of instructions repeated The nu b . ..
repeat count register (BRCR). .. m er of repeats speCIfied m block
RPTZ: Repeat preceded by clearing both ACC and PREG. 7.9.2 Internal Memory Organization
For example: The '54x memory is organized into three individually selectable spaces: program,
data and input/output spaces (DARAM and SARAM). It contains both RAM and
RPTZ #K: Clears both ACC d PREG d
times. . an an then executes the next instruction· K. ROM.
On-chip ROM: The on-chip ROM is part of program memory space (20K. word)
and in some cases, part of the data memory. Therefore the ROM may be mapped
7.8.10 IN and OUT Instruction into both data and program space (8K. words)
On-chip Dual access RAM (DARAM): It consists of several blocks and each
IN: Reads a 16 bit number from inp t rt d . .
OUT: Reads a 16 bit b· fi u po an stores It In the data memory location. block can be accessed twice per machine cycle, the CPU can read from and write
num er rom data memory and writes it onto the output port. to a single block ofDARAM in the same cycle. It is mapped in to program space
(5K. word).
Digital Signal Processor ].35'
7.34 Digital Signal Pro~essing
Table 7.5 Comparison of the features of 5x and 54x.
Program Address Generation Data Address Generation
Logic (PAGEN) Logic (DAGEN)
5x 54x
Description
PC.1PTR. RC ARAUO, ARAU1 ARO-AR7
BRC, RSA, REA ARP, BK. DP. SP One, PB One, PB
Name of program bus
DB and CB (for Read)
Name of the data bus One, DB
EB (for write)
PAB, CAB, DAB, EAB
Name of address buses PAB,DAB
32 bitALU 40bitALU
MainALU
Memory and 40 bit ACCA and ACCB
Accumulators 32 bitACC
Extemallnterface
0-16 bit left shift 40 bit: 0-31 left shift
Barrel shitter 0-15 right shift
0-16 bit right shift
16 x 16 bit 17 x 17-bit
Multiplier
Peripheral 40 bit
Interface Adder 32 bit
ARAUO & ARAUI
Auxiliary Register ALU ARAU
ARO-AR7 ARO-AR7
Auxiliary registers
Not available' 16 bit: SP
Stack pointer (SP)
Two 16 bit start & end register 16 bit BK
Circular buffer register
16 bit PMST, STU, STI 16 bit PMST, STO, STI
Status registers
16 bit BRCR, PASR, PAER 16 bit BRC, RSA, REA
Block repeat registers
16 bit PC
Program counter 16 bit PC
Not available 7 bitXPC
Extended program memory
16 bit IMR and IFR
Interrupt registers 16-bit IMR and IFR
Same as that of 5x
General purpose I/O BIO andXF
SWWSR
Wait state generator PDWSR
Same as that of 5x
Hardware timer 16 bit timer
Same as that of 5x
Clock generator PLL based
Full duplex and double buffered Same as that of 5x
Synchronous serial port
Multiplier (17* 17) Upto 7 devices using TDM can Same as that of 5x
TDM serial ports
communicate serially
Standard 5x serial port with Same as that of 5x
Buffered serial port
additional.auto buffering unit
8 bit standard HPI 8 bit standard HPI or
Host port interface enhanced 8 bit and .
16 bit HPI
Available
Multichannel buffered serial Not available
port including internal
programmable clock and
other advanced features
Available
On-chip ROM for look up Not available
table for A law, Jl, law
companding, sine wave
Figure 7.12 Block-diagram of 54x internal hardware. generation
7.36 Digital Signal Processing

Digital Signal Processor 7.37


On-chip Single - Access RAM (SARAM)' .
block is accessible once per machI'ne c 1 c. . ~thconslsts of several blocks. Each
. t yc e lor elt er a read . . Status Register ST1
In 0 program space. or a wnte. It IS mapped
The status register STI bits are shown in Figure 7.14.

7.9.3 Central Processing Unit (CPU) 15 14 13 12 11 10 9 8 7 6 5 4-0

The 'C54x CPU contains: I BRAF I IXI' I I


CPL HM INTM I I
0 OYM I I
SXM C161 FRCT I CMPT IA~
: 40 bit Ari~hmetic Logic Unit (ALU). Figure 7.14 Status registerSTl.
Two 40 bIt accumulator registers BRAF (Block repeat active flag).
• Barrel shifter. .
- if BRAF = 1, block repeat is active.
• Multiply/Accumulate Block
- ifBRAF = 0, block repeat is de-active and BRC decrements below O.
• 16 bit Temporary Register. .
• 16 bit Transition Register (TRN). CPL (Compiler mode) - CPL = 0, the relative direct- addressing mode using the
• Compare, Select and Store Unit (CSSU) data page pointer is selected otherwise using 'stack pointer is selected.
• Exponent encoder. . XF (XF status) External flagepin: The SSBX instruction can set XF and RSBX
instruction can reset XF.
HM (Hold mode): HM indicates whether the processor continues internal execution
Status Register (S10)
when acknowledging an active HOLD signal.
The status register STO bits are shown in FigU~~.l3.
INTM (Interrupt mode): IfINTM = 0, all unmasked interrupts are enabled. IfINTM
= 1, all maskable interrupts are disabled.
OVM (overflow mode): OVM determines what is loaded into the designation
accumulator when an overflow occurs.
15-13 12 11 10 9 8
°
I /J
SXM (Sign-extension mode): SXM = sign extension is suppressed. SXM = 1,
[ ARP I I I I
TC C OVA OVE data is sign extended before being used by the ALU,
C16 (Dual 16 bit/double precision arithmetic mode). It determines the arithmetic
Figure 7.13 Status register STO.
and C 16 = 1 Dual 16 bit arithmetic mode.
°
mode of the ALU's operation. If C 16 = double precision arithmetic mode

CMPT (Compatibility mode): It determines the compatibility mode for the ARP.
ARP (Auxiliary register pointer)' It sel .. ASM (Accumulator shift mode): Specify the shift value within -16 through 15 range
indirect addressing. . ects the aUXIlIary register to be used in is coded as a 2 's complement form.
TC (Test/control flage): TC stores the It '. Processor Mode Status Register (PMST): It is loaded with memory - mapped
test bit operations. TC is affected ~es~ s of the anthmetIc logic unit (AtU) register instruction such,as STM.
CMPS and SFTC instructions. The s~a~u~ BIT, BITF, BITT, CMPM, CMPR,
the conditional branch call exe t 'd (set or cleared) of TC detennines if Auxiliary Registers \
C (carry) - C = 1 ifresult of, dd't'
, cu e an return in t t'
s ruc IOns are executed.
a 1 IOn generates a ca C - 0 'f 'C54x also includes eight auxiliary registers and a software stack to enable a
generates a borrow. rry. - 1 result of subtraction
highly-optimized C compiler. The eight 16 bit auxiliary register (ARO-AR7) can
OVA (overflow flag for ACCA) - OVA = 1 h .
ALU or multiplier's adder in ACCA w en an overflow Occurs In either the be accessed by the CPU and modified by the auxiliary register arithmetic units
(ARAUS). The primary function of the auxiliary registers is to generate 16 bit
OVB (overflow flag for ACCB) - OVA~ 1 h .
ALU or multiplier's adder in ACCB w en an overflow occurs In either the addresses for data space.
DP (Data - me~ory page pointer): This 9-bit field i . Barrel Shifter
LSBs ofan Instruction word to c. d' s concatenated WIth the seven
. I lonn Irect memory add f 16 b'
SIng e data memory operand addressin Th' . re~s 0 ItS address for The 40 bit barrel shifter of 'C54x can perform arithmetic and logical shifts by up to
mode bit is STI (CPL) = o. g. IS operatIOn IS done if the compiler 31 bits left or by up to '16 bits rights in a single instruction cycle. Shifter inputs can
come directly from data memory or from either of the two accumulators., Shifter
outputs can be sent to the ALU or stored in memory.
7.38 Digital Signal Processing

The barrel shifter is also used for scaling operatio s such as: Digital Signal Processor 7.39
l1
• operation.
Pre-scaling an input data memory operand or the ACC value before an ALU
. . Ie from one of six different instructio~s can be
instructions. Dunng any gIven cyc, If The six levels and functIOn of the
• Perfonning a logical or arithmetic shift of the ACC value. active, each at a different stages of comp e IOn.
• Nonnalizing the ACC. pipeline structure are:
• Post-scaling the ACC before storing the accumulator value into data memory. • Program pre-fetch.
Compare, Select and Store Unit (CSSU) • Program fetch.
• Decode.
• Access.
• Read.
old state If (met 1 + D 1) > (met 2+D ) • Execute ..
2
then new met 1 =met 1+Dl
2J else new met 1 = met 2 +D2 7.9.5 On-Chip Peripherals
(met 1)
The 'C54x has the following on-chip peripherals:
General- purpose input/output pins.
• Software - programmable wait state .generator.
2J+I' L~_---------~
J+STNB/2 • Programmable book - switching lOgIC.
(met 2) (Newmet2) • Host port interface.
• Hardware timer.
(old metrics)
• Clock generator.
• Serial ports.
Figure 7.15
metrics). Viterbi operator (SlNB = no. of states, met path metrics and 0 = branch
• Synchronous serial ports.
• Buffered serial port~. M) erial ports.
• Time Division MultIplexed (TD s
The compare, select ana store unit (CSSU) is an application specific hardware unit
dedicated to add/compare/select (ACS) operations of the viterbi operator. Figure
7.9.6 Data Addressing
7.15 shows the CSSU, which is used with theALU to perform fastACS operations.
The CSSU allows the 'C54x to support various viterbi butterfly algorithms used . d at a addressing modes:
The 'C54x offers seven baSIC
in equalizers and channel decoders. The add function of the viterbioperator is
• Immediate addressIng: uses t ~ Ins t' to encode a fined address.
. h . truction to encode a fined value.
performed
1 by the ALU. This function consists of a double addition function (Met I
2
D and Met ID2). Double addition is completed in one machine cycle if the ALU • Absolute addressing: uses the mstruc 10ttor A to access a location in program
is configured for dual 16 bit mode by setting the C 16 bit in ST 1. • Accumulator addressing: uses accumu a
The CSSU implements the compare and select operation via the CMPS instruc- memory as data. . ~
instruction to encode the lower 7 b'Its fan
tion,a comparator and the 16 bit transition (TRN) register. This operation compares • Direct addressing: uses 7 bIts ?ft~e d t page pointer (DP) or the stack pomter
address. The 7 bits are used WIth t e ~~
two 16 bit parts of the specified accumulator and shifts the decision into bit 0 of
(SP) to determine the actual memory a ress. ory
TRN. This decision is also stored in the TC bit of STO. Based on the decision,
h ARS to access mem . .
the corresponding 16 bit part of the accumulator is stored in data memory: TRN • Indirect addressing: uses t e . es the memory _ mapped regIsters
• Memory mappe r~gIs
d . ter addreSSIng: us I
register contains information of the path transition decisions to new states. This
ent DP value or the current SP va ue.
information can be used for a back - tracking routine that finds the optimal path, without modifying eIther the cu:r d oving in terms from the stack.
which results in de.coding the 'code. d . manages addmg an rem
• Stack ad ressIng: . . . d' ct or memory mapped regis-
f' t t' ons USIng direct, In Ire dr
7.9.4 Pipe Line During the execution 0 Ins ruc I . I . (DAGEN) computes the ad esses
ter addressing, the data - address generatIOn OglC . .
of data-memory operands.
The 'C54x DSP has a six-level deep instruction pipeline. The six stages of the
pipeline are independent of each other, which allows overlapping execution of
Digital Signal Processor 7.41

, 7.40 Digital.Signal Processing

7.10 Simple·Assembly Language Program


Example 7.2
To write a program for 64 bit addition.

Example 7.1 Solution In this example, two 64 bit numbers X and Y will be added as follows:

To write an assembly language program for 32 bit addition. X ~X3X2Xl Xo


y :=:}Y3 Y2 Yl Yo
Solution ''in this exampIe, two 32 bIt
. numbers X and Y will be added as follows: {+} Z3 Z2 Zl Xo

X ==>Xl Xo
Program:
Y ==> Yl Yo Load the data page pointer
{+} Zl Zo LDP #lOOH
First 32 bit addition
Load the higher accumulator with Xl
. - where Xl and Yl are MSB (16 bit) f X dY LACC 0001, lOR
(16 bit) of X and Y respectively. 0 an respectively; Xo and Yo are LSB ACC = Xl 00
0000 Add the accumulator with Xo
ADDS
Program: ACC =X1 X O
Add the lower accumulator with Yo
LDP #100H Load the data page pointer ADDS 0004
ACC = XlXo + OOYo
LACC OOOl,lOH Load the higher accumulator with Xl Add the higher accumulator with Y1
ACC = Xl 00 . ADDS 0005, lOR
Acr=~~+~~ /
ADDS OOOQ Add theACC withXo Store the content of lower accumulator (ACCL)in
ACC =XlXO SACL 0008
dma 8008
ADDS 0002 Add the ACC with Yo
ACCL=Zo
ACC = XlXO + OOYo Store the content of higher accumulator (ACCH) in
ADD 0003,10H Add the ACC with YI with shift SACR 0009
dma 8009
ACC = XIXO + Yl Yo
ACCR = Zl
SACL 0004 Store the content of lower accumulator in dma 8004' _-.Second 32 bit addition
ACCL = Zo ' load the higher accumulator with X3
SACH 0005 Store the content of higher accumulator indma 8005' LACC 0003, lOR
ACC =X3 00
ACCH =Zl ' Add the accumulator with X2 and first 32 bit
H: B ADDC 0002
addition CARRY
For example ACC = X3X2 + CARRY
ACC = X3X2 + 00Y2 + CARRY
Inputs Outputs ADDS 0006
0007, lOR ACC = X3X2 + Y3 Y2 + CARRY
8000 - 2439 (Xo) 8004 - 355F (Zo) ADD
Store ACCL in dma 8010;
8001 - 5523 (Xl) 8005 - C6A8 (Zl) SACL 0010
ACCL = Z2
8002 - 1126 (Yo) Store ACCH in dma 8011;
8003'- 7185 (.YI) SACR 0011
ACCH = Z3 -
R: B
7.42 Digital Signal Processing Digital Signal Processor· 7.43

For example
Inputs Outputs Example 7.4
8000 - 1123 (Xo) 8008 - 6465 (Zo) To write an assembly language program for 64 bit substraction.
8001 - 8279 (Xl) 8009 - 53CB (Zl)
8002 - A453 (X2) 8010 - 2B99 (Z2)
8003 - CB21 (X3) 8011 - EE34 (Z3) Solution In this example, two 64 bit numbers X and Y will be subtracted as
8004 - 5342 (Yo) follows:
8005 - D152 (Yl)
8006 - 8745 (Y2) x ===} X3 X2 Xl Xo
8007 - 2312 (Y3) Y ===} Y3 Y2 YI Yo
{-} Z3 Z2 Z1 Zo

Program:
Example 7.3
LDP #100R' Load the data page pointer
To write an assembly language progranl for 32 bit substraction. First 32 bit substraction
LACC 0001, lOR Load the higher accumulator with Xl
Solution In this example two 32 bit numbers X and Y will be subtracted as ACC = Xl 00
follows: ADDS 0000 Add the accumulator with Xo
ACC =XIXO
X ==}XlXO SUBS 0004 Subtract the accumulator with Yo
ACC = XIXO - OOYo
Y ===}Yl Yo
SUB 0005, lOR Subtract Yl from accumulator with shift
{-} Zl Zo
ACC = XlXO - YI Yo
Program: SACL 0008 Store ACCL in dma 8008
: ACCL=Zo
LDP #lOOH Load data page pointer Store ACCR in dma 8009
SACR 0009
LACC 0001, lOR Load the higher accumulator with Xl \ ACCR=ZI
ACC = Xl 00 second 32 bit substraction
ADDS 0000 Add the accumulator with Xo . load the higher accumulator with X3
LACC 0003, lOR
ACC =XIXO ACC =X3 00
SUBS 0002 Subtract Yo from ACC Add the accumulator with X2
ADDS 0002
ACC = XlXO -:- OOYo ACC =X3 X 2
SUB 0003, lOR Subtract Yl from ACC with shift SubtraCt the Y2 CARRY from ACC
SUBB 0006
ACC = XlXO - Yl Yo ACC = X3X2 - 00Y2 - CARRY
SACL 0004 Store ACCL in dma 8004 ACC = X3X2 - Y3 Y2 - CARRY
SUB 0007, lOR
ACCL = Zo Store ACCL in dma 8010
SACL 0010
SACR 0005 Store ACCR in dma 8005 ACCL = Z2
ACCR == Zl Store ACCR in dma 8011
SACR 0011
For example ACCR = Z3
Inputs Outputs R: B
8000 - 7725 (Xo) 8004 - 2311 (Zo)
8001 - 894A (Xl) 8005 - 4422 (Zl)
8002 - 5414 (Yo)
8003 - 4528 (Yd
Digital Signal Processor 7.45
7.44 Digital Signal Processing
I

For example
Example 7.6
Inputs Outputs To write a program to calculate the value of the function
8000 - 894A (Xo) 8008 - 4422 (Zo)
8001 - 7725 (Xl) 8009 - 2311 (ZI> Y = A * Xl + B * X2 + C * X3
8002 - 6525 (X2) 8010 - 1223 (Z2)
8003 - BC 13 (X3) 8911 ~'2411 (Z3) (Anna University, November/December, 2006)
8004 - 4528 (Yo)
8005 - 5414 (Yd Solution
8006 - 5302 (Y2) ~--- Program: The constants A, B, C, Xl, X2 andX3 are to be stored in dma from 8000
8007 - 9802 (Y3) to 8005 as shown below:
Data memory I
Address
(DMA) Data
Example 7.5 8000 A
8001 B
I To write a program for 32 bit integer multiplication. 8002 C
8003 Xl
Solution 8004 X2
8005 X3
Program:
#100H Load data page pointer
LDP #100H Load data page pointer LDP
LACL #0 Clear the accumulator
LACC #037AH,0 Load the ACC with multiplicand Load the T register with constant A; T = A
i.e., ACC = 0000037A LT 0000
Multiply the T register with Xl; T = AXI
SACL 0000,0 Store ACC in memory location 8000 MPY 0003 Loads the T register from data memory and adds the .
8000 = 037A LTA 0001
contents of T register with ACC and stores the result In
LACC #012EH,0 Load the ACC with multiplier acc~mulator ACC = AXI; T = B
i.e., ACC = 0000012E Multiply the T register withX2
SACL 0001,0 Store ACCL in memory location 8001 MPY 0004
T = BX2; ACC = AXI
8001 = 012E
0002 T = C; ACC = AXI + BX2
LT 0000 Load the product (TREG) register with content of LTA
0005 T = CX3; ACC = AXI + BX2
memory location 8000 MPY Adds the contents of T register with accumulator and stores
TREG = 037A APAC
the result in accumulator ACC = AXI + BX2 + CX3
MPY 0001 Multiply the content of product register (TREG) with Store the ACCL in dma 8006.
content of memory location 8001 and the result is SACL 0006,0
stored in TREG Output
PAC Mov~ the content from product register to ACC __ If A = 1,B = 2, C = 3,XI = 4,X2 = 5 and X3 = 6 gives 8006: 0020
ACe = 000419EC
SACL 0002,0 Store the content in ACCL memory location 8002 Note:
SACH 0003,0 Store the content of ACCH in memory location 8003 LTA: Instruction is equivalent to two instructions: LT and APAC.
H: B H LT: Loads the T register from data memory. .
APAC: Adds the content of T register with accumulator and stores the result m
Outputs
accumulator.
8002 19EC
8003 0004
Digital Signal Processor 7.47
7.46 Digital Signal Processing

Example 7.9
Example 7.7 To write a program for square waveform and sawtooth waveform generation.
To write a' program that explain the usage of RPT instruction.

Solution
Solution RPT instruction is an immediate addressing mode instruction. The
eight bit constant N specified by the RPT instruction is loaded into the repeat counter Square Waveform Generation
of'C50. This causes the instruction following RPTinstniction to be executedN + 1 .MMREGS
times. . . TEXT
In this example, the value 4 is added 5 times continuous and the result is stored START:
in data memory 8000. LDP #1204
LACC #OH Load the ACC with lower amplitude
Program: Store ACCL in dma
Loop: SACL 0
LDP #100H ACC = 0000
LACL #4 Load the lower accumulator with 4 RPT #OFFH Frequency of the square wave
SACL 0000,0 Store ACCL in dma 8000 OUT 0,04 Address for DAC
LACL #0 Clear the ACC CMPL Complements the ACC contents
RPT #4 Set repeat count is 4 ACC = FFFF
ADD 0000,0 4 is added withACC 5 times B loop Go to pma loop
ACC = 14 .END
SACL 0000,0 Store ACCL in dma 8000
H: B H Sawtooth Waveform Generation
.MMREGS
. TEXT
START:
Example 7.8 LDP #120H
#OH Load the accumulator with lower alnplitude
To find the two's complement of given number. LACC
SACL o Store ACCL in dma
OUT 0,04H Address for DAC
Solution CMPL instruction replaces the content of accumulator with its logical o Load the ACe with content in dma
Loop: LACe
inversion i.e., one's complement. By adding one with its complement number we 0,04H Address for DAC
OUT
will get the 2 's complement of a number. #05h Change this value for frequency
ADD
Program: / SACL o Store ACCL in dma
SUB #OFFFh Change upper amplitude
LDP #100H LEQ
BCND Loop
LACL #7 Load 7 toACC
B START
ACC = 0007
CMPL Complements the accumulator contents (l's complement) .END
ACC = FFF8
ADD #1 AddACC with 1 (2's complement)
ACC = FFF9
SACL 0000,0 Store ACCL in dma 8000
H: B H
7.48 Digital Signal Processing Digital Signal Processor 7.49

Output:
Example 7.10 Data nlemory location: 8200 - 0001
To. write a program for linear convolution 8201 - 0005
8202 - 0008
yen) = x(n) * hen). 8203 - 0008
8204 - 0007
8205 - 0003
Solution
Program:
.text Example 7.11
.mmregs To write a program for circular convolution
START:
LDP #0002H
LAR AR3, #0200H yen) starting
LAR AR4, #0007 NI + N2 - 1 (length of linear
convolution sequence) Solution
LAR ARI, #OIOOH x(n) data array
Loop: MAR *,ARI modify the auxiliary register Program:
LACC *+ .MMREGS . program initialisation
SACL 050H starting of the scope of multiplication .Text
LAR AR2, #0153H end of the array, to be multiplied LDP #100H
with hen) {I50 + NI - I} LACC OOOOH length of the input is given is 8000
MAR *,AR2 SUB #OOOIH
ZAP SACL OOOIH
RPT #0003 NI - 1 times so that NI times LAR ARb,lH
MACD OCI00H, *_ LAR AR2, #0010H
APAC .to accumulate the final product sample Loop3: LAR ARI #0060H give the inputs Xl (n) in ARI
MAR *, AR3 LAR AR3 #0050H give the inputs x2(n) inA R 3
SACL *+ LAR AR4,IH
MAR *,AR4 ZAP
BANZ LOP, *_ Loop: MAR #, AR 3 multiply xI(n) andx2(n) and
H:B H add the multiplication
LT *+, ARI output
Input: MPY *+
SPL 5H
x(n) data memory: 8100 - 0001
ADD 5H
8101 - 0003
MAR *,AR4
8102 - 0001 Loop, *-, AR2 outputs of correlation are stored
BANZ
8104 - 0003 inAR2
hen) data mem9ry: C100 - 0001 SACL *+
C101 - 0002 CALL ROTATE
C102.; 0001 Loop2: MAR *,ARo
BANZ LOOP3, *-
H:B H
750
. D"Igltal Signal
. Processing Digital Signal Processor 7.51

ROTATE:
Summary
LDP # I OOH rotate the values of X2 (n)
LACC OOOIH • DSP algorithms require extensive use of arithmetic operations such as multipli-
SUB #IH cations and additions and therefore the amount of data flow through the CPU
SACL 0002H is very high. Standard microprocessors do not possess the required hardware
LACC 0050H architecture and instruction set for the above purpose.
SACB • To overcome the limitations of the standard microprocessors digital signal pro-
LAR AR3 #8051H cessors are designed which include Harvard architecture, pipelining, dedicated
LAR ARs #8070H hardware such as fast hardware multiplier-accumulator and shifters, fast internal
LAR AR62H memories and DSP connected special instructions.
Loopl: MAR *, AR3 • In the latest generation DSP processors, new architectures such as Very Long
LACC *+,O,ARs Instruction Word (VLIW) and Static Super Scalar are used. These architec-
SACL *+,0, AR6 tures have multiple data paths and arithmetic units. Parallelismat the instruction
BANZ Loopl, *_ level enhances the performance of the processors.
LACB • Digital signal processor are classified as general purposes and special purpose
MAR *,ARs processors. The functioning of general purposes DSP processors is similar to
SACL *+ standard microprocessors except that they have specially designed architecture
LACC #8070H and instruction sets. On the other hand special purpose processors are used to
SAMM BMAR perform certain specific algorithms such as digital FIR filtering and for execution
LAR AR3, #0050H of application dependent operations. While special purpose processors are faster
MAR *, AR 3 in execution they are not as flexible as general purpose processors.
RPT OH
BLOD
RET
BMAR,*_
IShort Questions and Answers I
Input: 1. What are the classifications of digital signal processors?
The digital signal processors are classified into two categories. They are:
Xl (n): 8000 - 0004 (i) General purpose digital signal processors.
8050 - 0002 (ii) Special purpose digital signal processors.
8051 - 0001
2. What is meant by general purpose digital signal processors?
8052 - 0002
General purpose digital signal processors are basically high speed micropro-
8053 - 0001
cessors with architecture and instruction sets optimized for DSP operation.
x2(n): 8060-0001 Examples are:
8061 - 0002
(i) Fixed point processors such as TMS320C5x, TMS320C54x, ADSP-219x
8062 - 0003
and ADSP-21 9xx.
8063 - 0004 (U) Float point processors such as TMS320C3x, TMS320C67x and ADSP-
Output: 21xxxx
3. What is meant by special purpose digital signal processor?
X3(n): 8010 - OOOE Special· purpose digital signal processor consists of hardware, designed for
8011 - 0010 specific DSP algorithm such as FFT and designed for specific DSP applications
8012 - OOOE such as PCM and filtering. Examples are:
8013 - 0010
(i) MT93001 - Mitel's multichannel telephony voice echo canceller.
(ii) P-DSP 16515A, TM-44 and TM-66 - FFT processor.
(iii) UPDSP 16256 and model 3092 - programmable FIR filter.
7.52 Digital Signal Processing
Digital Signal Processor 7.53

4. What are the factors that influence the selection of DSPs?


11. What arethe different busesofTMS320C5x and other functions?
- Architectural features. The TMS320C5x processor has four buses.
- Execution speed.
- Type of arithmetic. (i) Program b~s (PB): It carries the instruction code and immediates operands
- Word length. from program memory to the CPU.
(ii) Program address bus (PAB): It provides address to program memory space
5. What are the applications ofP-DSPs? for both read and wire.
(Anna University, November/De b 20 (iii) Data read bus (DB): It interconnects various elements of the CPU to data
Th~tPpIication ofP-D.SPs are, digital cell phone~, automated ins~:~ti~~ vo~( memory space.
mal , motor co?trol, vIdeo conferences, noise cancellation medical i ' . ~ (iv) Data read address bus (DAB): It provides address to access the data
speech syntheSIS, satellite communication etc. ' magmg, memory space.
6. AWdhat are the advantages and disadvantages of VLIW architecture?
. vantages: . 12. What are the addressing modes ofTMS320C5x?
The addressing modes in TMS320C5x are:
- Increased perfonnance.
- Better compiler targets. (i)Immediate addressing.
- Potentially easier to program. (ii)Indirect addressing.
- Potentially scalable. (iii)Register addressing.
- VLcanIWad~ more. execution units, allow more instructions to be packed into (iv) Memory mapped register addressing.
InstructIOn. (v) Direct addressing.
Disadvantages: (vi) Circular addressing mode.

- New kind of programmer/compiler complexity. 13. What is the advantage of Harvard architecture ofTMS320 series?
- Increased memory use. (Anna University, November/December, 2006)
- P~ogram must keep track of instruction scheduling. The Harvard architecture has two separate memories for their instruction and
- HIgh power consumption. ' data. It is capable of simultaneous reading an instruction code and reading or
- Misleading MIPs ratings. writing a memory or peripheral.
7. What is pipelining? 14. Differentiate between Von-Neumann and Harvard architectures.
In Von-Neumann architectures the CPU can be either reading an instruction or
P~pe1inin~ a ~rocessors me~ns breaking down its instructions into a series of reading/writing data from/to memory. Both cannot occur at the same time since
~~~~~~r:IPelme stages WhICh can be completed in sequence by specialized the instruction and data use the same signal pathways and memory whereas the
Harvard architecture has two memories for their instruction and data, requiring
8. What are the different stages in pipelining and explain? dedicated buses for each of them.
(i) Fetch phase: Next instruction is fetched from the address stored in th 15. State the merit- and demerit of multi-ported memories?
program counter. e (Anna University, May/June, 2007)
(ii) ~~ode phase: Insu:uction in the instruction register is decoded and the Merit: It increases the number of accesses/clock period. For example, in dual
. . . a ress m the program counter is incremented. ported memory, program and data memory can be accessed simultaneously.
Demerits: It requires larger number 9f pins and larger chip area and it is more
(lll) dMetmtorythredad phase: Reads the data from the data buses and also writes
a a 0 e ata buses. expensive.
(iv) Ex~cute phase: Executes the instruction currently in the instruction 16. List the various registers used withARAV.
regIster and also completes the write process. - Eight auxiliary n~gisters (ARo-AR7).
- Auxiliary register pointer (ARP).
9. What is pipeline depth?
- Unsigned 16 bitALU.
The n~mber of pipeline stages is referred to as the pipeline depth. 17. What ~re the elements of control processing units of TMS320C5x?
10. What IS the pipeline depth ofTMS320C50 and TMS320C54 ? - Central arithmetic logic unit (CALU).
TMS320C50 - 4 and TMS320C54x _ 6. x. - Parallel logic unit (PLU).
7.54 Digital Signal Processing
Digital Signal Processor 7.55
- Auxiliary register arithmetic unit (ARAU).
- .Memory mapped registers. 12. Write short notes on:
- Program controllef. (i) 32 bit accumulator.
18. What is the function of parallel logic unit? (ii) 16 x 16 bit parallel multiplier.
The parallel logic unit is a second logic unit, that execute logic operation on (iii) Shifter.
data without affecting the contents of accumulator.
13. Give the key features of the digital signal processor.
19. What are the arithmetic instructions of 'C5x?
ADD, ADDB, ADDC, SUB, SUBB, MPY, MPYU 14. Explain the memory mapped addressing mode used in P-DSPs.
20. What are the logical instructions of 'C5x? 15. What are the different ways in which the auxiliary register pointer can be
updated in 5x?
AND, ANDB, OR, ORB, XOR, XORB
21. What are load/store instruction? 16. Explain the immediate addressing mode of 'C5x with examples.
LACB, LACC, LACL, LAMM, LAR, SACB, SACH, SACL, SAR, SAMM 17. Explain the arithmetic instruction of 'C5x.
22. What are the shift instructions? 18. Write short notes on:
ROR,ROL,ROLB,RORB,BSAR (i) Barrel shifter.
(ii) Exponent encoder.

ILong Answer Type Questions I (iii) ,Compare Select and Store Unit (CSSU).

1. Explain how Harvard architecture as used by the TMS320 family differs from
the strict Harvard architecture. Compare this with the architecture of a standard
Von-Neumann processor.
2. A multiplier-accumulator, with three pipe stages is required for a digital signal
processor. Sketch a block diagram of a suitable configuration for the MAC.
With the aid of a timing diagram explain how the MAC works.
3. In relation to DSP processor, explain SIMD and VLIW techniques. In each
case, clearly point out the advantages and disadvantages of the technique in
signal processing.
4. Explain the operation of CSSU and TMS320C54x and explain its use
considering the veterbi operator.
5. Explain what is meant by instruction pipelining. Explain with an example, how
pipelining increases the throughput efficiency.
6. Explain the operation ofTDM serial ports in P-DSPs.
7. With a suitable diagram describe the functions of multiplierladder unit of
TMS320C54x.
8. Explain the function of auxiliary registers in the indirect addressing mode to
point the data memory location.
9. Write a program to use the auxiliary register in the memory pointing and
looping.
10. Write a program to compute the following equation:

11. Write a program to perform addition of two 64 bit numbers.

You might also like