0% found this document useful (0 votes)

2K views45 pages

DSP Notes Unit1 and 2

The document discusses the architectural features of programmable DSP processors including basic features like registers, memories and instructions. It describes the key computational blocks like multipliers, barrel shifters and multiply-accumulate units. It also discusses design aspects like speed, precision and preventing overflows.

Uploaded by

Hemapriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views45 pages

DSP Notes Unit1 and 2

Uploaded by

Hemapriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

CEC337 – DSP ARCHETECTURE AND PROGRAMMING

UNIT-1

Architectures for Programmable DSP Processors

2.1 Basic Architectural Features

A programmable DSP device should provide instructions similar to a conventional
microprocessor. The instruction set of a typical DSP device should include the following,
a. Arithmetic operations such as ADD, SUBTRACT, MULTIPLY etc
b. Logical operations such as AND, OR, NOT, XOR etc
c. Multiply and Accumulate (MAC) operation
d. Signal scaling operation
In addition to the above provisions, the architecture should also include,
a. On chip registers to store immediate results
b. On chip memories to store signal samples (RAM)
c. On chip memories to store filter coefficients (ROM)

2.2 DSP Computational Building Blocks

Each computational block of the DSP should be optimized for functionality and speed and in the
meanwhile the design should be sufficiently general so that it can be easily integrated with other blocks
to implement overall DSP systems.

2.2.1 Multipliers
The advent of single chip multipliers paved the way for implementing DSP functions on a VLSI
chip. Parallel multipliers replaced the traditional shift and add multipliers now days. Parallel multipliers
take a single processor cycle to fetch and execute the instruction and to store the result. Theyare also
called as Array multipliers. The key features to be considered for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed

The number of bits used to represent the operands decides the accuracy and the dynamic range
of the multiplier. Whereas speed is decided by the architecture employed. If the multipliers are
implemented using hardware, the speed of execution will be very high but the circuit complexity will
also increases considerably. Thus there should be a tradeoff between the speed of execution and the
circuit complexity. Hence the choice of the architecture normally depends on the application.

2.2.2 Parallel Multipliers

Consider the multiplication of two unsigned numbers A and B. Let A be represented using m
bits as (Am-1 Am-2 …….. A1 A0) and B be represented using n bits as (Bn-1 Bn-2 ...........B1 B0). Then
the product of these two numbers is given by,
This operation can be implemented paralleling using Braun multiplier whose hardware structure is as
shown in the figure 2.1.

Fig 2.1 Braun Multiplier for a 4X4 Multiplication

2.2.3 Multipliers for Signed Numbers

In the Braun multiplier the sign of the numbers are not considered into account. In order to
implement a multiplier for signed numbers, additional hardware is required to modify the Braun
multiplier. The modified multiplier is called as Baugh-Wooley multiplier.

Consider two signed numbers A and B,

2.2.4 Speed
Conventional Shift and Add technique of multiplication requires n cycles to perform the
multiplication of two n bit numbers. Whereas in parallel multipliers the time required will be the longest
path delay in the combinational circuit used. As DSP applications generally require very high speed, it
is desirable to have multipliers operating at the highest possible speed by having parallel
implementation.

2.2.5 Bus Widths

Consider the multiplication of two n bit numbers X and Y. The product Z can be at most 2n bits
long. In order to perform the whole operation in a single execution cycle, we require two buses of width
n bits each to fetch the operands X and Y and a bus of width 2n bits to store the result Z to the memory.
Although this performs the operation faster, it is not an efficient way of implementation as it is
expensive. Many alternatives for the above method have been proposed. One such method is to use the
program bus itself to fetch one of the operands after fetching the instruction, thus requiring only one bus
to fetch the operands. And the result Z can be stored back to the memory using the same operand bus.
But the problem with this is the result Z is 2n bits long whereas the operand bus is just n bits long. We
have two alternatives to solve this problem, a. Use the n bits operand bus and save Z at two successive
memory locations. Although it stores the exact value of Z in the memory, it takes two cycles to store the
result.
b. Discard the lower n bits of the result Z and store only the higher order n bits into the memory. It is
not applicable for the applications where accurate result is required. Another alternative can be used for
the applications where speed is not a major concern. In which latches are used for inputs and outputsthus
requiring a single bus to fetch the operands and to store the result (Fig 2.2).
Fig 2.2: A Multiplier with Input and Output Latches

2.2.6 Shifters

Shifters are used to either scale down or scale up operands or the results. The following scenarios
give the necessity of a shifter
a. While performing the addition of N numbers each of n bits long, the sum can grow up to n+log2 N
bits long. If the accumulator is of n bits long, then an overflow error will occur. This can be overcome
by using a shifter to scale down the operand by an amount of log2N.
b. Similarly while calculating the product of two n bit numbers, the product can grow up to 2n bits
long. Generally the lower n bits get neglected and the sign bit is shifted to save the sign of the product.
c. Finally in case of addition of two floating-point numbers, one of the operands has to be shifted
appropriately to make the exponents of two numbers equal.
From the above cases it is clear that, a shifter is required in the architecture of a DSP.

2.2.7 Barrel Shifters

In conventional microprocessors, normal shift registers are used for shift operation. As it requires
one clock cycle for each shift, it is not desirable for DSP applications, which generally involves more
shifts. In other words, for DSP applications as speed is the crucial issue, several shifts are to be
accomplished in a single execution cycle. This can be accomplished using a barrel shifter, which
connects the input lines representing a word to a group of output lines with the required shifts determined
by its control inputs. For an input of length n, log2 n control lines are required. And an dditional control
line is required to indicate the direction of the shift.
The block diagram of a typical barrel shifter is as shown in figure 2.3.

Fig 2.3 A Barrel Shifter

Fig 2.4 Implementation of a 4 bit Shift Right Barrel Shifter

Figure 2.4 depicts the implementation of a 4 bit shift right barrel shifter. Shift to right by 0, 1, 2 or 3
bit positions can be controlled by setting the control inputs appropriately.

2.3 Multiply and Accumulate Unit

Most of the DSP applications require the computation of the sum of the products of a series of
successive multiplications. In order to implement such functions a special unit called a multiply and
Accumulate (MAC) unit is required. A MAC consists of a multiplier and a special register called
Accumulator. MACs are used to implement the functions of the type A+BC. A typical MAC unit is as
shown in the figure 2.5.
Fig 2.5 A MAC Unit

Although addition and multiplication are two different operations, they can be performed in parallel. By
the time the multiplier is computing the product, accumulator can accumulate the product of the previous
multiplications. Thus if N products are to be accumulated, N-1 multiplications can overlap with N-1
additions. During the very first multiplication, accumulator will be idle and during the last accumulation,
multiplier will be idle. Thus N+1 clock cycles are required to compute the sum of N products.

2.3.1 Overflow and Underflow

While designing a MAC unit, attention has to be paid to the word sizes encountered at the input
of the multiplier and the sizes of the add/subtract unit and the accumulator, as there is a possibilityof
overflow and underflows. Overflow/underflow can be avoided by using any of the following methodsviz
a. Using shifters at the input and the output of the MAC
b. Providing guard bits in the accumulator
c. Using saturation logic

Shifters
Shifters can be provided at the input of the MAC to normalize the data and at the output to de
normalize the same.

Guard bits
As the normalization process does not yield accurate result, it is not desirable for some
applications. In such cases we have another alternative by providing additional bits called guard bits in
the accumulator so that there will not be any overflow error. Here the add/subtract unit also has to be
modified appropriately to manage the additional bits of the accumulator.

Saturation Logic
Overflow/ underflow will occur if the result goes beyond the most positive number or below the
least negative number the accumulator can handle. Thus the overflow/underflow error can be resolved
by loading the accumulator with the most positive number which it can handle at the time of overflow
and the least negative number that it can handle at the time of underflow. This method is called as
saturation logic. A schematic diagram of saturation logic is as shown in figure 2.7. In saturation logic, as
soon as an overflow or underflow condition is satisfied the accumulator will be loaded with the most
positive or least negative number overriding the result computed by the MAC unit.

Fig 2.7: Schematic Diagram of the Saturation Logic

2.4 Arithmetic and Logic Unit

A typical DSP device should be capable of handling arithmetic instructions like ADD, SUB,
INC, DEC etc and logical operations like AND, OR , NOT, XOR etc. The block diagram of a typical
ALU for a DSP is as shown in the figure 2.8.
It consists of status flag register, register file and multiplexers.
Fig 2.8 Arithmetic Logic Unit of a DSP

Status Flags
ALU includes circuitry to generate status flags after arithmetic and logic operations. These flags
include sign, zero, carry and overflow.

Overflow Management
Depending on the status of overflow and sign flags, the saturation logic can be used to limit the
accumulator content.

Register File
Instead of moving data in and out of the memory during the operation, for better speed, a large set of
general purpose registers are provided to store the intermediate results.

2.5 Bus Architecture and Memory

Conventional microprocessors use Von Neumann architecture for memory management wherein
the same memory is used to store both the program and data (Fig 2.9). Although thisarchitecture is
simple, it takes more number of processor cycles for the execution of a single instruction as the same
bus is used for both data and program.
Fig 2.9 Von Neumann Architecture

In order to increase the speed of operation, separate memories were used to store program and
data and a separate set of data and address buses have been given to both memories, the architecture
called as Harvard Architecture. It is as shown in figure 2.10.

Fig 2.10 Harvard Architecture

Although the usage of separate memories for data and the instruction speeds up the processing,
it will not completely solve the problem. As many of the DSP instructions require more than one
operand, use of a single data memory leads to the fetch the operands one after the other, thusincreasing
the delay of processing. This problem can be overcome by using two separate data memories for storing
operands separately, thus in a single clock cycle both the operands can be fetchedtogether (Figure 2.11).
Fig 2.11 Harvard Architecture with Dual Data Memory

Although the above architecture improves the speed of operation, it requires more hardware and
interconnections, thus increasing the cost and complexity of the system. Therefore there should be a
trade off between the cost and speed while selecting memory architecture for a DSP.

2.5.1 On-chip Memories

In order to have a faster execution of the DSP functions, it is desirable to have some memory
located on chip. As dedicated buses are used to access the memory, on chip memories are faster. Speed
and size are the two key parameters to be considered with respect to the on-chip memories.
Speed
On-chip memories should match the speeds of the ALU operations in order to maintain the single cycle
instruction execution of the DSP.
Size
In a given area of the DSP chip, it is desirable to implement as many DSP functions as possible. Thus
the area occupied by the on-chip memory should be minimum so that there will be a scope for
implementing more number of DSP functions on- chip.

2.5.2 Organization of On-chip Memories

Ideally whole memory required for the implementation of any DSP algorithm has to reside on-
chip so that the whole processing can be completed in a single execution cycle. Although it looks as a
better solution, it consumes more space on chip, reducing the scope for implementing any functional
block on-chip, which in turn reduces the speed of execution. Hence some other alternatives have to be
thought of. The following are some other ways in which the on-chip memory can be organized.
a. As many DSP algorithms require instructions to be executed repeatedly, the instruction can
bestored in the external memory, once it is fetched can reside in the instruction cache.
b. The access times for memories on-chip should be sufficiently small so that it can be accessed
morethan once in every execution cycle.
c. On-chip memories can be configured dynamically so that they can serve different purpose
atdifferent times.

2.6 Data Addressing Capabilities

Data accessing capability of a programmable DSP device is configured by means of its

addressing modes. The summary of the addressing modes used in DSP is as shown in the table below.

2.6.1 Immediate Addressing Mode

In this addressing mode, data is included in the instruction itself.

2.6.2 Register Addressing Mode

In this mode, one of the registers will be holding the data and the register has to be specified in
the instruction.

2.6.3 Direct Addressing Mode

In this addressing mode, instruction holds the memory location of the operand.

2.6.4 Indirect Addressing Mode

In this addressing mode, the operand is accessed using a pointer. A pointer is generally a
register, which holds the address of the location where the operands resides. Indirect addressing mode
can be extended to inculcate automatic increment or decrement capabilities, which has lead to the
following addressing modes.
2.7 Special Addressing Modes
For the implementation of some real time applications in DSP, normal addressing modes will
not completely serve the purpose. Thus some special addressing modes are required for such
applications.

2.7.1 Circular Addressing Mode

While processing the data samples coming continuously in a sequential manner, circularbuffers
are used. In a circular buffer the data samples are stored sequentially from the initial location till the
buffer gets filled up. Once the buffer gets filled up, the next data samples will get stored once again from
the initial location. This process can go forever as long as the data samples are processed ina rate faster
than the incoming data rate.
Circular Addressing mode requires three registers viz
a. Pointer register to hold the current location (PNTR)
b. Start Address Register to hold the starting address of the buffer (SAR)
c. End Address Register to hold the ending address of the buffer (EAR)

There are four special cases in this addressing mode. They are
a. SAR < EAR & updated PNTR > EAR
b. SAR < EAR & updated PNTR < SAR
c. SAR >EAR & updated PNTR > SAR
d. SAR > EAR & updated PNTR < EAR
The buffer length in the first two case will be (EAR-SAR+1) whereas for the next tow cases (SAR-
EAR+1)
The pointer updating algorithm
Fig 2.12 Special Cases in Circular Addressing Mode

2.7.2 Bit Reversed Addressing Mode

To implement FFT algorithms we need to access the data in a bit reversed manner. Hence a
special addressing mode called bit reversed addressing mode is used to calculate the index of the next
data to be fetched. It works as follows. Start with index 0. The present index can be calculated by adding
half the FFT length to the previous index in a bit reversed manner, carry being propagated fromMSB to
LSB.
Current index= Previous index+ B (1/2(FFT Size))

2.8 Address Generation Unit

The main job of the Address Generation Unit is to generate the address of the operands required
to carry out the operation. They have to work fast in order to satisfy the timing constraints. As the address
generation unit has to perform some mathematical operations in order to calculate the operand address,
it is provided with a separate ALU.
Address generation typically involves one of the following operations.
a. Getting value from immediate operand, register or a memory location
b. Incrementing/ decrementing the current address
c. Adding/subtracting the offset from the current address
d. Adding/subtracting the offset from the current address and generating new address according
tocircular addressing mode
e. Generating new address using bit reversed addressing mode

The block diagram of a typical address generation unit is as shown in figure 2.13.

Fig 2.13 Address generation unit

2.9 Programmability and program Execution
A programmable DSP device should provide the programming capability involving branching,
looping and subroutines. The implementation of repeat capability should be hardware based so that it
can be programmed with minimal or zero overhead. A dedicated register can be used as a counter. In a
normal subroutine call, return address has to be stored in a stack thus requiring memory access for storing
and retrieving the return address, which in turn reduces the speed of operation. Hence a LIFO memory
can be directly interfaced with the program counter.
2.9.1 Program Control
Like microprocessors, DSP also requires a control unit to provide necessary control and timing
signals for the proper execution of the instructions. In microprocessors, the controlling is micro coded
based where each instruction is divided into microinstructions stored in micro memory. As this
mechanism is slower, it is not applicable for DSP applications. Hence in DSP the controlling is
hardwired base where the Control unit is designed as a single, comprehensive, hardware unit. Although
it is more complex it is faster.

2.9.2 Program Sequencer

It is a part of the control unit used to generate instruction addresses in sequence needed to access
instructions. It calculates the address of the next instruction to be fetched. The next address can be from
one of the following sources.
a. Program Counter
b. Instruction register in case of branching, looping and subroutine calls
c. Interrupt Vector table
d. Stack which holds the return address
The block diagram of a program sequencer is as shown in figure 2.14.

Fig 2.14 Program Sequencer

Program sequencer should have the following circuitry:
a. PC has to be updated after every fetch
b. Counter to hold count in case of looping
c. A logic block to check conditions for conditional jump instructions
d. Condition logic-status flag
Problems:
1). Investigate the basic features that should be provided in the DSP architecture to be used
toimplement the following Nth order FIR filter.

Solution:-
y(n)= ∑h(i) x(n-i) n=0,1,2…
In order to implement the above operation in a DSP, the architecture requires the
following features

i. A RAM to store the signal samples x (n)

ii. A ROM to store the filter coefficients h (n)
iii. An MAC unit to perform Multiply and Accumulate operation
iv. An accumulator to store the result immediately
v. A signal pointer to point the signal sample in the memory
vi. A coefficient pointer to point the filter coefficient in the memory
vii. A counter to keep track of the count
viii. A shifter to shift the input samples appropriately

2). It is required to find the sum of 64, 16 bit numbers. How many bits should
theaccumulator have so that the sum can be computed without the occurrence
of overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from occurring.

1. In the previous problem, it is decided to have an accumulator with only 16

bitsbut shift the numbers before the addition to prevent overflow, by how many bits
should each number be shifted?
As the length of the accumulator is fixed, the operands have to be shifted by an
amount of log2 64 = 6 bits prior to addition operation, in order to avoid the condition of
overflow.
2. If all the numbers in the previous problem are fixed point integers, what is
theactual sum of the numbers?
The actual sum can be obtained by shifting the result by 6 bits towards left side after the sum
being computed. Therefore
Actual Sum= Accumulator content X 2 6

3. If a sum of 256 products is to be computed using a pipelined MAC unit, and if the MAC
execution time of the unit is 100nsec, what will be the total time required to complete
theoperation?
As N=256 in this case, MAC unit requires N+1=257execution cycles. As the single MAC
execution time is 100nsec, the total time required will be, (257*100nsec)=25.7usec

4. Consider a MAC unit whose inputs are 16 bit numbers. If 256 products are to be
summed up in this MAC, how many guard bits should be provided for the
accumulator to prevent overflow condition from occurring?
As it is required to calculate the sum of 256, 16 bit numbers, the sum can be as
long as (16+ log2 256)=24 bits. Hence the accumulator should be capable of handling
these 22 bits. Thus the guard bits required will be (24-16)= 8 bits.
The block diagram of the modified MAC after considering the guard or extention bits is as shown in
the figure

5. What are the memory addresses of the operands in each of the following cases of indirect
addressing modes? In each case, what will be the content of the addreg after the memory
access? Assume that the initial contents of the addreg and the offsetreg are 0200h and
0010h,respectively.
a. ADD *addreg
b.ADD +*addreg
c. ADD offsetreg+,*addreg
d. ADD *addreg,offsetreg-

6. A DSP has a circular buffer with the start and the end addresses as 0200h and 020Fh
respectively. What would be the new values of the address pointer of the buffer if, in the
courseof address computation, it gets updated to

0212h
b. 01FCh
Buffer Length= (EAR-SAR+1) = 020F-0200+1=10h
a. New Address Pointer= Updated Pointer-buffer length = 0212-10=0202h
b. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch

7. Repeat the previous problem for SAR= 0210h and EAR=0201h

Buffer Length= (SAR-EAR+1)= 0210-0201+1=10h
c. New Address Pointer= Updated Pointer- buffer length = 0212-10=0202h
d. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch

9. Compute the indices for an 8-point FFT using Bit reversed Addressing Mode
Start with index 0. Therefore the first index would be (000)
Next index can be calculated by adding half the FFT length, in this case it is (100)
to the previous index. i.e. Present Index= (000)+B (100)= (100)
Similarly the next index can be calculated as
Present Index= (100)+B (100)= (010)
The process continues till all the indices are calculated. The following table summarizes
the calculation.
UNIT-2

TMS320C5X Programmable DSP Processors

3.1 Introduction:
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog devices &
Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed a
range of DSP chips with varied complexity.
The TMS320 family consists of two types of single chips DSPs: 16-bit fixed point &32-bit floating-
point. These DSPs possess the operational flexibility of high-speed controllers and the numerical
capability of array processors

3.2 Commercial Digital Signal-Processing Devices:

There are several families of commercial DSP devices. Right from the early eighties, when these
devices began to appear in the market, they have been used in numerous applications, such as
communication, control, computers, Instrumentation, and consumer electronics. The architectural
features and the processing power of these devices have been constantly upgraded based on the advances
in technology and the application needs. However, their basic versions, most of them have Harvard
architecture, a single-cycle hardware multiplier, an address generation unit with dedicated address
registers, special addressing modes, on-chip peripherals interfaces. Of the various families of
programmable DSP devices that are commercially available, the three most popular ones are those from
Texas Instruments, Motorola, and Analog Devices. Texas Instruments was one of the first to come out
with a commercial programmable DSP with the introduction of its TMS32010 in 1982.

Summary of the Architectural Features of three fixed-Points DSPs

3.3. The architecture of TMS320C54xx digital signal processors:

TMS320C54xx processors retain in the basic Harvard architecture of their predecessor,

TMS320C25, but have several additional features, which improve their performance over it. Figure 3.1
shows a functional block diagram of TMS320C54xx processors. They have one program and three data
memory spaces with separate buses, which provide simultaneous accesses to program instruction and
two data operands and enables writing of result at the same time. Part of the memory is implementedon-
chip and consists of combinations of ROM, dual-access RAM, and single-access RAM. Transfers
between the memory spaces are also possible.
The central processing unit (CPU) of TMS320C54xx processors consists of a 40- bit arithmetic
logic unit (ALU), two 40-bit accumulators, a barrel shifter, a 17x17 multiplier, a 40-bit adder, data
address generation logic (DAGEN) with its own arithmetic unit, and program address generation logic
(PAGEN). These major functional units are supported by a number of registers and logic in the
architecture. A powerful instruction set with a hardware-supported, single-instruction repeat and block
repeat operations, block memory move instructions, instructions that pack two or three simultaneous
reads, and arithmetic instructions with parallel store and load make these devices very efficient for
running high-speed DSP algorithms.
Several peripherals, such as a clock generator, a hardware timer, a wait state generator, parallel
I/O ports, and serial I/O ports, are also provided on-chip. These peripherals make it convenient to
interface the signal processors to the outside world. In these following sections, we examine in detail
the various architectural features of the TMS320C54xx family of processors.
Figure 3.1.Functional architecture for TMS320C54xx processors.

3.3.1 Bus Structure:

The performance of a processor gets enhanced with the provision of multiple buses to provide
simultaneous access to various parts of memory or peripherals. The 54xx architecture is built around
four pairs of 16-bit buses with each pair consisting of an address bus and a data bus. As shown in Figure
3.1, these are The program bus pair (PAB, PB); which carries the instruction code from the program
memory. Three data bus pairs (CAB, CB; DAB, DB; and EAB, EB); which interconnected the various
units within the CPU. In Addition the pair CAB, CB and DAB, DB are used to read from the data
memory, while The pair EAB, EB; carries the data to be written to the memory. The ‘54xxcan generate
up to two data-memory addresses per cycle using the two auxiliary register arithmeticunit (ARAU0 and
ARAU1) in the DAGEN block. This enables accessing two operands simultaneously.

3.3.2 Central Processing Unit (CPU):

The ‘54xx CPU is common to all the ‘54xx devices. The ’54xx CPU contains a 40-bitarithmetic
logic unit (ALU); two 40-bit accumulators (A and B); a barrel shifter; a
17 x 17-bit multiplier; a 40-bit adder; a compare, select and store unit (CSSU); an exponent
encoder(EXP); a data address generation unit (DAGEN); and a program address generation unit
(PAGEN).
The ALU performs 2’s complement arithmetic operations and bit-level Boolean operations on
16, 32, and 40-bit words. It can also function as two separate 16-bit ALUs
and perform two 16-bit operations simultaneously. Figure 3.2 show the functional diagram of the ALU
of the TMS320C54xx family of devices.

Accumulators A and B store the output from the ALU or the multiplier/adder block and provide a
second input to the ALU. Each accumulators is divided into three parts: guards bits (bits 39-32), high-
order word (bits-31-16), and low-order word (bits 15- 0), which can be stored and retrieved individually.
Each accumulator is memory-mapped and partitioned. It can be configured as the destination registers.
The guard bits are used as a head margin for computations.
Figure 3.2.Functional diagram of the central processing unit of the TMS320C54xxprocessors.
Barrel shifter: provides the capability to scale the data during an operand read or write.
No overhead is required to implement the shift needed for the scaling operations. The’54xx barrel shifter
can produce a left shift of 0 to 31 bits or a right shift of 0 to 16 bits on the input data. The shift count
field of status registers ST1, or in the temporary
register T. Figure 3.3 shows the functional diagram of the barrel shifter of TMS320C54xx processors.
The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle.
The LSBs of the output are filled with0s, and the MSBs can be either zero filled or sign extended,
depending on the state of the sign-extension mode bit in the status register ST1. An additional shift
capability enables the processor to perform numerical scaling, bit extraction, extended arithmetic, and
overflow prevention operations.
Figure 3.3.Functional diagram of the barrel shifter

Multiplier/adder unit: The kernel of the DSP device architecture is multiplier/adder unit. The
multiplier/adder unit of TMS320C54xx devices performs 17 x 17 2’s complement multiplication with
a 40-bit addition effectively in a single instruction cycle.
In addition to the multiplier and adder, the unit consists of control logic for integer and fractional
computations and a 16-bit temporary storage register, T. Figure 3.4 show the functional diagram of the
multiplier/adder unit of TMS320C54xx processors. The compare, select, and store unit (CSSU) is a
hardware unit specifically incorporated to accelerate the add/compare/select operation. This operation
is essential to implement the Viterbi algorithm used in many signal-processing applications. The
exponent encoder unit supports the EXP instructions, which stores in the T register the number of leading
redundant bits of the accumulator content. This information is useful while shifting the accumulator
content for the purpose of scaling.
Figure 3.4. Functional diagram of the multiplier/adder unit of TMS320C54xx processors.
3.3.3 Internal Memory and Memory-Mapped Registers:
The amount and the types of memory of a processor have direct relevance to the efficiency and
performance obtainable in implementations with the processors. The ‘54xx memory is organized into
three individually selectable spaces: program, data, and I/O spaces. All ‘54xx devices contain both RAM
and ROM. RAM can be either dual-access type (DARAM) or single-access type (SARAM). Theon-chip
RAM for these processors is organized in pages having 128 word locations on each page.
The ‘54xx processors have a number of CPU registers to support operand addressing and computations.
The CPU registers and peripherals registers are all located on page 0 of the data memory. Figure 3.5(a)
and (b) shows the internal CPU registers and peripheral registers with their addresses. The processors
mode status (PMST) registers
that is used to configure the processor. It is a memory-mapped register located at address 1Dh on page
0 of the RAM. A part of on-chip ROM may contain a boot loader and look-up tables for function such
as sine, cosine, μ- law, and A- law.

Figure 3.5(a) Internal memory-mapped registers of TMS320C54xx processors

Figure 3.5(b).peripheral registers for the TMS320C54xx processors

Status registers (ST0,ST1):

ST0: Contains the status of flags (OVA, OVB, C, TC) produced by arithmetic operations
& bit manipulations.
ST1: Contain the status of various conditions & modes. Bits of ST0&ST1registers can be set or clear
with the SSBX & RSBX instructions.
PMST: Contains memory-setup status & control information.
Figure 3.6(a). ST0 diagram

ARP: Auxiliary register pointer.

TC: Test/control flag.
C: Carry bit.
OVA: Overflow flag for accumulator A.
OVB: Overflow flag for accumulator B.
DP: Data-memory page pointer.

Figure 3.6(b). ST1 diagram

BRAF: Block repeat active flag
BRAF=0, the block repeat is deactivated.
BRAF=1, the block repeat is activated.

CPL: Compiler mode

CPL=0, the relative direct addressing mode using data page pointer is selected.
CPL=1, the relative direct addressing mode using stack pointer is selected.

HM: Hold mode, indicates whether the processor continues internal execution or acknowledge for
external interface.

INTM: Interrupt mode, it globally masks or enables all interrupts.

INTM=0_all unmasked interrupts are enabled.
INTM=1_all masked interrupts are disabled.
0: Always read as 0

OVM: Overflow mode.

OVM=1_the destination accumulator is set either the most positive value or the most negative value.
OVM=0_the overflowed result is in destination accumulator.
SXM: Sign extension mode.
SXM=0 _Sign extension is suppressed.

SXM=1_Data is sign extended

C16: Dual 16 bit/double-Precision arithmetic mode.

C16=0_ALU operates in double-Precision arithmetic mode.
C16=1_ALU operates in dual 16-bit arithmetic mode.

FRCT: Fractional mode.

FRCT=1_the multiplier output is left-shifted by 1bit to compensate an extra sign bit.

CMPT: Compatibility mode.

CMPT=0_ ARP is not updated in the indirect addressing mode.
CMPT=1_ARP is updated in the indirect addressing mode.

ASM: Accumulator Shift Mode.

5 bit field, & specifies the Shift value within -16 to 15 range.

Processor Mode Status Register (PMST):

INTR: Interrupt vector pointer, point to the 128-word program page where the interrupt vectors
reside.
MP/MC: Microprocessor/Microcomputer mode,
MP/MC=0, the on chip ROM is enabled.
MP/MC=1, the on chip ROM is enabled.

OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM blocks to be mapped into
program space.

AVIS: It enables/disables the internal program address to be visible at the address pins.
DROM: Data ROM, DROM enables on-chip ROM to be mapped into data space.
CLKOFF: CLOCKOUT off.

SMUL: Saturation on Multiplication

SST: Saturation on Store.

3.4 Data Addressing Modes of TMS320C54X Processors:

Data addressing modes provide various ways to access operands to execute instructions and place results
in the memory or the registers. The 54XX devices offer seven basic addressing modes
1. Immediate addressing.
2. Absolute addressing.
3. Accumulator addressing.
4. Direct addressing.
5. Indirect addressing.
6. Memory mapped addressing
7. Stack addressing.

3.4.1 Immediate addressing:

The instruction contains the specific value of the operand. The operand can be short (3,5,8 or 9
bit in length) or long (16 bits in length). The instruction syntax for short operands occupies one memory
location,
Example: LD #20, DP.
RPT #0FFFFh.

3.4.2 Absolute Addressing:

The instruction contains a specified address in the operand.
1. Dmad addressing. MVDK Smem,dmad, MVDM dmad,MMR
2. Pmad addressing. MVDP Smem,pmad, MVPD pmem,Smad
3. PA addressing. PORTR PA, Smem,
4.*(lk) addressing .

3.4.3 Accumulator Addressing:

Accumulator content is used as address to transfer data between Program and Data memory.
Ex: READA *AR2

3.4.4 Direct Addressing:

Base address + 7 bits of value contained in instruction = 16 bit address. A page of 128 locations
can be accessed without change in DP or SP.Compiler mode bit (CPL) in ST1 register is used.

If CPL = 0 Selects DP
CPL = 1 selects SP,
It should be remembered that when SP is used instead of DP, the effective address iscomputed by adding the 7-bit offset to
SP
Figure 3.7 Block diagram of the direct addressing mode for TMS320C54xx Processors.
3.4.1Indirect Addressing:

TMS320C54xx have 8, 16 bit auxiliary register (AR0 – AR 7). Two auxiliary register arithmetic units
(ARAU0 & ARAU1)
Used to access memory location in fixed step size. AR0 register is used for indexed and bit reverse
addressing modes.
– operand addressing
MOD _ type of indirect addressing
ARF _ AR used for addressing
ARP depends on (CMPT) bit in ST1
CMPT = 0, Standard mode, ARP set to zero
CMPT = 1, Compatibility mode, Particularly AR selected by ARP
Table 3.2 Indirect addressing options with a single data –memory operand.
Circular Addressing;
 Used in convolution, correlation and FIR filters.
 A circular buffer is a sliding window contains most recent data. Circular buffer of size R must
start on a N-bit boundary, where 2N > R .

 Effective base address (EFB): By zeroing the N LSBs of a user selected AR (ARx).

If 0 _ index + step < BK ; index = index +step;
else if index + step _ BK ; index = index + step - BK;
else if index + step < 0; index + step + BK
Bit-Reversed Addressing:
o Used for FFT algorithms.
o AR0 specifies one half of the size of the FFT.
o The value of AR0 = 2N-1: N = integer FFT size = 2N
o AR0 + AR (selected register) = bit reverse addressing.
o The carry bit propagating from left to right.

Dual-Operand Addressing:
Dual data-memory operand addressing is used for instruction that simultaneously
perform two reads (32-bit read) or a single read (16-bit read) and a parallel store (16-bit
store) indicated by two vertical bars, II. These instructions access operands using indirect addressing
mode.
If in an instruction with a parallel store the source operand the destination operand point to the
same location, the source is read before writing to the destination. Only 2 bits are available in the
instruction code for selecting each auxiliary register in this mode. Thus, just four of the auxiliary
registers, AR2-AR5, can be used, The ARAUs together with these registers, provide capability to access
two operands in a single cycle. Figure 3.11 shows how an address is generated using dual data- memory
operand addressing.
3.4.6. Memory-Mapped Register Addressing:
 Used to modify the memory-mapped registers without affecting the current data page
 pointer (DP) or stack-pointer (SP)
o Overhead for writing to a register is minimal
o Works for direct and indirect addressing
o Scratch –pad RAM located on data PAGE0 can be modified
 STM #x, DIRECT
 STM #tbl, AR1

3.4.7 Stack Addressing:

• Used to automatically store the program counter during interrupts and subroutines.
• Can be used to store additional items of context or to pass data values.
• Uses a 16-bit memory-mapped register, the stack pointer (SP).
• PSHD X2
3.5. Memory Space of TMS320C54xx Processors
 A total of 128k words extendable up to 8192k words.
 Total memory includes RAM, ROM, EPROM, EEPROM or Memory mapped peripherals.
 mapped
registers.
Figure 3.14 Memory map for the TMS320C5416 Processor.
3.6. Program Control
 It contains program counter (PC), the program counter related H/W, hard stack, repeat
counters &status registers.
 PC addresses memory in several ways namely:
 Branch: The PC is loaded with the immediate value following the branch instruction
 Subroutine call: The PC is loaded with the immediate value following the call instruction
 Interrupt: The PC is loaded with the address of the appropriate interrupt vector.
 Instructions such as BACC, CALA, etc ;The PC is loaded with the contents of the accumulator
low word
 End of a block repeat loop: The PC is loaded with the contents of the block repeat
programaddress start register.
 Return: The PC is loaded from the top of the stack.
Problems:

1. Assuming the current content of AR3 to be 200h, what will be its contents
aftereach of the following TMS320C54xx addressing modes is used? Assume that the
contents of AR0 are 20h.
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3
e. *AR3
f. *+AR3 (40h)
g. *+AR3 (-40h)
Solution:
a. AR3 ← AR3 + AR0; AR3
= 200h + 20h = 220h
b. AR3← AR3 - AR0;
AR3 = 200h - 20h = 1E0h
c. AR3 ← AR3 + 1; AR3
= 200h + 1 = 201h
d. AR3 ← AR3 - 1; AR3
= 200h - 1 = 1FFh
e. AR3 is not modified.
AR3 = 200h
f. AR3 ← AR3 + 40h; AR3
= 200 + 40h = 240h
g. AR3 ← AR3 - 40h; AR3
= 200 - 40h = 1C0h
2. Assuming the current contents of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that
the contents of AR0 are20h
a. *AR3 + 0B
b. *AR3 – 0B
Solution:
a. AR3 ← AR3 + AR0 with reverse
carry propagation; AR3 = 200h + 20h
(with reverse carry propagation) =
220h.
b. AR3 ← AR3 - AR0 with reverse
carry propagation; AR3 = 200h - 20h
(with reverse carry propagation) =
23Fh.

3.7 On chip peripherals:

It facilitates interfacing with external devices. The peripherals are:
 General purpose I/O pins
 A software programmable wait state generator.
 Hardware timer
 Host port interface (HPI)
 Clock generator
 Serial port

3.7.1 It has two general purpose I/O pins:

 BIO-input pin used to monitor the status of external devices.

 XF- output pin, software controlled used to signal external devices

3.7.2 Software programmable wait state generator:

 Extends external bus cycles up to seven machine cycles.

3.7.3 Hardware Timer




of 3 memory mapped registers:
 The timer register (TIM)
 Timer period register (PRD)
 Timer controls register (TCR)
• Pre scaler block (PSC).
• TDDR (Time Divide Down ratio)
• TIN &TOUT

The timer register (TIM) is a 16-bit memory-mapped register that decrements at

every pulse from the prescaler block (PSC).
The timer period register (PRD) is a 16-bit memory-mapped register whose
contents are loaded onto the TIM whenever the TIM decrements to zero or the
device is reset (SRESET).
The timer can also be independently reset using the TRB signal. The timer
control register (TCR) is a 16-bit memory-mapped register that contains status and
control bits. Table shows the functions of the various bits in the TCR.
The prescaler block is also an on-chip counter. Whenever the prescaler bits
count down to 0, a clock pulse is given to the TIM register that decrements the
TIM register by 1. The TDDR bits contain the divide-down ratio, which is loaded
onto the prescaler block after each time the prescaler bits count down to 0.
That is to say that the 4-bit value of TDDR determines the divide-by ratio
of the timer clock with respect to the system clock. In other words, the TIM
decrements either at the rate of the system clock or at a rate slower than that as
decided by the value of the TDDR bits. TOUT and TINT are the output signal
generated as the TIM register decrements to 0. TOUT can trigger the start of the
conversion signal in an ADC interfaced to the DSP.
The sampling frequency of the ADC determines how frequently it receives
the TOUT signal. TINT is used to generate interrupts, which are required to service
a peripheral such as a DRAM controller periodically. The timer can also be
stopped, restarted, reset, or disabled by specific status bits.
3.8 Interrupts of TMS320C54xx Processors:
Many times, when CPU is in the midst of executing a program, a peripheral
device may requirea service from the CPU. In such a situation, the main program
may be interrupted by a signal generated by the peripheral devices. This results in
the processor suspending the main program in order to execute another program,
called interrupt service routine, to service the peripheral device. On completion of
the interrupt service routine, the processor returns to the main program to continue
fromwhere it left.
Interrupt may be generated either by an internal or an external device. It may also
be generated by software. Not all interrupts are serviced when they occur. Only
those interrupts that are called nonmaskable are serviced whenever they occur.
Other interrupts, which are called maskable interrupts,are serviced only if they are
enabled. There is also a priority to determine which interrupt gets servicedfirst if
more than one interrupts occur simultaneously.
Almost all the devices of TMS320C54xx family have 32 interrupts. However, the
types and the number under each type vary from device to device. Some of these
interrupts are reserved for use by the CPU.

3.9 Pipeline operation of TMS320C54xx Processors:

The CPU of ‘54xx devices have a six-level-deep instruction pipeline. The
six stages of the pipeline are independent of each other. This allows overlapping
execution of instructions. During any given cycle, up to six different instructions
can be active, each at a different stage of processing. The six levels of the pipeline
structure are program prefetch, program fetch, decode, access, read and execute.
1 During program prefetch, the program address bus, PAB, is loaded with the
address of the next instruction to be fetched.
2 In the fetch phase, an instruction word is fetched from the program bus, PB,
and loaded into the instruction register, IR. These two phases from the instruction
fetch sequence.
3 During the decode stage, the contents of the instruction register, IR are
decoded to determine thetype of memory access operation and the control
signals required for the data-address generation unit and the CPU.
4 The access phase outputs the read operand’s on the data address bus, DAB. If
a second operand is required, the other data address bus, CAB, also loaded with
an appropriate address. Auxiliary registers in indirect addressing mode and the
stack pointer (SP) are also updated.
5 In the read phase the data operand(s), if any, are read from the data buses, DB
and CB. This phase completes the two-phase read process and starts the two
phase write processes. The data address of thewrite operand, if any, is loaded
into the data write address bus, EAB.
6 The execute phase writes the data using the data write bus, EB, and completes
the operand write sequence. The instruction is executed in this phase.

DSP Module 5 2018 Scheme
100% (1)
DSP Module 5 2018 Scheme
104 pages
BEC515D
100% (1)
BEC515D
18 pages
CBM-341-BAN-2 Marks
100% (1)
CBM-341-BAN-2 Marks
16 pages
Ec3561 - Vlsi Laboratory
No ratings yet
Ec3561 - Vlsi Laboratory
69 pages
22scheme - VLSI Lab Manual
No ratings yet
22scheme - VLSI Lab Manual
101 pages
Stihl MS 460 Chainsaw Service Manual PDF
No ratings yet
Stihl MS 460 Chainsaw Service Manual PDF
94 pages
Cec355 - SDR Model QN Set Ii
100% (1)
Cec355 - SDR Model QN Set Ii
2 pages
Ec 3501 Wirelss Communication Lab Manual
No ratings yet
Ec 3501 Wirelss Communication Lab Manual
34 pages
PSG ECE Syllabus
No ratings yet
PSG ECE Syllabus
81 pages
21EC52 - ARM Microcontrollers LAB - Programs
100% (1)
21EC52 - ARM Microcontrollers LAB - Programs
30 pages
EC 3501 Wireless Communication
100% (2)
EC 3501 Wireless Communication
20 pages
EC3451-All Unit 2 Marks With Answer
100% (1)
EC3451-All Unit 2 Marks With Answer
21 pages
Radar Systems Imp Questions
100% (2)
Radar Systems Imp Questions
3 pages
Co Po Mapping - R17
100% (2)
Co Po Mapping - R17
27 pages
Wireless Communication University Question Papers
No ratings yet
Wireless Communication University Question Papers
27 pages
Aficio MPC 306 406
No ratings yet
Aficio MPC 306 406
189 pages
1.1 - CEC342 - Common Types of Analog and Mixed - Signal Circuits - Applications of Mixed-Signal Circuits
No ratings yet
1.1 - CEC342 - Common Types of Analog and Mixed - Signal Circuits - Applications of Mixed-Signal Circuits
6 pages
Embedded Firmware Design Approaches and Development Languages Class Notes
No ratings yet
Embedded Firmware Design Approaches and Development Languages Class Notes
10 pages
Cec334 Analog Ic Design Syllabus
No ratings yet
Cec334 Analog Ic Design Syllabus
2 pages
Ec3551 TLRF Unit-1 QB
100% (3)
Ec3551 TLRF Unit-1 QB
3 pages
DSP Arch
No ratings yet
DSP Arch
10 pages
Pencil Grip
No ratings yet
Pencil Grip
8 pages
EC3401 - Network Security - QB
100% (1)
EC3401 - Network Security - QB
19 pages
R20 - VLSI Lab Manual
No ratings yet
R20 - VLSI Lab Manual
64 pages
Cec365 WSN Question Bank
No ratings yet
Cec365 WSN Question Bank
10 pages
ESIOT LAB Mannual - Cse
100% (1)
ESIOT LAB Mannual - Cse
59 pages
Liquefaction and Settlement Analysis 1
67% (3)
Liquefaction and Settlement Analysis 1
54 pages
MC Notes Complete
No ratings yet
MC Notes Complete
137 pages
VTU Model Question Papers VI Sem ECE - TCE
No ratings yet
VTU Model Question Papers VI Sem ECE - TCE
44 pages
Factors To Consider When Buying A Personal Computer
100% (1)
Factors To Consider When Buying A Personal Computer
5 pages
LIC - Lab Record - 2023-24
100% (1)
LIC - Lab Record - 2023-24
74 pages
Module2 Notes
No ratings yet
Module2 Notes
35 pages
Satellite Communications Question Bank SUB CODE: 17EC755/18EC732
100% (1)
Satellite Communications Question Bank SUB CODE: 17EC755/18EC732
5 pages
EC3492-DIGITAL SIGNAL LABORATORY Manual
100% (2)
EC3492-DIGITAL SIGNAL LABORATORY Manual
48 pages
Module 5 CN
No ratings yet
Module 5 CN
26 pages
2 Marks
No ratings yet
2 Marks
23 pages
ME1355-CAD / CAM Laboratory: Lab Manual
100% (1)
ME1355-CAD / CAM Laboratory: Lab Manual
42 pages
STS10 Parts Manual 2002
No ratings yet
STS10 Parts Manual 2002
150 pages
Co Po Justification Vlsi Ec6601
No ratings yet
Co Po Justification Vlsi Ec6601
11 pages
Ec8562 Digital Signal Processing Lab
100% (3)
Ec8562 Digital Signal Processing Lab
73 pages
Module 2 21EC732
No ratings yet
Module 2 21EC732
49 pages
CEC352 Satellite Communication Nov Dec 2023 Question Paper Download
No ratings yet
CEC352 Satellite Communication Nov Dec 2023 Question Paper Download
2 pages
Ece 5TH Sem - Book
No ratings yet
Ece 5TH Sem - Book
2 pages
Ec3462 Linear Integrated Circuits Laboratory Course Objectives
100% (2)
Ec3462 Linear Integrated Circuits Laboratory Course Objectives
13 pages
Matsui Mat102 Manual Instructions
No ratings yet
Matsui Mat102 Manual Instructions
220 pages
Unit - 2 ARM Instruction Set-Notes
100% (1)
Unit - 2 ARM Instruction Set-Notes
18 pages
VI ECE EC3461 CS Lab Manual (Viva)
No ratings yet
VI ECE EC3461 CS Lab Manual (Viva)
77 pages
EC8562 - Wireless Communication Unit - Iv Diversity
No ratings yet
EC8562 - Wireless Communication Unit - Iv Diversity
22 pages
Synopsis (Customer Billing System)
50% (2)
Synopsis (Customer Billing System)
3 pages
Ec3492 DSP Unit V Handwritten Notes
No ratings yet
Ec3492 DSP Unit V Handwritten Notes
29 pages
Module - 4: WDM Concepts and Components
No ratings yet
Module - 4: WDM Concepts and Components
224 pages
DSP Lab Manual EC8562 (R 2017)
No ratings yet
DSP Lab Manual EC8562 (R 2017)
82 pages
Simulate NRZ, RZ, Half-Sinusoid and Raised Cosine Pulses and Generate Eye Diagram For Binary Polar Signalling
No ratings yet
Simulate NRZ, RZ, Half-Sinusoid and Raised Cosine Pulses and Generate Eye Diagram For Binary Polar Signalling
10 pages
RS - Unitwise Important Questions
100% (2)
RS - Unitwise Important Questions
3 pages
MAGNER 75 User Manual
No ratings yet
MAGNER 75 User Manual
34 pages
Question Bank-1913104-Design of Embedded Systems
50% (2)
Question Bank-1913104-Design of Embedded Systems
12 pages
EC 8651 Transmission Lines and RF Systems Previous Year Questions
0% (1)
EC 8651 Transmission Lines and RF Systems Previous Year Questions
3 pages
UNIT-V-TMS320C54x-DSP Processor
No ratings yet
UNIT-V-TMS320C54x-DSP Processor
47 pages
EC8651 TLW R2017 2 Marks
100% (2)
EC8651 TLW R2017 2 Marks
11 pages
CO-PO Analog Electronics
No ratings yet
CO-PO Analog Electronics
3 pages
Chapter 2introduction To The C2xx DSP Core and Code Generation
No ratings yet
Chapter 2introduction To The C2xx DSP Core and Code Generation
4 pages
EE6602 2 Marks With Answer Key
50% (2)
EE6602 2 Marks With Answer Key
20 pages
ELM327 v2.1
No ratings yet
ELM327 v2.1
94 pages
Project Plan and Gantt Chart: Project Name Project Manager Start Date End Date Overall Progress Project Deliverable
No ratings yet
Project Plan and Gantt Chart: Project Name Project Manager Start Date End Date Overall Progress Project Deliverable
4 pages
Annauniversity Optical Communication Question Paper
No ratings yet
Annauniversity Optical Communication Question Paper
7 pages
Spatial Epi Book
No ratings yet
Spatial Epi Book
178 pages
Viva Questions For Advanced Communication Lab
60% (5)
Viva Questions For Advanced Communication Lab
6 pages
Embedded System 16 Marks University Questions
No ratings yet
Embedded System 16 Marks University Questions
2 pages
2 Marks
No ratings yet
2 Marks
11 pages
Awp Unit Wise Ques
No ratings yet
Awp Unit Wise Ques
5 pages
Grade 7 Syllabus
100% (1)
Grade 7 Syllabus
11 pages
Mark S. Gockenbach - Mathematica Tutorial - To Accompany Partial Differential Equations - Analytical and Numerical Methods (2010) (p120)
100% (1)
Mark S. Gockenbach - Mathematica Tutorial - To Accompany Partial Differential Equations - Analytical and Numerical Methods (2010) (p120)
120 pages
Mes Question-Bank
No ratings yet
Mes Question-Bank
4 pages
RISC-V Assembly Language Presentation
No ratings yet
RISC-V Assembly Language Presentation
19 pages
Principles of Communication Engineering Question Bank
0% (1)
Principles of Communication Engineering Question Bank
7 pages
Thakur Institute of Aviation Technology: Training Notes Feb 2018 Forword Uncontrolled
No ratings yet
Thakur Institute of Aviation Technology: Training Notes Feb 2018 Forword Uncontrolled
10 pages
Gabinetes Pentair
No ratings yet
Gabinetes Pentair
8 pages
Read Me
No ratings yet
Read Me
4 pages
AAC 3 of 2004 CVR Readout
No ratings yet
AAC 3 of 2004 CVR Readout
4 pages
Characteristic S Felta Technokid S Techfactor Creotec Makebloc K C & E ? Diwa 1 Robotics Alexan Phoeni X
No ratings yet
Characteristic S Felta Technokid S Techfactor Creotec Makebloc K C & E ? Diwa 1 Robotics Alexan Phoeni X
4 pages
Ba 6392 FP
No ratings yet
Ba 6392 FP
9 pages
PROII102SilentInstallGuide x64
No ratings yet
PROII102SilentInstallGuide x64
7 pages
Longbomb's CFG
No ratings yet
Longbomb's CFG
10 pages
Character Generation
No ratings yet
Character Generation
3 pages
Python Cht3 PDF
No ratings yet
Python Cht3 PDF
36 pages
Viewsonic VA2405-h
No ratings yet
Viewsonic VA2405-h
2 pages
23 Cmsis-Rtos
No ratings yet
23 Cmsis-Rtos
17 pages
Presales Jobs
No ratings yet
Presales Jobs
2 pages
IQ 100 Series Meter Quick Start Guide
No ratings yet
IQ 100 Series Meter Quick Start Guide
2 pages
Signals, Systems and Communication
From Everand
Signals, Systems and Communication
B.P. Lathi
No ratings yet

DSP Notes Unit1 and 2

Uploaded by

DSP Notes Unit1 and 2

Uploaded by

CEC337 – DSP ARCHETECTURE AND PROGRAMMING

Architectures for Programmable DSP Processors

2.1 Basic Architectural Features

2.2 DSP Computational Building Blocks

2.2.2 Parallel Multipliers

Fig 2.1 Braun Multiplier for a 4X4 Multiplication

2.2.3 Multipliers for Signed Numbers

Consider two signed numbers A and B,

2.2.5 Bus Widths

2.2.7 Barrel Shifters

Fig 2.3 A Barrel Shifter

2.3 Multiply and Accumulate Unit

2.3.1 Overflow and Underflow

Fig 2.7: Schematic Diagram of the Saturation Logic

2.4 Arithmetic and Logic Unit

2.5 Bus Architecture and Memory

Fig 2.10 Harvard Architecture

2.5.1 On-chip Memories

2.5.2 Organization of On-chip Memories

2.6 Data Addressing Capabilities

Data accessing capability of a programmable DSP device is configured by means of its

2.6.1 Immediate Addressing Mode

2.6.2 Register Addressing Mode

2.6.3 Direct Addressing Mode

2.6.4 Indirect Addressing Mode

2.7.1 Circular Addressing Mode

2.7.2 Bit Reversed Addressing Mode

2.8 Address Generation Unit

Fig 2.13 Address generation unit

2.9.2 Program Sequencer

Fig 2.14 Program Sequencer

i. A RAM to store the signal samples x (n)

1. In the previous problem, it is decided to have an accumulator with only 16

7. Repeat the previous problem for SAR= 0210h and EAR=0201h

TMS320C5X Programmable DSP Processors

3.2 Commercial Digital Signal-Processing Devices:

Summary of the Architectural Features of three fixed-Points DSPs

TMS320C54xx processors retain in the basic Harvard architecture of their predecessor,

3.3.1 Bus Structure:

3.3.2 Central Processing Unit (CPU):

Figure 3.5(a) Internal memory-mapped registers of TMS320C54xx processors

Status registers (ST0,ST1):

ARP: Auxiliary register pointer.

Figure 3.6(b). ST1 diagram

CPL: Compiler mode

INTM: Interrupt mode, it globally masks or enables all interrupts.

OVM: Overflow mode.

SXM=1_Data is sign extended

C16: Dual 16 bit/double-Precision arithmetic mode.

FRCT: Fractional mode.

CMPT: Compatibility mode.

ASM: Accumulator Shift Mode.

Processor Mode Status Register (PMST):

SMUL: Saturation on Multiplication

3.4 Data Addressing Modes of TMS320C54X Processors:

3.4.1 Immediate addressing:

3.4.2 Absolute Addressing:

3.4.3 Accumulator Addressing:

3.4.4 Direct Addressing:

3.4.7 Stack Addressing:

3.7 On chip peripherals:

3.7.1 It has two general purpose I/O pins:

 BIO-input pin used to monitor the status of external devices.

3.7.2 Software programmable wait state generator:

3.7.3 Hardware Timer

The timer register (TIM) is a 16-bit memory-mapped register that decrements at

3.9 Pipeline operation of TMS320C54xx Processors:

You might also like