0% found this document useful (0 votes)
78 views57 pages

Unit 2 Architectures For Programmable Digital Signal-Processors

This document discusses the architecture of programmable digital signal processors. It covers basic architectural features like arithmetic and logical operations supported. It also describes key computational building blocks like multipliers, parallel multipliers, barrel shifters and multiply-accumulate units. Issues like speed, word sizes, and preventing overflow/underflow in multiply-accumulate units are discussed. The document provides examples to illustrate concepts like number of control lines required for a barrel shifter and number of guard bits needed to prevent overflow in a multiply-accumulate unit.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views57 pages

Unit 2 Architectures For Programmable Digital Signal-Processors

This document discusses the architecture of programmable digital signal processors. It covers basic architectural features like arithmetic and logical operations supported. It also describes key computational building blocks like multipliers, parallel multipliers, barrel shifters and multiply-accumulate units. Issues like speed, word sizes, and preventing overflow/underflow in multiply-accumulate units are discussed. The document provides examples to illustrate concepts like number of control lines required for a barrel shifter and number of guard bits needed to prevent overflow in a multiply-accumulate unit.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Unit 2 ARCHITECTURES FOR

PROGRAMMABLE DIGITAL SIGNAL-


PROCESSORS:
TEXT BOOK:
1. “Digital Signal Processing”, Avatar Singh and S.
Srinivasan, Thomson Learning, 2004. (refer chapter 4
4.1 to 4.7)
Learning objectives:
To understand the basic architecture features, DSP
computational building blocks and data addressing
capabilities.
To study data addressing capabilities, programmability and
program execution.
To analyze the speed issue, features for external interfacing.
Lesson plan:
Sl No Topic Date Date Hours
planned engaged
1 Introduction 1st

2 Basic Architectural Features 2nd

3 DSP Computational Building Blocks 3rd

4 Bus Architecture and Memory 4th

5 Data Addressing Capabilities 5th

6 Address Generation Unit 6th

7 Programmability and Program Execution 7th

8 Features for External Interfacing 8th


Contents
Introduction
Basic Architectural Features
DSP Computational Building Blocks
Bus Architecture and Memory
Data Addressing Capabilities
Address Generation Unit
Programmability and Program Execution
Features for External Interfacing
2.1 Basic Architectural Features 
A programmable DSP device should provide instructions
similar to a conventional microprocessor. The instruction set
of a typical DSP device should include the following,
a. Arithmetic operations such as ADD, SUBTRACT,
MULTIPLY etc
b. Logical operations such as AND, OR, NOT, XOR etc
c. Multiply and Accumulate (MAC) operation
d. Signal scaling operation
In addition to the above provisions, the architecture should also
include,
a. On chip registers to store immediate results
b. On chip memories to store signal samples (RAM) 
c. On chip memories to store filter coefficients (ROM)
 
2.2 DSP Computational Building Blocks
2.2.1 Multipliers
The advent of single chip multipliers is the way for
implementing DSP functions on a VLSI chip. Parallel
multipliers replaced the traditional shift and add multipliers
now a days. Parallel multipliers take a single processor cycle
to fetch and execute the instruction and to store the result.
They are also called as Array multipliers.
The key features to be considered for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed
2.2.2 Parallel Multipliers
Consider the multiplication of two unsigned numbers A and
B. Let A be represented using m bits as (Am-1 Am-2
…….. A1 A0) and B be represented using n bits as
(Bn-1 Bn-2 …….. B1,B0). Then the product of these
two numbers is given by,
Braun multiplier for a 4X4 Multiplication
2.2.3 Multipliers for Signed Numbers 
In the Braun multiplier the sign of the numbers are not
considered into account. In order to implement a multiplier
for signed numbers, additional hardware is required to modify
the Braun multiplier. The modified multiplier is called as
Baugh-Wooley multiplier.
Consider two signed numbers A and B
2.2.4 Speed 
Conventional Shift and Add technique of multiplication
requires n cycles to perform the multiplication of two n bit
numbers. Whereas in parallel multipliers the time required will
be the longest path delay in the combinational circuit used.
As DSP applications generally require very high speed, it
is desirable to have multipliers operating at the highest
possible speed by having parallel implementation.
2.2.5 Bus Widths
Consider the multiplication of two n bit numbers X and Y. The
product Z can be atmost 2n bits long. In order to perform the
whole operation in a single execution cycle, we require two
buses of width n bits each to fetch the operands X and Y and a
bus of width 2n bits to store the result Z to the memory.
Although this performs the operation faster, it is not an efficient
way of implementation as it is expensive.
Another alternative can be used for the applications where
speed is not a major concern. In which latches are used for
inputs and outputs thus requiring a single bus to fetch the
operands and to store the result (Fig 2.2).
2.2.6 Shifters
Shifters are used to either scale down or scale up operands or the
results. The following scenarios give the necessity of a shifter 
a. While performing the addition of N numbers each of n bits
long, the sum can grow up to n+log2N bits long. If the
accumulator is of n bits long, then an overflow error will occur.
This can be overcome by using a shifter to scale down the operand
by an amount of log2N. 
b. Similarly while calculating the product of two n bit numbers,
the product can grow up to 2n bits long. Generally the lower n bits
get neglected and the sign bit is shifted to save the sign of the
product.
c. Finally in case of addition of two floating-point numbers, one
of the operands has to be shifted appropriately to make the
exponents of two numbers equal.
2. It is required to find the sum of 64, 16 bit numbers. How many bits should
the accumulator have so that the sum can be computed without the occurrence
of overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log 2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from
occurring.
 
3. In the previous problem, it is decided to have an accumulator with only 16
bits but shift the numbers before the addition to prevent overflow, by how many
bits should each number be shifted? 
As the length of the accumulator is fixed, the operands have to be shifted by an
amount of log2 64 = 6 bits prior to addition operation, in order to avoid the
condition of overflow.
 
4. If all the numbers in the previous problem are fixed point integers, what is
the actual sum of the numbers? 
The actual sum can be obtained by shifting the result by 6 bits towards left side
after the sum being computed. Therefore
Actual Sum= Accumulator content X 2 6
2.2.7 Barrel Shifters
In other words, for DSP applications as speed is the crucial
issue, several shifts are to be accomplished in a single
execution cycle. This can be accomplished using a barrel
shifter, which connects the input lines representing a word to a
group of output lines with the required shifts determined by its
control inputs. For an input of length n, log2 n control lines
are required. And an additional control line is required to
indicate the direction of the shift.
The block diagram of a typical barrel shifter is as shown in
figure 2.3.
An additional line is also required for the control input to
indicate the direction of the shift .
The direction of shift is usually fixed , with the result
that only log2 n lines are required for the control
inputs.
Bit shifted out of the input word are discarded and the
new bit positions are filled with zeros in the case of left
shift.
In the case of right shift, the new bit positions are
replicated with the most significant bit to maintain the
sign of the shifted result.
Fig 2.4 Implementation of a 4 bit Shift Right Barrel Shifter
5. A Barrel Shifter is to be designed with 16 inputs for left
shifts from 0 to 15 bits. How many control lines are required
to implement the shifter?
As the number of bits used to represent the input are 16, log2
16=4 control inputs are required.

2.3 Multiply and Accumulate Unit


Most of the DSP applications require the computation of the
sum of the products of a series of successive multiplications. In
order to implement such functions a special unit called a
Multiply and Accumulate (MAC) unit is required.
A MAC consists of a multiplier and a special register called
Accumulator. MACs are used to implement the functions of the
type A+BC. A typical MAC unit is as shown in the figure 2.5.
Fig 2.5 A MAC Unit
6. If a sum of 256 products is to be computed using a pipelined
MAC unit, and if the MAC execution time of the unit is 100nsec,
what will be the total time required to complete the operation? 
As N=256 in this case, MAC unit requires N+1=257execution
cycles. As the single MAC execution time is 100nsec, the total time
required will be, (257*100nsec)=25.7µsec .

2.3.1 Overflow and Underflow 


While designing a MAC unit, attention has to be paid to the word
sizes encountered at the input of the multiplier and the sizes of the
add/subtract unit and the accumulator, as there is a possibility of
overflow and underflows. Overflow/underflow can be avoided by
using any of the following methods viz
a. Using shifters at the input and the output of the MAC
b. Providing guard bits in the accumulator
c. Using saturation logic
Shifters
Shifters can be provided at the input of the MAC to normalize the data
and at the output to denormalize the same.
Guard bits
As the normalization process does not yield accurate result, it is not
desirable for some applications. In such cases we have another
alternative by providing additional bits called guard bits in the
accumulator so that there will not be any overflow error. Here the
add/subtract unit also has to be modified appropriately to manage the
additional bits of the accumulator.
7. Consider a MAC unit whose inputs are 16 bit numbers. If 256
products are to be summed up in this MAC, how many guard bits
should be provided for the accumulator to prevent overflow condition
from occurring? 
As it is required to calculate the sum of 256, 16 bit numbers, the sum can
be as long as (16+ log2 256)=24 bits. Hence the accumulator should be
capable of handling these 22 bits. Thus the guard bits required will be
(24-16)= 8 bits 
The block diagram of the modified MAC after considering the
guard or extension bits is as shown in the figure 2.6.
8. What should be the minimum width of the accumulator in a DSP
device that receives 10 bit A/D samples and is required to add 64 of
them without causing an overflow?
As it is required to calculate the sum of 64, 10 bit numbers, the sum can
be as
long as (10+ log2 64)=16 bits. Hence the accumulator should be
capable of handling these
16 bits. Thus the guard bits required will be (16-10)= 6 bits.
Saturation Logic
Overflow/ underflow will occur if the result goes beyond the most
positive number or below the least negative number the accumulator
can handle. Thus the overflow/underflow error can be resolved by
loading the accumulator with the most positive number which it can
handle at the time of overflow and the least negative number that it can
handle at the time of underflow. This method is called as saturation
logic. A schematic diagram of saturation logic is as shown in figure 2.7.
In saturation logic, as soon as an overflow or underflow
condition is satisfied the accumulator will be loaded with the
most positive or least negative number overriding the result
computed by the MAC unit.

Fig :A Schematic Diagram of the Saturation


Logic 
2.4 Arithmetic and Logic Unit
 A typical DSP device should be capable of handling arithmetic instructions like
ADD, SUB, INC, DEC etc and logical operations like AND, OR , NOT, XOR
etc. The block diagram of a typical ALU for a DSP is as shown in the figure 2.8.
Status Flags 
ALU includes circuitry to generate status flags after arithmetic
and logic operations. These flags include sign, zero, carry and
overflow. 
Overflow Management
Depending on the status of overflow and sign flags, the
saturation logic can be used to limit the accumulator content.
Register File
Instead of moving data in and out of the memory during the
operation, for better speed, a large set of general purpose
registers are provided to store the intermediate results.
2.5 Bus Architecture and Memory 
 Conventional microprocessors use Von Neumann architecture for
memory management wherein the same memory is used to store
both the program and data (Fig 2.9). Although this architecture is
simple, it takes more number of processor cycles for the execution
of a single instruction as the same bus is used for both data and
program.
 
In order to increase the speed of operation, separate memories
were used to store program and data and a separate set of data
and address buses have been given to both memories, the
architecture called as Harvard Architecture. It is as shown in
figure 2.10.
 As many of the DSP instructions require more than one operand,
use of a single data memory leads to the fetch the operands one
after the other, thus increasing the delay of processing. This
problem can be overcome by using two separate data memories for
storing operands separately, thus in a single clock cycle both the
operands can be fetched together (Figure 2.11).
2.5.1 On-chip Memories 
In order to have a faster execution of the DSP functions, it is
desirable to have some memory located on chip. As dedicated
buses are used to access the memory, on- chip memories are faster.
Speed and size are the two key parameters to be considered with
respect to the on-chip memories.

Speed
On-chip memories should match the speeds of the ALU operations
in order to maintain the single cycle instruction execution of the
DSP.
Size
In a given area of the DSP chip, it is desirable to implement as
many DSP functions as possible. Thus the area occupied by the
on-chip memory should be minimum so that there will be a scope
for implementing more number of DSP functions on- chip.
2.5.2 Organization of On-chip Memories
a. As many DSP algorithms require instructions to be
executed repeatedly, the instruction can be stored in the
external memory, once it is fetched can reside in the
instruction cache.
b. The access times for memories on-chip should be
sufficiently small so that it can be accessed more than once in
every execution cycle.
c. On-chip memories can be configured dynamically so that
they can serve different purpose at different times.
2.6 Data Addressing Capabilities 
Data accessing capability of a programmable DSP device is
configured by means of its addressing modes. The summary
of the addressing modes used in DSP is as shown in the table
below.
Table 2.1 DSP Addressing Modes
Addressing
Operand Sample Operation
Mode Format

Immediate Immediate Value ADD #imm #imm +A A


Register Register Contents ADD reg reg +A A
Direct Memory Address ADD mem mem+A A
Register
Indirect Memory contents ADD *addreg *addreg +A A
with address in the
register
2.6.1 Immediate Addressing Mode: ADD #imm
In this addressing mode, data is included in the instruction
itself. 
#imm +A A

2.6.2 Register Addressing Mode: ADD reg


In this mode, one of the registers will be holding the data and
the register has to be specified in the instruction. 
reg +A A
2.6.3 Direct Addressing Mode : ADD mem
In this addressing mode, instruction holds the memory location
of the operand. 
mem+A A
2.6.4 Indirect Addressing Mode:  ADD *addreg
In this addressing mode, the operand is accessed using a
pointer. A pointer is generally a register, which holds the
address of the location where the operands resides.
*addreg +A A
Indirect addressing mode can be extended to automatic
increment or decrement capabilities, which has lead to the
following addressing modes.
Table 2.2 Indirect Addressing Modes
Addressing Sample Operation
Mode Format

Post Increment ADD *addreg+ A A + *addreg


addreg
addreg+1
Post Decrement ADD *addreg- A A + *addreg
addreg addreg-1

Pre Increment ADD +*addreg addreg


addreg+1
A A + *addreg
Pre Decrement ADD -*addreg addreg addreg-1
A A + *addreg
Post_Add_Offse ADD *addreg, offsetreg+ A A + *addreg
t
addreg
addreg+offsetreg
Post_Sub_Offset ADD *addreg, offsetreg- A A + *addreg
addreg
addreg-offsetreg
Pre_Add_Offset ADD offsetreg+,*addreg addreg
addreg+offsetreg
A A + *addreg
Pre_Sub_Offset ADD offsetreg-,*addreg addreg
addreg-offsetreg
A A + *addreg
9. What are the memory addresses of the operands in each of the following
cases of indirect addressing modes? In each case, what will be the content of
the addreg after the memory access? Assume that the initial contents of the
addreg and the offsetreg are 0200h and 0010h, respectively.
a. ADD *addreg- b. ADD +*addreg
c. ADD offsetreg+,*addreg d. ADD *addreg,offsetreg-
 
Instruction Addressing Operand Address addreg Content
Mode after Access
ADD *addreg- Post Decrement 0200h 0200-01=01FFh

ADD +*addreg Pre Increment 0200+01=0201h 0201h

ADD Pre_Add_Offset 0200+0010=0210h 0210h


offsetreg+,*addreg
ADD Post_Sub_Offse 0200h 0200-0010=01F0h
*addreg,offsetreg- t
2.7.1 Circular Addressing Mode
 Circular buffer allows one to handle a continuous stream of incoming
data samples.
In a circular buffer, successive data samples are stored in sequential
buffer locations until the end of the buffer is reached.
After reaching the end , again should start from the beginning of the
buffer.
This process can go on forever as long as the data samples get
processed in a timely manner at a rate faster than the incoming data.
To access a data sample from a circular buffer, a circular addressing
mode is a great help.
The implementation of such an Addressing mode in hardware requires
three registers
a. Pointer register to hold the current location (PNTR)
b. Start Address Register to hold the starting address of the buffer (SAR)
c. End Address Register to hold the ending address of the buffer (EAR)
There are four special cases in this addressing mode.
They are
a. SAR < EAR & updated PNTR > EAR
b. SAR < EAR & updated PNTR < SAR
c. SAR >EAR & updated PNTR > SAR
d. SAR > EAR & updated PNTR < EAR
The buffer length in the first two case will be (EAR-
SAR+1) whereas for the next tow cases (SAR-
EAR+1)
Pointer updating Algorithm for the circular addressing mode

Updated PNTR PNTR+ or - increment


If SAR < EAR
and if updated PNTR > EAR, then
new PNTR Updated PNTR – buffer size
and if updated PNTR < SAR, then
new PNTR Updated PNTR + Buffer size
If SAR > EAR
and if updated PNTR > SAR, then
new PNTR Updated PNTR – buffer size
and if updated PNTR < EAR, then
new PNTR Updated PNTR + Buffer size
Else new PNTR Updated PNTR
Case 3: SAR>EAR, & Updated
Case 4: SAR>EAR, & Updated PNTR<EAR
PNTR>SAR
11. A DSP has a circular buffer with the start and the end addresses
as 0200h and 020Fh respectively. What would be the new values of
the address pointer of the buffer if, in the course of address
computation, it gets updated to
a. 0212h b. 01FCh 
Buffer Length= (EAR-SAR+1)= 020F-0200+1=10h
a. New Address Pointer= Updated Pointer-buffer length = 0212-
10=0202h
b. New Address Pointer= Updated Pointer+buffer length =
01FC+10=020Ch
 
12. Repeat the previous problem for SAR= 0210h and EAR=0201h
Buffer Length= (SAR-EAR+1)= 0210-0201+1=10h
c. New Address Pointer= Updated Pointer-buffer length = 0212-
10=0202h
d. New Address Pointer= Updated Pointer+buffer length =
01FC+10=020Ch
2.7.2 Bit Reversed Addressing Mode
To implement FFT algorithms we need to access the data
in a bit reversed manner. Hence a special addressing mode
called bit reversed addressing mode is used to calculate the
index of the next data to be fetched.
It works as follows. Start with index 0. The present index can
be calculated by adding half the FFT length to the previous
index in a bit reversed manner, carry being propagated from
MSB to LSB.

Current index= Previous index+ B (1/2(FFT Size))


13. Compute the indices for an 8-point FFT using Bit
reversed Addressing Mode
Start with index 0. Therefore the first index would be (000)
Next index can be calculated by adding half the FFT length,
in this case it is (100) to the previous index.
i.e. Present Index= (000)+B (100)= (100) Similarly the next
index can be calculated as Present Index= (100)+B (100)=
(010)
The process continues till all the indices are calculated. The
following table summarizes the calculation.
Index in Binary BCD value Bit reversed BCD value
index
000 0 000 0

001 1 100 4

010 2 010 2

011 3 110 6

100 4 001 1

101 5 101 5

110 6 011 3

111 7 111 7
2.8 Address Generation Unit 
The main job of the Address Generation Unit is to generate the
address of the operands required to carry out the operation. They
have to work fast in order to satisfy the timing constraints.
As the address generation unit has to perform some mathematical
operations in order to calculate the operand address, it is provided
with a separate ALU.
Address generation typically involves one of the following
operations.
a. Getting value from immediate operand, register or a memory
location
b. Incrementing/ decrementing the current address
c. Adding/subtracting the offset from the current address
d. Adding/subtracting the offset from the current address and
generating new address according to circular addressing mode
e. Generating new address using bit reversed addressing mode
The block diagram of a typical address generation unit is as shown in fig2.13
2.9 Programmability and Program Control
A programmable DSP device should provide the programming capability
involving branching, looping and subroutines.
The implementation of repeat capability should be hardware based so that it
can be programmed with minimal or zero overhead. A dedicated register can
be used as a counter.
In a normal subroutine call, return address has to be stored in a stack
thus requiring memory access for storing and retrieving the return address,
which in turn reduces the speed of operation. Hence a LIFO memory can be
directly interfaced with the program counter.

2.9.1 Program Control 


Like microprocessors, DSP also requires a control unit to provide necessary
control and timing signals for the proper execution of the
instructions. In microprocessors, the controlling is micro coded based where
each instruction is divided into microinstructions stored in micro memory. As
this mechanism is slower, it is not applicable for DSP applications. Hence in
DSP the controlling is hardwired base where the Control unit is designed as a
single, comprehensive, hardware unit. Although it is more complex it is faster.
2.9.2 Program Sequencer 
It is a part of the control unit used to generate instruction
addresses in sequence needed to access instructions. It
calculates the address of the next instruction to be fetched. The
next address can be from one of the following sources. 
a. Program Counter
b. Instruction register in case of branching, looping and
subroutine calls
c. Interrupt Vector table
d. Stack which holds the return address 
The block diagram of a program sequencer is as shown in figure
2.14.
Fig 2.14 Program Sequencer
Program sequencer should have the following circuitry
a. PC has to be updated after every fetch
b. Counter to hold count in case of looping
c. A logic block to check conditions for conditional jump
instructions
d. Condition logic-status flag
Questions
1. Explain implementation of 8- tap FIR filter, (i) pipelined using MAC units and (ii)
parallel using two MAC units. Draw block diagrams.
2. What is the role of a shifter in DSP? Explain the implementation of 4-bit shift
right barrel shifter, with a diagram.
3. Identify the addressing modes of the operands in each of the following
instructions & their operations
i)ADD B ii) ADD #1234h iii) ADD 5678h iv) ADD +*addreg
4. Draw the schematic diagram of the saturation logic and explain the same.
5. Explain how the circular addressing mode and bit reversal addressing mode are
implemented in a DSP.
6. Explain the purpose of program sequencer. 
7. Give the structure of a 4X4 Braun multiplier, Explain its concept. What
modification is required to carry out multiplication of signed numbers? Comment
on the speed of the multiplier.
8. Explain guard bits in a MAC unit of DSP. Consider a MAC unit whose inputs are 24-
bit numbers. How many guard bits should be provided if 512 products have to be
added in the accumulator to prevent overflow condition? What is the overall size of
the accumulator required?
9. With a neat block diagram explain ALU of DSP system.
11. The 256 unsigned numbers, 16 bit each are to be summed up in a processor. How many
guard bits are needed to prevent overflow.
12. How will you implement an 8X8 multiplier using 4X4 multipliers as the building
blocks.
13. Describe the basic features that should be provided in the DSP architecture to be used to
implement the Nth order FIR filter, where x(n) denotes the input sample, y(n) the
output sample and h(i) denotes ith filter coefficient.(Dec.09-Jan.10, 8m)
14. Explain the issues to be considered in designing and implementing a DSP system, with
the help of a neat block diagram. (May/June10 , 6m)
15. Briefly explain the major features of programmable DSPs. (May/June10, 8m)
16. Explain the operation used in DSP to increase the sampling rate. The sequence
x(n)=[0,2,4,6,8] is interpolated using interpolation sequence bk =[1/2,1,1/2] and the
interpolation factor is 2.find the interpolated sequence y(m). (May/June10, 8m)
17. Explain with the help of mathematical equations how signed numbers can be
multiplied. (Dec.10-Jan.11, 8m)
18. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation sequence
bk=[0.5,1,0.5] and the interpolation factor of 2. Find the interpolated sequence y(m).
(Dec.10-Jan.11, 6m)
19. Why signal sampling is required? Explain the sampling process. (Dec.12, 5m)
20. Define decimation and interpolation process. Explain them using block diagrams and
equations. (Dec.12, 6m).
Thank You

You might also like