Imp 22
Imp 22
Subject Code:10EC751
Date:24.8.2014
Unit-02
•Multipliers
The advent of single chip multipliers paved the way for
implementing DSP functions on a VLSI chip. Parallel multipliers
replaced the traditional shift and add multipliers now days.
Parallel multipliers take a single processor cycle to fetch and
execute the instruction and to store the result. They are also
called as Array multipliers. The key features to be considered
for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed
•Parallel multipliers:
Consider the multiplication of two unsigned numbers A and B. Let A be
represented using m bits as (Am-1 Am-2 …….. A1 A0) and B be
represented using n bits as (Bn-1 Bn-2 …….. B1 B0). Then the product of
these two numbers is given by, Braun multiplier.
•Multipliers for signed numbers:
Consider two signed numbers A and B,
•Bus Widths
1.Consider the multiplication of two n bit numbers X and Y. The product Z can
be atmost 2n bits long. In order to perform the whole operation in a single
execution cycle, we require two buses of width n bits each to fetch the operands
X and Y and a bus of width 2n bits to store the result Z to the memory. Although
this performs the operation faster, it is not an efficient way of implementation as
it is expensive.
a. Use the n bits operand bus and save Z at two successive memory locations.
Although it stores the exact value of Z in the memory, it takes two cycles to store
the result.
b. Discard the lower n bits of the result Z and store only the higher order n bits
into the memory. It is not applicable for the applications where accurate result is
required.
Another alternative can be used for the applications where speed is not a major
concern. In which latches are used for inputs and outputs thus requiring a single bus to
fetch the operands and to store the result
Shifters
Shifters are used to either scale down or scale up operands or the results. The
following scenarios give the necessity of a shifter
a. While performing the addition of N numbers each of n bits long, the sum can
grow up to n+log2 N bits long. If the accumulator is of n bits long, then an
overflow error will occur. This can be overcome by using a shifter to scale down
the operand by an amount of log2N.
b. Similarly while calculating the product of two n bit numbers, the product can
grow up to 2n bits long. Generally the lower n bits get neglected and the sign bit
is shifted to save the sign of the product.
c. Finally in case of addition of two floating-point numbers, one of the operands has
to be shifted appropriately to make the exponents of two numbers equal.
Barrel Shifters
For an input of length n, log2 n control lines are required. And an additional
control line is required to indicate the direction of the shift.
•A Barrel Shifter is to be designed with 16 inputs for left shifts from 0 to 15 bits.
How many control lines are required to implement the shifter?
As the number of bits used to represent the input are 16, log2 16=4 control inputs
are required.
•It is required to find the sum of 64, 16 bit numbers. How many bits should the
accumulator have so that the sum can be computed without the occurrence of
overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from occurring.
Multiply and Accumulate Unit
Overflow and Underflow
While designing a MAC unit, attention has to be paid to the word sizes
encountered at the input of the multiplier and the sizes of the add/subtract unit
and the accumulator, as there is a possibility of overflow and underflows.
Overflow/underflow can be avoided by using any of the following methods viz
a. Using shifters at the input and the output of the MAC
b. Providing guard bits in the accumulator
c. Using saturation logic
Saturation logic
Overflow/ underflow will occur if the result goes beyond the most positive number or
below the least negative number the accumulator can handle. Thus the
overflow/underflow error can be resolved by loading the accumulator with the most
positive number which it can handle at the time of overflow and the least negative
number that it can handle at the time of underflow. This method is called as saturation
logic.
Harvard Architecture
Von Neumann Architecture
• program and data reside in same memory
•single bus is used to access both
Implications:
slows down program execution since processor has to wait for data even
after instruction is made available
Harvard Architecture
program and data reside in separate memories with two independent buses
Implications:
• faster program execution because of simultaneous memory
access capability
On-Chip Memory
• on-chip = on-processor
•help in running the DSP algorithms faster than when memory is off-chip dedicated
addresses and data buses are available
speed: on-chip memories should match the speeds of the ALU
Operations
size: the more area chip memory takes, the less area available for
other DSP functions
• Immediate
•Register
• Direct
•Indirect
Parallelism
Parallelism means:
provision of multiple function units, which may operate in parallel to increase throughput
multiple memories
different ALUs for data and address computations
advantage: algorithms can perform more than one operation at a time increasing speed
disadvantage: complex hardware required to control units and make sure instructions and
data can be fetched simultaneously
Pipelining
architectural feature in which an instruction is broken into a number of steps
a separate unit performs each step at the same time usually working on different stage
of data
advantage: if repeated use of the instruction is required, then after an initial latency
the output throughput becomes one instruction per unit time
disadvantages: pipeline latency, having to break instructions up into equally-timed
units
Pipelining example:
Five steps:
Step 1: instruction fetch
Step 2: instruction decode
Step 3: operand fetch
Step 4: execute
Step 5: save
Pipelining for speeding up the execution of an instruction
Multiplier
MAC
unit
multiplexer
•Pipelined implementation of an 8-tap FIR filter using eight MACs
•Parallel implementation using two MAC units