0% found this document useful (0 votes)
47 views39 pages

An Optimized Modified Parallel Implementation Design of Multiplier and Accumulator Operator

The document describes a new optimized parallel implementation of a multiplier and accumulator (MAC) operator. It proposes combining multiplication and accumulation into a hybrid carry-save adder tree structure. This improves performance by merging the accumulator, which has the longest delay, into the partial product compression. The design uses a 1's complement Booth encoding and modified arrays to increase operand density and reduce final adder inputs. It analyzes the proposed design against standard and Elguibaly MAC architectures in terms of hardware resources and performance when pipelined. The MAC was implemented on FPGA using Xilinx tools and for ASIC using Cadence design suites.

Uploaded by

VigneshInfotech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views39 pages

An Optimized Modified Parallel Implementation Design of Multiplier and Accumulator Operator

The document describes a new optimized parallel implementation of a multiplier and accumulator (MAC) operator. It proposes combining multiplication and accumulation into a hybrid carry-save adder tree structure. This improves performance by merging the accumulator, which has the longest delay, into the partial product compression. The design uses a 1's complement Booth encoding and modified arrays to increase operand density and reduce final adder inputs. It analyzes the proposed design against standard and Elguibaly MAC architectures in terms of hardware resources and performance when pipelined. The MAC was implemented on FPGA using Xilinx tools and for ASIC using Cadence design suites.

Uploaded by

VigneshInfotech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

JYOTHISHMATHI INSTITUTE OF TECHONOLGY AND SCIENCES

Nustulapur, Karimnagar.
Department of Electronics and Communication Engineering.

AN OPTIMIZED MODIFIED PARALLEL IMPLEMENTATION


DESIGN
OF MULTIPLIER AND ACCUMULATOR OPERATOR

By
UNDER THE GUIDENCE
E. EVANGELINE,
J. RAMESH
M.Tech(VLSI Design),
18271D5701.
Agenda:

• Abstract
• Introduction
• Design Analysis
• Tools Used
• FPGA Implementation
• ASIC Implementation
• Simulation Results
• Conclusion
Abstract
• a new architecture of multiplier-and-accumulator (MAC) for high-speed
arithmetic. By combining multiplication with accumulation and devising a hybrid
type of carry save adder (CSA), the performance was improved.
• Since the accumulator that has the largest delay in MAC was merged into CSA,
the overall performance was elevated. The proposed CSA tree uses 1’s-
complement-based radix-2 modified Booth’s algorithm (MBA) and has the
modified array for the sign extension in order to increase the bit density of the
operands.
• The CSA propagates the carries to the least significant bits of the partial products
and generates the least significant bits in advance to decrease the number of the
input bits of the final adder.
Introduction :

• In many DSP applications like filtering and convolution, multiplier and


multiplier and accumulator ( MAC ) are the most essential elements.

• The current trend in ALU design is to implement the addition and


multiplication operations using one hardware component i.e., MAC
Unit.

• MAC = Multiplication + Accumulation


Contd…

• Types of multipliers:
• Binary serial multiplier
• Parallel multiplier
• Booth encoding
• Modified Booth encoding
Binary Serial Multiplier :
• The last adder in the multiplier has a carry chain.The earlier additions
are performed by full adders are used to reduce three one-bit inputs
to two one-bit outputs.

Disadvantage :
• Critical path will be more.

To reduce the critical path, we will go for parallel multipliers


which uses Booth Encoding concept.
Parallel multipliers :
Booth encoding :
Steps involved in Booth encoding:
Step 1:
Determine the values of A and S, and the initial value of P. All of
these numbers should have a length equal to (x + y + 1).
• A: Fill the most significant (leftmost) bits with the value of m. Fill
the remaining (y + 1) bits with zeros.
• S: Fill the most significant bits with the value of (−m) in two's
complement notation. Fill the remaining (y + 1) bits with zeros.
• P: Fill the most significant x bits with zeros. To the right of this,
append the value of r. Fill the least significant (rightmost) bit with
a zero.
Step 2:
Determine the two least significant (rightmost) bits of P.
• If they are 01, find the value of P + A. Ignore any overflow.
• If they are 10, find the value of P + S. Ignore any overflow.
• If they are 00, do nothing. Use P directly in the next step.
• If they are 11, do nothing. Use P directly in the next step.
Step 3:
Arithmetically shift the value obtained in the 2nd step by a single
place to the right. Let P now equal this new value.
Step 4: Repeat steps 2 and 3 until they have been done y times.
Step 5: Drop the least significant (rightmost) bit from P. This is the
product of m and r.
Types of Adders:

• Ripple carry adder


• Carry Look Ahead adder
• Carry Save adder

By taking the advantages of both the multiplier and adder


architectures, a hybrid type of CSA structure is used in this
MAC design.
Design Analysis:
Overview of MAC :

Fig. Hardware architecture of general MAC.

Fig. Basic arithmetic steps of multiplication and accumulation.


Contd…

• In general, For N X N bit multiplication,


The required partial products are N.
• Execution time is proportional to N.
• For faster multiplication, the architecture uses Booth which reduces the
partial products to half.
• This architecture uses a Hybrid type of CSA to add the partial products.
Different types of Parallel MAC architectures are :

• Standard Design
• Elguibaly’s Architecture
• Proposed Architecture
Standard Design :

Fig. Standard design


Contd…

Hardware architecture for the standard design :


BBBooth Encoder

n n+1

Accumulation
Final addition

Z(2n+1 bits)
n+1
X 2n+1
CSA tree

2n
n n+1 C P
Y n+1
n+1
S
Drawbacks:
• There are two bottlenecks to be considered to increase the speed of
MAC :
Partial products reduction network
Accumulator

• Since the accumulation has the longest delay in MAC operation, the
independent accumulation operation has been removed and is
merged into the compression process of the partial products.
so that overall MAC performance has been improved.
One of the most advanced types of MAC for general-purpose
Digital Signal Processing has been proposed by Elguibaly.

• critical path was reduced.


• number of input bits to the final adder will be reduced.
• But the output rate will be poor.
Elguibaly’s Architecture:

Fig. Parallel MAC architecture proposed by Elguibaly.


Contd…

n-1

n+1

CSA & Accumulator


P[n-2:0]
Booth Encoder

n+1
n
X

Final Adder
n+2
n n+1
Y C P[2n:n-1]
n+1 n+2
n+1
S

Fig. Hardware structure proposed by Elguibaly


Limitations :

• Even though it has a better performance because of the reduced


critical path, output rate will be poor.

To improve the output rate and performance we will go for


proposed architecture.
Proposed parallel MAC architecture :

Fig. Proposed arithmetic operation of multiplication and accumulation.


Contd…

Fig. Hardware architecture of the proposed MAC.


Characteristics of CSA tree:

Standard Elguibaly’s Proposed


Design design design
Number System 2’s complement 1’s complement 1’s complement

Sign Extension Used Used Not Used

Accumulation Result Data of Result Data of Sum and Carry of


Final Addition Final Addition CSA

CSA Tree FA,HA FA,2-bits CLA FA, HA, 2-bit CLA

Final Adder 2n bits n+2 bits n bits


Table . Calculation of Hardware Resources

Component Standard Elguibaly’s Proposed


Design Design design

General 8-bits General 8-bits General 8-Bits


FA ( n2 / 2 + n ) 40 ( n2/2+2n+3 ) 51 ( n2/2+n/2) 36

HA 0 0 0 0 3n/2 12
2 bit CLA 0 0 ( n/2 -1) 3 n/2 4
4-bit CLA 0 0 0 - n/4 2
Accumulator (2n+1) bits 1 - - - -
CLA
Final adder 2n bits 16 ( n+ 2 ) bits 10 n-bits 8
Disadvantage:

• Delay is more compared to the previous Elguibaly’s architecture.

• But the overall performance is increased if the pipelining concept is


applied for both the Elguibaly’s architecture and proposed
architecture.
Pipelining Scheme:

Fig. Pipelined Hardware structure a) Elguibaly’s design b) proposed design


• LANGUAGE USED: VHDL
• TOOLS REQUIRED: Simulation: modelsim5.8c
• Synthesis: Xilinx 9.1
3.Proposed Architecture:

RTL schematic diagram


5. Proposed Architecture with pipelining:

RTL schematic diagram


ASIC Implementation:
RTL Synthesis diagrams:
1. Elguibaly’s architecture with pipelining:
Contd…

2. Proposed Architecture with pipelining:


1. standard design

2. Elguibaly’s Architecture
3. Proposed Architecture:
4. Elguibaly’s architecture with pipelining:
5. Proposed architecture with pipelining:
Conclusion:

• The MAC unit is proposed and designed by combining a hybrid type


CSA structure and Modified Booth’s Algorithm using Xilinx ISE Design
suite for FPGA implementation and Cadence Semi-Custom Design
Suite for ASIC Design for TSMC 180nm.

• The overall performance parameter of the proposed MAC unit with


pipelining is increased by 49.05 % compared to the Elguibaly’s MAC
unit with pipelining.
Future scope

• The MAC unit can be extended by replacing Booth-2 algorithm with


Booth-3 algorithm. Using Booth-2 algorithm number of partial
products is reduced to half. Similarly using Booth-3 algorithm number
of partial products are reduces to n/3, so that delay will be reduced.
The Booth-3 algorithm extension can be done with an additional cost
of hardware components.
Thank You

You might also like