0% found this document useful (0 votes)
90 views6 pages

Multiplier Vlsi Paper

This document summarizes a research paper that proposes a new design for Wallace tree multipliers using parallel prefix adders (PPAs) to improve speed. The paper describes how traditional Wallace tree multipliers perform additions in stages to calculate partial product sums. It then introduces five PPA designs - Kogge Stone, Sklansky, Brent Kung, Ladner Fischer, and Han Carlson - that can be used in the final addition stage for faster performance. Verilog is used to implement and test the proposed PPA-based Wallace tree multiplier designs. Analysis shows the new designs have improved speed over traditional designs without degrading area.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views6 pages

Multiplier Vlsi Paper

This document summarizes a research paper that proposes a new design for Wallace tree multipliers using parallel prefix adders (PPAs) to improve speed. The paper describes how traditional Wallace tree multipliers perform additions in stages to calculate partial product sums. It then introduces five PPA designs - Kogge Stone, Sklansky, Brent Kung, Ladner Fischer, and Han Carlson - that can be used in the final addition stage for faster performance. Verilog is used to implement and test the proposed PPA-based Wallace tree multiplier designs. Analysis shows the new designs have improved speed over traditional designs without degrading area.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE - 49239

Design and analysis of High speed wallace tree


multiplier using parallel prefix adders for VLSI
circuit designs
Yamini devi Ykuntam Katta Pavani Krishna Saladi
Department of ECE Department of ECE Department of ECE
Aditya Engineering college Aditya Engineering college Aditya Engineering college
Surampalem,India Surampalem,India Surampalem,India
[email protected] [email protected] [email protected]

Abstract— Major operation block in any processing unit is speed of addition is going to affect the operation of the
a multiplier. There are many multiplication algorithms are multiplication.
proposed, by using which multiplier structure can be designed.
Among various multiplication algorithms, Wallace tree In order to improve the performance of multiplication
multiplication algorithm is beneficial in terms of speed of operation, the adder structure used in design of Wallace tree
operation. With the advancement of technology, demand for multiplier has a major role. In this paper, a new structure of
circuits with high speed and low area is increasing. In order to Wallace tree multiplier is proposed in which PPAs are used
improve the speed of Wallace tree multiplier without to add final row of partial products with the previous stage
degrading its area parameter, a new structure of Wallace tree generated sum and carry out terms to generate final product
multiplier is proposed in this paper. In the proposed structure, terms. The PPAs are designs which are originally derived
the final addition stage of partial products is performed by from carry look ahead adder concept of generating and
parallel prefix adders (PPAs). In this paper, five Wallace tree propagating of carry bits.
multiplier structures are proposed using Kogge stone adder,
Sklansky adder, Brent Kung adder, Ladner Fischer adder and In PPAs, a carry generation tree is present which
Han carlson adder. All the multiplier structures are designed generates carry for all preceding stages which improves the
using Verilog HDL in Xilinix 13.2 design suite. The proposed speed of operation. The carry generation tree mainly consists
structures are simulated using ISIM simulator and synthesized of two components-black cell and grey cell. The black cell
using XST synthesizer. The proposed designs are analyzed with and grey cell are interconnected to from carry tree network.
respect to traditional multiplier design in terms of area (No. of Carry generation tree block is also called as parallel carry
LUTs) and delay (ns). generation block as it generates carry bits for all stages at a
time parallel[6]. There are different types of PPAs[7] whose
Keywords—Wallace tree multiplier, Parallel Prefix classification mainly depends on two factors-
adders(PPAs), Kogge Stone adder, Sklansky adder, Brent-Kung
adder, Ladner Fischer adder, Han Carlslon adder i) Number of black and grey cells in carry generation tree
I. INTRODUCTION ii) Interconnection of black and grey cells in carry
generation tree.
At present, the technology is advancing very rapidly in
very short duration of time. The circuits being design have Kogge stone adder, Sklansky adder, Brent-Kung adder,
some billions of components with low area, high speed and Ladner-Fischer adder and Han Carlson adder are some of the
low power consumption. Hence area, speed and power plays PPAs used in the design of proposed Wallace tree multiplier
crucial role in the design of any circuit [1], [2]. In order to in this paper.This paper is organized as into various sections
satisfy the current trend demand a circuit must be designed –section II explains the in detailed operation of traditional
with low area and less delay constraints.Arithmetic units are Wallace tree multiplier. Section III is composed of many sub
major blocks in any processing units which perform various sections which describes in depth operation of PPA, different
arithmetic operations [3]. Multiplication operation is PPAs used in the design of multiplier and the structure of
important among all arithmetic operations. Several proposed multiplier using PPAs. The result analysis is clearly
multiplication algorithms are studied in literature survey of explained in section IV. Section V is conclusion and future
multiplier designs like Binary multiplier, array multiplier, scope of design proposed in this paper.
Booth’s multiplier, Dadda multiplier, Wallace tree multiplier
[4]. Wallace tree multiplier is advantageous in different types II. TRADITIONAL WALLACE TREE MULTIPLIER
of multipliers[5]. Most extensively used multiplier design in many
The operation of Wallace tree multiplier is same in the processors and memory units is Wallace tree structure[4].
first stage of multiplication which is generating partial The Wallace tree multiplication process majorly has two
products. In the second stage, Wallace tree multiplier adds phases. In phase 1, the input numbers are applied to AND
first three rows partial products. Then the generated sum and gate to produce partial products. These partial products are
carry are added with the next row of partial products. This added in step by step process by using half and full adders in
addition process continues until the generation of final phase 2 to obtain final product output. In detailed
products. For this row-wise addition process half and full multiplication process of Wallace tree multiplier is
adders are employed. Thus, adders are playing a very explained through Fig.1 for input size of 4-bits.
important role in generation of final product terms. The

11th ICCCNT 2020


July 1-3, 2020 - IIT - Kharagpur
Kharagpur,
Authorized licensed use limited to: Carleton University. Downloaded India29,2020 at 15:12:41 UTC from IEEE Xplore. Restrictions apply.
on November
IEEE - 49239

A3 A2 A1 A0 A3B0 A2B0 A1B0 A0B0


B3 B2 B1 B0 A3B1 A2B1 A1B1 A0B1
A3B0 A2B0 A1B0 A0B0 A3B2 A2B2 A1B2 A0B2
A3B1 A2B1 A1B1 A0B1
A3B2 A2B2 A1B2 A0B2 A3B2 S3 S2 S1 S0 A0B0
A3B3 A2B3 A1B3 A0B3 C3 C2 C1 C0

Phase 1 Partial products generation Phase 2.1 Performing Addition of first three rows of
partial products

A3B2 S3 S2 S1 S0 A0B0 A3B3 S7 S6 S5 S4 S0 A0B0


C3 C2 C1 C0 C7 C6 C5 C4
A3B3 A2B3 A1B3 A0B3 C10 C9 C8

A3B3 S7 S6 S5 S4 S0 A0B0 C11 S11 S10 S9 S8 S4 S0 A0B0


C7 C6 C5 C4
Phase 2.3 Performing Addition of phase 2.2 result to
Phase 2.2 Performing Addition of row 4 partial products
obtain final product
with phase 2.1 result

Fig 1. Various Phases of multiplication in a Wallace tree multiplier

The phase 1 comprises of generation of partial products In first block which is also called as pre-processing block,
through multiplying every bit of given input numbers with propagate(Pi) and generate(Gi)signals are generated by
each other. Four rows of partial products are generated as
the size of input is 4- bits. The phase 2 comprises of many Pi= A XOR B Where A and B are
sub phases of addition of the partial products obtained in Gi=A AND B input bits
phase. The addition operation is carried out using half and
full adders. Initially in phase 2, the addition operation is
performed on first three rows of partial products generated Propagate and generate signal generation block
in phase 1 which generates result of two rows having sum
terms in first row and carry terms in second row. Then, the
last row of partial products of phase 1 result is added with
the sum and carry row which again result two rows Parallel carry generation tree
comprising of one sum row and one carry row. To acquire
final product, the sum and carry row are added.
In entire process of multiplication, addition process
holds major role. To perform fast addition, carry must be Sum generation block
propagated quickly. A Wallace tree multiplier using Carry
select adder designed in [8]. But, carry propagation delay is
more in this existing methodology which is the major Fig. 2. Parallel prefix adder block diagram
drawback. To prevail over this drawback, Parallel prefix
adders are used in place of half and full adders in final stage
of addition in phase 2 of this multiplier. The main Pk-1:j Gk-1;j P i:k Gi:k
motivation of this proposed design is to achieve Wallace Pi:k Gk-1:j Gi:k
tree multiplier architecture with high speed than the existing
designs.
III. PROPOSED WALLACE TREE MULTIPLIER
A. Parallel Prefix adders:
In this paper, Wallace tree multiplier structure is modified
by using parallel prefix adders to add partial products in
final phase addition process to obtain final product. The
reason behind to use PPA in place of full adders is to
CP i:j CGi:j
improve the speed of operation. In PPA, the carry input for CG i:j

the next bits is generated at a time with the help of parallel


prefix carry tree which consists of black cells and grey cells. Fig.3. Black cell Fig.4. Grey cell
There are many types of PPAs are present whose basic
design idea is originated from carry look ahead adder. The
PPA consists of three main blocks as shown in fig 2.

11th ICCCNT 2020


July 1-3, 2020 - IIT - Kharagpur
Kharagpur,
Authorized licensed use limited to: Carleton University. Downloaded India29,2020 at 15:12:41 UTC from IEEE Xplore. Restrictions apply.
on November
IEEE - 49239

The second block is parallel carry generation tree which b) Sklansky adder:
generates carry bits for all input bits with the help of grey
The structure of carry prefix network have minimum
and black cell to which inputs are Pi and Gi signals generated
logic depth but at the cost of high fan out for some
in first block. The black cell is used to calculate carry
computation nodes. The delay of the structure is given by
generate and propagate signals by
log2n with n/2 log2 n computation nodes. This adder fan out
CPi: j = Pk-1:j and Pi:k increases hugely from the inputs to outputs along the critical
path leading for large amount of latency[14]. Due to this as
CGi: j = Gi: k + 1 or (Pi:k and Gk-1;j )
the number of input bits increases, adder performance
The grey cell used to calculate only carry generate signal degrades. Fig.7. shows 16-bit Sklansky adder structure
by which consists of 17 black cells and 15 grey cells which
implies its area also less when compared to Kogge stone
CGi: j = Gi: k + 1 or (Pi:k and Gk-1;j )
adder but the fan out of computation node is increasing
The third block is sum generation block used to generate which increases latency in adder operation.
sum by performing XOR operation of propagate signal and c) Brent Kung adder:
carry signal generated from the block 2 i.e, This adder has less computation nodes but hold
Si=Pi XOR Ci-1 maximum depth which accounts for increased latency[10]-
[16]. The interconnection complexity of black and grey cells
In this paper, five different PPAs are used to design is less when compared to Kogge stone adder. The delay of
Wallace tree multiplier. The five different PPAs are - the structure is given by [2(log2 n)-2] with [2n-2-log2 n]
1.Kogge stone adder 2.Sklansky adder 3.Brent Kung adder computation nodes. Fig.8. shows 16-bit Brent Kung adder
4.Han Carlson adder and 5. Ladner Fischer adder. These structure which consists of 12 black cells and 15 grey cells
five adders’ structures are same only the second block varies which implies its area also less when compared to Kogge
in terms of black and grey cells and their interconnections. stone and Sklansky adders. The structure of this adder is
a) Kogge stone adder: simpler when compared to Kogge stone adder. Also fan out
Kogge Stone adder is very attractive for high-speed for this structure is also less when compared to Sklansky
applications which come at the cost of high area and more adder.
power[9]-[14]. The delay of the structure is given by log2 n A[15:0] B[15:0]
with [n(log2 n)-n+1] computation nodes. This adder has
regular layout and a controlled fan-out of two. The
drawback of this adder is designed circuit will be complex Generate and propagate signal generation block
with large number of interconnects. Fig.6. shows 16-bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Kogge stone adder structure which consists of 34 black cells
and 15 grey cells.
A[15:0] B[15:0]

Generate and propagate signal generation block


15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

C15C14 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0

Pi[15:0] C[15:0]

C15 C14 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0

Pi[15:0 C[15:0] Sum generation block


]

Sum generation block


COUT Sum [15:0]

COUT Sum [15:0] Fig.7. Structure of 16-bit Sklansky adder


Fig.6. Structure of 16-bit Kogge stone adder

11th ICCCNT 2020


July 1-3, 2020 - IIT - Kharagpur
Kharagpur,
Authorized licensed use limited to: Carleton University. Downloaded India29,2020 at 15:12:41 UTC from IEEE Xplore. Restrictions apply.
on November
IEEE - 49239

A[15:0] B[15:0]
A[15:0] B[15:0]

Generate and propagate signal generation block


15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Generate and propagate signal generation block
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

C15 C14 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0


C15C14 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0
P i[15:0] C[15:0]
Pi[15:0] C[15:0]

Sum generation block


Sum generation block

COUT Sum [15:0]

COUT Sum[15:0]
Fig.8. Structure of 16-bit Brent kung adder

Fig.10.Structure of 16-bit Ladner Fischer adder


A[15:0] B[15:0]
d) Ladner Fischer adder:
This adder is designed from Sklansky adder design
Generate and propagate signal generation block having delay of(log2 n)+1 with [(n/2)log2 n] computation
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
nodes[14]. Fig.10. shows 16-bit Ladner Fishner adder
structure which consists of 12 black cells and 15 grey cells
which is same as in case of Brent Kung adder but it has less
number of stages then Brent kung adder so this adder will
have more speed then brent Kung adder.
e) Han Carlson adder:
This adder is combination of Kogge stone and Brent
Kung adder which is nothing but this adder is a hybrid
design[17]. The second block consists of five stages in
which the first stage is like Brent Kung adder and the
middle three stages are like Kogge stone adder. The delay of
the structure is given by (log2 n)+1 with [(n/2)log2 n]
computation nodes. Fig.9. shows 16-bit Han Carlson adder
C1 5C14 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0
structure which consists of 17 black cells and 15 grey cells
which implies its area and interconnection complexity is less
P i[15:0] C[15:0] then Kogge stone adder but its logic depth is more which
increases delay.
Sum generation block B. Wallace tree multiplier using PPAs:
In this paper, four different Wallace Tree multiplier
structures are designed using four different PPAs namely
Kogge stone adder, Brent Kung adder, Han Carlson adder
COUT Sum [15:0] and Ladner Fischer adder. These adders are used in final
addition process in phase 2 to improve the performance of
the operation of the multiplier.
Fig 9.Structure of 16-bit Han Carlson adder

11th ICCCNT 2020


July 1-3, 2020 - IIT - Kharagpur
Kharagpur,
Authorized licensed use limited to: Carleton University. Downloaded India29,2020 at 15:12:41 UTC from IEEE Xplore. Restrictions apply.
on November
IEEE - 49239

The multiplier operation is same which is performed in


two phases as explained in section 2. In phase 1, partial
products are generated with the help of AND gates. In Phase
2 partial products generated in phase 1 are added in step by
step approach using half and full adders. In proposed design
the final phase of addition of partial products in phase 2 is
performed using PPA.
Fig.14. Simulation wavefrom for 16-bit Wallace tree multiplier using Brent
Kung adder
A[15:0] B[15:0]

Partial product Generation (Using AND gates)

Additon of partial products using half and full adders Fig.15. Simulation wavefrom for 16-bit Wallace tree multiplier using
Ladner Fischer adder

Addition of final row partial products using


PPA(Kogge stone adder/Brent Kung adder/Han
Carlson Adder/Ladner Fischer adder)

Fig.16. Simulation wavefrom for 16-bit Wallace tree multiplier using Han
Carlson adder
Final output of product
Table 1. Area(No. of LUTs) and delay(ns) of different Wallace tree
Fig.11. Block diagram of proposed Wallace tree multiplier using PPAs multiplier structures using different PPAs
Area(No. of
Input size Multiplier structure Delay(ns)
IV. SIMULATED AND SYNTHESIZED RESULT LUTs)
Traditional Wallace tree
ANALYSIS multiplier
590 36.7

Five Wallace tree multiplier structures using five Wallace tree multiplier
634 29.44
using Kogge stone adder
different PPA’s (Sklansky adder, Kogge stone adder, Brent
Wallace tree multiplier
Kung adder, Han Carlson adder and Ladner Fischer adder 599 30
using Sklansky adder
are designed for input size of 16-bits in this paper using 16-bit
Wallace tree multiplier
598 32.37
Verilog HDL. All the designed multipliers are simulated and using Brent kung adder
synthesized using Xilinix ISE 13.2. The simulation Wallace tree multiplier
596 31.57
using Ladner fischer adder
waveforms for proposed multipliers are shown in fig12- Wallace tree multiplier
fig16. The proposed multiplier’s synthesis result is 601 30.19
using Han carlson adder
compared with the traditional Wallace tree multiplier in
terms of area (number of LUTs) and delay (ns).
As shown in table 1, total six multipliers i.e., the
proposed multipliers using PPAs in this paper are five and
one is traditional multiplier are synthesized using Xilinx
XST synthesizer. The synthesis report consists of area
details in terms of number of LUTs occupied and delay in
terms of nano seconds. From table 1, it can be seen that
Wallace tree multiplier using Kogge stone adder is having
least delay but it has more number of LUTs occupied when
Fig.12. Simulation wavefrom for 16-bit Wallace tree multiplier using compared to other structures.
Kogge stone adder
The traditional multiplier is having highest delay and
number of LUTs occupied is less when compared to other
adders but the difference of LUTs of proposed and
traditional structure is not more which can be ignored. The
Wallace tree multiplier using ladner fischer adder is having
least number of LUTs occupied where as its delay is
medium i.e, 31.57ns which is less when compared with the
Fig.13. Simulation wavefrom for 16-bit Wallace tree multiplier using traditional structure.
Sklansky adder

11th ICCCNT 2020


July 1-3, 2020 - IIT - Kharagpur
Kharagpur,
Authorized licensed use limited to: Carleton University. Downloaded India29,2020 at 15:12:41 UTC from IEEE Xplore. Restrictions apply.
on November
IEEE - 49239

V. CONCLUSION AND FUTURESCOPE [12] David H. K. Hoe, Chris Martinez and Sri Jyothsna Vundavalli,
“Design and Characterization of Parallel Prefix Adders using
Total six multiplier structures are designed in this paper FPGAs”,IEEE 43rd Southeastern Symposium on System Theory
in which five are proposed Wallace tree multipliers using (SSST), USA, pp 168 – 172, March 2011.
five different PPAs and one is traditional Wallace tree [13] Jasmine Saini, Somya Agarwal and Aditi Kansal, “Performance,
Analysis and Comparison of Digital Adders”, Proc. IEEE Int.
multiplier. In terms of delay parameter it is achieved where Conference on Advances in Computer Engineering and Applications
as for area parameter the difference of LUTs occupied by (ICACEA), Ghaziabad, India, pp 80 – 84, 2015.
traditional and proposed structures is not more which intend [14] Sudheer Kumar Yezerla and B Rajendra Naik, “Design and
that area parameter is also achieved up to some extent which Estimation of delay, power and area for Parallel prefix adders”, Proc.
IEEE 2014 RAECS UIET Panjab University, Chandigarh, March
can be seen in table 1. In proposed multipliers, Wallace tree 2014.
multiplier using Kogge stone adder is having least delay but [15] R.Brent and H.Kung, “A regular layout for parallel adders”, IEEE
its area parameter is high when compared to other structures Transaction on Computers, vol. C-31, no.3, pp 260 – 264, March
of multipliers in this paper. In terms of area parameter, 1982.
[16] K.Babulu, Y.Gowthami,” Implementation and Performance
Wallace tree multiplier using Ladner fischer adder is having Evaluation of Prefix Adders using FPGAs”, IOSR Journal of VLSI
least area among proposed structures. and Signal Processing, Vol. 1, Iss.1, pp 51 - 57, 2012.
[17] S. Sri Katyayani, Dr.M.Chandramohan Reddy and Murali.K, “Design
So it can be concluded that the proposed multipliers are of Efficient Han-Carlson-Adder”, International Journal of Innovations
better in delay and area when compared with the traditional in Engineering and Technology, Special issue on ETiCE, pp 69 – 75,
multiplier structure. Among the proposed structures, 2016.
Wallace tree multiplier using Ladner fischer adder can be
used for application circuits which require less area and
speed of the circuits will also be in medium. Whereas for
high speed application circuits, Wallace tree multiplier
using Kogge stone adder can be used but costing little bit
more area when compared to other proposed structures. The
proposed multiplier structures can be used in high speed and
low area application circuits basing on requirement. As
future scope of this work, the multiplier structures can be
designed by increasing input size(I.e. 32-bit,64- bit……).
By designing the multiplier circuit with higher input size
may lead to multiplier structure with less number of LUTs
and also better delay value can be achieved.
REFERENCES
[1] N.Weste and D. Harris, CMOS VLSI Design. Reading, MA: Addison
Wesley, 2004.
[2] N. Weste and K. Eshragian, Principles of CMOS VLSI Designs: A
System Perspective, 2nd ed., Addison-Wesley, 1985-1993.
[3] Milos D. Ercegovac and Thomas Lang, “Digital arthimetic,” Morgan
Kaufmann, Elsevier INC, 2004
[4] J. M. Rabaey, Digtal Integrated Circuits—A Design Perspective.
Upper Saddle River, NJ: Prentice-Hall, 2001.
[5] Himanshu Bansal, K. G. Sharma, Tripti Sharma,” Wallace Tree
Multiplier Designs: A Performance Comparison Review“,Innovative
Systems Design and Engineering, Vol.5, No.5, 2014.
[6] Nehru, K., A. Shanmugam, and S. Vadivel. "Design of 64-bit low
power parallel prefix VLSI adder for high speed arithmetic circuits."
In 2012 International Conference on Computing, Communication and
Applications, pp. 1-4. IEEE, 2012.
[7] Rakesh.S,K.S.Vijula Grace, “A comprehensive review on the VLSI
design performance of different Parallel Prefix Adders”
ScienceDirect, Materials Today: Proceedings 11 (2019) 1001–1009
[8] Kesava, R. Bala Sai, B. Lingeswara Rao, K. Bala Sindhuri, and N.
Udaya Kumar. "Low power and area efficient Wallace tree multiplier
using carry select adder with binary to excess-1 converter." In 2016
Conference on Advances in Signal Processing (CASP), pp. 248-253.
IEEE, 2016.
[9] Sunil M, Ankith R D, Manjunatha G D and Premananda B S, “Design
and Implementation of Faster Parallel Prefix Kogge-Stone adder”,Int.
Journal of Electrical and Electronic Engineering &
Telecommunications, Vol.4, No.1, pp 116-121, 2015.
[10] Raghumanohar Adusumilli and Vinod Kumar K, “Design and
Implementation of a high speed 64 bit Kogge-Stone adder using
Verilog HDL”, Int. Journal of Electrical and Electronic Engineering
& Telecommunications, Vol.3, No.1, pp 13 – 18, 2014.
[11] Nurdiani Zamhari, Peter Voon, Kuryati Kipli, Kho Lee Chin and
Maimun Huja Husin, “Comparison of Parallel Prefix Adder (PPA)”,
Proc.World Congress on Engineering 2012 (WCE 2012) Vol II,
London, July 2012.

11th ICCCNT 2020


July 1-3, 2020 - IIT - Kharagpur
Kharagpur,
Authorized licensed use limited to: Carleton University. Downloaded India29,2020 at 15:12:41 UTC from IEEE Xplore. Restrictions apply.
on November

You might also like