A Comparative Study of Different Multiplier Designs
Aisha Abdallah
Department of Electrical and Computer Engineering, University of Sharjah
ABSTRACT 2. MULTIPLICATION TECHNIQUES
In this paper, a comparative study of different multiplier There are various techniques to do binary multiplication.
designs has been presented. It shows various Also, there exist many modifications done to these
multiplication techniques and discusses the modification techniques. Binary array multiplier, shown in Figure 1,
that is applied to the multiplication methods. An consists of AND gates, half adders, and full adders. For
overview about four selected multiplier designs is N-bit×N-bit unsigned numbers, it needs N2 AND gates,
provided. They are compared from VLSI design N half adders, and N(N-2) full adders [5]. It uses ripple
performance; speed, power, and area, and the best carry adder in which the carry-out of each adder is
performance is highlighted. The lowest power rippled to be the carry-in for the next adder. To multiply
consumption is shown in selected CSA multiplier, also both unsigned and signed numbers, radix-4 Modified
with the lowest delay with the values 0.198mW and Booth Encoder (MBE) is one of the methods to do that.
6psec, respectively. Hence, it is used for low-power It groups three consecutive bits of the multiplier and
applications. In other hand, the smallest area is achieved matches each group to partial products table, given by
by selected MBE with 4030 μm2. Table 1 [1]. Thus, it reduces number of partial products
and so the numbers of adders. Carry Save Adder (CSA)
multiplier consists of half adders and full adders. It is
similar to ripple carry adder but the carry-out vector is
1. INTRODUCTION saved to be combined with the output of sum later. The
half adder can be designed as depicted in Figure 2,
Arithmetic logic operations, such as multiplication and which is known as multiplexing method. It is built using
addition, are one of the dominant concern in digital CPL (Complementary Pass-transistor Logic) that reduces
signal processing (DSP), multimedia, 3D graphics, number of transistors and consists only of NMOS. Thus,
microprocessor, application-specific integrated circuit it provides low capacitance and yields to high speed (low
(ASIC),…etc [1-3]. In DSP field, the binary delay) and low power. Also, half adder can be
multiplication is the most important operation in the implemented using MCIT (Multiplexing Control Input
hardware. Thus, multiplication method must not Technique). It has less number of transistors comparing
consume much execution time of the hardware. There with CMOS but it is more than CPL [2]. Another type of
are many DSP algorithms that depend on multiplication, multiplier is Braun multiplier, which consists of array of
such as digital filters, Discrete Cosine Transform (DCT), N2 AND gates and array of N(N-1) full adder. It is used
and Fast Fourier Transform (FFT). Searching for ace only for unsigned numbers. The generated partial
multiplier lands on three features; high-speed and low- products are inputs to cascaded CSA. A modified version
power. These features have tradeoff among them, but a of Braun multiplier is called Baugh-Wooley multiplier,
compromise design can be achieved. There are three in which it is used for unsigned and signed numbers
basic steps in multiplication which are partial products represented in 2’complement [2].
generation, partial products additions (reduction), and
final addition [4]. There are many different and modified
architecture designs of multiplier. This paper will 3. OVERVIEW ABOUT FOUR DIFFERENT
present a comparison among different multipliers from DESIGNED MULTIPLIERS
three important considerations of VLSI design which are
power, area, and delay. This paper is organized as Four types of multipliers which are proposed in [1], [2],
follows. Section 2 gives basic background about several [3], and [4] will be discussed and compared from
multiplication techniques. Overview about four different different point of view of VLSI design parameters; area,
designed multipliers is provided in Section 3. Section 4 delay, and power, and used technology. Each design
summarizes the comparison of the discussed multipliers. shows a modification done on the conventional
The conclusion is drawn in Section 5. multiplier to enhance its performances.
1
5 coder. This technique reduces partial products rows
and minimizes the total multiplication operating time
[1].
3.2. Multipliers using a Shannon-based adder
The Shannon-based adder relies on Shannon Theorem,
which basically uses the multiplexers to represent the
function as a sum of two sub-functions in term of one or
more input variables and its negation. In [2], this
proposed type of adder is applied to Braun, Baugh-
Wooley, and CSA multipliers. Figure 4 shows the
Shannon-based adder used in [2]. Using X-OR and X-
NOR gates constructed by CPL offers less number of
Figure 1. Binary array multiplier [5]. transistors in the sum circuit but the overall number of
transistors is large [2]. For the same multiplier type, the
Table 1. Radix-4 Modified Booth Encoder [1] designed adder grabs the best performance among other
Groups of multiplier adders; i.e. MCIT, CPL-12T, and mixed Shannon except
digits Partial products in the area and number of transistors where mixed
000 0 Shannon is preferable. The lowest power dissipation and
001 1*multiplicand delay is obtained by CSA because it uses half-adder
010 1*multiplicand circuit. The significant advantages of the proposed
011 2*multiplicand multipliers show in low power dissipation and high
100 -2*multiplicand speed comparing to other 8x8 multipliers.
101 -1*multiplicand
110 -1*multiplicand
3.3. Pair-wise algorithm
111 0
The introduced pair-wise algorithm in [3] reduces the
8x8 bits complex mathematic multiplication in two steps
without using encoder; generation of partial products and
partial products addition. Each number in multiplication
is separated into even and odd numbers, each one n-bit,
so that the addition of them gives the original number.
Then, multiplication of the two numbers, represented in
even and odd, is done pairwise. The results are Pee, Peo,
Poe, and Poo. Each result has even partial products is
divided into a sum of even and odd and added using 3:2
adder to make reduction in number of partial products.
The result at this stage is A and B. x7y1, x2y7, x2y8, and
Figure 2. Half adder using CPL [2].
x7y2, are sparse bits are to M and N which are originally
13-bit and 15-bit zeros, respectively. Poo is grouped with
3.1. A logarithmic time method of 2’s complement M and N to get another two numbers. The final result is
representation obtained as the output of CLA. The addition step is
depicted in Figure 5. The multiplier consists of full
A fast MBE is achieved by using a new 2’s complement adder, half adder, and CLA. The proposed full adder
representation, given in [1]. It starts by finding circuit is X-OR constructed using CPL-10T. This design
conversion signal which groups, starting from LSB, two gives low power dissipation and high speed as the
consecutive bit each level; 2, 4, 8,…,2n. After each comparison done in [3].
group, if the rightmost digit is ‘1’ in each group then it
masks the left-bits to ‘1’, otherwise they are unchanged. 3.4. Modified binary array multiplier
Then 2’s complement is obtained by adding the final
conversion signal to the input. This leads to have a well- In [5], the traditional binary array multiplier is altered by
organized diamond-shape of partial product tree. The replacing the last full adder with the CLA adder. It
last row in multiplication scheme is 2’s complement with provides a fast computation for the carry-in of each
partial product selection. Figure 3 summarizes the new previous stage. The sum and carry for a four bit adder is
technique to find 2’s complement. Using 3-5 coder expressed as in (1-2), respectively. Equation (3-4)
selects the possible input value using Table 1. The 2’s defines two terms; the generation carry and propagation
complement is generated concurrently with MBE and 3- carry, respectively. Gi occurs when carry-out is generated
2
Pi = Xi ⊕ Yi. (4)
Cout,1 = G0 + P0 · C0 (5)
Cout,2 = G1 + P1 · Cout,1 (6)
Cout,3 = G2 + P2 · Cout,2 (7)
Cout,4 = G3 + P3 · Cout,3. (8)
4. COMPARISON OF THE FOUR TYPES OF
MULTIPLIER DESIGNS
Figure 3. A new technique of finding 2’complement [1]. Different techniques and modifications have been
discussed to preform multiplication. Each one provides
advantages and tradeoffs among several parameter
performances. Power, area, and delay are the three
essential performances in VLSI design. The comparison
among the discussed multipliers is summarized in Table
1 and Table 2. All references simulate and test for 8-
bitx8-bit multiplier except [4]. The lowest delay and
lowest power dissipation is resulted from [2] using CSA
multiplier, in which delay is 6psec and power dissipation
is 0.198mW. Thus, this multiplier is preferable for high
speed and low power devices. Reference [1] is the good
path to implement multiplier in an area consideration
environment, where the occupied area is 4030μm2.
Figure 4. The proposed Shannon-based adder cell [2]
1. CONCLUSION
In this paper, different types of multipliers are compared.
Each one shows a new method or a modification on
+ traditional multipliers. They are implemented using
different technologies. The lowest power consumption is
CSA multiplier [2], also with the lowest delay with the
values 0.198mW and 6psec, respectively. Hence, it is
used for low-power applications. Obviously, the smallest
area is achieved by MBE [1] with 4030 μm2.
Table 1. Comparison of discussed multipliers showing
Figure 5. The partial product addition [3] their types, modification, and process.
Type of
Reference Type of modification Process
multiplier
internally in the full adder. Pi is produced when carry-in Fast multiplication Artisan
appears. For 4-bit adder, this can be shown as (5-8). MBE
[1] in 2’s complement TSMC
(8-bitx8-bit)
The speed in the conventional array multiplier depends representation 0.13μm
on propagation time of carries in all stages, while, using Baugh-
CLA adder, it eliminates the propagation delay of the Wooley
carries realized in the addition. The carry-out bits are (8-bitx8-bit) Shannon-based
90nm
found ahead based on inputs and initial carry-in without [2] Braun adder
CMOS
(8-bitx8-bit)
waiting for the full adder output to be found in each
CSA
stage. Thus, a faster partial product is generated and a
(8-bitx8-bit)
speeder adder is obtained. 6
Pair-wise Metal
Sum = Xi ⊕Yi ⊕ Ci (1) [3] algorithm
10-transistor full
0.18μm
adder cell
(8-bitx8-bit) Digital
Cout,i+1 = Gi + Pi · Ci (2)
CMOS
Gi = Xi·Yi (3) Array
[4] CLA adder -
(4-bitx4-bit)
3
Table 2. Comparison of discussed multipliers from VLSI
design parameter performances.
Power
Type of
consumption Delay (ps) Area (μm2)
multiplier
(mW)
MBE
6.3876 3030 4030
(8-bitx8-bit)
Baugh-
Wooley 0.816 37 6230
(8-bitx8-bit)
Braun
0.307 16 21672
(8-bitx8-bit)
CSA
0.198 6 32316
(8-bitx8-bit)
Pair-wise
algorithm 0.64 1256 14267
(8-bitx8-bit)
Array
68.49 16620 -
(4-bitx4-bit)
REFERENCES
[1] J.-Y. Kang, and J.-L. Gaudiot, “A Simple High-Speed
Multiplier Design,” IEEE Transaction on Computers,
vol. 55, no. 10, pp.1253-1258, Oct. 2006.
[2] C. Senthilpari, K. Diwakar, and A. K. Singh, “High
Speed and High Throughput 8x8 Bit Multiplier using a
Shannon-Based Adder Cell,” in TENCON 2009-2009
IEEE Region 10 Conference, Singapore, pp. 1-5, Jan.
2009.
[3] L. Jayaraju, B. N. Rao, and A. V. Rao, “0.69mW,
700MHz Novel 8x8 Digital Multiplier,” International
Journal of Computer Theory and Engineering, vol. 3, no.
5, pp. 662-665, Oct. 2011.
[4] R. Dhanabal, V. Bharathi, N. Anand, G. Joseph, S.
Oommen, and S. Sahoo, “Comparison of Existing
Multipliers and Proposal of a New Design for Optimized
Performance, ” International Journal of Engineering and
Technology(IJET), vol. 5, no. 2, pp. 1704-1709, May
2013.
[5] S. R. Vaidya, and D. R. Dandekar, “Performance
Comparison of Multipliers for Power-Speed Trade-off in
VLSI Design,” in Recent Advances in Networking, VLSI
and Signal Processing, University of Cambridge, UK,
pp. 262-266, Feb. 2010.