0% found this document useful (0 votes)
29 views5 pages

Low Power and High Speed 8x8 Bit Multiplier Using Non-Clocked Pass Transistor Logic

The document discusses the design and simulation of an 8-bit multiplier circuit using non-clocked pass transistor logic to achieve low power and high speed. It implements a carry save array multiplier technique with full adders designed using multiplexing control input. Simulation results show reductions in power dissipation and propagation delay compared to other pass transistor logic designs.

Uploaded by

Dai Lewis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Low Power and High Speed 8x8 Bit Multiplier Using Non-Clocked Pass Transistor Logic

The document discusses the design and simulation of an 8-bit multiplier circuit using non-clocked pass transistor logic to achieve low power and high speed. It implements a carry save array multiplier technique with full adders designed using multiplexing control input. Simulation results show reductions in power dissipation and propagation delay compared to other pass transistor logic designs.

Uploaded by

Dai Lewis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Conference on Intelligent and Advanced Systems 2007

Low power and high speed 8x8 bit multiplier using


non-clocked Pass Transistor Logic
C.Senthilpari Member IEEE, Ajay Kumar Singh Member IEEE and K.Diwakar Member IEEE
Faculty of engineering &Technology,
Multimedia University,
Jalan Ayer Keroh lama,
75450 Melaka, Malaysia
Email:[email protected]

Abstract: In this paper we have analyzed an 8-bit multiplier circuit applications. Non clocked pass gate logic has been chosen to
using non clocked pass gate families with help of carry save implement the 8-bit multiplier circuit. This multiplier circuit has full
multiplier (CSA) technique. The multiplier cell of the adder is adder circuits, which are designed using multiplexing control input
designed by using pass transistors (n-transistors), p-transistors used techniques [4-6]. We have compared our results with five pass gate
as cross-coupled devices. The adder cell is designed by using logics in terms of speed, area energy per instruction and power
multiplexing control input techniques. A combination of n- and p- dissipation. We have observed that the power dissipation and delay
transistors used on the mirror logic and inverters of full adder circuit. is reduced very much in the designed 8 bit multipliers circuits.
These multipliers are useful in the portable battery operated
multimedia devices for energy efficient. The 8 bit multiplier circuit II. CIRCUIT ARCHITECTURE, IMPLEMENTATION AND LAYOUT
has been simulated using microwind3 VLSI layout CAD tool. We
have analyzed the power dissipation, propagation delay, PDP and EPI
Our multiplier circuit has been designed using by carry save
(energy per instruction) and compared our results with other pass
array multiplier (CSA) techniques. The basic architecture of the CSA
transistor logics as well as published results. From the simulated
multiplier is utilized half adder, AND gate and full adder blocks. The
results it was found that the power dissipation and propagation delay
half adder comprised the EX-OR and AND gates which are
are low in our designed non-clocked pass transistor logics. Our
implemented by pass transistor logic. One efficient implementation
multiplier circuit shows a power dissipation improvement of 97.6%
of the carry save array multiplier is the regular layout of the adder
from Amir et.al and 46.30%, 23.24% and 0.15% from Rizwan et.al.
array [6]. This multiplier is well performed for unsigned/signed
Our multipliers gives better propagation delay compared to Rizwan
operands. The carry save array multiplier is designed using of all full
et.al that are89.56%, 88.39% and 88.31%
adder circuit, which is for regularity of the structures. The full adder
block is designed using by multiplexing control input technique. The
[Keywords: 8 bit multiplier, multiplexing control input techniques,
multiplexing control input techniques has logical control of A and A’,
power dissipation, propagation delay, VLSI CAD tools and Energy
B and B’ inputs into differential node is sum of the outputs and
Per Instruction.]
complement of sum. The input Cin and Cin’ is adding the input of the
differential node of sum and it’s complement. The optional cross
1. INTRODUCTION coupling of the PFET devices are the pull up of output devices
Multiplication is an important fundamental function in whichever is high, from the VDD –VT [6]. The logic tree does not pass
arithmetic logic operation. Since, multiplication dominates the any direct current after the latch sets. Since the inputs drive only the
execution time of most DSP algorithms; therefore high-speed NMOS transistors, the input gat capacitance loading typically three
multiplier is much desired [1]. Currently, multiplication time is still times smaller than CMOS circuits, which require complementary
the dominant factor in determining the instruction cycle time of a NMOS and PMOS transistors to be driven [7]. If input has n variable,
DSP chip. With an ever-increasing quest for greater computing power then we can use n-1 as a control input signal and only one as input
on battery-operated mobile devices, design emphasis has shifted from data. Write an expression to include the product terms of the control
optimizing conventional delay time area size to minimizing power signals and ANDed with the result of the minimized sum of product
dissipation while still maintaining the high performance [2]. The low expression (this will leave either 0, 1, input data, or input data’). The
power and high speed VLSI can be implemented with different logic full adder designed using this method, which is reducing number of
style. The three important considerations for VLSI design is power transistor tremendously [1, 6].
area and delay. There are many proposed logics (or) low power
dissipation and high speed and each logic style has its own
II.1 Architecture of 4-bit multiplier
advantages in terms of speed and power [3-4].
The basic 4x4 bit carry save array multiplier is shown in Fig.1. All
Pass-transistor logic is reported as another alternative logic that
the partial products A and B are computed in parallel, and then
can enhance circuit performance [4]. Since can propagate signals
collected through a cascade of Carry Save array multiplier technique.
using both the source and the gate, its high functionality can reduce
The output of the array is noted in Carry Save, so an additional adder
the number of transistors in terms of multiplexing control input
converts it (by the mean of carry propagation) into the classical
technique, which yields the high performance in the critical path [3-
notation. The completion time is limited by the depth of the carry
4]. As a PTL-based circuit can consist of only one type of MOS
save array, and by the carry propagation in the adder. [7]. Real time
transistor (generally an nMOS transistor), it has a low node
computer application require fast multiplication by utilizing And
capacitance. As a result, PTL enables high-speed and low-power
gates and full adders. Multiplication can be implemented on the
circuits [4].
processor much in the same way as it is done by hand. The carry save
multiplier circuit analyzed for different non-clocked pass transistor
This paper describes the design and simulation results of an 8-bit
adder techniques. The five type of non clocked adder circuit is shown
multiplier based on CMOS design rule for low power and high speed

1374 ~ 1-4244-1355-9/07/$25.00 @2007 IEEE

Authorized licensed use limited to: Purdue University. Downloaded on April 16,2024 at 02:20:27 UTC from IEEE Xplore. Restrictions apply.
International Conference on Intelligent and Advanced Systems 2007

in Fig. 2(a) to Fig. 2 (e). The CPL non clocked pass transistor logic n −1
adder circuits have two kinds of nodes that are differential node and A = ¦ Ai 2i -------- (1)
swing restoration mode. The Swing Restored Pass Gate Logic i=0
(SRPL) has derived from the basic pass transistor logic full adder n −1
circuit, the pass gate restoration node of sum and its complement has
connected with output CMOS inverter reciprocally. The DCVS logic
B= ¦B j 2 j -------- (2)
j =0
with the pass gate is a means of extending the performance benefits
associated with DCVSL into pass gate topologies. Static DCVSL is a The product of A and B is P and it can be written in the following
differential style of logic requiring both true and complementary form
signals to be routed to gates. Two (sum, sum complement (or) carry, n −1 n −1
carry complement) complementary NFET switching trees are P=¦ ¦AB e i j
(i+ j )
------(3)
connected to cross-coupled PFET transistors. Depending on the i =0 j =0
differential inputs, one of the outputs is pulled down by the To illustrate further, the multiplicand A and multiplier B can be
corresponding NFET network. The cross-coupled PFET transistors represented as follows:
then latch the differential output. EEPL reduces power consumption A Multiplicand xn-1xn-2xn-3….x1x0
and delay by interrupting the feedback of the latches forming the load B Multiplier yn-1yn-2yn-3….y1y0
circuit in the structure and allowing reduction in the width of the P Product (A*B) p2n-1p2n-2p2n-3….p1p0
NFET devices which comprise the evaluate tree. The device width Each of the partial product terms Pn =AiYj is called the summand.
reduction further contributes to the power reduction. The circuit Each partial product then gets stored in the arithmetic Logic Unit
action simultaneously provides regenerative positive feedback, register, where it occupies memory space until the final partial
providing shorter delays than comparative CPL circuits. EEPL will product is obtained. [9]
be a valuable logic element in low power applications where
performance is still essential. The Push – Pull Pass transistor Logic
(PPPL) is a new innovation addressed of recovering full VDD levels III. RESULTS AND DISCUSSIONS
while still exploiting the speed of differential pass gate trees. Both
styles, however, the PFET and NFET devices in the load circuit still The 5 types of pass gate multiplier circuit had designed and
must include output node for getting full rail voltage of output signal verified using DSCH3 CAD tools and the layout design and
polarity. Push Pull (Pull up and Pull down resistor) gates tree of the simulation are taken by the microwind 3 CAD tool. The propagation
output node makes the tree into a differential complementary pass delay is calculated for worst-case pattern from11111111x10000000
gate function. The Single – Ended Pass Gate Logic (SEPG), to 11111111x11111111. The output maximum, average currents,
concentrates on synthesizing the function of full logic blocks rather power dissipation and delays are calculated at P15, which has the
than individual logic functions. The other name for the single ended critical path output [10-11]. The propagation delay, power dissipation
pass transistors is LEAP logic. The output of the full adder LEAN and power delay product (PDP) is determined for the various CMOS
buffer circuit produces the output of the circuit. The specialized design rule feature size such as 0.12μm, 0.18μm, 0.25μm and
LEAN inverter is essentially a simple inverter with half-latched 0.35μm. Our simulated result for various pass logic, using
“keeper” PFET devices which brings the inverter input the rest of the multiplexing control input technique as given in table I, which is
way up from VDD – VT to VDD. clearly suggests that the power consumption, propagation delay and
power delay product all are very low for designed circuit than other
pass logic irrespective of the gate length. This shows the superiority
of our designed circuit. The number of transistor reported to be 416
in our 4x4 multiplier circuit whereas Rakshith Krishnappa et.al [10]
has used 528 transistors to implement only a 4-bit multiplier circuit
using CPL. From the table I, it is clear that the CPL is more dominant
in terms of power dissipation and propagation delays. Due to cross-
coupling of inverter in the output node terminal, the CPL has logic
transition from low to high (or) high to low is faster and consumed
energy is low. The CPL circuits are dominant in ultra Deep Sub
Micron process, due to shrinkage of wire capacitance, load
capacitance and stray capacitances.

The high speed and low power Carry Save Array multipliers
provides an improved functionality due to the implementation of the
sum and carry processing logic, improved performance and improved
power usage. The process of multiplying two numbers, A and B,
requires the generation of a set of partial products followed by the
sum of these partial products. The carry save array multiplier is used
in the DSP circuit for increased speed of execution of data
instruction. Code transformations that keep one operand constant or
Fig. 1. 4x4 Carry save array multiplier reduce the number of “1” bits in a carry save array multiplier are
ways that the compiler can change the data to reduce the multiplier’s
Assume that A and B are the two bit numbers where A is energy. To gauge the importance of data, we used cycle accurate
multiplicand and B is the multiplier. The multiplier circuit can be simulation to measure the power in the ALU when each MAC
expressed as follows: [8] operation was active for the workloads [12]. The average power
consumed by each instruction is calculated, when bits are transferred

~ 1375
Authorized licensed use limited to: Purdue University. Downloaded on April 16,2024 at 02:20:27 UTC from IEEE Xplore. Restrictions apply.
International Conference on Intelligent and Advanced Systems 2007

from ALU to DSP circuits. The FFT (Fast Fourier Transformation) Instructions with corresponding speeds for various non-clocked pass
has the largest standard deviation and has the least accurate ALU gate logics are given in Table II. The EPI is increased when the
estimates under all models. The FFT showed a bimodal distribution feature size is decreased. This clearly shows that as instruction passed
of energy in one of its MAC instructions, probably due to a large from the ALU, each instruction consume less power per instruction.
number of multiplications by zero. Improving accuracy for this Due to longer interconnection wire, load capacitance, layout
problem would require moving to a model based on execution traces. capacitance, the power consumption is high in the deep submicron
The non-clocked pass transistor 8x8 bit multiplier Energy Per feature size than ultra deep submicron feature size.

TABLE I. 8X8 BIT MULTIPLIER RESULTS OF POWER DISSIPATION, PROPAGATION DELAY AND PDP OF CPL, DCVS, SRPG, EEPL and
PPPL CIRCUITS

Supply
Feature size Variable CPL DCVS SRPG EEPL PPPL
Voltage
-3 -3 -3 -3
P.Dissipation(PD) Watts 8.081x10 9.532x10 19.631x10 12.732x10 38.99x10-3
0.35μm 3.5V Delay (td) Sec 1.2124 x10-9 1.1181 x10-9 1.096 x10-9 1.0691x10-9 1.0731x10-9
PDP( PD x td) watts-sec 9.7974x10-12 1.0657x10-11 2.1515x10-11 1.3611x10-11 4.184x10-11
P.Dissipation(PD) Watts 4.039x10-3 4.36x10-3 9.699x10-3 9.227x10-3 23.026x10-3
0.25μm 2.5V Delay (td) Sec 7.4267x10-10 7.5280x10-10 7.4267x10-10 7.099 x10-10 7.1708x10-10
PDP( PD x td) watts-sec 2.9996x10-12 3.2822x10-12 7.2031x10-12 6.5502x10-12 1.6511x10-11
P.Dissipation(PD) Watts 1.4138 x10-3 1.512 x10-3 3.673 x10-3 2.496 x10-3 10.738 x10-3
0.18μm 2.0V Delay (td) Sec 1.3539x10-10 1.3539x10-10 4.8527x10-10 4.554 x10-10 4.8157x10-10
PDP( PD x td) watts-sec 1.9141x10-13 2.047x10-13 1.7823x10-12 1.1366x10-12 5.1710x10-12
P.Dissipation(PD) Watts 0.103x10-3 0.293 x10-3 0.137 x10-3 0.478 x10-3 2.565 x10-3
0.12μm 1.2V Delay (td) Sec 1.0273x10-10 1.0273x10-10 3.3674x10-10 3.2170x10-10 3.3349x10-10
PDP( PD x td) watts-sec 1.0581x10-14 3.0099x10-14 4.6133x01-14 1.5377x10-13 8.5540x10-13

TABLE II. 8X8 BIT MULTIPLIER RESULTS OF EPI OF CPL, DCVS, SRPG, EEPL and PPPL CIRCUITS

Feature Variable CPL DCVS SRPG EEPL PPPL


size
035μm Max.Operating frequency MHz 824.81 894.37 912.4 935.36 931.87
(3.5V) EPI pJ 739.92 649.348 622.954 643.014 653.588
025μm Max.Operating frequency GHz 1.346 1.328 1.346 1.408 1.394
(2.5V) EPI pJ 355.643 380.256 255.844 279.595 353.613
018μm Max.Operating frequency GHz 7.386 7.386 2.060 2.195 2.076
(2.0V) EPI pJ 181.527 181.527 104.884 170.894 178.562
012μm Max.Operating frequency GHz 9.737 9.734 2.969 3.108 2.998
(1.2V) EPI pJ 60.258 59.260 31.243 47.460 50.995

TABLE III: POWER COMPARION OF 0.180μm FEATURE SIZE

Power % Propagation % of
Multiplier type
m Watt improvement delay Improvement
Our designed CAS 1.4138 1.3539x10-10
Amir et.al [13] Pipeline 59.91 97.64 ---- ----
Array 2.633 46.30 1.298x10-9 89.56
Rizwan et.al [14] Array architecture-I 1.842 23.24 1.167x10-9 88.39
Array Architecture-II 1.416 0.15 1.159x10-9 88.31

1376 ~
Authorized licensed use limited to: Purdue University. Downloaded on April 16,2024 at 02:20:27 UTC from IEEE Xplore. Restrictions apply.
International Conference on Intelligent and Advanced Systems 2007

Fig 2 (a) CPL

Fig.2(d) PPPL

Fig.2(b) DCVS

Fig. 2 (e) SEPG

Our non-clocked 8x8 bit multiplier circuits, designed by CSA


multiplier technique are compared with other existing author circuits.
The simulated CPL technique of our 8x8 bit multiplier results of our
circuit and other existing authors 8x8 multipliers types are given in
the table III. Our designed circuits shows approximately 98%
improvement in terms of power when it is compared with Amir et.al
[13]’s pipeline circuit. similarly, our circuit shows 46.30%, 23.24%
and 0.15% improvement in terms of power dissipation in comparison
to Rizwan et.al [14]’s different architecture. The propagation delay
percentage improvement of our CPL multiplier circuits with Rizwan
et.al [14] are 89.56%, 88.39% and 88.31% respectively.
Fig.2 (c). EEPL
CONCLUSION

We have designed different non-clocked pass transistor logic with


CSA technique for low power and high performance of multiplier
circuit. We have analyzed power dissipation, PDP, propagation delay
and EPI which are calculated from the simulation results. These non-
clocked pass transistor types multipliers are used in DSP circuits with
inter instruction effects when building instruction-level energy

~ 1377
Authorized licensed use limited to: Purdue University. Downloaded on April 16,2024 at 02:20:27 UTC from IEEE Xplore. Restrictions apply.
International Conference on Intelligent and Advanced Systems 2007

models for a specific DSP design. From the simulated results we have Point Multipliers” IEEE Transaction on very Large Scale
seen that our proposed multiplier circuit is very fast and consumes Integration Systems, Vol. 12, No. 5, May 2004.
less power than other existing multiplier circuit.
[8] Kyung-Ju Cho, Kwang-Chul Lee, Jin-Gyun Chung, and Keshab
REFERENCES K. Parhi, “Design of Low-Error Fixed-Width Modified Booth
Multiplier” IEEE Transaction on very Large Scale Integration
Systems, Vol. 12, No. 5, pp.511-521, May 2004.
[1] Kiat-seng Yeo and Kaushik roy “Low-voltage, low power VLSI
sub systems”Mc Graw-Hill publication. [9] Keoncheol Shin and Taewhan Kim “Tight Integration of
Timing-Driven Synthesis and Placement of Parallel Multiplier
[2] Jong Duk Lee, Yong Jin Yoony, Kyoung Hwa Leez and Byung- Circuits” IEEE Transaction on very Large Scale Integration
Gook Park “Application of Dynamic Pass-Transistor Logic to an Systems, Vol. 12, No. 7, pp. 766-775, July 2004.
8-Bit Multiplier” Journal of the Korean Physical Society, Vol.
38, No. 3, pp. 220-223, March 2001. [10] Rakshith Krishnappa, “Power & Delay Analysis of a 4-bit
Multiplier implemented in CPL, DPL, CVSL & DCVSPG”
[3] C. F. Law, S. S. Rofail, and K. S. Yeo “A Low-Power 16 16-b Illinois Institute of TechnologyECE529, November 2003.
Parallel Multiplier Utilizing Pass-Transistor Logic” IEEE
Journal of Solid-State circuits, Vol.34, No.10, pp.1395-1399, [11] Etienne Sicard “Microwind & Dsch user’s manual” Version2
October 1999. National Institute of Applied Sciences. May 2002.
[4] Oscal T.C. Chen, Sandy Wang, and Yi-Wen Wu “Minimization [12] Chua-Chin Wang, Yih-Long Tseng, Hsien-Chih She, and Ron
of Switching Activities of Partial Products for Designing Low- Hu “A 1.2 GHz Programmable DLL-Based Frequency
Power Multipliers” IEEE Transaction on very Large Scale Multiplier for Wireless Applications” IEEE Transaction on Very
Integration systems. Vol. 11, No. 3, pp.418-433, June 2003. Large Scale Integration (VLSI) systems, Vol.12, No. 12, pp.
1404-1408, December 2004.
[5] Suhwan Kim, Conrad H. Ziesler, and Marios C. Papaefthymiou,
“A True Single-Phase Energy-Recovery Multiplier” IEEE [13] Amir Khatibzadeh and Kaamran Raahemifar “A Novel Design
Transaction on very Large Scale Integration systems, vol 11,No. of a 6-GHz 8 X 8-b Pipelined Multiplier” IEEE, Proceedings of
2,pp.194-207, April 2003. the 9th International Database Engineering & Application
Symposium (IDEAS’05), pp.1-5, 2005.
[6] Kerry Bernstein, Keith M. Carring, Christopher M. Durham,
Patrick R. Hansen, David Hogenmiller, Edward J. Nowak and [14] Rizwan Mudassir and Z. Abid “NEW PARALLEL
Norman.J. Rohrer “High speed CMOS design styles” Kluwar MULTIPLIERS BASED ON LOW POWER ADDERS” 2005
academic publisher. London IEEE CCECE/CCGEI, Saskatoon, pp. 694-697, May 2005.
[7] Nhon T. Quach, Naofumi Takagi, and Michael J. Flynn,
“Systematic IEEE Rounding Method for High-Speed Floating-

1378 ~
Authorized licensed use limited to: Purdue University. Downloaded on April 16,2024 at 02:20:27 UTC from IEEE Xplore. Restrictions apply.

You might also like