Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic: Reto Zimmermann and Wolfgang Fichtner

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO.

7, JULY 1997 1079

Low-Power Logic Styles: CMOS


Versus Pass-Transistor Logic
Reto Zimmermann and Wolfgang Fichtner, Fellow, IEEE

Abstract—Recently reported logic style comparisons based on not included in this work, but can be found elsewhere in the
full-adder circuits claimed complementary pass-transistor logic literature [3].
(CPL) to be much more power-efficient than complementary Section II gives a short introduction to the most important
CMOS. However, new comparisons performed on more efficient
CMOS circuit realizations and a wider range of different logic existing static logic styles and compares them qualitatively.
cells, as well as the use of realistic circuit arrangements demon- Results of quantitative comparisons based on simulations of
strate CMOS to be superior to CPL in most cases with respect different logic gates as well as of a 32-b adder implemen-
to speed, area, power dissipation, and power-delay products. tation are given in Sections III and IV, respectively. Some
An implemented 32-b adder using complementary CMOS has conclusions are finally drawn in Section V.
a power-delay product of less than half that of the CPL version.
Robustness with respect to voltage scaling and transistor sizing,
as well as generality and ease-of-use, are additional advantages II. LOGIC STYLES
of CMOS logic gates, especially when cell-based design and logic
synthesis are targeted. This paper shows that complementary
CMOS is the logic style of choice for the implementation of A. Impact of Logic Style
arbitrary combinational circuits if low voltage, low power, and The logic style used in logic gates basically influences
small power-delay products are of concern. the speed, size, power dissipation, and the wiring complexity
Index Terms— Adder circuits, CPL, complementary CMOS, of a circuit. The circuit delay is determined by the num-
low-voltage low-power logic styles, pass-transistor logic, VLSI ber of inversion levels, the number of transistors in series,
circuit design. transistor sizes (i.e., channel widths), and intra- and inter-
cell wiring capacitances. Circuit size depends on the number
I. INTRODUCTION of transistors and their sizes and on the wiring complexity.
Power dissipation is determined by the switching activity and
T HE increasing demand for low-power very large scale
integration (VLSI) can be addressed at different de-
sign levels, such as the architectural, circuit, layout, and
the node capacitances (made up of gate, diffusion, and wire
capacitances), the latter of which in turn is a function of the
same parameters that also control circuit size. Finally, the
the process technology level [1]. At the circuit design level,
wiring complexity is determined by the number of connections
considerable potential for power savings exists by means of
and their lengths and by whether single-rail or dual-rail logic
proper choice of a logic style for implementing combinational
is used. All these characteristics may vary considerably from
circuits. This is because all the important parameters governing
one logic style to another and thus make the proper choice of
power dissipation—switching capacitance, transition activity,
logic style crucial for circuit performance.
and short-circuit currents—are strongly influenced by the
As far as cell-based design techniques (e.g., standard-cells)
chosen logic style. Depending on the application, the kind
and logic synthesis are concerned, ease-of-use and generality
of circuit to be implemented, and the design technique used,
of logic gates is of importance as well. Robustness1 with
different performance aspects become important, disallowing
respect to voltage and transistor scaling as well as varying
the formulation of universal rules for optimal logic styles. In-
process and working conditions, and compatibility with sur-
vestigations of low-power logic styles reported in the literature
rounding circuitries are important aspects influenced by the
so far, however, have mainly focused on particular logic cells,
implemented logic style.
namely full-adders, used in some arithmetic circuits. In this
paper, these investigations are extended to a much wider set of
logic gates, and with that, to arbitrary combinational circuits. B. Logic Style Requirements for Low Power
The power dissipation characteristics of various existing logic According to the formula
styles are compared qualitatively and quantitatively by actual
logic gate implementations and simulations under realistic cir-
cuit arrangements and operating conditions [2]. Investigations
of sequential elements, such as latches and flip-flops, were the dynamic power dissipation of a digital CMOS circuit
depends on the supply voltage , the clock frequency ,
Manuscript received November 20, 1996; revised January 29, 1997. the node switching activities , the node capacitances , the
The authors are with the Integrated Systems Laboratory, Swiss Federal
Institute of Technology (ETH), CH-8092 Zurich, Switzerland. 1 A robust circuit guarantees correct functioning under a wide range of
Publisher Item Identifier S 0018-9200(97)04363-1. certain conditions.

0018–9200/97$10.00  1997 IEEE


1080 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

node short-circuit currents , and the number of nodes . A that is, decoupling of gate inputs and outputs (i.e., at least one
reduction of each of these parameters results in a reduction of inverter stage per gate) as well as good driving capabilities
dissipated power. However, clock frequency reduction is only and full signal swings at the gate outputs, so that logic gates
feasible at the architecture level, whereas at the circuit level can be cascaded arbitrarily and work reliably in any circuit
frequency is usually regarded as constant in order to fulfill configuration. These properties are prerequisites for cell-based
some given throughput requirement. All the other parameters design and logic synthesis, and they also allow for efficient
are influenced to some degree by the logic style applied. Thus, gate modeling and gate-level simulation. Furthermore, a logic
some general logic style requirements for low-power circuit style should allow the efficient implementation of arbitrary
implementation can be stated at this point. logic functions and provide some regularity with respect to
1) Switched Capacitance Reduction: Capacitive load, orig- circuit and layout realization. Both low-power and high-speed
inating from transistor capacitances (gate and diffusion) and versions of logic cells (e.g., by way of transistor sizing) should
interconnect wiring, is to be minimized. This is achieved be supported in order to allow flexible power-delay tuning by
by having as few transistors and circuit nodes as possible, the designer or the synthesis tool.
and by reducing transistor sizes to a minimum. In particular,
the number of (high-capacitive) inter-cell connections and D. Static Versus Dynamic Logic Styles
their length (influenced by the circuit size) should be kept
A major distinction, also with respect to power dissipation,
minimal. Another source for capacitance reduction is found
must be made between static and dynamic logic styles. As
at the layout level [4], which, however, is not discussed
opposed to static gates, dynamic gates are clocked and work
in this paper. Transistor downsizing is an effective way to
in two phases, a precharge and an evaluation phase. The logic
reduce switched capacitance of logic gates on noncritical
function is realized in a single NMOS pull-down or PMOS
signal paths [5]. For that purpose, a logic style should be robust
pull-up network, resulting in small input capacitances and fast
against transistor downsizing, i.e., correct functioning of logic
evaluation times. This makes dynamic logic attractive for high-
gates with minimal or near-minimal transistor sizes must be
speed applications. However, the large clock loads and the high
guaranteed (ratioless logic).
signal transition activities due to the precharging mechanism
2) Supply Voltage Reduction: The supply voltage and the
result in an excessive high power dissipation. Also, the usage
choice of logic style are indirectly related through delay-driven
of dynamic gates is not as straightforward and universal as
voltage scaling. That is, a logic style providing fast logic gates
it is for static gates, and robustness is considerably degraded.
to speed up critical signal paths allows a reduction of the
With the exception of some very special circuit applications,
supply voltage in order to achieve a given throughput. For that
dynamic logic is no viable candidate for low-power circuit
purpose, a logic style must be robust against supply voltage
design [1], [8], [9] and was therefore not considered any
reduction, i.e., performance and correct functioning of gates
further in this study.
must be guaranteed at low voltages as well. This becomes a
severe problem at very low voltages of around 1 V and lower,
where noise margins become critical [6], [7]. E. Complementary CMOS Logic Style
3) Switching Activity Reduction: Switching activity of a Logic gates in conventional or complementary CMOS (also
circuit is predominantly controlled at the architectural and simply referred to as CMOS in the sequel) are built from
register transfer level (RTL). At the circuit level, large an NMOS pull-down and a dual PMOS pull-up logic net-
differences are primarily observed between static and dynamic work. In addition, pass gates or transmission gates (i.e., the
logic styles. On the other hand, only minor transition activity combination of an NMOS and a PMOS pass-transistor) are
variations are observed among different static logic styles and often used for implementing multiplexers, XOR-gates, and
among logic gates of different complexity, also if glitching flip-flops efficiently (CMOS with pass-gates will be denoted as
is concerned. CMOS ). Any logic function can be realized by NMOS pull-
4) Short-Circuit Current Reduction: Short-circuit currents down and PMOS pull-up networks connected between the gate
(also called dynamic leakage currents or overlap currents) may output and the power lines. Fig. 1(a) and (b) depicts a two-
vary by a considerable amount between different logic styles. input multiplexer gate (MUX2) in pure CMOS (using tristate
They also strongly depend on input signal slopes (i.e., steep inverters) and CMOS with pass gates, respectively. Simple
and balanced signal slopes are better) and thus on transistor monotonic gates, such as NAND/NOR and AOI/OAI, can be
sizing. Their contribution to the overall power consumption realized very efficiently with only a few transistors (A , P ),2
is rather limited but still not negligible ( 10–30%), except one signal inversion level (T ), and a few circuit nodes (P ).
for very low voltages , where the short- Non-monotonic gates, such as XOR and multiplexer, require
circuit currents disappear. A low-power logic style should more complex circuit realizations but are still quite efficient.
have minimal short-circuit currents and, of course, no static Other advantages of the CMOS logic style are its ro-
currents besides the inherent CMOS leakage currents. bustness against voltage scaling and transistor sizing (high
noise margins) and thus reliable operation at low voltages
and arbitrary (even minimal) transistor sizes (ratioless logic).
C. Logic Style Requirements for Ease-of-Use Input signals are connected to transistor gates only, which
For ease-of-use and generality of gates, a logic style should 2 This notation documents the tendency whether circuit area (A), delay (T),
be highly robust and have friendly electrical characteristics, and power (P) are increased (*) or decreased (+) by the mentioned property.
ZIMMERMANN AND FICHTNER: CMOS VERSUS PASS-TRANSISTOR LOGIC 1081

equal due to velocity saturation [11]. Another drawback of


CMOS is the relatively weak output driving capability due to
series transistors in the output stage (T ). This, however, can
be corrected by additional output buffers/inverters which are
inherent in other logic styles.
A more restrictive approach was taken for the design of
low-power low-voltage cells using CMOS branch-based logic
in [4], [6]. Here, the transistor networks consist only of
branches (i.e., a series of up to three transistors between power
(a) (b) line and gate output), thus disallowing the usage of pass-
gates. The advantages of transistor branches are higher layout
regularity (i.e., smaller diffusion capacitances) and simpler
characterization (i.e., branch instead of gate modeling). Other
aspects, such as the design of delay-independent flip-flops,
were addressed in order to face the massively increasing effects
of process, temperature, voltage, and transistor size variations
at very low voltages.

F. Pass-Transistor Logic Styles


The basic difference of pass-transistor logic compared to the
CMOS logic style is that the source side of the logic transistor
networks is connected to some input signals instead of the
power lines. The advantage is that one pass-transistor network
(c) (d)
(either NMOS or PMOS) is sufficient to perform the logic
operation, which results in a smaller number of transistors and
smaller input loads, especially when NMOS networks are used
(A , T , P ). However, the threshold voltage drop (
) through the NMOS transistors while passing a
logic “1” makes swing (or level) restoration at the gate outputs
necessary in order to avoid static currents at the subsequent
output inverters or logic gates. Adjusting the threshold voltages
(i.e., ) as a solution at the process technology
level is usually not feasible for other reasons. In order to
(e) (f)
decouple gate inputs and outputs and to provide acceptable
output driving capabilities, inverters are usually attached to
the gate outputs (A , T , P ). Because the MOS networks
are connected to variable gate inputs rather than constant
power lines, only one signal path through each network
must be active at a time in order to avoid shorts between
inputs. Therefore, each pass-transistor network must realize
a multiplexer structure, which limits the number of logic
functions that can be implemented efficiently.3 Because these
pass-transistor multiplexer structures require complementary
(g) (h) control signals, dual-rail logic is usually used in order to
Fig. 1. Two-input multiplexer in (a) CMOS, (b) CMOS with pass gates, (c) provide all signals in complementary form. As a consequence,
DPL, (d) LEAP, (e) CPL, (f) EEPL, (g) SRPL, and (h) PPL logic style. two MOS networks are again required in addition to the
swing restoration and output buffering circuitry (A , T , P ),
facilitates the usage and characterization of logic cells. The which all in all annihilates the advantage of low transistor
layout of CMOS gates is straightforward and efficient due to count and small input loads of pass-transistor logic. Also, the
the complementary transistor pairs. Basically, CMOS fulfills required double inter-cell wiring increases wiring complexity
all the requirements regarding the ease-of-use of logic gates. and capacitance by a considerable amount (A , P ). A small
An often mentioned disadvantage of complementary CMOS advantage of dual-rail logic is that inverted signals are for
is the substantial number of large PMOS transistors, resulting free. Layout of pass-transistor cells is not as straightforward
in high input loads (P , T , A ). However, the best gate and efficient due to rather irregular transistor arrangements
performance is achieved with a PMOS/NMOS width ratio of and high wiring requirements. Finally, pass-transistor logic
only about 1.5 ( , [10]), and this ratio will decrease with swing restoration circuitry is sensitive to voltage scaling
even further in deep-submicron technologies, where the carrier 3 Note that each logic function can be realized in a multiplexer structure,
drift velocities in NMOS and PMOS transistors become almost but often at a lower circuit efficiency.
1082 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

[12] and transistor sizing with respect to circuit robustness as a dual-rail pass-gate logic, while CMOS is a single-rail
(reduced noise margins), i.e., efficient or reliable operation pass-gate logic.
of logic gates is not necessarily guaranteed at low voltages 4) Single-Rail Pass-Transistor Logic (LEAP): A single-rail
or small transistor sizes. In other words, transistor sizing is pass-transistor logic is proposed in the LEAP logic design
crucial for correct gate operation and therefore more difficult scheme [12]. As opposed to the dual-rail logic styles, only
(ratioed logic). Short-circuit currents are rather large due to single inter-cell wiring and single NMOS networks are re-
competing signals in the swing restoration circuitry. quired (A , T , P ), while the required complementary input
Many different pass-transistor logic styles have been pro- signals are generated locally by inverters [Fig. 1(d)]. Swing
posed recently. The most important ones are now briefly restoration is realized by a fed back pull-up PMOS transistor
summarized. which, however, is slower than the cross-coupled PMOS
1) Complementary Pass-Transistor Logic (CPL): A CPL transistors of CPL working in differential mode. Note also
gate [1], [13] consists of two NMOS logic networks (one for that this swing restoration structure only works for
each signal rail), two small pull-up PMOS transistors for swing , because the threshold voltage drop through the
restoration, and two output inverters for the complementary NMOS network for a logic “1” prevents the NMOS of the
output signals. Fig. 1(e) depicts a two-input multiplexer inverter and with that the pull-up PMOS from turning on.
which represents the basic and minimal CPL gate structure Therefore, robustness at low voltages is only guaranteed if the
(ten transistors). All two-input functions (e.g., AND, OR, threshold voltages are appropriately small. On the other hand,
XOR, ) can be implemented by this basic gate structure, ease-of-use of logic gates and compatibility with conventional
which is relatively expensive for simple monotonic gates such cell-based design is partly provided in this logic style. The
as NAND and NOR. The advantages of the CPL style are the fact that conventional logic networks can be mapped more
small input loads (P , T ), the efficient XOR and multiplexer efficiently onto simple logic gates than on multiplexers is dealt
gate implementations, the good output driving capability due in the LEAP system with a new synthesis approach which
to the output inverters (T ), and the fast differential stage exploits the full functionality of multiplexer structures [12].
due to the cross-coupled PMOS pull-up transistors (T ). This 5) Other Pass-Transistor Logic Styles: Some other pass-
differential stage, on the other hand, leads to considerably transistor logic styles have been proposed. The differential
larger short-circuit currents (P ). Other disadvantages of CPL pass-transistor logic (DPTL) in [17] is a generalized dual-rail
are the substantial number of nodes and high wiring overhead pass-transistor logic structure. It consists of the NMOS pass-
due to the dual-rail signals (P , A ) and the inefficient transistor networks and a buffer circuit for level restoration,
realization of simple gates (i.e., high transistor count, two which can be a clocked precharging buffer (dynamic) or a
signal inversion levels). static buffer (e.g., as in CPL). In the energy economized
2) Swing Restored Pass-Transistor Logic (SRPL): The pass-transistor logic (EEPL) of [18], the sources of the
SRPL style [14] is derived from CPL. Here, the output PMOS pull-up transistors of a CPL gate are connected to
inverters are cross-coupled to a latch structure which performs the complementary output signal instead of [Fig. 1(f)].
swing restoration and output buffering at the same time The reputed advantage of shorter delay and smaller power
[Fig. 1(g)]. Note that the pull-up PMOS transistors are not dissipation compared to CPL, however, could not be confirmed
required anymore and that the output nodes of the NMOS in this work. The push–pull pass-transistor logic (PPL) of
network are also the gate outputs. Because the inverters have [19] can be regarded as a CPL gate without output inverters
to drive the outputs and must also be overridden by the NMOS and with complementary transistors on one signal rail [i.e.,
network, transistor sizing becomes very difficult and results PMOS pass-transistors followed by an NMOS pull-down
in poor output driving capability (T , P ), slow switching transistor, Fig. 1(h)]. Besides its attractively low transistor
(T ), and large short-circuit currents (P ). This becomes even count, switching and output driving characteristics are even
worse when cascading SRPL gates. The resulting series of worse than in SRPL (see Section III), and it does not work
NMOS networks with competing inverters in between leads for .
to very slow switching and unreliable operation. SRPL gates
are highly sensitive to transistor sizing and show acceptable
performance only in very special circuit arrangements (e.g., G. Qualitative Comparisons
no gates in series, small output loads). Some basic logic style characteristics which influence circuit
3) Double Pass-Transistor Logic (DPL): In the DPL style performance and power dissipation are qualitatively compared
[7], [15], [16], both NMOS and PMOS logic networks are in Table I. In particular, the number of MOS logic networks,
used in parallel [Fig. 1(c)]. This provides full swing on the the output driving capabilities, the presence of input/output
output signals (i.e., no level restoration circuitry is needed), decoupling, the need for swing restoration circuitry, the num-
and circuit robustness is therefore high. However, the number ber of signal rails, and the robustness with respect to voltage
of transistors—especially large PMOS transistors—and the scaling and transistor sizing are given for the logic styles
number of nodes is quite high (A , P ), leading to sub- discussed.
stantial capacitive loads (T , P ). The combination of large
PMOS transistors and inefficient dual-rail logic makes DPL III. ANALYSIS OF LOGIC GATES
not competitive compared to other pass-transistor logic styles The efficient implementation of logic gates is a prerequisite
and to complementary CMOS. Note that DPL can be regarded for the realization of well-performing combinational circuits.
ZIMMERMANN AND FICHTNER: CMOS VERSUS PASS-TRANSISTOR LOGIC 1083

TABLE I
QUALITATIVE LOGIC STYLE COMPARISONS

This is especially true for high-speed and low-power appli- Fig. 2. Circuit arrangement for the simulation of full-adders.
cations.

A. Results from the Literature


Various investigations of logic styles with respect to low
power dissipation have recently been carried out and reported
in the literature [1], [12]–[14], [19]–[23]. In all these publica-
tions (except [23]), CPL or related pass-transistor logic styles
are propagated as low-power logic styles. This is basically Fig. 3. Circuit arrangement for the simulation of logic gates.
explained by the fact that CPL gates count fewer transistors,
have smaller transistors and smaller capacitances, and are library development, the investigations have to be extended to
faster than gates in complementary CMOS. a larger set of gates and therefore must include multiplexers
However, some weak points show up in all these in- and simple gates as well. Realistic circuit and simulation setups
vestigations. First, all examinations are based only on full- have to be chosen in order to capture worst case behavior,
adder circuits. This comparison, however, is not representative which is crucial in synchronous designs. In particular, gate
because the critical three-input XOR function of the full- inputs have to be driven by typical gate outputs rather than
adder required for sum bit calculation is perfectly suited for by the simulator. Similarly, gate outputs have to drive typical
implementation in pass-transistor logic due to its multiplexer gate inputs, thus simulating realistic fan-outs. Several gates
structure. On the other hand, the XOR is the logic function have to be cascaded in order to observe their behavior within
with the least efficient implementation in CMOS. Second, multilevel logic circuits. A comprehensive set of input stimuli
rather inefficient CMOS full-adder implementations counting has to be applied during simulation for sensitization of all
40 transistors were used throughout except for [12]. More critical signal paths.
efficient CMOS realizations with only 28 transistors exist An additional aspect to be considered within pass-gate
which perform better with respect to circuit size, speed, and and pass-transistor circuits is the fact that input signals may
power dissipation. connect to transistor gates and transistor sources at the same
Furthermore, full-adders have only limited importance even time. Since current is drawn from a logic gate input at the
in arithmetic circuits. Full-adders or the related 4-2 com- transistor source, switching of that respective signal is slowed
pressors are the basic cells in adder arrays (i.e., carry-save down (i.e., flat signal ramp). If the same signal is connected to
adders) used in multipliers and similar components like di- a transistor gate somewhere else, switching of that transistor
viders. In such applications, efficient full-adder circuits are and of the corresponding logic gate is slowed down as well.
crucial since these building blocks are often the critical ones. For simulation of this effect (referred to as source-gate effect),
However, in simpler arithmetic circuits such as adders, in- such worst case input combinations must be included in the
crementers/counters, and comparators, full-adders are hardly circuit arrangement as well.
used. Most fast adder architectures (e.g., carry-lookahead)
do not use entire full-adders since their function is broken C. Circuit Arrangement and Simulation Conditions
up in order to speed up carry-propagation. Moreover, the The first set of comparisons was carried out on various
greater part of typical circuit applications is made up of simple and complex logic gates. Circuits were designed at the
other (nonarithmetic) combinational functions, which require transistor-level in a standard 0.6- m CMOS process technol-
no full-adders at all. ogy (double-metal, V, V). Layout was
Finally, the simulation conditions and circuit arrangements carried out for all compared logic gates and for the CMOS and
are often not clearly specified. One has to assume that idealistic CPL full-adders. It was done in a standard-cell-like manner
and highly specific rather than realistic and more general using symbolic layout and compaction, which allowed for an
setups are used in some cases. efficient exploration of layout topologies for the different logic
styles. The circuits were simulated using HSPICE at 3.3 V and
B. Improved Investigations 1.5 V, 27 C, 20 MHz, with the capacitances extracted from
For a more general characterization of logic styles with the layout. All possible transition combinations at the gate
respect to low-power circuit implementation and standard-cell inputs were simulated. Worst case gate delays and average
1084 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

(a) (b) (c) (d)

(e) (f) (g)

(h) (i) (j) (k)


Fig. 4. Simulated gates in (a), (c), (e), (f), (h), (l), (p) CMOS, (i), (m) CMOS with pass-gates, and (b), (d), (g), (j), (n), (o) CPL logic style, and (k) Wang’s XOR.

power dissipation (including power from short-circuit currents) [24]). This setup makes sure that all inputs are driven by typ-
were obtained from simulation. PT-products are calculated as ical gate outputs and that all possible gate input combinations
a quality measure for power efficiency, giving the energy are simulated (source-gate effect mentioned above).
consumed by a gate per switching event. Transistors were
sized carefully by hand with the objective of balanced gate
performance, low PT-products, and, to some extent, uniform D. Comparisons and Results
and regular transistor sizes. Most circuits are depicted in 1) Full-Adders (FA): Four different CMOS full-adder cir-
Figs. 1 and 4 with the transistor widths ( ) given in cuits were implemented: the mentioned 28-transistor version
( m, ). [25] [Fig. 4(p)], the often-used 40-transistor version [1], a
Fig. 2 illustrates the circuit arrangement for simulation of version using branch-based gates [26], and a pure pass-gate
the full-adders. Inverters equivalent to the full-adder output version [25]. Pass-transistor full-adders were realized for CPL
inverters are placed at the inputs and wiring capacitances of [Fig. 4(o)], LEAP, EEPL, and DPL. A comparison based on
20 fF attached in order to simulate two full-adders connected actual layout and extracted capacitances was done only for the
in series with a fan-out of one, which is typical for full-adder CMOS and CPL full-adders. Their layout is given in Fig. 5.
applications (e.g., adder arrays, Wallace trees, and ripple- Another set of comparisons comprising all logic styles was
carry adders). This simple circuit setup allows application done without layout and based on estimated diffusion and
of arbitrary signal transition combinations to the full-adder wiring capacitances.
inputs, as well as consideration of output driving and fan-out The simulation results are given in Table II. The compar-
characteristics. isons based on cell layouts basically confirm the better delay
Fig. 3 shows the general circuit arrangement used for all and PT-product values of CPL full-adders at 3.3 V due to
other logic gates. Several gates of the same type are connected the efficient three-input XOR pass-transistor implementation,
in series with a fan-out of two and with typical interconnect while the power dissipation of CMOS and CPL are compa-
loads attached (50 fF, corresponds to three typical cell pitches rable. However, CMOS has a shorter carry-in to carry-out
ZIMMERMANN AND FICHTNER: CMOS VERSUS PASS-TRANSISTOR LOGIC 1085

(l) (m) (n)

(o) (p)
Fig. 4. (Continued.) Simulated gates in (a), (c), (e), (f), (h), (l), (p) CMOS, (i), (m) CMOS with pass-gates, and (b), (d), (g), (j), (n), (o) CPL logic
style, and (k) Wang’s XOR.

delay ( ) at 3.3 V as well as overall shorter delays transistor count. Note that in all these circuit implementations,
and comparable PT-products at 1.5 V. Similar results were power and delay can be traded off by a considerable amount
reported recently in [23]. Also, the layout size of the CMOS through transistor sizing, while the PT-products remain fairly
full-adder is considerably smaller due to the smaller number of constant, except for minimum-sized transistors where PT-
transistors and, in particular, due to a higher circuit regularity products become typically larger.
(i.e., complementary transistors are easy to layout) and smaller 2) Logic Gates: Two sets of comparisons on logic gates
number of wires (single-rail). were carried out based on the cells’ layout. The first set
The comparisons without cell layouts show a higher per- includes two-input multiplexers (MUX2) for all different logic
formance advantage of CPL over CMOS full-adders. This styles. The circuits are given in Fig. 1 and the results sum-
again documents the worse layout efficiency of CPL. The 28- marized in Table III. Here, the multiplexer in complementary
transistor CMOS full-adder performs considerably better than CMOS outperforms all other implementations with respect to
the 40-transistor version and the other CMOS implementations circuit delay, power, PT-product, and layout size, despite the
in terms of circuit speed, power dissipation, or both. EEPL relatively high transistor count. It is far more efficient than any
proves to be comparable, but not better than CPL, from which pass-transistor solution, also with respect to layout (Fig. 6).
it is derived. The single-rail pass-transistor logic style used This is remarkable since multiplexers are actually the domain
in LEAP does not work at 1.5 V (i.e., , of pass-transistor logic. CPL is the best performing pass-
as mentioned earlier), and its superiority over CMOS [12] at transistor logic style and, in particular, the fastest one. Again,
higher voltages could not be confirmed. Finally, DPL is not EEPL has worse performance than CPL, and the additional sig-
competitive compared to CMOS and CPL due to the very high nal connections required in EEPL gates are sometimes difficult
1086 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

TABLE II
FULL-ADDER COMPARISONS

TABLE III
MULTIPLEXER COMPARISONS (ALL LOGIC STYLES)

to layout. LEAP is quite power-efficient but rather slow. DPL of MUX4 in CPL was only achieved by relaxing the cell layout
is comparable to CPL in all respects. Finally, SRPL and PPL rules (i.e., all input metal-2 wires lead only to one side of the
suffer from the weak output driving capability and the missing cell). Otherwise, its layout size would have been dominated
input–output decoupling, resulting in increasingly slow signal by the large number of input/output wires and thus have been
ramps through a series of gates and, as a consequence, in much larger. CMOS also proves to be less sensitive to voltage
high short-circuit currents. This is illustrated by the simulated scaling than CPL. The delays increase by a smaller amount and
waveforms of Fig. 7 and confirms the well-known fact that the PT-product ratios get better for CMOS when scaling down
gates without input–output decoupling cannot be connected in to 1.5 V. Finally, pure CMOS also performs better than the
series to form arbitrary circuits without inserting buffers every combination of CMOS and pass-gates (CMOS ), which is one
few gates. This, however, makes these logic styles difficult basic advantage of branch-based logic [4]. Also, a reduction
to use, and they hardly yield better circuit performance than of short-circuit currents in CMOS compared to pass-gate logic
logic styles with inherent input–output decoupling in each was reported in [23], when comparing tristate inverter selectors
gate. (CMOS) with pass-gate selectors (CMOS ). The two CMOS
In the second set of gate investigations, the following logic implementations of AND4 further demonstrate that the decom-
position of complex gates into simpler ones often improves
gates were compared between CMOS and CPL: two-input
performance [4], but not always (see CMOS implementations
NAND (NAND2), four-input AND (AND4), three-input and-
of full-adder). Complex gate decomposition minimizes the
or-invert/or-and-invert (AOI/OAI), two- and four-input multi-
number of series transistors (i.e., simpler gates)—an important
plexers (MUX2, MUX4), and two-input XOR [Figs. 4(a)–(n)].
aspect at low supply voltages—at the cost of additional signal
The results are given in Table IV. In most cases, complemen-
inversion levels (i.e., more gates).
tary CMOS clearly outperforms CPL with respect to circuit
delay, power dissipation, power-delay product, and layout
size. This especially holds true for the simple gates (NAND2, E. Discussion
AND4, AOI/OAI). The only exceptions are the MUX4 and Among the pass-transistor logic styles, CPL proves to have
XOR gates where CPL is faster at 3.3 V. The small layout area the best performance values and lowest power-delay products.
ZIMMERMANN AND FICHTNER: CMOS VERSUS PASS-TRANSISTOR LOGIC 1087

TABLE IV
LOGIC GATES COMPARISONS (CMOS AND CPL)

(a)

(a)

(b)
Fig. 6. Layout of (a) CMOS and (b) CPL two-input multiplexer.

in most circuit applications. CMOS also shows the highest


robustness and smallest sensitivity to transistor and voltage
(b) scaling, which was also documented in [23].
Fig. 5. Layout of (a) CMOS and (b) CPL full-adder.
IV. ANALYSIS OF ADDERS
Only the single-rail style of LEAP is a viable alternative if Binary adders are good examples for circuit performance
lower power and compatibility with cell-based design are of comparisons because they include a balanced combination of
concern. different logic gates and make up the crucial building blocks
Complementary CMOS, however, proves to be superior to in many circuit applications.
all pass-transistor logic styles in performance for all logic
gates, with the exception of the full-adder at higher supply A. Adder Architecture and Implementation
voltages. The advantages of efficient circuit and layout imple- Adder architecture investigations carried out on cell-based
mentation of simple gates, the absence of swing restoration designs showed the best circuit performance measures for the
circuitry, and the single-rail logic property are predominant class of parallel-prefix adders (carry-lookahead adders), with
1088 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

TABLE V
32-b ADDER COMPARISONS

Fig. 7. Simulation waveforms for two-input multiplexer in CMOS, CPL, and SRPL logic style (@ 1.5 V).

the one using the parallel-prefix structure by Sklansky [27]


resulting in the fastest adder circuit implementations [28], [29].
This seems also to hold true for transistor-level circuits, since
the area-efficient but slower Manchester chains as a transistor-
level alternative do not fit well into the parallel-prefix adder
structure.
A 32-b adder was realized in a 0.5- m CMOS process
using the unbounded fan-out parallel-prefix adder structure
of Fig. 8. One level of buffers was inserted for driving
the nodes with large fan-outs and thus for fan-out decou-
pling on the critical paths (i.e., speed-up). Since the pre-
fix carry-propagation can be realized using AOI/OAI-gates Fig. 8. Buffered parallel-prefix adder structure.
or multiplexers, the more efficient variant was chosen for
each logic style. That is, the CMOS implementation makes
use of the efficient AOI/OAI-gates while the CPL solution speed. Note that these adder architectures do not contain
uses two-input multiplexers. Transistors were sized for high any full-adder circuits, and that the three-input XOR’s are
ZIMMERMANN AND FICHTNER: CMOS VERSUS PASS-TRANSISTOR LOGIC 1089

split into two two-input EXOR’s, one in the preprocess- choice for low-power, low-voltage implementation of arbitrary
ing and one in the postprocessing stage. The adders were combinational circuits and for design automation—i.e., low-
simulated at 2.8 V, 110 C, and 100 MHz with estimated power synthesis and cell-based design—also, particularly
wiring capacitances (layout topology taken into account). in the future [10]. However, other logic styles, such as
The worst case delay on the critical path as well as aver- CPL, may still be viable candidates for low-power high-
age power dissipation on a set of random data was mea- speed implementation of dedicated circuit applications like
sured. multipliers.

B. Results and Discussion


ACKNOWLEDGMENT
Table V gives the comparison results. The CMOS solution
The authors would like to thank Dr. H. Kaeslin for his
is about 20% slower than the CPL version, but has a much
encouragement, valuable suggestions, and careful reviewing.
smaller transistor count and dissipates less than 1/3 the power.
They would also like to thank Dr. N. Felber for his support and
A CPL version with downsized transistors still consumes
the reviewers for their constructive comments. This work was
twice as much power and is slower than CMOS. The CMOS
partly done in collaboration with R. Gupta and D. Fisher from
adder has 41% fewer transistors and 29% fewer circuit nodes
the DSP Device Design Group of Rockwell Semiconductor
than the CPL version. The reasons for the greater power
Systems in Newport Beach, CA, USA.
dissipation of the CPL adder are basically the larger switched
capacitance (more transistors, dual-rail wiring), larger short-
circuit currents (differential swing-restoration circuitry), REFERENCES
and a higher average switching activity than was observed [1] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS
in the CMOS version. On the other hand, the CMOS adder Design. Norwell, MA: Kluwer, 1995.
takes advantage of the efficient implementation of the simple [2] R. Zimmermann and R. Gupta, “Low-power logic styles: CMOS versus
CPL,” in Proc. 22nd European Solid-State Circuits Conf., Neuchâtel,
AOI/OAI-gates used for carry-propagation and of the single- Switzerland, Sept. 1996, pp. 112–115.
rail interconnects. Note that the inaccuracies from wiring [3] J. Yuan and C. Svensson, “New single-clock CMOS latches and flipflops
estimation can be regarded as considerably smaller than the with improved speed and power savings,” IEEE J. Solid-State Circuits,
vol. 32, pp. 62–69, Jan. 1997.
observed differences in circuit performance. [4] C. Piguet, J.-M. Masgonty, P. Mosch, C. Arm, and V. von Kaenel,
For comparison, the performance figures of a low-power “Low-power low-voltage standard cell libraries,” in Proc. Low Volt-
age–Low Power Workshop, ESSCIRC’95, Lille, France, Sept. 1995.
high-performance 32-b conditional-sum adder implementation [5] R. Rogenmoser, H. Kaeslin, and N. Felber, “The impact of transistor
using the DPL style are given from the literature [30]. sizing on power efficiency in submicron CMOS circuits,” in Proc. 22nd
European Solid-State Circuits Conf., Neuchâtel, Switzerland, Sept. 1996,
pp. 124–127.
V. CONCLUSIONS [6] C. Piguet, J.-M. Masgonty, S. Cserveny, and E. Dijkstra, “Low-
power low-voltage digital CMOS cell design,” in Proc. PATMOS’94,
In our investigations, CPL was found to be the most efficient Barcelona, Spain, Oct. 1994, pp. 132–139.
pass-transistor logic style. Complementary CMOS, however, [7] N. Ohkubo et al., “A 4.4 ns CMOS 54 2 54-b multiplier using
pass-transistor multiplexer,” IEEE J. Solid-State Circuits, vol. 30, pp.
proves to be superior to CPL in all respects with only few 251–257, Mar. 1995.
exceptions. An interesting alternative is represented by the [8] P. Ng, P. T. Balsara, and D. Steiss, “Performance of CMOS differential
single-rail pass-transistor logic and the proposed synthesis ap- circuits,” IEEE J. Solid-State Circuits, vol. 31, pp. 841–846, June
1996.
proach used in LEAP in order to better exploit the multiplexer [9] K. Chu and D. Pulfrey, “A comparison of CMOS circuit techniques:
structure of pass-transistor logic. Differential cascode voltage switch logic versus conventional logic,”
The advantages of high functionality with few pass- IEEE J. Solid-State Circuits, vol. 22, pp. 528–532, Aug. 1987.
[10] J. M. Rabaey, Digital Integrated Circuits. Englewood Cliffs, NJ:
transistors and of small input capacitances in the CPL style are Prentice-Hall, 1996.
partially undone by the need for swing restoration circuitry, [11] N. Arora, MOSFET Models for VLSI Circuit Simulation. Wien, Aus-
tria: Springer-Verlag, 1993.
dual-rail encoding, and the resulting wiring overhead. The [12] K. Yano, Y. Sasaki, K. Rikino, and K. Seki, “Top-down pass-transistor
investigation results presented show that—for all simple and logic design,” IEEE J. Solid-State Circuits, vol. 31, pp. 792–803, June
complex logic gates except the full-adder, and under realistic 1996.
circuit conditions—complementary static CMOS performs
2
[13] K. Yano et al., “A 3.8-ns CMOS 16 16-b multiplier using comple-
mentary pass-transistor logic,” IEEE J. Solid-State Circuits, vol. 25, pp.
much better than CPL and other pass-transistor logic styles 388–393, Apr. 1990.
if low power is of concern. CMOS also compares favorably [14] A. Parameswar, H. Hara, and T. Sakurai, “A swing restored pass-
transistor logic-based multiply and accumulate circuit for multimedia
with regard to circuit speed and layout efficiency. Its single- applications,” IEEE J. Solid-State Circuits, vol. 31, pp. 805–809, June
rail property is crucial for saving routing resources, which 1996.
[15] M. Suzuki, N. Ohkubo, T. Yamanaka, A. Shimizu, and K. Sasaki, “A
is an important issue in submicron VLSI. Its robustness 1.5ns 32b CMOS ALU in double pass-transistor logic,” in Proc. 1993
against transistor downsizing and voltage scaling allows the IEEE Int. Solid-State Circuits Conf., Feb. 1993, pp. 90–91.
efficient power optimization of noncritical signal nets and [16] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLSI Design:
Circuits and Systems. Norwell, MA: Kluwer, 1995.
of entire circuit components. As a matter of fact, circuit [17] J. H. Pasternak and C. A. T. Salama, “Differential pass-transistor logic,”
robustness is becoming a key aspect in deep-submicron VLSI, IEEE Circuits & Devices, pp. 23–28, July 1993.
where variation ranges of many process and environment [18] M. Song, G. Kang, S. Kim, and B. Kang, “Design methodology for
high speed and low power digital circuits with energy economized pass-
parameters will increase massively [24]. This, together with transistor logic (EEPL),” in Proc. 22nd European Solid-State Circuits
its ease-of-use, makes complementary CMOS the logic style of Conf., Neuchâtel, Switzerland, Sept. 1996, pp. 120–123.
1090 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

[19] W.-H. Paik, H.-J. Ki, and S.-W. Kim, “Push-pull pass-transistor logic Reto Zimmermann received the Dipl. Ing. degree
family for low-voltage and low-power,” in Proc. 22nd Europ. Solid-State in computer science from the Swiss Federal Institute
Circuits Conf., Neuchâtel, Switzerland, Sept. 1996, pp. 116–119. of Technology (ETH), Zurich, Switzerland, in 1991.
[20] T. Kuroda and T. Sakurai, “Overview of low-power ULSI circuit He is currently working toward the Ph.D. degree in
techniques,” IEICE Trans. Electron., vol. E78-C, pp. 334–344, Apr. electrical engineering.
1995. He joined the Integrated Systems Laboratory of
[21] K. Shimohigashi and K. Seki, “Low-voltage ULSI design,” IEEE J. ETH in 1991 as Research and Teaching Assistant.
Solid-State Circuits, vol. 28, pp. 408–413, Apr. 1993. He was involved in the implementation of VLSI
[22] I. S. Abu-Khater, A. Bellaouar, and M. I. Elmasry, “Circuit techniques components for cryptographic and spread-spectrum
for CMOS low-power high-performance multipliers,” IEEE J. Solid- systems and in the design and synthesis of arithmetic
State Circuits, vol. 31, no. 10, pp. 1535–1546, Oct. 1996. units for cell-based VLSI. His research interests
[23] M. Izumikawa et al., “A 0.25-m CMOS 0.9-V 100-MHz DSP core,” include digital VLSI design and synthesis, high-speed and low-power cir-
IEEE J. Solid-State Circuits, vol. 32, pp. 52–61, Jan. 1997. cuit techniques, computer arithmetic, computer-aided design, and artificial
[24] J. D. Meindl, “Gigascale integration: Is the sky the limit?,” IEEE Circuits intelligence.
& Devices, vol. 12, pp. 19–32, Nov. 1996.
[25] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design.
Reading, MA: Addison-Wesley, 1985.
[26] J.-M. Masgonty, C. Arm, and C. Piguet, “Technology- and power- Wolfgang Fichtner (M’79–SM’84–F’90) received
supply-independent cell library,” in Proc. IEEE Custom Integrated the Dipl. Ing. degree in physics and the Ph.D.
Circuits Conf., San Diego, CA, May 1991, pp. 25.5/1–4. degree in electrical engineering from the Technical
[27] J. Sklansky, “Conditional sum addition logic,” IRE Trans. Electron. University of Vienna, Austria, in 1974 and 1978,
Comput., vol. EC-9, pp. 226–231, June 1960. respectively.
[28] R. Zimmermann and H. Kaeslin, “Cell-based multilevel carry-increment From 1975 to 1978, he was an Assistant Professor
adders with minimal AT- and PT-products,” submitted to IEEE Trans. in the Department of Electrical Engineering, Techni-
VLSI Syst. cal University of Vienna. From 1979 through 1985,
[29] R. Zimmermann, “Non-heuristic optimization and synthesis of parallel- he worked at AT&T Bell Laboratories, Murray Hill,
prefix adders,” in Proc. Int. Workshop on Logic and Architecture Syn- NJ. Since 1985 he is Professor and Head of the
thesis, Grenoble, France, Dec. 1996, pp. 123–132. Integrated Systems Laboratory at the Swiss Federal
[30] I. S. Abu-Khater and R. H. Yan, “A 1-V low-power high-performance Institute of Technology (ETH). In 1993, he founded ISE Integrated Systems
32-bit conditional sum adder,” in Proc. 1994 IEEE Symp. Low Power Engineering AG, a company in the field of technology CAD.
Electron., San Diego, Oct. 1994, pp. 66–67. Dr. Fichtner is a member of the Swiss National Academy of Engineering.

You might also like