INTEGRATION, The VLSI Journal: Arkadiy Morgenshtein, Viacheslav Yuzhaninov, Alexey Kovshilovsky, Alexander Fish

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Full-Swing Gate Diffusion Input logicCase-study of low-power

CLA adder design


Arkadiy Morgenshtein, Viacheslav Yuzhaninov, Alexey Kovshilovsky, Alexander Fish
n
Faculty of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel
a r t i c l e i n f o
Article history:
Received 15 July 2012
Received in revised form
8 February 2013
Accepted 24 April 2013
Keywords:
Alternative logic family
Carry Look Ahead (CLA) adder
Full-Swing GDI
Gate Difusion Input (GDI)
Low power
a b s t r a c t
Full Swing Gate Diffusion Input (FS-GDI) methodology is presented. The proposed methodology is
applied to a 40 nm Carry Look Ahead Adder (CLA). The CLA is implemented mainly using GDI full-swing
F1 and F2 gates, which are the counterparts of standard CMOS NAND and NOR gates. A 16-bit GDI CLA
was designed in a 40 nm low power TSMC process. The CLA, implemented according to the proposed
methodology, presents full functionality and robustness under global and local process variations at
wide range of supply voltages. Simulation results show 2 area reduction, 5 improvement in dynamic
energy dissipation and 4 decrease in leakage, with a slight (24%) degradation in performance, when
compared to the CMOS CLA. Advanced design metrics of GDI cells, such as minimum energy point (MEP)
operation and minimum leakage vector (MLV), are discussed.
& 2013 Elsevier B.V. All rights reserved.
1. Introduction
Power consumption and area reduction of logic and memory
have become primary focuses of attention in VLSI digital design
[16]. Power is the limiting factor in both high performance
systems and portable applications. Die area directly affects the
device size and cost. Since the introduction of the standard CMOS
Logic in early 80s, many design solutions have been proposed
to improve power dissipation, area and performance of digital
VLSI chips.
Gate Diffusion Input (GDI) design methodology was introduced
as a promising alternative to Static CMOS Logic [7]. Originally
proposed for fabrication in Silicon on Insulator (SOI) and twin-well
CMOS processes, GDI methodology allowed implementation of a
wide range of complex logic functions using only two transistors
[7]. It was shown, that area and dynamic power of GDI combina-
torial and sequential logic were signicantly reduced, as compared
to standard CMOS implementations. Similarly to existing alter-
natives to CMOS, such as Pass Transistor Logic (PTL), the GDI gates
presented reduced voltage swing at their outputs due to threshold
drops. These drops usually cause degradation in performance and
increased short circuit power [8]. However, since the GDI circuits
were implemented with much less transistors, a signicant power
overall power reduction was observed, while maintaining minimal
performance penalty.
Recently, it was shown that any GDI circuit can be implemented
in a standard CMOS process [8]. The efciency of the GDI method
for both combinatorial and sequential logic was shown by many
groups [815]. Various combinatorial circuits, such as adders,
multipliers, comparators, and counters, were implemented in
processes from 0.8 mm down to 65 nm. GDI Flip-Flops were also
presented, showing improvements in both area and power, com-
pared to existing Flip Flop styles.
In this paper we present an efcient methodology for digital
circuits implementation. The proposed methodology was applied
to a 16-bit adder in low power standard 40 nm TSMC process. CLA
adder architecture, which was originally proposed as an alterna-
tive for the speed enhancement of a simple ripple adder, was
chosen as a benchmark circuit in this work. The proposed CLA
implementation utilizes improved full-swing GDI F1 and F2 gates,
which are the counterparts of standard CMOS NAND and NOR
gates. The CLA design is compared with our previously shown GDI
methodology [8], which utilizes swing-restoring buffers with
selective application of high-Vth transistors, as well as with
standard CMOS implementation. Simulation results show CLA
functionality and robustness under global and local process varia-
tions. The CLA presents 2 area reduction and 45 power
reduction, compared to the conventional CMOS implementation.
The contributions of this paper are as follows: (1) The GDI full
swing methodology for a standard nanoscaled CMOS technology is
presented; (2) Leakage reduction in GDI is discussed through
minimum leakage vector (MLV) analysis; (3) GDI robustness is
evaluated through statistical Monte Carlo simulations; and (4) Low
voltage operation of the GDI cells, including minimum energy
operation (MEP) is shown.
Contents lists available at SciVerse ScienceDirect
journal homepage: www.elsevier.com/locate/vlsi
INTEGRATION, the VLSI journal
0167-9260/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.vlsi.2013.04.002
n
Corresponding author. Tel.: +972 54 8044144; fax: +972 3 7384051.
E-mail addresses: [email protected], [email protected] (A. Fish).
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
INTEGRATION, the VLSI journal ()
The paper is organized as follows: Section 2 overviews the
GDI methodology and presents its benets and limitations. The
proposed CLA implementation is discussed in Section 3. Section 4
presents simulation results of the proposed GDI CLA in 40 nm
standard CMOS process, comparing them to the CMOS CLA.
Section 5 concludes the paper.
2. Overview of GDI
The basic GDI cell is shown in Fig. 1. At the rst glance, the GDI
cell, which consists of only two transistors, resembles the conven-
tional CMOS inverter. However, contradictory to the inverter, it
contains three inputs: G (common gate input of both the nMOS
and the pMOS), P (input to the source/drain of the pMOS), and N
(input to the source/drain of the nMOS).
It was shown that multiple Boolean functions can be imple-
mented by a simple GDI cell, as demonstrated in Table 1. This is
achieved by a change of the input conguration of the GDI cell.
While implementation of most of these functions is relatively
complex (612 transistors) in Static CMOS, it is very efcient (only
2 transistors) with the GDI cells. The Multiplexer (MUX) is the
most complex function that can be implemented with a basic GDI
cell, while being the most efcient function as compared to CMOS
implementation.
GDI gates may suffer from threshold voltage drops which
reduce current drive and therefore affect the performance of the
gate. These drops also increase direct-path static power dissipation
in the cascaded inverters, used for swing restoration. It was shown
that these effects can be signicantly reduced by using swing-
restoration buffers with a multiple VTH approach [7], herein
named MVT. This approach suggests using low threshold transis-
tors in all paths where a voltage drop is expected. This way,
the voltage drop at the output will be minimal. In addition, all
regenerative inverters are implemented using high threshold
transistors. This combination allows minimization of the direct-
path static power in the inverters.
Most of today's static digital designs are based on CMOS NAND
and NOR gates. The reasons for this are known and well explored.
Both NAND and NOR gates are implemented using only four
transistors and each one of these functions is a universal set. The
GDI method, which is very efcient for implementation of various
gates, such as MUX, AND, OR (see Table 2), has similar number of
transistors in NAND/NOR gates implementation as standard CMOS
methodology. However, the GDI technology provides alternative
basic functions, F1 and F2. Consisting only of two transistors (one
GDI cell), each one of these functions represents a universal set.
Moreover, it was shown in [7] that F1 and F2 functions can be used
to synthesize other functions more efciently than the NAND
and NOR gates. Fig. 2 shows a comparison of number of various
functions that can be implemented using the same number of F1
and NAND gates. The strength of the F1 and F2 gates will be also
demonstrated by implementation of CLA using mainly these GDI
gates (see Section 3).
3. GDI CLA adder design
3.1. Full-swing GDI cells
In this paper we propose full-swing (FS) GDI cells. The
proposed technique utilizes a single swing restoration (SR) tran-
sistor to improve the output swing of F1 and F2 GDI gates. Fig. 3
shows the structure of full swing F1 and F2 cells. As can be seen,
the SR transistor is activated only in cases when the Vth drop may
occur at the output. Since in F1 and F2 gates, the output VTH drop
can occur only at one of the logical levels (VTH instead of 0 V in F1,
and VDDVTH instead of VDD in F2), only a single SR transistor is
required to ensure the full swing operation.
In cases where the gate input signal of GDI cell has an inverted
representation in the circuit, it can be used to control the swing
restoring transistor. This transistor will have a diffusion input similar
to the diffusion input of GDI, but will be of an opposite type (nMOS
for F1, and pMOS for F2). In this manner, the diffusion input signal
will pass through a pair of transistors of both types (in the transistor
of original GDI cell, and the complementary SR transistor). The
FS GDI cells are efcient alternative for swing restoration buffers,
in designs where inverted signals can be obtained as part of logic
function implementation. The CLA adder presented in the next sub-
section is a good example of such design. An example of a logic chain
in CLA, containing full-swing GDI cells, is shown in Fig. 4.
3.2. CLA implementation
Two GDI versions of a 16-bit CLA adder were implemented in
this work: one using swing restoration buffers with multiple Vth
(MVT GDI), and the second one with FS GDI gates.
A similar conventional CLA architecture, shown in Fig. 5, was
used to implement all the CLA versions. The circuit-level imple-
mentation of MVT and FS GDI adders was different, in order to
address the specic properties of each technique. The implemen-
tation was based mostly on F1 and F2 cells. The detailed GDI
implementation of various CLA blocks is given in (1)(4) and
is depicted in Table 2, assuming the logical functions F
1
a; b ab
and F
2
a; b a b.
The pg unit outputs were implemented as follows:
p ab; GDI XOR gate
g ab
F
2
a; b; MVT GDI
F
1
b; a; FS GDI
(
1
Fig. 1. Basic GDI cell.
Table 1
Boolean function synthesis through input conguration of a simple GDI cell
N P G Out Function
0 B A
AB
F1
B 1 A
A B
F2
1 B A A B OR
B 0 A AB AND
C B A
AB AC
MUX
0 1 A
A
NOT
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 2
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
Table 2
The transistor-level design of CLA blocks.
Unit MVT GDI Full-Swing (FS) GDI
pg
PG
LCG
CLA
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 3
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
where a similar GDI XOR gate was used in both versions, as will be
explained below.
The PG unit implementation is the following:
P p
1
p
0
F
1
p
1
; p
0

G g
1
g
0
p
1

F
1
g
1
; F
2
g
0
; p
1
; MVT GDI
F
2
F
1
p
1
; g
0
;g
1
; FS GDI
(
2
where output P has similar implementation in both the versions.
The local and global carry generator blocks are implemented
according to:
C
loc
g
0
p
0
C
in

F
1
g
0
; F
2
C
in
; p
0
; MVT GDI
F
2
F
1
p
0
; C
in
; g
0
; FS GDI
(
3
C
out
G
1
P
1
G
0
P
1
P
0
C
in
F
2
P
1
G
0
G
1
; P
1
P
0
C
in

F
2
F
2
P
1
G
0
; G
1
; F
1
P
1
P
0
; C
in

F
2
F
2
F
1
P
1
; G
0
; G
1
; F
1
F
1
P
1
; P
0
; C
in
4
The signals P
1
; G
1
and P
0
; G
0
in (4) represent the outputs from
top hierarchy level CLA blocks.
It should be noted that in the MVT implementation, the
application of high-Vth (HVT) transistors in swing restoration
buffers is selective and depends on the Vth drop that may occur
Table 2 (continued )
Unit MVT GDI Full-Swing (FS) GDI
Last CLA
Inverter with HVT nMOS transistor, Inverter with HVT pMOS transistor, Inverter with both, HVT transistors.
Note: F1* and F2* stand for GDI F1 and F2 full swing gates.
0 1 2 3 4
0
5
10
15
20
25
30
35
40
45
Number of Cells
N
u
m
b
e
r

o
f

F
u
n
c
t
i
o
n
s
CMOS NAND cell vs. GDI F1 cell
21
41
CMOS
GDI
2
8
4
8
16
Fig. 2. Number of various functions that can be implemented using the same
number of F1 and NAND cells (after [7]).
Fig. 3. Scheme of the FS GDI gates.
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 4
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
in the path between two buffers. In case that the Vth drop at buffer
input may occur only at high (low) voltage, then an asymmetric
buffer will be used with high-Vth nMOS (pMOS) transistor. All
other transistors in the buffers will remain low-Vth (LVT) to
maintain the performance. In all the GDI cells, the transistor that
may cause a Vth drop, will be LVT in order to minimize the Vth
drop. Other GDI transistors will be standard-Vth (SVT).
The FS GDI implementation is based on the F1 and F2 cells, as
described in Fig. 3. The implementation did not require addition of
inverters for driving the SR transistors. All the inverted signals that
were used in SR transistors appeared inherently in the functional
implementation. In MUX cell implementation, a couple of com-
plementary SR transistors was used at the output, making the cell
similar to a PTL MUX.
In both the versions the XOR implementation was optimized.
The basic implementation of a GDI XOR gate, as was proposed
in [1], consists of 4 transistors comprising a GDI cell used as a
MUX, and an inverter. However, in complex circuits, the diffusion
input to GDI XOR may already have an inverted representation
elsewhere in the circuit. Thus, instead of implementing an inverter
again as part of GDI XOR, we can use both signals as diffusion
inputs to GDI MUX while maintaining the same functionality.
This allows a signicant decrease in number of transistors in
circuits with high number of XOR/XNOR gates. An example of the
optimization in GDI CLA adder can be seen in implementation of
the p function in Table 2. This optimization allowed reducing the
number of transistors in GDI design.
Both versions of GDI implementation were compared with
standard CMOS design of CLA adder. In order to maintain optimal
design, the XOR circuits in CMOS were implemented using the
PTL technique. Table 3 summarizes the transistor count and area
estimation of both GDI designs vs. CMOS counterpart. Note, all
CMOS gates were implemented using minimum sized transistors
with a standard ratio between pull up and pull down networks, i.e.
2. GDI F1, F2 and XOR gates were sized similarly to a CMOS
inverter (2). Supplementary SR transistors were: minimum
sized NMOS and double sized PMOS. The area estimation is
normalized with respect to W
min
L
min
.
It can be clearly seen that both GDI implementations have
signicant advantage in terms of transistors count and area,
as compared to CMOS. While FS GDI implementation provides
full swing and improved performance (as will be shown in the
following sections), it implies about 40% area increase as com-
pared to MVT GDI. Still, it occupies only half of area as compared to
CMOS design.
3.3. Leakage elimination in GDI
As was shown in [7], the unique structure of the GDI cell
provides signicant reduction of both the sub-threshold and the
gate leakage components, as compared to a static CMOS gate.
Fig. 5. The general CLA architecture.
Table 3
Transistor count and area estimation comparison between GDI and Static CMOS designs.
Design Unit
pg PG LCG CLA Last CLA Total
CMOS 18 18 12 30 34 934
(41) (35) (24) (59) (77) (2039)
MVT GDI 8 8 8 16 24 438
(12) (12) (12) (24) (36) (657)
FS GDI 11 11 10 21 31 627
(16) (16) (15) (31) (46) (925)
Transistors count (Area [W min L min]).
F *
F *
Fig. 4. Example logic chain in CLA, containing full-swing GDI cells.
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 5
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
Since the sub-threshold leakage is still dominant, it is addressed
here in more details.
A general GDI cell eliminates the sub-threshold leakage in half
of all possible states. This is contrary to static CMOS gates, where
the pull-up and the pull-down networks are always connected to
the supply voltage and ground, respectively.
Here we demonstrate this advantage by analyzing the minimum
leakage vector (MLV) of a basic two-bit CLA module, consisting of
PG and LCG blocks. The following input vector provides the minimal
leakage in the two-bit FS-GDI CLA
v
in
!
g
0
; g
1
; p
0
; p
1
; C
in
1; 1; 0; 1; 0
As can be seen in Fig. 6, when the input vector is applied to LCG
block, four transistors are turned off (M
1
, M
3
, M
4
and the inverter
nMOS). However, as the inputs C
in
and g
o
are connected to
diffusions of turned-off transistors, the potential at both diffusion
nodes of the transistors is similar. This leads to elimination of sub-
threshold leakage in transistors M
1
, M
3
and M
4
.
Similar effect is observed when the input vector v
in
is applied
to the PG block, as shown in Fig. 7. As can be seen, due to the
diffusion input connections, three out of ve turned-off transistors
have zero potential between the diffusion nodes. In this case
transistors M
2
, M
8
and M
9
exhibit zero sub-threshold leakage.
4. Comparative simulation results
The proposed 16-bit CLA circuits were designed in 40 nm TSMC
process with supply voltage varying from deep sub-threshold
values of 100 mV up to nominal values of 1.1 V. The designs were
simulated using SPICE based Virtuoso simulator. Both GDI designs
were compared to the CMOS counterpart in terms of performance,
power consumption, area estimation and sensitivity to process
variations.
4.1. Nominal voltage operation
The comparative results of performance, static power, energy
per operation and energy-delay product (EDP) in CMOS and GDI
implementations are presented in Table 4.
DelayAs can be seen, the CMOS design has the shortest delay
among all implementations. The FS GDI implementation shows a
30% delay increase as compared to CMOS. The MVT GDI is about
three times slower than CMOS due to voltage drops. The perfor-
mance improvement in FS GDI is achieved due to better driving
capabilities of the modied F1 and F2 gates.
Static power consumptionAccording to the results presented
in Table 4, the static power of both GDI designs is signicantly
lower than in CMOS design. One of the reasons for leakage
reduction in GDI is the reduced transistor count. In addition, as
shown in previous section, GDI benets from an inherent sub-
threshold leakage reduction in half of the input vectors, leading to
a zero potential between the diffusion inputs of GDI cell [7]. The FS
GDI implementation presents reduction in static power consump-
tion. as compared to MVT GDI. This is achieved by maintaining full
swing at output nodes of all GDI cells, and therefore eliminating
the direct-path currents.
Dynamic Energy and EDPTable 4 shows a 56 reduction of
dynamic energy consumption per operation in both GDI circuits
as compared to CMOS. The main reason for this is the reduced
switching capacitance in GDI. Note that although CMOS presents
an advantage over GDI by means of delay, it can be clearly seen
that the EDP metric of both GDI designs is better.
Fig. 6. LCG unit with minimum leakage vector.
Fig. 7. PG unit with minimum leakage vector.
Fig. 8. Delay distribution of CMOS design derived by Monte Carlo simulation.
Table 4
Comparison of performance, static and dynamic power in GDI and CMOS CLA
implementations.
t
pd
[psec] P
Stat
[nW] E
Dyn
[f] EDP [J sec 10
24
]
CMOS 134 78.7 123 16.5
MVT GDI 422 35.9 23.5 8.2
FS GDI 167 18.4 19.4 3.9
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 6
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
Sensitivity of Process VariationsIn order to evaluate the sensi-
tivity of the designs to local and global process variations, Monte
Carlo simulations have been carried out. Figs. 810 present the
delay distribution of CMOS, MVT GDI and FS GDI, respectively.
As expected, FS GDI presents much better immunity to process
variations than MVT GDI, while showing /s ratio that is very close
to CMOS (only 12% degradation). The MVT GDI adder is much
more sensitive (4 degradation as compared to CMOS), because
of driving current dependence on process-sensitive Vth, which is
amplied due to voltage drops at internal nodes.
4.2. Low voltage operation
Driven by demand for ultra-low power dissipation, low voltage
operation of digital circuits has recently gained extensive research
efforts [1623]. It was shown that minimum energy operation
point (MEP) is usually achieved in the sub-threshold region. Here
we examine the operation of GDI and CMOS adders at low supply
voltages.
While discussing the MEP term, it should be reminded that the
total energy consumption is comprised of two different compo-
nents E
total
E
dyn
E
leak
. The dynamic energy component is
proportional to an effective load capacitance and supply voltage
E
dyn
C
ef f
V
DD
2
, while the leakage component is dominated by
integration of a sub-threshold current along the operation time
E
leak
V
DD
2
expV
DD
=nV
t
, assuming full swing operation [24]. It can
be easily noticed that these two components have an opposite
effect as a function of V
DD
, so their relation determines the MEP.
Fig. 10 presents the dependency of total energy consumption
on varying supply voltages. MEPs of GDI and CMOS adders are
shown. Note that the simulation was carried out at a typical corner
(TT), thus the actual MEPs may be higher due to increased leakage
currents under process variations.
An interesting observation from Fig. 11 is that the same order of
MEP energy dissipation is obtained for all designs, while the MEPs
were achieved at different voltages. The FS GDI has 35% lower MEP
energy, as compared to CMOS.
Both GDI designs have reduced effective load capacitances, thus
the consumed energy at high V
DD
values is lower than in CMOS.
The sub-threshold leakage in MVT GDI adder starts dominating
Fig. 9. Delay distribution of MVT GDI design derived by Monte Carlo simulation.
Fig. 10. Delay distribution of FS GDI design derived by Monte Carlo simulation.
Fig. 11. MEPs simulation.
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 7
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
at higher V
DD
values because of the V
TH
drops. Thus, the MVT GDI
achieves the MEP at higher voltage. FS GDI design has slightly
higher effective capacitances. Therefore, its energy consumption at
large voltages is increased. Since no V
TH
drops occur in FS GDI, its
leakage component becomes dominant at lower supply voltage and
therefore the MEP is shifted towards lower V
DD
. The CMOS adder
MEP appears at lower V
DD
because of the dominating dynamic
energy caused by higher switching capacitances.
Fig. 12 presents the propagation delay of all adders as function
of supply voltage sweep. The delays at MEP are also shown. As can
be seen, the MVT GDI adder exhibits a delay of about one order
higher than in CMOS and FS GDI. However, at MEP, the delay
of MVT GDI is comparable with the MEP delays of the other
techniques.
Table 5 summarizes the characteristics of all the designs at
MEP. As can be seen, FS GDI presents the best energy and delay
at MEP. Moreover, the MEP is achieved at higher V
DD
than CMOS,
which when accounting the similar process variation sensitivity
make FS GDI benecial for minimal energy operation.
5. Conclusions
Full Swing Gate Diffusion Input (FS GDI) methodology was
proposed and evaluated on a 16-bit Carry Look Ahead Adder (CLA).
It was shown that the proposed FS-GDI circuits are benecial in
terms of performance and static power consumption, as compared
to the conventional multiple Vth GDI (MVT-GDI). Three CLA
versions, based on FS-GDI, MVT-GDI and standard CMOS were
designed and compared in 40 nm low power TSMC process.
Simulation results showed a clear advantage of the proposed GDI
CLAs by means of area, dynamic and static energy. The FS-GDI
achieved 2 area reduction, 5 improvement in dynamic energy
dissipation and 4 decrease in leakage, with a slight (24%)
degradation in performance, when compared to the CMOS CLA.
Advanced design metrics of GDI cells, such as minimum energy
point (MEP) operation and minimum leakage vector (MLV) were
discussed. It was shown that MVT-GDI achieved better character-
istics at MEP, as compared to other techniques.
References
[1] M. Alioto, Ultra-low power VLSI circuit design demystied and explained: a
tutorial, IEEE Transactions on Circuits and SystemsPart I (invited) 59 (1)
(2012) 329.
[2] G. Gammie, A. Wang, M. Chau, S. Gururajarao, R. Pitts, F. Jumel, S. Engel, P.
Royannez, R. Lagerquist, H. Mair, A 45 nm 3.5 g baseband-and-multimedia
application processor using adaptive body-bias and ultra-low-power techni-
ques, in: Proceedings of IEEE International Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), 2008, pp. 258611.
[3] Bol D. Robust and energy-efcient ultra-low-voltage circuit design under
timing constraints in 65/45 nm CMOS. Journal of Low Power Electronics and
Applications 1 (1) (2011) 119.
[4] G. Chen, M. Fojtik, D. Kim, D. Fick, J. Park, M. Seok, M.T. Chen, Z. Foo, D. Sylvester,
D. Blaauw, Millimeter-scale nearly perpetual sensor systemwith stacked battery
and solar cells, in: Proceedings of IEEE International Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), 2010 , pp. 288289.
[5] I. Vaisband, E.G. Friedman, R. Ginosar, A. Kolodny, Low power clock network
design, Journal of Low Power Electronics and Applications 1 (2011) 219246.
[6] A. Teman, L. Pergament, O. Cohen, A. Fish, Minimum leakage quasi-static RAM
bitcell, Journal of Low Power Electronics and Applications 1 (2011) 204218.
[7] A. Morgenshtein, A. Fish, I.A. Wagner, Gate-diffusion input (GDI)a power-
efcient method for digital combinatorial circuits, IEEE Transactions on VLSI
Systems 10 (5) (2002).
[8] A. Morgenshtein, I. Shwartz, A. Fish, Gate diffusion input (GDI) logic in
standard CMOS nanoscale process, in: Proceedings of IEEE Convention of
Electrical and Electronics Engineers in Israel, 2010.
[9] M. Kumar, M.A. Hussain, L.L.K. Singh, Design of a Low Power High Speed ALU
in 45nm Using GDI Technique and Its Performance Comparison Communica-
tions in Computer and Information Science 142 (Part 3) (2011) 458463.
[10] K.K. Chaddha, R. Chandel, Design and analysis of a modied low power CMOS
full adder using gate-diffusion input technique, Journal of Low Power
Electronics 6 (4) (2010) 482490.
[11] O.P. Hari, A.K. Mai, Low power and area efcient implementation of N-phase
non overlapping clock generator using GDI technique, in: Proceedings of IEEE
International Conference on Electronics Computer Technology (ICECT), 2011.
Fig. 12. Propagation delay of CLA Adders at low supply voltages.
Table 5
Transistor count and area estimation comparison between GDI and Static CMOS
designs.
MEP V
DD
[V] MEP energy [fJ] MEP delay [ sec]
CMOS 0.18 4.7 5.51
MVT GDI 0.32 5.2 5.92
FS GDI 0.22 3.1 4.17
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 8
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002
[12] P.MLee, C.H. Hsu, Y.-H. Hung, Novel 10-T full adders realized by GDI structure, in:
Proceedings of the IEEE International Symposium on Integrated Circuits, 2007.
[13] F. Moradi, D.T. Wisland, D.T.H. Mahmoodi, H.S. Aunet, T.V. Cao, A. Peiravi, Ultra
low power full adder topologies, in: Proceedings of ISCAS'04, Taipei, Taiwan,
May 2009.
[14] A. Morgenshtein, A. Fish, I.A. Wagner, An efcient implementation of D-ip-
op using the GDI technique, in: Proceedings of ISCAS'04 Conference, Canada,
May 2004, pp. 673676.
[15] R. Uma, P. Dhavachelvan, Modied gate diffusion input technique: a new
technique for enhancing performance in full adder circuits, Proceedings of
ICCCS 6 (2012) 7481.
[16] A. Wang, B.H. Calhoun, A.P. Chandrakasan, Sub-Threshold Design for Ultra
Low-Power Systems, Springer Verlag, 2006.
[17] S. Fisher, A. Teman, D. Vaysman, A. Gertsman, O. Yadid-Pecht, A. Fish, Digital
subthreshold logic design - motivation and challenges, in: Proceedings of the
IEEE 25th Convention of Electrical and Electronics Engineers in Israel (IEEEI),
vol. 35, 2008, pp. 702706.
[18] P.R. Panda, A. Shrivastava, P.R. Panda, B.V.N. Silpa, K. Gummidipudi, Power-
Efcient System Design, Springer Verlag, 2010.
[19] D. Markovic, C.C. Wang, L.P. Alarcon, J.M. Rabaey, Ultralow-power design in
near-threshold region, Proceedings of the IEEE 98 (2010) 237252.
[20] B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, Analysis and mitigation of
variability in subthreshold design, in: Proceedings of the 2005 International
Symposium on Low power Electronics and Design,2005, pp. 2025.
[21] N. Verma, J. Kwong, A.P. Chandrakasan, Nanometer MOSFET variation in
minimum energy subthreshold circuits, IEEE Transactions on Electron Devices
55 (2008) 163174.
[22] D.F. Finchelstein, V. Sze, M.E. Sinangil, A.P. Chandrakasan, A 0.7-V 1.8-mW H.
264/AVC 720p video decoder, IEEE Journal of Solid State Circuits 44 (2009)
29432956.
[23] Y. Pu, J.P. de Gyvez, H. Corporaal, Y. Ha, An ultra-low-energy/frame multi-
standard JPEG co-processor in 65 nm CMOS with sub/near-threshold power
supply, in: Proceeding of IEEE International Solid-State Circuits Conference-
Digest of Technical Papers (ISSCC), 147 a, 2009, pp. 146147.
[24] B.H. Calhoun, A. Wang, A. Chandrakasan, Modeling and sizing for minimum
energy operation in subthreshold circuits, IEEE Journal of Solid-State Circuits
40 (9) (2005) 17781786.
Arkadiy Morgenshtein received the B.Sc. degree in
electrical engineering in 1999, M.Sc. in biomedical
engineering in 2003, MBA in 2006 and Ph.D in elec-
trical engineering in 2008 from Technion, Israel Insti-
tute of Technology. In 2012 he joined the IBM Haifa
Research Lab. Prior to that he worked with Core CAD
Technologies group at Intel Corporation, where he was
researching and developing tools for power optimiza-
tion and estimation at various levels of VLSI design. He
has been a Teaching and Research Assistant at Electrical
Engineering Department, Technion since 1999, where
he is currently an Adjunct Lecturer.
Dr. Morgenshtein's research interests include low-
power design techniques for digital circuits, optimization of on-chip interconnect,
CMOS sensors and EDA tools for power estimation and optimization. He has
authored over 40 scientic papers and patent applications. Dr. Morgenshtein co-
authored a paper that won the IEEE VLSI Transactions (TVLSI) Best Paper award
for 2012. He was honored by Technion President's award and Intel-Technion award
for excellence in study in 1998 and 2007. He supervised projects winning the award
of Oz Moses Foundation by Intel in 2002 and best VLSI project in 2003 and 2005.
Dr. Morgenshtein has served as associate editor for the Journal of Low Power
Electronics and Applications (JLPEA), as session chairman at ICECS'04 Conference
and as referee in multiple journals and conferences.
Viacheslav Yuzhaninov received the B.Sc. degree
in Electrical Engineering from Ben-Gurion University,
Beer Sheva, Israel, in 2012. He has been a Research
Assistant at the Low Power Circuits and Systems Lab,
VLSI Systems Center, Ben-Gurion University, since 2011.
He is currently working on his M.Sc degree in
Electrical Enginering at Bar-Ilan University. His research
interests are energy effcient logic families and low
voltage high performance digital design.
Alexey Kovshilovsky received the B.Sc. degree in
Electrical Engineering from Ben-Gurion University,
Beer Sheva, Israel, in 2012. He has been a Research
Assistant at the Low Power Circuits and Systems Lab,
VLSI Systems Center, Ben-Gurion University, since 2011.
Currently he is working as an Embedded Software
Engineer at Powermat Technologies in Neve-Ilan, Israel.
Alexander Fish received the B.Sc. degree in Electrical
Engineering from the Technion, Israel Institute of
Technology, Haifa, Israel, in 1999. He completed his
M.Sc. in 2002 and his Ph.D. (summa cum laude) in
2006, respectively, at Ben-Gurion University in Israel.
He was a postdoctoral fellow in the ATIPS laboratory at
the University of Calgary (Canada) from 20062008. In
2008 he joined the Ben-Gurion University in Israel, as a
faculty member in the Electrical and Computer Engi-
neering Department. There he founded the Low Power
Circuits and Systems (LPC&S) laboratory, specializing in
low power circuits and systems. In July 2011 he was
appointed as a head of the VLSI Systems Center at BGU.
In October 2012 Prof. Fish joined the Bar-Ilan University, Faculty of Engineering as
an Associate Professor and the head of the microelectronics track. Prof. Fish also
leads new Energy Efcient Electronics and Applications Labs.
Prof. Fish's research interests include development of energy efcient smart
CMOS image sensors, ultra low power SRAM, DRAM and Flash memory arrays and
energy efcient design techniques for low voltage digital and analog VLSI chips. He
has authored over 70 scientic papers in journals and conferences, including IEEE
Journal of Solid State Circuits, IEEE Transactions on Electron Devices, IEEE Transac-
tions on Circuits and Systems and many others. He also submitted 16 patent
applications. Prof. Fish has published two book chapters. He was a co-author
of papers that won the Best Paper Finalist awards at IEEE ISCAS and ICECS
conferences.
Prof. Fish serves as an Editor in Chief for the MDPI Journal of Low Power
Electronics and Applications (JLPEA) and as an Associate Editor for the IEEE Sensors
Journal. He also served as a chair of different tracks of various IEEE conferences. He
was a co-organizer of many special sessions at IEEE conferences, including IEEE
ISCAS, IEEE Sensors and IEEEI conferences. Prof. Fish is a member of Sensory, VLSI
Systems and Applications and Bio-medical Systems Technical Committees of IEEE
Circuits and Systems Society.
A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 9
iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,
INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002

You might also like