B01
B01
Edited by
DIMITRIOS SOUDRIS
Democritus University of Thrace, Xanthi, Greece
CHRISTIAN PIGUET
CSEM, Neuchtel, Switzerland
COSTAS GOUTIS
University of Patras, Patras, Greece
Series Editors:
Contents
List of Figures
vii
List of Tables
ix
Contributing Authors
xi
Foreword
xvii
Introduction
xxv
Part I
1
Motivation, Context and Objectives
Dimitrios Soudris, Christian Piguet and Costas Goutis
2
Sources of power dissipation in CMOS circuits
Dimitrios Soudris and Antonios Thanailakis
2.1
Introduction
2.2
The components of power dissipation in CMOS circuits
2.2.1 Dynamic Power dissipation
2.2.2 Short-circuit power dissipation
2.2.3 Leakage Power Dissipation
2.2.4 Static Power dissipation
2.3
Low Power Design with Multiple
and
9
10
10
12
15
18
19
References
20
3
Logic level Power Optimization
George Theodoridis and Dimitrios Soudris
3.1
Introduction
3.2
Problem Formulation
3.3
Combinational Circuits Technology-Independent Optimization
23
23
24
26
vi
3.4
3.5
3.6
References
26
28
31
31
33
35
35
36
37
38
40
41
4
Circuit-Level Low-Power Design
Spiridon Nikolaidis and Alexander Chatzigeorgiou
4.1
Introduction
4.2
Logic Style
4.2.1 Static Logic
4.2.2 Dynamic Logic
4.2.3 Pass-Transistor Logic
4.2.4 Single-Rail Pass-Transistor Logic
4.2.5 Other Logic Styles
4.2.6 Logic Styles: Discussion
4.3
Latches and Flip-Flops
4.3.1 Latches
4.3.2 Flip - Flops
4.4
Transistor Sizing and Ordering
4.4.1 Transistor ordering
4.4.2 Transistor sizing
4.5
Drivers for large loads
4.6
Conclusions
45
46
46
48
50
52
54
57
58
58
61
63
63
64
64
66
References
67
5
Circuit Techniques for
Reducing Power Consumption
in Adders and Multipliers
Labros Bisdounis, Dimitrios Gouvetas and Odysseas Koufopavlou
5.1
Introduction
5.2
Power and Delay in CMOS Circuits
5.3
CMOS Circuit Design Styles
5.3.1 Conventional Static CMOS Logic - CSL
5.3.2 Complementary Pass-Transistor Logic - CPL
5.3.3 Double Pass-Transistor Logic - DPL
5.3.4 Static Differential Cascode Voltage Switch Logic - SDCVSL
5.3.5 Static Differential Split-level Logic - SDSL
5.3.6 Dual-Rail Domino Logic - DRDL
45
71
72
73
74
74
75
76
76
77
78
vii
Contents
5.3.7
5.4
5.5
5.6
5.7
References
79
80
81
87
88
89
89
93
94
6
Computer Arithmetic Techniques for Low-Power Systems
Vassilis Paliouras and Thanos Stouraitis
6.1
Introduction
6.2
Conventional Arithmetic and Low-power design
6.2.1 Basic Arithmetic Operations
6.2.2 Number Representations
6.3
The Logarithmic Number System
6.3.1 LNS Basics
6.3.2 LNS and Power Dissipation
6.4
The Residue Number System
6.4.1 RNS Basics
6.4.2 RNS and Power Dissipation
6.4.3 RNS Signal Activity for Gaussian Input
6.4.4 Algebraic extensions: The Quadratic RNS
6.5
Conclusions
97
99
99
100
100
101
105
106
106
108
108
109
115
References
115
97
7
Reducing Power Consumption in Memories
Alexander Chatzigeorgiou and Spiridon Nikolaidis
7.1
Introduction
7.2
Static Random Access Memories
7.2.1 Sources of power dissipation in SRAMs
7.2.2 Low-Power SRAM Circuit Techniques
7.3
Dynamic Random Access Memories
7.3.1 Sources of power dissipation in DRAMs
7.3.2 Low-Power DRAM Circuit Techniques
7.4
Conclusions
117
118
119
121
128
131
133
138
References
139
8
Low-Power Clock, Interconnect and Layout Designs
Christian Piguet
8.1
Introduction
8.2
Low-Power Clock Design
117
141
141
142
viii
8.3
8.4
References
142
145
146
148
148
149
150
152
153
154
156
157
157
158
160
163
163
164
165
9
Logic Level Power Estimation
George Theodoridis and Costas Goutis
9.1
Introduction
9.2
Problem Formulation
9.2.1 Sources of Power Dissipation
9.2.2 Problem Statement
9.3
Background
9.3.1 Signal Correlations
9.3.2 Structural Dependencies
9.3.3 Sequential Correlations
9.3.4 Gate Delay Model
9.4
Classication of Power Estimation Methodologies
9.5
Simulation-based Power Estimation
9.5.1 Monte-Carlo Power Estimation
9.5.2 Advanced Sampling Techniques
9.5.3 Vector Compaction
9.6
Probabilistic methods
9.6.1 Combinational Circuits
9.6.2 Real-Delay Gate Power Estimation
9.6.3 Sequential Circuits
9.7
Conclusions
169
170
170
170
172
172
174
175
176
176
178
178
180
182
186
187
193
196
198
References
199
Part II
169
10
Low-Power Design for Safety-Critical Applications
205
Contents
ix
205
206
207
212
213
214
216
218
218
219
221
231
232
References
233
11
Design of a Low Power Ultrasound Beamformer ASIC
Robert Schwann, Thomas Heselhaus, Oliver Weiss and Tobias G. Noll
11.1 Introduction
11.2 The Key to Physically Oriented Design: Automated Datapath Generation
11.3 Beamformers System Overview and Requirements
11.4 Optimization and Benchmarks of Basic Building Blocks
11.4.1 Coarse Delay Unit: Optimization of Customized Memories
11.4.2 Delay Generator
11.4.3 Fine Delay Interpolation Filter
11.4.4 Apodization Multiplier
11.4.5 Accumulation Chain
11.5 Conclusion
239
242
243
243
249
254
261
263
265
References
266
12
Epilogue
Christian Piguet
271
235
236
Contributing Authors
xx
Contributing Authors
xxi
1990 to 1994 he was at the IBM Thomas J. Watson Research Center, Yorktown
Heights, NY, USA. He is currently an Associate Professor with the Department
of Electrical and Computer Engineering, University of Patras. His research interests include VLSI, low power design, and high performance communication
subsystems architecture and implementation. He served as conference General Chair of the 1999 International Conference on Electronics, Circuits and
Systems, and on organizing committees and technical program committees of
many major conferences. He leads several projects, the Greek government, and
major companies. Dr. Koufopavlou has published 83 journal and conference
papers and received patents and inventions. He is a member of IEEE, IFIP 10.5
WG and the Technical Chamber of Greece.
Spiridon Nikolaidis was born in Eleochori of Kavala, Greece in 1965. He
received the Diploma and Ph.D. degrees in Electrical Engineering from Patras
University, Greece, in 1988 and 1994, respectively. Since September 1996
he has been with the Department of Physics of the Aristotle University of
Thessaloniki, Thessaloniki, Geece. He is now an assistant professor at this
Department in the eld of digital design. His research interests include CMOS
gate propagation delay and power-consumption modeling, high speed and low
power CMOS circuit design techniques, power estimation of DSP architectures,
and design of high speed and low power DSP architectures. He is author and coauthor in more than 60 papers and articles in conferences. He also participates
in many European (ESPRIT, IST) and Greek government projects.
Tobias G. Noll received the Ing. (grad.) degree in Electrical Engineering
from the Fachhochschule Koblenz in 1974, the Dipl.-Ing. degree in Electrical
Engineering from the Technical University of Munich in 1982, and the Dr.-Ing.
degree from the Ruhr-University of Bochum in 1989. From 1974 to 1976, he
was with the Max-Planck-Institute of Radio Astronomy, Bonn. Since 1976 he
was with the Corporate Research and Development Department of Siemens and
since 1987 he headed a group of laboratories concerned with CMOS circuits for
digital signal processing. In 1992, he joined the RWTH Aachen University of
Technology where he is a Professor holding the Chair of Electrical Engineering
and Computer Systems. His activities focus on low power deep submicron
CMOS architectures, circuits and design methodologies, as well as digital signal
processing for communications and medicine electronics.
Vassilis Paliouras received the Diploma in electrical engineering in 1992 and
the Ph.D. degree in electrical engineering in 1999, from the Electrical and
Computer Engineering Department, University of Patras, Greece. He works
as a researcher at the VLSI Design Laboratory, ECE Dept., while teaching
xxii
microprocessor-based system design at the Computer Engineering and Informatics Dept., both at the University of Patras, Greece. His research interests
include computer arithmetic algorithms and circuits, microprocessor architecture, and VLSI signal processing, areas where he has published more than 30
conference and journal articles. Dr. Paliouras received the MEDCHIP VLSI
Design Award in 1997. He is also the recipient of the 2000 IEEE Circuits and
Systems Society Guillemin-Cauer best paper Award. He is a Member of ACM,
SIAM, and the Technical Chamber of Greece.
Kyriakos Papadomanolakis is a postgraduate student at the VLSI Laboratory
of the University of Patras. He received his Diploma in Electrical and Electronics Engineering at the University of Patras. He is currently working on his
Ph. D. thesis in the area of Hardware safety in VLSI designing at the VLSI
Laboratory of the University of Patras, at Rion. He has the working experience
of two ESPRIT European projects: i) ASPIS: Contribution in the designing of
a DECT/GSM (DCS1800) dual mode mobile phone, in INTRACOM S.A. at
the department of R&D and ii) CoSafe: Design of a Low Power safety-critical
pump for in-vein infusion in collaboration with Micrel Medical Devices Ltd.
Christian Piguet received the M. S. and Ph. D. degrees in electrical engineering from the EPFL, respectively in 1974 and 1981. He joined the Centre
Electronique Horloger S.A., Neuchtel, Switzerland, in 1974. He is now Head
of the Ultra Low Power Circuits section at the CSEM S.A. He is presently
involved in the design of low-power low-voltage integrated circuits. He is Professor at EPFL and also lectures at the University of Neuchtel, Switzerland.
He is author or co-author of more than 100 scientic papers and has contributed
to numerous advanced engineering courses.
Robert Schwann received the Dipl.-Ing. degree in 1997 from Aachen University of Technology. Since 1998 he has been working as a research assistant at
the Chair of Electrical Engineering and Computer Systems, Aachen University
of Technology. His eld of research are the signal processing and image quality
of medical ultrasound systems.
Dimitrios Soudris received his Diploma in Electrical Engineering from the
University of Patras, Greece, in 1987. He received the Ph.D. Degree from in
Electrical Engineering, from the University of Patras in 1992. He is currently
working as Ass. Professor in Dept. of Electrical and Computer Engineering,
Democritus University of Thrace, Greece. His research interests include low
power design, parallel architectures, embedded systems design, and VLSI sig-
Contributing Authors
xxiii
xxiv
Foreword
This book is the fourth in a series on novel low power design architectures,
methods and design practices. It results from of a large European project started
in 1997, whose goal is to promote the further development and the faster and
wider industrial use of advanced design methods for reducing the power consumption of electronic systems.
Low power design became crucial with the wide spread of portable information and communication terminals, where a small battery has to last for a
long period. High performance electronics, in addition, suffers from a permanent increase of the dissipated power per square millimeter of silicon, due
to the increasing clock-rates, which causes cooling and reliability problems or
otherwise limits the performance.
The European Unions Information Technologies Programme Esprit did
therefore launch a Pilot action for Low Power Design, which eventually grew
to 19 R&D projects and one coordination project, with an overall budget of 14
million EURO. It is meanwhile known as European Low Power Initiative for
Electronic System Design (ESD-LPD) and will be completed in the year 2002.
It involves to develop or demonstrate new design methods for power reduction,
while the coordination project takes care that the methods, experiences and
results are properly documented and publicised.
The initiative addresses low power design at various levels. This includes
system and algorithmic level, instruction set processor level, custom processor
level, RT-level, gate level, circuit level and layout level. It covers data dominated
and control dominated as well as asynchronous architectures. 10 projects deal
mainly with digital, 7 with analog and mixed-signal, and 2 with software related
aspects. The principal application areas are communication, medical equipment
and e-commerce devices.
xxvi
xxvii
LUCS Low Power Ultrasound Chip Set.
Design methodology on low power ADC, memory and circuit design
Prototype demonstration of a handheld medical ultrasound scanner
ALPINS Analog Low Power Design for Communications Systems
Low-voltage voice band smoothing lters and analog-to-digital and
digital-to-analog converters for an analog front-end circuit of a
DECT system
High linear transconductor-capacitor (gm-C) lter for GSM Analog
Interface Circuit operating at supply voltages as low as 2.5V
Formal verication tools, which will be implemented in the industrial partners design environment. These tools support the complete
design process from system level down to transistor level
SALOMON System-level analog-digital trade-off analysis for low power
A general top-down design ow for mixed-signal telecom ASICs
High-level models of analog and digital blocks and power estimators
for these blocks
A prototype implementation of the design ow with particular software tools to demonstrate the general design ow
DESCALE Design Experiment on a Smart Card Application for Low Energy
The application of highly innovative handshake technology
Aiming at some 3 to 5 times less power and some 10 times smaller
peak currents compared to synchronously operated solutions
SUPREGE A low power SUPerREGEnerative transceiver for wireless data
transmission at short distances
Design trade-offs and optimisation of the micro power receiver /
transmitter as a function of various parameters (power consumption,
area, bandwidth, sensitivity, etc)
Modulation / demodulation and interface with data transmission
systems
Realisation of the integrated micro power receiver / transmitter
based on the super-regeneration principle
xxviii
xxix
LPGD A Low-Power Design Methodology/Flow and its Application to the
Implementation of a DCS1800-GSM/DECT Modulator/Demodulator
To complete the development of a top-down, low power design
methodology/ow for DSP applications
To demonstrate the methods at the example of an integrated
GFSK/GMSK Modulator-Demodulator (MODEM)
for DCS1800-GSM/DECT applications
SOFLOPO Low Power Software Development for Embedded Applications
Develop techniques and guidelines for mapping a specic algorithm
code onto appropriate instruction subsets
Integrate these techniques into software for the power-conscious
ARM-RISC and DSP code optimisation
I-MODE Low Power RF to Base band Interface for Multi-Mode Portable
Phone
To raise the level of integration in a DECT/DCS1800 transceiver, by
implementing the necessary analog base band low-pass lters and
data converters in CMOS technology using low power techniques
COOL-LOGOS Power Reduction through the Use of Local dont Care Conditions and Global Gate Resizing Techniques: An Experimental Evaluation.
To apply the developed low power design techniques to the existing
24-bit DSP, which is already fabricated
To assess the merit of the new techniques using experimental silicon
through comparisons of the projected power reduction (in simulation) and actually measured reduction of new DSP; assessment of
the commercial impact
LOVO Low Output VOltage DC/DC converters for low power applications
Development of technical solutions for the power supplies of advanced low power systems, comprising the following topics
New methods for synchronous rectication for very low output voltage power converters
xxx
xxxi
The low power design projects have achieved the following results:
Projects, who have designed a prototype chip, can demonstrate a power
reduction of 10 to 30 percent.
New low power design libraries have been developed.
New proven low power RF architectures are now available.
New smaller and lighter mobile equipment is developed.
Instead of running a number of Esprit projects at the same time independently of each other, during this pilot action the projects have collaborated
strongly. This is achieved mostly by the novelty of this action, which is the
presence and role of the coordinator: DIMES - the Delft Institute of Microelectronics and Submicron-technology, located in Delft, the Netherlands
(http://www.dimes.tudelft.nl). The task of the coordinator is to co-ordinate,
facilitate, and organize:
The information exchange between projects.
The systematic documentation of methods and experiences.
The publication and the wider dissemination to the public.
The most important achievements, credited to the presence of the coordinator
are:
New personnel contacts have been made, and as a consequence the resulting synergy between partners resulted in better and faster developments.
The organization of low power design workshops, special sessions at
conferences, and a low power design web site,
http://www.esdlpd.dimes.tudelft.nl. At this site all public reports of the
projects can be found and all kind of information about the initiative
itself.
The used design methodology, design methods and/or design experience
are disclosed, are well documented and available.
Based on the work of the projects, in cooperation with the projects, the
publication of a low power design book series is planned. Written by
members of the projects this series of books on low power design will
disseminate novel design methodologies and design experiences, which
were obtained during the runtime of the European Low Power Initiative
for Electronic System Design, to the general public.
xxxii
In conclusion, the major contribution of this project cluster is that, except the
already mentioned technical achievements, the introduction of novel knowledge
on low power design methods into the mainstream development processes is
accelerated.
We would like to thank all project partners from all the different companies
and organizations who make the Low Power Initiative a success.
Rene van Leuken, Reinder Nouta, Alexander de Graaf, Delft, May 2002
Introduction
Modern electronic systems have reached a signicant turning point in the last
decade, from low performance products such as wristwatches and calculators
to high performance products such as laptops and personal digital assistants.
The introduction of these devices to the consumer market raised to the surface
a characteristic that had been previously omitted. This was low power dissipation. Gradually, engineers invented novel techniques, which may be included
in efcient design methodologies, for designing and implementing efcient circuits not only in terms of area and performance, as they were used to, but in the
term of low power consumption.
The material in this book is based on the background and the innovative results of the different partners involved in the AMIED, LPGD, PREST, COSAFE,
and LUCS projects of the European Low Power Initiative for Electronic System Design under the successful coordination of DIMES, Delft. The partners
have been studying for many years low-power design eld introducing novel
concepts and efcient techniques. Due to close collaboration of academic and
industrial research groups the presented material have been inuenced by the
plethora of disseminations e.g. public deliverables, technical meetings, workshops, during the projects execution.
The book consists of two parts: The rst part includes the low power design techniques for power optimization and estimation, while the second one
provides the results from the projects COSAFE and LUCS. Starting from the
description of the power consumption sources, low power optimization and estimation techniques for logic design level, circuit/transistor design level, and
layout design level are provided in eight chapters (i.e. Chapters 2-9). The
next two chapters describe the novel low power techniques, which were used
during the implementation of the safety-critical Application Specic Instruction Processor designed in COSAFE project, and the implementation of the
low power 16-channel ultrasound beamformer application specic integrated
circuit (ASIC) designed for LUCS project.
xxxiv
A top-down approach with respect to the design level is adopted in the presentation of the low power design techniques . However,it was not possible to
present in detail manner all the low power optimization and estimation techniques from the logic level to the layout level. Only the most important research
contributions presented in a tutorial manner, are included. The book can also
be used as a textbook for undergraduate and graduate students, VLSI design
engineers,and professionals, who have had a basic knowledge of VLSI digital
design.
The authors of the chapters of this book together with the editors would
like to use this opportunity to thank the many people, i.e. colleagues and Ph.D.
students, whose dedication and industry during the projects execution lead to the
introduction of novel scientic results and realization of innovative integrated
systems.
The authors of Chapter 3 and 9 would like to thank Dr. S. Theoharis for his
contribution in the software development of logic optimization and estimation
tools, which were necessary for making power measurements.
The authors of Chapter 6 wish to acknowledge the discussions with Dr.
Karagianni, who signicantly inuenced the particular chapter. Also the authors appreciate the nancial support from the "C. Caratheodorys" fund for the
University of Patras.
The authors of Chapter 10 would like to thank V. Spiliotopoulos for his
support in the designing and realization of the COSAFE ASIP. Also, many
thanks to V. Kokkinos for his contribution in the implementation of the many
fault-secure multiplier designs.
All authors of this book together with the editors would like to thank DIMES,
Delft, for their continuous support during the running of the low power projects
for the dissemination of the scientic results. This book is one of the activities
of this dissemination task.
The authors of Chapter 4 would like to thank the people, who contributed
to PREST, whose public deliverable reports used as an inspiration source for
chapters preparation.
Last but not least, D. Soudris would like to thank his parents for being a
constant source of moral support and for rmly imbibing into him from a very
young age that perseverantia omnia vincit - it is this perseverance that kept him
going. This book is dedicated to them.
Dimitrios Soudris, Christian Piguet, Costas Goutis, May 2002
Chapter 5
CIRCUIT TECHNIQUES FOR
REDUCING POWER CONSUMPTION
IN ADDERS AND MULTIPLIERS
Labros Bisdounis
INTRACOM S.A., Athens, Greece
[email protected]
Dimitrios Gouvetas
Odysseas Koufopavlou
University of Patras, Rio, Greece
[email protected]
Abstract
An important issue in the design of VLSI Circuits is the choice of the basic circuit
approach and topology for implementing various logic and arithmetic functions
such as adders and multipliers. In this chapter, several static and dynamic CMOS
circuit design styles are evaluated in terms of area, propagation delay and power
dissipation. The different design styles are compared by performing detailed
transistor-level simulations on a benchmark circuit (ripple carry adder) using
HSPICE, and analyzing the results in a statistical way. After the comparison
between the different design styles, a number of well known types of adders
(ripple carry, carry skip, carry lookahead, carry select etc.) are compared in terms
of propagation delay, number of gates and logic transitions average number.
Furthermore, power measurements and comparisons for a number of well-known
multipliers are provided. Based on the results of the provided analysis some of
the tradeoffs that are possible during the design phase in order to improve the
circuit power-delay product are identied.
Keywords:
D R A F T
D R A F T
72
5.1
Introduction
Much of the research efforts of the past years in the area of digital electronics
has been directed towards increasing the speed of digital systems. Recently,
the requirement of portability and the moderate improvement in battery performance indicate that the power dissipation is one of the most critical design
parameters [1]. The three most widely accepted metrics to measure the quality of a circuit or to compare various circuit styles are area, delay and power
dissipation. Portability imposes a strict limitation on power dissipation while
still demands high computational speeds. Hence, in recent VLSI systems the
power-delay product becomes the most essential metric of performance.
The reduction of the power dissipation and the improvement of the speed
require optimizations at all levels of the design procedure. In this chapter,
the proper circuit style and methodology is considered. Since, most digital
circuitry is composed of simple and/or complex gates, we study the best way
to implement adders in order to achieve low power dissipation and high speed.
Several circuit design techniques are compared in order to nd their efciency
in terms of speed and power dissipation. A review of the existing CMOS
circuit design styles is given, describing their advantages and their limitations.
Furthermore, a four-bit ripple carry adder for use as a benchmark circuit was
designed in a full-custom manner by using the different design styles, and
detailed transistor-level simulations using HSPICE [2] were performed. Also,
various designs and implementations of four multipliers are analysed in the
terms of delay and power consumption. Two ways of power measurements are
used.
Conventional static CMOS has been a technique of choice in most processor
design. Alternatively, static pass transistor circuits have also been suggested for
low-power applications [3]. Dynamic circuits, when clocked carefully, can also
be used in low-power, high speed systems [4]. However, several other design
techniques need to be applied and evaluated along with these circuit styles in
order to improve the speed and reduce the power dissipation of VLSI systems.
In this chapter we study eight different CMOS logic styles:
Conventional Static CMOS - CSL,
Complementary Pass-transistor - CPL [5],
Double Pass-transistor - DPL [6],
Static and Dynamic Differential Cascode Voltage Switch - DCVSL [7,8],
Static Differential Split-level - SDSL [9],
Dual-Rail Domino - DRDL [10,11], and
Enable/disabled CMOS Differential - ECDL [12].
D R A F T
D R A F T
73
The rest of the chapter is structured as follows. In the next section a brief
introduction of the power dissipation and the delay in CMOS circuits is given. In
section 3, the CMOS adder logic styles and their characteristics are described
in details. The different adder logic styles are compared in terms of speed,
power dissipation and silicon area, in section 4. Also, the power-delay product
of the designs is considered, due to the importance of this metric in modern
VLSI applications. Comparison results among different realizations of a 16-bit
adder in terms of area, delay, and power are presented in Section 6. The next
section provides results for four implementations of multipliers. Finally, the
main points are summarized in Section 7 of conclusions.
5.2
Since the objective is to investigate the tradeoffs that are possible at the circuit
level in order to reduce power dissipation while maintaining the overall system
throughput, we must rst study the parameters that affect the power dissipation
and the speed of a circuit. It is well known that one of the major advantage
of CMOS circuits over single polarity MOS circuits, is that the static power
dissipation is very small and limited to leakage. However, in some cases such as
bias circuitry and pseudo-nMOS logic, static power is dissipated. Considering
that in CMOS circuits the leakage current between the diffusion regions and
the substrate is negligible, the two major sources of power dissipation are the
switching and the short-circuit power dissipation [1],
(5.1)
qq i
h
g
u s r pq i h g e c b
y q q i x w vtCq @4fd6a
e
fc
where
is the node transition activity factor,
is the load capacitance,
is the supply voltage, is the switching frequency.
is the current which
arises when a direct path from power supply to ground is caused, for a short
period of time during low to high or high to low node transitions [13]. The
switching component of power arises when energy is drawn from the power
supply to charge parasitic capacitors. It is the dominant power component in a
well designed circuit and it can be lowered by reducing one or more of ,
,
and , while retaining the required speed and functionality.
Even though the exact analysis of circuit delay is quite complex, a simple
rst-order derivation can be used [14,15] in order to show its dependency of the
circuit parameters
xwu
h e
g fc
(5.2)
y " iq q Sq q i
i h
Ug
qq i
6q
"i
where depends on the transistors aspect ratio ( / and other device parameters,
is the transistor threshold voltage, and is the velocity saturation
index which varies between 1 and 2 ( is equal to 1.4 for the 1.5 m process
D R A F T
D R A F T
74
5.3
In the following, the circuit design styles are described using the full adder
circuit, which is the most commonly used cell in arithmetic units. Also, their
characteristics in terms of power dissipation and delay are investigated.
5.3.1
Conventional Static CMOS logic is used in most chip designs in the recent
VLSI applications. The schematic diagram of a conventional static CMOS
full adder cell is illustrated in Figure 5.1. The signals noted with - are the
complementary signals. The pMOSFET network of each stage is the dual
network of the nMOSFET one. In order to obtain a reasonable conducting
current to drive capacitive loads the width of the transistors must be increased.
This results in increased input capacitance and therefore high power dissipation
and propagation delay.
A A
-A
-B
-B
-C
-B
-A
B -B
-A
-A
-C
-C
CARRY
-A
-C
-A
-B
-B
SUM
D R A F T
B -B
-A
Figure 5.1.
-B
A A
-A
D R A F T
5.3.2
75
The main concept behind CPL [5] is the use of only an nMOSFET network
for the implementation of logic functions. This results in low input capacitance
and high speed operation. The schematic diagram of the CPL full adder circuit
is shown in Figure 5.2. Because the high voltage level of the pass-transistor
outputs is lower than the supply voltage level by the threshold voltage of the
pass transistors, the signals have to be amplied by using CMOS inverters at
the outputs. CPL circuits consume less power than conventional static circuits
because the logic swing of the pass transistor outputs is smaller than the supply
voltage level. The switching power dissipated from charging or discharging the
pass transistor outputs is given by
r q
p h o l Gm l j fGh geG
n k i f hf d
(5.3)
l u "f Sh h g4m l j ff
t s f d k i
where
. In the case of conventional static CMOS circuits
the voltage swing at the output nodes is equal to the supply voltage, resulting in
higher power dissipation. To minimize the static current due to the incomplete
turn-off of the pMOSFET in the output inverters, a weak pMOSFET feedback
device can also be added in the CPL circuits of Figure 5.2, in order to pull the
pass-transistor outputs to full supply voltage level. However, this will increase
the output node capacitance, leading to higher switching power dissipation and
higher propagation delay.
B
-B
-A
-A
-C
-B
-C
SUM
-A
CARRY
D R A F T
D R A F T
76
5.3.3
DPL [6] is a modied version of CPL. The circuit diagram of the DPL full
adder is given in Figure 5.3. In DPL circuits full-swing operation is achieved by
simply adding pMOSFET transistors in parallel with the nMOSFET transistors.
Hence, the problems of noise margin and speed degradation at reduced supply
voltages which are caused in CPL circuits due to the reduced high voltage level,
are avoided. However, the addition of pMOSFETs results in increased input
capacitances.
-C
-A
B
SUM
-B
-B
-C
B
-A
Figure 5.3.
5.3.4
CARRY
Static DCVSL [7], is a differential style of logic requiring both true and complementary signals to be routed to gates. Figure 5.4 shows the circuit diagram of
the static DCVSL full adder. Two complementary nMOSFET switching trees
are constructed to a pair of cross-coupled pMOSFET transistors. Depending on
D R A F T
D R A F T
77
the differential inputs one of the outputs is pulled down by the corresponding
nMOSFET network. The differential output is then latched by the cross-coupled
pMOSFET transistors. Since the inputs drive only the nMOSFET transistors of
the switching trees, the input capacitance is typically two or three times smaller
than that of the conventional static CMOS logic.
SUM
-SUM
CARRY
-A
-A
-C
-B
-B
-C
Figure 5.4.
5.3.5
-B
B
-CARRY
-C
A variation of the differential logic described above is the Static DSL [9].
The SDSL full adder circuit diagram is illustrated in Figure 5.5. Two nMOSFET transistors with their gates connected to a reference voltage (
,
: nMOSFET threshold voltage) are added to reduce the
logic swing at the output nodes. The output nodes are clamped at the half of the
supply voltage level. Thus, the circuit operation becomes faster than standard
DCVSL circuits. However, due to the incomplete turn-off of the cross-coupled
pMOSFET transistors, SDSL circuits dissipate high static power dissipation.
Also, the addition of two extra nMOSFET transistors per gate results in area
overhead.
| y
}{ z "x
x
"x " fx
D R A F T
D R A F T
78
-SUM
SUM
Vref
- CARRY
CARRY
Vref
-A
-A
-B
-B
B
B
-C
-B
5.3.6
-C
C
Figure 5.5.
-C
D R A F T
D R A F T
79
CLK
SUM
CLK
CARRY
C
-C
A
B
-A
CLK
-B
A
CLK
Figure 5.6.
5.3.7
-B B
Dynamic DCVSL [8], is a combination between the domino logic and the
static DCVSL. The circuit diagram of the dynamic DCVSL full adder is given
in Figure 5.7. The advantage of this style over domino logic is the ability to
generate any logic function. Domino logic can only generate noninverted forms
of logic. For example, in the design of a ripple carry adder, two cells must be
designed for the carry propagation, one for the true carry signal and another
for the complementary one (in Figure 5.6, the cell for the true carry signal is
only shown, but the one for the complementary signal is also required). Using
DCVSL to design dynamic circuits will eliminate p-logic gates because of the
inherent availability of complementary signals. The p-logic gates usually cause
long delay times and consumes large areas.
D R A F T
D R A F T
80
SUM
-CARRY
CLK
-A
-A
-B
-C
-B
CARRY
-SUM
CLK
-B
-C
CLK
-C
CLK
Figure 5.7.
5.3.8
D R A F T
D R A F T
81
Done i-1
Done i
Done i-1
Done i-1
-CARRY
-SUM
SUM
M
Done i-1
CARRY
Done i-1
-A
-C
-B
-B
-A
-B
-C
C
-C
5.4
The experimental results described in this section were obtained using a fourbit ripple carry adder. A general block diagram of the adder is illustrated in
Figure 5.9.
C1
S0
D R A F T
B1
FA 1
S1
Figure 5.9.
FA 0
A1
B0
C2
A2
B2
FA 2
A3
C3
S2
C in
A0
B3
FA 3
C out
S3
D R A F T
82
The circuit was designed in a full custom manner for all the design styles
described in the previous section, using a 1.5 m CMOS process technology.
The channel width of the transistors was 4.8 m for the nMOSFETs, and 9.6 m
for the pMOSFETs. The design was based on the full adder cells presented in
Figures 5.1 to 5.8.
Figure 5.10 shows the layout of the conventional static four-bit ripple carry
adder, as an example of the designed circuits.
Figure 5.10. Layout of the conventional static four-bit ripple carry adder
In Table 5.1 the adder silicon area and the number of the transistors for each
design style are given. Although no extensive attempts were made to minimize
area, the numbers presented are a good indication of the relative areas of the
eight adder implementations, which account not only for the transistors, but for
the interconnections as well. For example, even though DPL adder has fewer
transistors than the CSL one, it has longer interconnections, which is reected
by its large area. Dynamic design styles and styles which uses control signals
(such as ECDL) occupy extra area for the routing of the clock and the control
signals. The smallest area is occupied by the CPL circuit, which has fewer
transistors and shorter interconnections than the other adder implementations.
After the design of the layouts, circuit equivalents were extracted for a
detailed circuit simulation using HSPICE [2] to obtain the power and delay
measurements. In our experiments, a supply voltage of 5Volts is used. All
measurements were obtained with each input supplied through a driver consisting of two minimum-sized inverters in series, and each output node driving a
minimum-sized inverter load.
The estimation of power dissipation is a difcult problem because of its data
dependency, and has received a lot of attention [17]. Some direct simulative
power estimation methods have been proposed [18,19], which are expensive in
D R A F T
D R A F T
83
Area and number of transistors of the four-bit ripple carry adder implementations
Adder Area ( 10
CSL
CPL
DPL
SDCVSL
SDSL
DRDL
DDCVSL
ECDL
5.42
4.46
6.52
5.19
6.39
6.48
7.22
7.65
Design Style
m )
Table 5.1.
No. of Transistors
144
88
136
114
130
146
154
166
terms of time. Also, several power estimation methods have been proposed,
where possibilities are used to solve the pattern-dependence problem. However,
in order to achieve good accuracy, the spatial and temporal correlations between
internal nodes should be modeled [20,21]. An alternative way is the use of
statistical methods [22,23,17], that combines the accuracy of simulation-based
techniques with the speed of probabilistic approaches.
In this chapter, the statistical approach proposed by Burch et al. [22] is used
in order to estimate the power dissipation of our designs. Using the powermeter
sub-circuit proposed by Kang [18], HSPICE can measure the average power
consumed by a circuit given a set of input transitions and a time interval. In
the method, the inputs are randomly generated and statistical mean estimation
techniques are used to determine the nal result. In our case for each adder
design we use 200 independent, pseudorandom input transition samples, and
the power consumed for each sample is monitored by HSPICE. All
simulations were carried out at 27C, with an input frequency of 50MHz in
order to accommodate the slowest adder. The power dissipation measures do
not include the power consumed by the drivers and the loads. In Figure 5.11, the
probability distributions of the power dissipation per addition derived from the
measurements, for the eight adder implementations, are shown. Since the data
inputs are independent, power can be approximated to be normally distributed
[22]. This conclusion can also be extracted from the curves of Figure 5.11.
Hence, the mean power dissipation is given by
(5.4)
D R A F T
D R A F T
84
interval [24]. The mean power dissipation of the eight adder implementations
using the simulation results and the equation (4) is given in Table 5.2.
The number of the required samples is extracted using the stopping criterion
[22] of the above method
$ d
(5.5)
where is the desired percentage error in the power estimate. The error in our
statistical power analysis for = 200 and 95% condence interval (
= 1.96)
is less than 7%. In Table 5.2, the percentage error for each adder design is also
given. For the four last designs the error is quite small because of the high
normality of their distributions which leads to small standard deviation.
D R A F T
Figure 5.11.
D R A F T
85
The delay of each design was measured directly from the output waveforms
generated by simulating the adder using HSPICE for the worst case inputs, that
is, inputs which cause the carry to ripple from the least signicant bit position to
most signicant bit position. The worst case delays of the eight adder designs
are listed in the fourth column of Table 5.2. As mentioned in Section 5.1,
the most essential metric of performance in modern VLSI applications is the
power-delay product. By multiplying each power measurement with the worst
case delay, we can found the mean power-delay product of the designs using a
method similar to that used for the mean power dissipation. Hence, the mean
power-delay product is given by
g
Figure 5.12.
(5.6)
where
is the sample average power-delay product. The mean powerdelay product values of the eight adder designs are listed in Table 5.2, and the
probability distributions of the power-delay product are shown in Figure 5.12.
D R A F T
D R A F T
86
Table 5.2. Power dissipation, delay and power-delay product of the four- bit ripple carry adder
implementations
Adder
Design
Style
Mean Power
Dissipation per
addition (mW)
Mean
PowerDelay Product per
addition (pJ)
CSL
CPL
DPL
SDCVSL
SDSL
DRDL
DDCVSL
ECDL
0.422
0.238
0.305
0.432
2.383
0.641
0.957
1.721
6.1
4.8
6.9
6.5
0.6
1.4
0.8
0.6
2.585
0.962
1.020
3.450
10.976
1.865
3.304
4.977
6.125
4.042
3.345
7.986
4.606
2.909
3.453
2.892
0.0302
0.0208
0.0263
0.0362
0.0129
0.0091
0.0074
0.0096
0.1850
0.0841
0.0879
0.2891
0.0594
0.0265
0.0255
0.0278
D R A F T
D R A F T
87
faster than the CPL, because the addition of pMOSFET transistors in parallel
with the nMOSFET transistors results in higher circuit drivability. Also, DPL
avoids the problems of noise margin and speed degradation at reduced supply
voltages which are caused in CPL circuits. As shown in Figure 5.12 and in
Table 5.2, the two styles exhibit similar power-delay product characteristics,
and they are the most efcient for low-power and high-speed applications.
The mean power dissipation and the propagation delay values of the eight
adder implementations are summarized in Figure 5.13. The fast adder circuits
lie to the left of the gure , and those with low power consumption lie toward
the bottom of the gure .
2.5
SDSL
ECDL
1.5
DDCVSL
DRDL
0.5
CSL
DPL
SDCVSL
CPL
0
2
Delay (ns)
Figure 5.13. Power dissipation versus delay of the adder implementations
5.5
Adders
D R A F T
D R A F T
88
logic transitions is more desirable as it will require less dynamic power. This is
only a rst order approximation as the power also depends on switching speed,
gate size, fan-out, output loading e.t.c.
The following types of adders were simulated: Ripple Carry, Constant Block
Width Single-level Carry Skip, Variable Block Width Multi-level Carry skip,
Carry Lookahead, Carry Select, and Conditional Sum. Table 5.3 presents the
worst case number of gate delays, the number of gates, and the average number
of logic transitions for the six 16-bit adder types. All the gates are assumed to
have the same delay, regardless of the fan-in or fan-out.
Table 5.3. Worst Case Delay, Number of Gates, and Average Number of Logic Transitions for
a 16-bit Adder
Adder Type
Worst Case
Delay
(in
gates units)
Number of
Gates
Average Number
of logic Transitions
Ripple Carry
Constant
Block
Width Single-level
Carry Skip
Variable
Block
Width Multi-level
Carry skip
Carry Lookahead
Carry Select
Conditional Sum
36
23
144
156
90
102
17
170
108
10
14
12
200
284
368
100
161
218
5.6
Multipliers
The majority of the real life applications, such as microprocessors and digital processing implementations, require the computation of the multiplication
operation. Specically, speed, area and power efcient implementation of a
multiplier is a very challenging problem. Here, four well-known multipliers: i)
the Array Multiplier [25], ii) the Split Array Multiplier [26], iii) Wallace Tree
Multiplier [27] and iv) the Radix-4 Modied Booth Recoded Wallace Tree
Multipliers [28], are studied in terms of power consumption.
Two kinds of measurements and comparisons in the terms of different design
parameters are performed providing to the designer a plethora of alternative implementations. Particularly, we provide SPICE-like measurements with respect
to the average logic transitions as well as the power consumption. The second
D R A F T
D R A F T
89
5.6.1
The multipliers were described using only AND, OR, and INVERT gates.
The simulation was made using a program called CazM [29], which is similar
to SPICE. Each multiplier was fed with 1.000 pseudorandom inputs. For the
sake of completeness, the carry save array multiplier and the Wallace tree is
presented in the following section, in order to briey describe the architectures
of the most common multipliers. The former is a representative paradigm of
array multipliers, while the Wallace tree is an efcient way to add multiple
partial products together.
A gate level simulation with 10.000 pseudorandom inputs, enabled the gathering of average number of gate-output transitions for each multiplier. During
each input, the number of gates that switch output states is recorded, and an
average number of gate-output transitions per multiplication are computed at
the end of the simulation. Table 5.4 presents the results.
Table 5.4.
8-bit
570
569
549
964
16-bit
7224
4874
3793
3993
32-bit
99906
52221
20055
19542
5.6.2
D R A F T
D R A F T
90
Table 5.5.
Multiplier Type
Power (mW)
Logic Transitions
Array
Split array
Wallace
Modied booth
43.5
38.0
32.0
41.3
7224
4874
3793
3993
into account [30]. Specically, the activity per node, resulting via logic-level
simulation that takes place in a second step, is combined with the capacitance
per node, to compute the power for a certain input vector, according to
(5.7)
UC GG
U @
where
is the capacitance at node ,
is the power supply voltage , is
the frequency and is the activity factor at node . The term f E of Eq. 5.7
is actually the number of transitions from logic 1 to logic 0 per time unit for
the node , which is equal to the ratio of number of node transitions from logic
1 to logic 0, divided by the total number of input vectors:
@
@ C @ 4
(5.8)
(5.9)
@ C @
C
@
U @
Following this procedure, the power estimation errors are in the range of
10-25% [31], compared with SPICE transistor-level simulator. However, the
accuracy of the estimates sufces for the purpose of comparing alternative
module architectures, since its relative evaluation is of importance and not the
absolute accuracy. The 8-bit wide input modules were simulated with 50.000
random vectors, the 16-bit modules with 100.000 vectors, the 32-bit modules
with 150.000 random vectors, and the 64-bit modules with 200.000 vectors. It
should be stressed here, that the energy gures, given later in the characterization
sections of the arithmetic components, correspond to the average energy per
operation. The difference of the two characterization procedures in the number
of test vectors is signicant.In this section, power measures for the synthesized
multipliers, namely the carry save array, the Booth encoded Wallace tree and
D R A F T
D R A F T
Synthesis
Parameters
Set
Parametrical
VHDL
Description
91
Target
Technology
Logic Level
Description
Delay
Estimation
Area
Estimation
Logic Level
Simulation
Capacitance
per Node
Target
Technology
Switching
Activity per node
Power
Estimation
D R A F T
D R A F T
92
the non-Booth encoded Wallace tree, will be presented, while analysis of results
and comparisons with previous work are made.
Table 5.6 presents the power estimation of the synthesized multipliers. The
measurements of power are normalized by frequency, reecting the fact of
simulation, with different operating frequencies. In this way, a representative
power measure is given, for every kind of multiplier and for every bit width. As
it is shown in Table 5.6, the most power-efcient multiplier for small bit-widths
(less than 32 bits) is the carry save array, but with a small difference, compared
to the Wallace tree with non-Booth encoding. In the 64-bit implementations,
the Wallace tree with non-Booth encoding multiplier is the most power efcient
choice. This fact is explained by the glitches arising from the ripple of carries of
the array multiplier for large bit-widths. Finally, for all bit-widths, the Wallace
tree with Booth encoding multiplier has the worst power dissipation.
Table 5.6.
Power (mW)
Multiplier Type/
Multiplier Width
Carry Save Array
Wallace Tree
(Booth Encoded)
Wallace Tree
(Non Booth Encoded)
8-bit
16-bit
32-bit
64-bit
0.3084
0.7868
2.2484
3.1204
3.057023
5.384
20.96759
23.7164
0.5488
2.5992
4.2212
18.8588
For comparison purposes, Table 5.7 shows the results presented in [29], considering 16-bit implementations of array and Wallace tree multipliers. These
designs were described only by AND, OR, and INVERT gates. The implementation technology was a 2-level metal 2- m process. It can be seen that the
array multiplier consumes more energy than the Wallace tree multiplier, which
is contradictory to corresponding values shown in Table 5.6. Only in the 64-bit
case, the array multiplier consumes more energy than the Wallace tree multiplier. The reason for this is the spurious transitions that occur by the rippling
of carries for the array multiplier of the 64-bit implementation, which cannot
compensate the interconnect area/capacitance switched by the array multiplier.
For the remaining cases, the factors of the greater interconnect and cell area for
the Wallace multiplier dominate the power performance of these two kinds of
multipliers.
D R A F T
D R A F T
93
Multiplier
Power (mW)
Logic transitions
43.5
32
7224
3793
The Wallace multiplier with Booth encoding dissipates the most power, while
it is not the largest. It is the fastest multiplier for bit-widths larger than 16 bits
and can be assumed that Booth encoding is a rather power-hungry operation.
Finally, Table 5.8 depicts the Power Delay product of the multipliers at
1MHz frequency. More specically, the carry save array multiplier exhibits
the worst product for all bit-widths, except the 8-bit, due to the large delay
and the substantial power consumption Although the non-Booth Wallace tree
multiplier is the largest, it shows the best Power Delay product for every
bit-width. Where speed is of great interest, especially if large bit widths are
required, and chip area is not a problem, the non-Booth encoded Wallace tree
multiplier is the best candidate for selection.
Table 5.8.
Power*Delay (mW*ns)
8-bit
16-bit
32-bit
Multiplier Type/
Multiplier Width
Carry Save Array
Wallace Tree
(Booth Encoded)
Wallace Tree
(Non Booth Encoded)
5.7
64-bit
4,13256
4,980444
53,51192
25,96173
141,9987
58,09336
2090,049
312,345
3,172064
19,8059
44,95578
251,9536
Conclusions
In this chapter, the most common kinds of adders and multipliers have been
characterized in terms of power, using either a traditional low-level design ow
paradigm, which is rather tedious and incompatible with modern design ows,
but provides the most accurate results, or a high-level design ow paradigm,
which is commonly used.
D R A F T
D R A F T
94
A four-bit ripple carry adder was used, as the benchmark circuit. All the circuits have been designed in a full-custom manner, and simulated using HSPICE.
A statistical approach was used in order to analyze the simulation results. It
has been shown that the circuits which use pass-transistor logic (CPL and DPL)
exhibit better power and the power-delay product characteristics compared to
other design styles.
The array multiplier is power-efcient for small bit widths. Its power consumption grows in proportion to the cube of the word size. The Wallace multiplier is less regular, but is more power efcient, while its power dissipation
grows with the square of the word size.
The speed of the synthesized optimized carry look-ahead is traded-off for
the worst energy power consumption among all the investigated adders, which
have been synthesized. As opposed to the speed optimized architecture of the
carry look-ahead adder, a non-optimized architecture is power-efcient, though
much slower. Power-efcient is the ripple carry adder, too.
References
[1] A. Chandrakasan, R. Brodersen, Low Power Digital Design, Kluwer Academic Publishers, 1995.
[2] Meta-Software, HSPICE Users Manual - Version 96.1,1996.
[3] K. Yano, Y. Sasaki, K. Rikino, K. Seki, Top-Down Pass-Transistor Logic
Design. IEEE Journal of Solid-State Circuits, vol.31, pp. 792-803. 1996
[4] MIPS Technologies, R4200 Microprocessor Product Information, MIPS
Technologies Inc., 1994
[5] K. Yano, T. Yamanaka, T. Nishida, M Saito, K. Shimohigashi, A.
Shimizu, A 3.8-ns CMOS 16 16-b Multiplier Using Complementary
Pass-Transistor Logic, IEEE Journal of Solid-State Circuits, vol.25, pp.
388-395, 1990.
[6] M. Suzuki, N. Ohkubo, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki, Y.
Nakagome, , 1993, A 1.5-ns 32-b CMOS ALU in Double Pass-Transistor
Logic, IEEE Journal of Solid-State Circuits, vol.28, pp. 1145-1151, 1993.
[7] L. Heller, W. Grifn, J. Davis, N. Thoma, Cascode Voltage Switch Logic:
A Differential CMOS Logic Family, Proceedings of IEEE International
Solid-State Circuit Conference, pp. 16-17, 1984.
[8] K. Chu, D. Pulfrey, Design Procedures for Differential Cascode Voltage
Switch Circuits, IEEE Journal of Solid-State Circuits, vol.21, pp. 10821087, 1986.
[9] L. Pfennings, W. Mol, J. Bastiaens, J. Van Dirk, Differential Split-level
CMOS Logic for Subnanosecond Speeds, IEEE Journal of Solid-State
Circuits, vol.20, pp. 1050-1055, 1985.
D R A F T
D R A F T
95
REFERENCES
D R A F T
D R A F T
96
D R A F T
D R A F T