Joshi Thesis 2016
Joshi Thesis 2016
Joshi Thesis 2016
Devices.
by
A Thesis
In
Electrical Engineering
MASTER OF SCIENCES
IN
ELECTRICAL ENGINEERING
Approved
Mark Sheridan
Dean of the Graduate School
May, 2016
© Ashish Joshi, 2016
Texas Tech University, Ashish Joshi, May 2016
ACKNOWLEDGEMENTS
I would like to sincerely thank my supervisor Dr. Nikoubin for providing me the
opportunity to pursue my thesis under his guidance. He has been a commendable
support and guidance throughout the journey and his thoughtful ideas for problems
faced really been the tremendous help. His immense knowledge in VLSI designs
constitute the rich source that I have been sampling since the beginning of my research.
I am especially indebted to my thesis committee members Dr. Bayne and Dr. Nutter.
They have been very gracious and generous with their time, ideas and support. I
appreciate Dr. Nutter’s insights in discussing my ideas and depth to which he forces me
to think.
I would like to thank Texas Instruments and my colleagues Mayank Garg, Jun, Alex,
Amber, William, Wenxiao, Shyam, Toshio, Suchi at Texas Instruments for providing
me the opportunity to do summer internship with them. I continue to be inspired by their
hard work and innovative thinking. I learnt a lot during that tenure and it helped me
identifying my field of interest. Internship not only helped me with the technical aspects
but also build the confidence to accept the challenges and come up with the innovative
solutions.
I have the great pleasure working with Dr. Li. He helped me understanding the
intricacies of the Analog Designs which further strengthen my interest towards mixed
signal design and verification. His guidance to his students goes well beyond the regular
duty of the course instructor.
I am highly indebted and thankful to my family, Dr. Surinder Kumar Joshi, Mrs. Renu
Joshi and Ashinder Joshi for their continual moral support, encouragement and
confidence in me, without whom it was not possible.
Lastly, to all my friends, thank you for your understanding and encouragement in many
moments of crisis. I cannot list all the names here, but you are always on my mind.
ii
Texas Tech University, Ashish Joshi, May 2016
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ........................................................................................... ii
ABSTRACT .................................................................................................................. vi
1. INTRODUCTION ..................................................................................................... 1
Appendix A .................................................................................................................. 68
iv
Texas Tech University, Ashish Joshi, May 2016
v
Texas Tech University, Ashish Joshi, May 2016
ABSTRACT
In this thesis, we propose the new methodology to implement the logic cells with
complementary and single output. Logic cells has been designed in CDM logic style for
the single and complementary outputs. The CDM logic style has been analyzed and
compared with the conventional CMOS logic style with the FinFET devices in super-
threshold operation. Comprehensive study of the performance parameters of the logic
cells with single and complementary outputs implemented in CDM and C-CMOS logic
style has been done in this thesis. Standard cell library with FinFET logic gates in CDM
and static CMOS logic style has been developed in various selected technologies (7nm,
10nm, 14nm, 16nm & 20nm) and used to synthesize the ISCAS’85 benchmark designs
to evaluate the performance improvement. Synopsys silicon smart and library compiler
tool has been used to generate the standard cell libraries using FinFET device models
from PTM and design compiler to synthesize the designs with developed standard cell
libraries. The simulation results shows that CDM based standard cell library achieve the
average power improvement of 17-21% and average PDP improvement of 7-26% for
all benchmark designs compared with conventional CMOS standard cell library in 7nm,
10nm, 14nm, and 16nm & 20nm technology node respectively. Hence we demonstrated
that our low power standard cell design is comparable to the contemporary custom
design optimization techniques used to save power in the design.
vi
Texas Tech University, Ashish Joshi, May 2016
LIST OF TABLES
vii
Texas Tech University, Ashish Joshi, May 2016
LIST OF FIGURES
viii
Texas Tech University, Ashish Joshi, May 2016
4.18. Test Bench for the 3 input NAND-AND in CMOS and CDM Logic Style ....... 26
4.19. Output Waveforms for 3 input CDM NAND-AND logic gate. .......................... 26
4.20. Output Waveforms for 3 input CMOS NAND-AND logic gate......................... 27
4.21. PDP for 3-NAND-AND in CMOS and CDM with varying load cap. ................ 27
4.22. Delay vs Power comparison for 3 Input CMOS and CDM AND-NAND. ......... 27
4.23. Schematic of 3 input NOR-OR gate in CDM logic Style ................................... 28
4.24. Test Bench for 3 input CMOS and CDM NOR-OR gate ................................... 29
4.25. Output waveforms from the CDM NOR-OR Logic cell. .................................... 29
4.26. Output Waveforms from CMOS NOR-OR Logic cell. ...................................... 29
4.27. PDP for the CMOS and CDM NOR-OR with varying load capacitance ........... 29
4.28. Delay vs Power comparison for CMOS and CDM OR-NOR. ........................... 30
4.29. Single Output CDM basic cells (a) Single Level (b)-(d) Two Level .................. 32
4.30. Schematic of AND gate with CDM single output logic style ............................. 33
4.31. Test Bench for CMOS and CDM AND gate ...................................................... 33
4.32. Output Waveforms from CMOS and CDM AND gate ....................................... 34
4.33. Schematic of the OR gate in the single output CDM logic style ........................ 34
4.34. Test Bench for CMOS and CDM OR gate. ........................................................ 35
4.35. Output waveforms from the CMOS and CDM OR gate. .................................... 35
4.36. Schematic of 3 input AND gate in CDM single output logic style..................... 36
4.37. Test Bench for CMOS and CDM 3 Input AND gate. ......................................... 36
4.38. Output waveforms from 3 input AND gate in CDM and CMOS. ...................... 37
4.39. Schematic of 3 Input OR gate in CDM single output logic style........................ 37
4.40. Test Bench for 3 Input CMOS and CDM OR gate. ............................................ 38
4.41. Output Waveforms from 3 Input OR gate in CMOS and CDM. ........................ 38
4.42. Schematic of Half Adder in single output CDM logic style ............................... 39
4.43. Test Bench for CDM and CMOS half adder ....................................................... 39
4.44. Output waveforms from the half adder in CMOS and CDM logic style. ........... 40
4.45. Schematic of Full Adder in CDM single output logic style. ............................... 40
4.46. Test Bench for CDM and CMOS Full adder. ..................................................... 41
4.47. Output waveforms from the Full adder in CMOS and CDM.............................. 41
4.48. Schematic of 4:2 compressor in single output CDM logic style......................... 42
4.49. Test Bench for CMOS and CDM 4:2 compressor design ................................... 42
ix
Texas Tech University, Ashish Joshi, May 2016
4.50. Output Waveforms from 4:2 compressor in CMOS and CDM. ......................... 43
4.51. Schematic of 4 bit by 4 bit multiplier in CDM logic style. ................................. 44
4.52. Test Bench for CDM and CMOS 4 bit by 4 bit multiplier.................................. 44
4.53. Output Waveforms from the Multiplier in CMOS logic style. ........................... 44
4.54. Output Waveforms from the Multiplier in CDM logic style .............................. 44
5.1. CDM Logic Cells (a) Single Level (b)-(d) Two Level ......................................... 47
5.2. Standard Cell Library Design Flow ...................................................................... 49
5.3.ISCAS-85 c6288 16x16 multiplier......................................................................... 54
5.4. Full adder module for ISCAS-85 c6288 16x16 multiplier ................................... 55
5.5. Power Improvement with CDM over CMOS standard cell libraries .................... 56
5.6. PDP Improvement with CDM over CMOS standard cell libraries ....................... 56
5.7. Binary to BCD converter design with CBLD algorithm ....................................... 59
5.8. Normalized Pre-Layout synthesis with 90nm technology. ................................... 61
5.9 Normalized Post-Layout Synthesis with 90nm technology. .................................. 62
5.10 Pre-Layout delay result for 14, 7 and 5nm technology node. .............................. 62
5.11 Pre-Layout PDP results for 14nm (cmos), 7nm & 5nm (FinFET). ...................... 63
5.12 Power Dissipation with CDM and CMOS in 7nm technology. ........................... 64
5.13. Power Delay Product with CDM and CMOS in 7nm technology. ..................... 65
x
Texas Tech University, Ashish Joshi, May 2016
CHAPTER 1
INTRODUCTION
The primary contribution of this work is the low power driven standard cell library based
design methodology. We have worked on designing the standard cell library with the
new logic style. The results obtained are comparable to power saving figures from
various glitch reduction methodologies tailored for the full custom design flow, thus
reducing the performance gap between the two design styles.
1.1 Motivation
1.1.1 Why do need Low Power
The Continual decrease in the feature size, corresponding increase in the device density
and high operating frequencies have made the power consumption a major concern in
the VLSI Design. Excessive power consumption in the integrated circuits discourages
their use in the portable systems. Excessive power consumption also results in
overheating resulting in decrease in the reliability and lifetime of the chip. To control
the temperature levels within the chip, specialized cooling and packaging techniques are
used thereby further increasing the chip cost. The growing need for the portable
communication and computing systems has increased the need for the power
optimization within the chip. Hence the low power design is the critical technology
required in the semiconductor industry today. Simultaneously, we need to decrease the
critical path delay while reducing the overall power consumption of the chip.
1.1.2 Why Improve the Standard Cell Based Design flow
The standard cell design is semi-custom design styles that is based on the set of the
prefabricated standard cells. The design flow used the highly automated synthesis and
place and route tools that uses the highly optimized advanced algorithms. This reduces
the manual efforts required to complete the design in silicon. Existing semi-custom
design flow doesn’t leave any flexibility to optimize for power consumption by reducing
the glitch power. Hence there is strong need to design the standard cell library with cells
1
Texas Tech University, Ashish Joshi, May 2016
having non-skewed (balanced) output to minimize the glitches and their propagation
within the design to minimize the power consumption.
1.2 Contribution of the Thesis
In this thesis, we have successfully designed the standard cell library with cells having
balanced outputs (for multi output cells) with new logic style called as CDM, hence
minimizing the glitches within the design. We have applied the proposed technique to
ISCAS’85 benchmark circuits and found that our methodology is capable of producing
the minimum transient energy design. Standard cell library has been designed with the
FinFET device models from PTM based on BSIM-CMG on five different technology
nodes (7nm, 10nm, 14m, 16nm & 20nm) and has been used to synthesize the benchmark
designs. Simulation results has shown the power improvement of 20-21% and Energy
Improvement (PDP) of 7-21 % compared with the standard cell libraries designed with
C-CMOS logic style on the same technology nodes.
1.3 Organization of the Thesis
A detailed explanation of the thesis work is provided in the following six
chapters.Chapter-2 reviews the basic digital design flows and main sources of the power
consumption in CMOS digital ICs. Chapter 3 demonstrates the advantages of the
FinFET devices over the planer bulk CMOS devices with short channels and hence the
reason for using the FinFET devices in designing the standard cell library for technology
nodes like 7nm, 10nm etc. In Chapter 4, the new logic style Cell Design Methodology
(CDM) has been introduced. Cell has been designed in CDM logic style with both
complementary outputs and Single outputs and compared with their C-CMOS
implementation. Power and PDP improvement obtained for each cell in CDM over C-
CMOS has been shown with various simulation results. Chapter 5 describes the flow
used for the standard cell library design, experimental setup to prove the proposed
concept and presents the results from the ISCAS’85 benchmark circuits. Finally
Chapter-6 presents the conclusion from our experiments and proposed future work.
2
Texas Tech University, Ashish Joshi, May 2016
CHAPTER 2
3
Texas Tech University, Ashish Joshi, May 2016
4
Texas Tech University, Ashish Joshi, May 2016
5
Texas Tech University, Ashish Joshi, May 2016
6
Texas Tech University, Ashish Joshi, May 2016
Where
• Pdyn : Dynamic Power Dissipation of the Gate
• Cload: Load capacitance of the Gate
• Vdd: Power Supply
• f: Clock frequency
• D: Switching probability
The dynamic power dissipation is thus proportional to the number of the transitions
occurring at the gate. Thus an accurate estimation of switching probability in the circuit
provide the estimate of the dynamic power dissipation. In the earlier technologies,
dynamic power dissipation accounts for most of the power dissipation within the circuit.
But with the advent of the deep sub-micron technologies, the other components of the
total power consumption are also becoming significant. Dynamic Power can be
classified into necessary switching activity for the correct functionality and unnecessary
transitions due to the unbalanced paths and skewed outputs (unbalanced outputs) from
the cells in the circuit. The latter component of the dynamic power is called as glitching
power and is explained in the next section.
2.3.1.1 Hazards and Glitch Power
Before signal of the digital circuit reach the steady state, gates can have multiple
transitions. Since the power consumed is proportional to the number of the transitions,
the unnecessary transitions increases the power dissipation. These unnecessary
transitions are called as glitches or hazards. Glitches happen in the circuit due to unequal
arriving time of the signal to the input gates. Glitch power contribute significantly to the
overall power dissipation in some of the typical cases like adders etc.
7
Texas Tech University, Ashish Joshi, May 2016
Consider the example of Fig.2.3 with each gate having one unit of delay. Due to the
unequal arriving time of the inputs to the AND gate, the output of the AND gate shows
the glitch and transmits the pulse of 1 unit width, which equal the inverter delay. This
is known as the Static Hazard.
2.3.2 Short Circuit Power Dissipation
Short circuit power dissipation occurs when the gate switches. During the transition,
there is the short time when both nMOS and pMOS conduct. This effect is equivalent
to shorting the power supply and ground for the shorter amount of time. The current
flowing during these transitions dissipates power called as the Short Circuit Power
Dissipation.
The value of the short circuit current depends on the value of the capacitance connected
to the output of the gate. Consider the example shown in Fig.2.4.For the larger load
capacitance, the output fall time is significantly larger than the input rise time and
conversely for the low load capacitance, the output fall time is substantially smaller than
the input rise time. The amount of the short circuit power dissipation can be calculated
by the following formula:
𝛽𝛽 𝜏𝜏 (2)
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑆𝑆ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = (𝑉𝑉𝑑𝑑𝑑𝑑 − 2𝑉𝑉𝑇𝑇 )3
12 𝑇𝑇
Where
• Pshort-circuit=Short Circuit Dissipation
• β : gain factor of the gate
• Vdd: Power Supply
• τ : rise or fall time of the Input Signal
• T: Clock period
8
Texas Tech University, Ashish Joshi, May 2016
9
Texas Tech University, Ashish Joshi, May 2016
transistor is making the drain to bulk potential –Vdd for the pMOS. The diode current
thus flowing through the junction is given by the expression:
𝐼𝐼 = 𝐴𝐴𝑑𝑑 𝐽𝐽𝑠𝑠 (3)
Where
• Ad: area of the diffusion at the junction of drain and body
• Js: leakage current density; set by the technology
It is desirable to reduce both the quantities. The leakage current density increases with
the temperature.
The other component of the power dissipation is the Sub-threshold conduction. In the
inverter shown the fig. 2.6, when the transistor is turned off there is still current flowing
through the channel due to the drain –source voltage Vds and this current is called as
subthreshold current. The plot of the drain current with Vds is shown in the fig 2.7 and
it exhibits the exponential relation in the sub-threshold conduction region. This is due
to the decrease in the threshold voltage with increase in Vds. In other words the width
of the drain junction depletion region increases with increase in Vds. This effect is
known as Drain Induced Barrier Lowering and causes a sharp increase in the current.
The magnitude of the sub-threshold current is the functions of the process, device size
and supply voltage and is given by the following formula:
𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊 ∗ 𝑉𝑉𝑉𝑉 2 1.8 [𝑉𝑉𝑉𝑉𝑉𝑉−𝑉𝑉𝑉𝑉ℎ] (4)
𝐼𝐼 = µ0 𝐶𝐶𝑜𝑜𝑜𝑜 𝑒𝑒 𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛ℎ [1 − 𝑒𝑒 −𝑣𝑣𝑣𝑣𝑣𝑣/𝑉𝑉𝑉𝑉 ]
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿
In the above equation that process parameter that effects the value of the sub threshold
current is the threshold voltage. Reducing the threshold voltage significantly increases
the subthreshold current. The Subthreshold current is proportional to the supply voltage
and device size.
10
Texas Tech University, Ashish Joshi, May 2016
Fig 2.6. Leakage Power in the inverter before occurrence of the transition at Input.
11
Texas Tech University, Ashish Joshi, May 2016
CHAPTER 3
12
Texas Tech University, Ashish Joshi, May 2016
Fig 3.2. I-V characteristics of the bulk CMOS (22nm) and FinFET (20nm) devices
Fig.3.3 shows the Ion/Ioff ratio versus supply voltage for both the devices. For lower
supply voltages Ion/Ioff ratio is higher for the FinFET than CMOS, while for the higher
13
Texas Tech University, Ashish Joshi, May 2016
supply voltages (higher than 0.72v) it is higher for CMOS. It’s due to the fact that for
higher supply voltages CMOS has lower Ioff but for lower supply voltages Ioff for
CMOS is comparable with FinFET while FinFET have higher Ion current compared
with CMOS.
Fig 3.3. Ion/Ioff variation for CMOS and FinFET with different supply voltages
3.2 Drain Induced Barrier Lowering
Fig 3.4. Drain Current vs Gate Source Voltage for the FinFET and bulk CMOS
Fig 3.4. Shows the variation of the drain current with Gate to Source voltage for the
FinFET and bulk CMOS Devices when Vds is 0.1v and 1.1v.It can be observed from
the figure that threshold voltage decreases with increase in the gate source voltage for
14
Texas Tech University, Ashish Joshi, May 2016
the short channel devices. This effect is called as Drain Induced Barrier Lowering.
Increase in the drain voltage caused the depletion region at drain to penetrate more into
the channel Drain Induced barrier lowering is higher for the bulk CMOS devices
(124mV/V) as compared with the FinFET devices (58mV/V).It shows the lower
threshold variation due to short channel for the FinFET devices. Another important fact
from the figure is the lower threshold voltage for the FinFET devices compared with
CMOS devices which is one of the reason for higher Ion/Ioff ratio.
3.3 Subthreshold Swing
Fig.3.4 also shows that subthreshold swing for the FinFET is 21% lower than the bulk
CMOS at room temperature. Subthreshold Swing of the device is defined as the change
in gate voltage required to increase the drain current by a decade. It shows more
dependency of the drain current on the gate voltage in the FinFET devices. Hence the
drain current increases at faster rate with the change in the gate source voltage for the
FinFET devices.
3.4 Gate Induced Drain Leakage
Gate current leakage in the nanoscale devices is the biggest concern. Gate Induced
Drain Leakage in the bulk CMOS devices happens due to the lateral diffusion of the
source and drain regions. For calculating the Gate induced drain leakage the gate voltage
is swept from negative to positive voltage values.
Fig 3.5. Drain current versus Gate source voltage while Vds=VDD [13]
15
Texas Tech University, Ashish Joshi, May 2016
It can be shown from the fig.3.5 that behavior of the FinFET devices is different from
the bulk CMOS devices for the negative values of the gate source voltages and shows
the better GIDL. With negative value of Vgs, the drain current of both the devices
decreases but for CMOS, it kept constant till Vgs<-0.1 and increases rapidly at Vgs<-
0.3.Higher negative gate voltages for the bulk CMOS devices results in the band
bending at polysilicon, oxide and p well interface as shown in Fig.3.6, resulting
electrons from the valence band in p well tunnel to the conduction band of the n+ and
increasing the gate leakage current.
Fig 3.6. Gate Dielectric tunneling current in NMOS bulk planer device.
From all the above results, It can be observed that FinFET possess better characteristics
compared with the bulk CMOS devices for the short channels. FinFET provide better
control over the channel with less leakage and subthreshold conduction which
significantly contributes to the overall power dissipation. Also, with the independent
gate FinFET devices, we can dynamically change the threshold voltage of the device by
connecting the back gate to the reverse voltage further reducing the off state current.
Better off state current performance from the FinFET devices is the motivation to design
the standard cell libraries in sub-micron technologies with FinFET devices.
16
Texas Tech University, Ashish Joshi, May 2016
CHAPTER 4
17
Texas Tech University, Ashish Joshi, May 2016
18
Texas Tech University, Ashish Joshi, May 2016
Fig 4.2. Different feedback circuits to get full swing outputs [4]
Now in the next section, we have designed various logic cells in complementary output
CDM logic style and compared them with same cells designed in C-CMOS logic style.
4.3 Performance comparison in CDM and CMOS for complementary outputs
Basic logic cells has been implemented with 7nm FinFET devices in both CDM and
CMOS logic style with complementary outputs. The Schematics, output waveforms,
PDP variation with load, power consumption and delay comparison for the various cells
has been shown in the following figures.
4.3.1 AND-NAND gate implemented in CDM and CMOS
19
Texas Tech University, Ashish Joshi, May 2016
Fig 4.5. Test Bench for NAND/AND implemented in CMOS and CDM logic style
20
Texas Tech University, Ashish Joshi, May 2016
Fig 4.8. PDP between CMOS and CDM with varying load capacitance.
21
Texas Tech University, Ashish Joshi, May 2016
Fig 4.9. Delay vs Power plot for AND-NAND in CMOS and CDM logic style.
Table 4.1. shows the various performance parameters for the AND-NAND logic cell in
CMOS and CDM with 1f F load capacitance.
Table 4.1. Performance parameter for AND/NAND in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 22.98 18.22
Power (in nW) 129.6 136.8
PDP (in 10 e-18) 2.979 2.492
22
Texas Tech University, Ashish Joshi, May 2016
Fig 4.12. Test Bench for performance comparison between CMOS and CDM
23
Texas Tech University, Ashish Joshi, May 2016
Fig 4.15. PDP between CMOS & CDM OR-NOR with varying load capacitance
24
Texas Tech University, Ashish Joshi, May 2016
Fig 4.16. Delay vs Power plot for NOR-OR in CMOS and CDM logic style.
Table 4.2. shows the performance parameter comparison of 2 input OR-NOR between
CMOS and CDM with 1f F load capacitance.
Table 4.2. Performance parameter for OR/NOR in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 7.609 12.08
Power (in nW) 100.1 98.25
PDP (in 10 e-18) 761.7m 1.037
25
Texas Tech University, Ashish Joshi, May 2016
Fig 4.18. Test Bench for the 3 input NAND-AND in CMOS and CDM Logic Style
Fig 4.19. Output Waveforms for 3 input CDM NAND-AND logic gate.
26
Texas Tech University, Ashish Joshi, May 2016
Fig 4.20. Output Waveforms for 3 input CMOS NAND-AND logic gate.
Fig 4.21. PDP for 3-NAND-AND in CMOS and CDM with varying load cap.
Fig 4.22. Delay vs Power comparison for 3 Input CMOS and CDM AND-NAND.
27
Texas Tech University, Ashish Joshi, May 2016
4.3.4 3-Input NOR-OR gate implemented in CMOS and CDM Logic Style
28
Texas Tech University, Ashish Joshi, May 2016
Fig 4.24. Test Bench for 3 input CMOS and CDM NOR-OR gate
Fig 4.25. Output waveforms from the CDM NOR-OR Logic cell.
Fig 4.27. PDP for the CMOS and CDM NOR-OR with varying load capacitance
29
Texas Tech University, Ashish Joshi, May 2016
Fig 4.28. Delay vs Power comparison for CMOS and CDM OR-NOR.
Table 4.4. shows the performance parameter comparison of 3 input OR/NOR between
CMOS and CDM with 1f F load capacitance
Table 4.4. Performance parameter for 3 input OR/NOR in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 10.59 16.04
Power (in nW) 131.9 104
PDP (in 10 e-18) 1.398 1.667
From the above simulation results, it can be shown that CDM logic is more efficient in
comparison with the CMOS logic for the complementary outputs with balanced and
symmetrical designs. CMOS logic style works better for NAND-AND logic gates,
hence CMOS standard cell library are more efficient for the NAND/AND intensive
design synthesis.
4.4 Power Saving with CDM
The Power consumption of the circuit can be reduced by considering the following
parameters:
• Switching Activity in the circuit.
• Switching capacitance of each node.
• Supply Voltage
• Short Circuit current
• Leakage current
30
Texas Tech University, Ashish Joshi, May 2016
Now, the advantage of CDM comes from the fact that it is best suitable to implement
all the above power reduction techniques:
1. Switching Activity in the circuit can be reduced by eliminating the glitches.
CDM Designs provides balanced complementary outputs and non-skewed
outputs for the cells with multiple outputs, hence the reduced chances for the
glitches and power dissipation due to glitch propagation.
2. Switching capacitance of the node in CDM will be small compared to the node
in the CMOS design, due to the smaller size if the transistors in CDM
implementation because of less no. of transistor in the critical path (less parasitic
capacitance).
3. Like the CMOS technology, the supply voltage can be reduced but with increase
in the delay for the circuit.
4. There are few ground and power connections means fewer VDD to GND
connections during switching. So CDM implementation should draw the least
amount of the short circuit power.
5. Leakage current contribute significantly as going deep the feature size and
therefore to address this problem, FinFET devices has been used in place of the
bulk MOS transistors to minimize the leakage power. FinFET devices has better
Ioff current performance compared with bulk MOS transistors.
31
Texas Tech University, Ashish Joshi, May 2016
Fig 4.29. Single Output CDM basic cells (a) Single Level (b)-(d) Two Level
Single output CDM basic cell can be seen as the half of the complementary output CDM
basic cell. All the feedback networks used with the complementary outputs CDM cells,
requires the outputs to be complementary which is not true with the single output CDM
cells. Hence we have used inverter at the output of the cells to get the full swing outputs
with enough drive capability.
Single output CDM are observed efficient compared with static CMOS for the better
implementation of the arithmetic circuits such as adders, multipliers and other XOR
intensive circuits. It can proved with the following simulation results for the various
arithmetic modules like half adder, full adder,4 bit multiplier and various primitive
gates like 2 input AND gate, NAND gate, OR gate ,NOR gate, XOR gate , 3 input AND
gate, NAND gate, OR gate, NOR gate and XOR gate.
32
Texas Tech University, Ashish Joshi, May 2016
Fig 4.30. Schematic of AND gate with CDM single output logic style
Fig 4.31. Test Bench for CMOS and CDM AND gate
33
Texas Tech University, Ashish Joshi, May 2016
Fig 4.32. Output Waveforms from CMOS and CDM AND gate
Table 4.5. shows the delay and power consumption of the AND gate implemented in
both logic styles
Table 4.5. Performance parameter for 2 input AND in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 13.15 12.53
Power (in nW) 75.96 75.95
PDP (in 10 e-18) 0.998 0.951
4.5.2 OR gate
Fig 4.33. Schematic of the OR gate in the single output CDM logic style
34
Texas Tech University, Ashish Joshi, May 2016
Fig 4.35. Output waveforms from the CMOS and CDM OR gate.
Table 4.6. shows the delay and ,power consumption of the OR gate implemented in
both logic styles.
Table 4.6. Performance parameter for 2 input OR in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 12.21 12.92
Power (in nW) 92.81 87.88
PDP (in 10 e-18) 1.103 1.135
35
Texas Tech University, Ashish Joshi, May 2016
Fig 4.36. Schematic of 3 input AND gate in CDM single output logic style.
Fig 4.37. Test Bench for CMOS and CDM 3 Input AND gate.
36
Texas Tech University, Ashish Joshi, May 2016
Fig 4.38. Output waveforms from 3 input AND gate in CDM and CMOS.
Table 4.7. shows the delay and ,power consumption of the AND gate implemented in
both logic styles
Table 4.7. Performance parameter for 3 input AND in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 20.89 13.14
Power (in nW) 66.1 59.36
PDP (in 10 e-18) 1.381 780m
Fig 4.39. Schematic of 3 Input OR gate in CDM single output logic style.
37
Texas Tech University, Ashish Joshi, May 2016
Fig 4.40. Test Bench for 3 Input CMOS and CDM OR gate.
Fig 4.41. Output Waveforms from 3 Input OR gate in CMOS and CDM.
Table 4.8. shows the delay and power consumption of the OR gate implemented in
both logic styles
Table 4.8. Performance parameter for 3 input OR in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 18.1 12.81
Power (in nW) 92.95 102.4
PDP (in 10 e-18) 1.6 1.3
38
Texas Tech University, Ashish Joshi, May 2016
Fig 4.42. Schematic of Half Adder in single output CDM logic style
Fig 4.43. Test Bench for CDM and CMOS half adder
39
Texas Tech University, Ashish Joshi, May 2016
Fig 4.44. Output waveforms from the half adder in CMOS and CDM logic style.
Table 4.9. shows the delay and ,power consumption of the half adder gate
implemented in both logic styles.
Table 4.9. Performance parameter for half adder in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 16.23 14.16
Power (in nW) 164 197
PDP (in 10 e-18) 2.66 2.79
Fig 4.45. Schematic of Full Adder in CDM single output logic style.
40
Texas Tech University, Ashish Joshi, May 2016
Fig 4.46. Test Bench for CDM and CMOS Full adder.
Fig 4.47. Output waveforms from the Full adder in CMOS and CDM.
Table 4.10. shows the delay and ,power consumption of the full adder gate
implemented in both logic styles.
Table 4.10. Performance parameter for full adder in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 28.66 19.08
Power (in nW) 462.5 690.2
PDP (in 10 e-18) 12.7 13.17
41
Texas Tech University, Ashish Joshi, May 2016
Fig 4.48. Schematic of 4:2 compressor in single output CDM logic style.
Fig 4.49. Test Bench for CMOS and CDM 4:2 compressor design
42
Texas Tech University, Ashish Joshi, May 2016
Fig 4.50. Output Waveforms from 4:2 compressor in CMOS and CDM.
From the Fig 4.50, it can be observed that CMOS outputs has more glitches when
compared with CDM logic cell, hence CMOS implementation results in more power
consumption.
Table 4.11. shows the delay and power consumption of the 4:2 compressor
implemented in both logic styles
Table 4.11. Performance parameter for 4:2 compressor in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 26.71 20.65
Power (in nW) 668.4 1092
PDP (in 10 e-18) 17.85 22.54
43
Texas Tech University, Ashish Joshi, May 2016
Fig 4.52. Test Bench for CDM and CMOS 4 bit by 4 bit multiplier.
Fig 4.53. Output Waveforms from the Multiplier in CMOS logic style.
Fig 4.54. Output Waveforms from the Multiplier in CDM logic style
44
Texas Tech University, Ashish Joshi, May 2016
Again for the multiplier design, from the above shown figures, we can observe that
CMOS multiplier has more glitches in the output as compared with the CDM logic
style and hence has more power consumption.
Table 4.12. shows the delay and power consumption of the multiplier implemented in
both logic styles.
Table 4.12. Performance parameter for 4 bit multiplier in CDM & C-CMOS
Parameters CDM CMOS
Delay (in ps) 30 24
Power (in nW) 2K 2.8K
PDP (in 10 e-18) 60 67.2
45
Texas Tech University, Ashish Joshi, May 2016
traditional CMOS standard cell library usually consists of thousands of logic cells with
individual layouts but Single Output CDM standard cell library consists of various
complicated logic functions that can be derived by changing the signals (VDD, GND,
and Variable) at the input lines with same layout. Hence Single Output CDM standard
cell libraries result in synthesized designs to be symmetrical. With same footprint and
by changing the input signal, different logic functions can be generated in the single
output CDM logic style. Therefore it results in reduced manual design efforts for the
standard cell library design.
46
Texas Tech University, Ashish Joshi, May 2016
CHAPTER 5
Fig 5.1. CDM Logic Cells (a) Single Level (b)-(d) Two Level
The Basic expression for the output from the single level and two level CDM single
output logic cells can be written as follows:
47
Texas Tech University, Ashish Joshi, May 2016
Y (a) = (5)
𝐴𝐴(𝐼𝐼𝐼𝐼1) + 𝐴𝐴(𝐼𝐼𝐼𝐼2)
Y (b) = (6)
𝐴𝐴(𝐼𝐼𝐼𝐼1) + 𝐴𝐴(𝐵𝐵(𝐼𝐼𝐼𝐼2) + 𝐵𝐵(𝑖𝑖𝑖𝑖3))
Y (c) = (7)
𝐴𝐴 �𝐵𝐵(𝑖𝑖𝑖𝑖1) + 𝐵𝐵(𝑖𝑖𝑖𝑖2)� + 𝐴𝐴(𝐼𝐼𝐼𝐼3)
Y (d) = (8)
𝐴𝐴 �𝐵𝐵(𝑖𝑖𝑖𝑖1) + 𝐵𝐵(𝑖𝑖𝑖𝑖2)� + 𝐴𝐴(𝐵𝐵(𝐼𝐼𝐼𝐼3) + 𝐵𝐵(𝑖𝑖𝑖𝑖4))
The major advantage from the CDM logic cells is that they support the automatic logic
design. We can define different algorithms to extract the different logic functionalities
just with above two basic cells. From the single level CDM basic cell, total of 32 = 9
different logic functions can be generated shown in the following Table 5.1.
Table 5.1. Logic functions from Single level Single output CDM basic cell
In1 In2 Y
0 0 1
0 1 𝐴𝐴
0 B 𝐴𝐴𝐴𝐴
1 0 A
1 1 0
1 B
𝐴𝐴 + 𝐵𝐵
C 0
𝐴𝐴𝐶𝐶
C 1 𝐴𝐴 + 𝐶𝐶
C B
𝐴𝐴𝐶𝐶 + 𝐴𝐴𝐴𝐴
Similarly with the two level CDM logic cell, total of 34 = 81 different logic functions
can be generated. Even after removing the repeating/redundant logic cell, total of 55
different logic cells can be generated with CDM logic style just with two level
implementation. Layout for the basic cells remains the same, only changing the input
lines can change the functionality of the cell. Therefore, this results in CDM cell library
being richer than CMOS cell library with reduced area and power consumption and less
manual efforts. We have confined the CDM logic cells to two level only though there is
feasibility to extend it to 3 levels to make the library richer in terms of the logic
functions. CMOS standard cell library has also been generated consisting of the
primitive logic gates along with arithmetic modules like full adder and half adder. All
48
Texas Tech University, Ashish Joshi, May 2016
the cells in both the designed standard cell library has single instance and device sizes
hasn’t been scaled to multiples to allow better drive capability with more power
consumption.
Standard cell libraries designed are optimized for Energy (PDP) and hence the logic
cells are sized for the minimum PDP for both static CMOS and Single Output CDM
standard cell library. Since standard cell library are designed using the various FinFET
device models from PTM, we can only change the number of the Fins for the FinFET
devices and other parameters are fixed with the model library. Therefore the number of
the fins for PFET and NFET used in the logic cells are selected for the minimum PDP
consumption. SEA algorithm [6] has been used for FinFET sizing in the Single Output
CDM standard cell library to minimize the PDP. The FinFETs at the identical positions
in the basic cells has been grouped together for fin sizing as one variable and then sweep
is performed for all the variables to find the combination for minimum PDP.
5.1 Standard Cell Library Design Flow
Fig 5.2 shows the complete flow for the standard cell library characterization for both
CDM and C-CMOS logic cells
49
Texas Tech University, Ashish Joshi, May 2016
Once the number of FINS has been decided for Single Output CDM and CMOS logic
cells, netlist for the logic cells was generated and formatted as per HSPICE format and
the fed into silicon smart with device models for the standard cell library
characterization using HSPICE simulator. Silicon smart generate the standard cell
library in the liberty format (.lib) containing all the timing and power information of the
cells included in the standard cell library. Liberty format is then converter into the
database format (.db) using the library compiler. All the tools mentioned were from
Synopsys. Standard Cell libraries with database (.db) format are further used to
synthesize various benchmark circuits using Synopsys design vision. All the scripts
used in this flow for the various tools and logic cell netlist for both CMOS and Single
Output CDM cells has been included in the appendix A
BSIM-CMG FinFET device models for feature size 7nm, 10nm, 16nm and 20nm are
available from PTM for HP (high performance) and LSTP (low stand by power).Hence
standard cell library has been designed for all the available device models in different
feature size. All the designed standard cell libraries has been used to synthesize the
benchmark designs to prove the design methodology is independent of the technology
feature size.
5.2 Benchmark Circuits
Benchmark circuits are the collection of the various circuits to evaluate more objectively
the performance of the various synthesis tools. Some of the popular benchmark circuits
includes ISCAS’85, ISCAS’89 and ITC’02 .In general ISCAS’85 is the generally used
for combinational logic circuits. Since the designed standard cell library consists of the
combinational cells only, we use ISCAS’85 for our experiments. Table 5.2 shows the
functionality of the various benchmark designs synthesized with the designed standard
cell libraries for the performance comparison.
50
Texas Tech University, Ashish Joshi, May 2016
51
Texas Tech University, Ashish Joshi, May 2016
52
Texas Tech University, Ashish Joshi, May 2016
Table 5.6 shows the synthesis results for the benchmark circuits with 16nm CDM and
CMOS FinFET standard cell libraries.
Table 5.6 Synthesis Results with 16nm Standard Cell Library
Architecture CMOS CDM CMOS CDM
Power Delay Power Delay PDP PDP
C1355 46.3 234.48 26.4 220.8 10856.424 5829.12
c1908a 21.8 213.79 15.7 237.77 4660.622 3732.989
c3540a 40.6 355.32 33.3 338.85 14425.992 11283.71
c499 35.2 189.45 28.4 196.28 6668.64 5574.352
c432 8.23 282.97 6.52 304.22 2328.8431 1983.514
c6288 252.2 1038.66 259.3 1204.05 261950.05 312210.2
c880 14.1 225.92 12 220.7 3185.472 2648.4
c17 0.1736 25.92 0.19918 24.49 4.499712 4.877918
c2670 32.4 264.69 24.2 281.84 8575.956 6820.528
c5315 83.6 260.24 67.3 250.28 21756.064 16843.84
c7552 126 611.92 90 398.18 77101.92 35836.2
Table 5.7 shows the synthesis results for the benchmark circuits with 20nm CDM and
CMOS FinFET standard cell libraries.
Table 5.7 Synthesis Results with 20nm Standard Cell Library
Architecture CMOS CDM CMOS CDM
Power Delay Power Delay PDP PDP
C1355 72.6 336.57 40 301.41 24434.982 12056.4
c1908a 33.8 298.92 24.4 309.5 10103.496 7551.8
c3540a 65.6 528.88 49.4 418.72 34694.528 20684.77
c499 55.3 273.83 41.9 254.83 15142.799 10677.38
c432 13.2 403.29 10.12 431.25 5323.428 4364.25
c6288 396.7 1463.13 371.1 1499.75 580423.67 556557.2
c880 22.1 325.96 18.4 295.56 7203.716 5438.304
c17 0.272 35.25 0.305 31.67 9.588 9.65935
c2670 51.7 375.79 39 390.43 19428.343 15226.77
c5315 133 363.04 100.2 360.83 48284.32 36155.17
c7552 195.1 918.48 140 627.01 179195.45 87781.4
From the above simulation results by the designed standard cell libraries in different
technologies with both single output CDM and CMOS logic styles, it can be observed
53
Texas Tech University, Ashish Joshi, May 2016
that CDM standard cell library results in Power and PDP efficient designs as
compared with the CMOS standard cell libraries.
5.4 Data Analysis
Power and PDP (Power Delay Product) savings with CDM compared with C-CMOS
has been calculated from the data presented in the tables and shown in Fig 5.5 & Fig
5.6. From the figures it can be observed that Power and PDP has saved with CDM for
all the benchmark designs except c17 and c6288.Design c17 is the small six-NAND
gate circuit and c6288 is 16x16 bit multiplier, with the following schematic shown in
Fig5.3. We have already observed that CDM NAND/AND gate are not optimized
compared with C-CMOS logic style, hence c17 being the NAND intensive design has
been optimized in terms of power and energy compared with C-CMOS standard cell
libraries.
54
Texas Tech University, Ashish Joshi, May 2016
Fig 5.4. Full adder module for ISCAS-85 c6288 16x16 multiplier
The full adder module has been implemented with primitive logic gates (NOR) in the
design. Hence during synthesis with the CDM standard cell, even if there’s full adder
cell in the library the compiler chose to use logic gates to implement full adder and use
that in multiplier design. If compiler would have chosen CDM full adder directly from
standard cells rather than designing it using logic gates, than power and PDP savings
are possible with CDM Standard cell libraries for c6288 as well. This requires change
in the HDL modelling to design c6288 design but since we are using the original
benchmark design only therefore the results are not optimized compared with C-CMOS.
For the rest of the benchmark designs, % savings in terms of power and PDP are
significant.
Synthesis with the CDM standard cell libraries has resulted in average power saving of
17-21% for all the benchmark designs and 7-26% PDP savings compared with C-CMOS
standard cell libraries for all benchmark designs.
55
Texas Tech University, Ashish Joshi, May 2016
Fig 5.5. Power Improvement with CDM over CMOS standard cell libraries
Fig 5.6. PDP Improvement with CDM over CMOS standard cell libraries
56
Texas Tech University, Ashish Joshi, May 2016
the CBLD based designs with CDM standard cell libraries results in more savings in
terms of energy and power. The following sections explains about the design of the
binary to BCD converter design with CBLD and comparison with various other state of
art designs. Comparison has been completed in various technologies and result shows
CBLD algorithm capable of designing fast and energy efficient modules. Later synthesis
for all binary to BCD architectures has been completed with the designed C-CMOS and
CDM standard cell libraries and result shows the CBLD designs with CDM standard
cell libraries achieve 50% energy saving compared with the best near performance
design.
5.5.1 Binary to BCD converter in CBLD
5.5.1.1 Introduction
The goal of this new method of the converter design is to optimize the conversion speed,
power dissipation and area consumed. Most of the recently proposed multiplier designs
uses the 7 bit binary to BCD converters. Binary to BCD converters is the critical
component of the multiplier designs and hence the proposed algorithm has been
designed for such multipliers is based on Complement Based Logic Design. For better
understanding, let us assume the arbitrary truth table for three inputs A, B, C and three
outputs Y1, Y2, Y3, such that:
Y1 = ⨍(A, B, C) (9)
Y2 = ⨍(A, B, C) (10)
Y3 = ⨍(A, B, C) (11)
Therefore, so as to implement all our output functions in terms of the inputs, we can use
the identity matrix multiplied with all our outputs and further multiplying the output
identity matrix with the inputs as shown in in the following equation:
57
Texas Tech University, Ashish Joshi, May 2016
With functions F11, F12 and F13, output Y1 is expressed in terms of the inputs A, B, C
and hence out of those three functions, one is selected for less area, power and high
speed. The above proposed algorithm is scalable and is possible to generate all the
possible functions with respect to inputs with the help of the integration of MATLAB
and Quine-McCluskey Software. Let’s assume we select functions F11, F21, F32 out of
all the available ones as they been simpler and smaller when compared with others.
Hence the final outputs can be expressed with the following equations:
Y1=F11⊕A (12)
Y2=F21⊕B (13)
Y3=F32⊕A (14)
Using the conventional methods of logic realization, we can only get SOP and POS
functions, but here we are defining our outputs with the help of the final XOR gate
which is either buffering or complementing the input line as per the output function with
the help of the functions defined above.
F1=AC (15)
F2= AC' + BC'D' (16)
F3= BCG' + CD'E' + BD' + A (17)
F4= BCG + DFG' + (A'CD'E' + C'DE + BC'D) + AD (18)
F5= CDE' + CEF + B'CD'F' + A'B'DE'F' (19)
F6= AC'D' + BFG + A'B'CF' + B'DEF' + CD (20)
F7= BCD'+E'FG+C'D'F+(A'CD'E'+C'DE+BC'D) F’+AD (21)
F8= G (22)
58
Texas Tech University, Ashish Joshi, May 2016
C3 = AC (23)
C2= (AC' + BC'D') ⨁ B (24)
C1= (BCG' + CD'E' + BD' + A) ⨁ C (25)
C0= (BCG + DFG' + (A'CD'E' + C'DE + BC'D) + AD) ⨁ B (26)
B3= (CDE' + CEF + B'CD'F' + A'B'DE'F') ⨁ C (27)
B2= (AC'D' + BFG + A'B'CF' + B'DEF' + CD) ⨁ E (28)
B1= (BCD'+E'FG+C'D'F+(A'CD'E'+C'DE+ BC'D)F'+AD)⨁B (29)
B0 = G (30)
This architecture is based on three stages, two first stages are with Sum of Product (SOP)
structure for producing control functions (Fi) and the last stage contains two input XOR
gates.
A C B C B C E B G
F1 F2 F3 F4 F5 F6 F7
C3 C2 C1 C0 B3 B2 B1 B0
59
Texas Tech University, Ashish Joshi, May 2016
cells libraries from USC [28] for Pre-Layout Analysis (Fig. 5.10 & 5.11). The synthesis
results are shown in Table 5.8 and Table 5.9-5.10. From the simulation results, we can
confirm that proposed architecture is consistently fast and Energy efficient for all the
technologies from 90nm to 14nm (using CMOS based standard cells) and 7nm, 5nm
using FinFET based standard cell library.
Table 5.8. Post-Layout synthesis results with 90nm technology node.
Architecture Area Power Delay
3-4[31] 293 90 1.23
4-3[31] 295 67.2 1.52
Binary New(BN)[32] 524 85.7 1.18
Shift Add by 3[30] 257 84.8 2.52
3-3-1 Design[33] 411 97.8 1.62
331 modified 1 [30] 405.5 110.2 1.95
331 modified 2 [30] 387.0 98.1 2.06
Range Detect(RD)
330.8 69 1.8
[30]
CBLD[29] 348 70.3 1.03
Table 5.9. Pre-Layout Synthesis with 90, 45 & 32nm technology node.
Architecture 90nm 45nm 32nm
2 2 2
µm µW ns µm µW ns µm µW ns
Parameters Area Pow Del Area Pow Del Area Pow Del
3-4[31] 324 94.53 1.41 122.4 51 0.44 93.52 8.13 0.57
4-3[31] 294 68 1.47 114 40 0.48 76.75 6.42 0.69
Binary
509 84 1.15 232 62.25 0.43 135 10.4 0.54
New[32]
Sh-Add-3
267.2 81.36 2.53 95.2 43.14 0.71 74.71 6.99 1.13
[30]
3-3-1 [33] 434 103 1.76 193.3 69 0.58 117.2 10.1 0.77
331 mod 1
451 108.5 1.62 185 68.4 0.49 124 10.47 0.75
[30]
331 mod 2
428.5 103.8 1.71 174.1 62.12 0.55 119 9.92 0.77
[30]
Range
304 54 1.81 151 45 0.64 86.66 6.97 0.74
Detect [30]
CBLD[29] 354 76 1.08 158.6 46.24 0.32 98.60 6.38 0.52
60
Texas Tech University, Ashish Joshi, May 2016
Table 5.10. Pre-Layout synthesis with 14(CMOS), 7 & 5nm (FinFET) technology.
Architecture 14nm 7nm 5nm
µm2 µW ns µm2 µW ns µm2 µW ns
Parameters Area Pow Del Area Pow Del Area Pow Del
3-4[31] 11.44 2.65 121.48 1.27 1.05 63.98 0.468 0.45 12.51
4-3[31] 11.03 2.48 181 1.28 0.98 74.59 0.476 0.42 13.18
Binary
18.44 2.4 130.32 2.08 1.38 70.4 0.898 0.52 11.79
New[32]
Sh-Add-3
9.62 2.67 266.8 1.08 0.78 120.6 0.396 0.4 22.46
[30]
3-3-1 [33] 15.85 3.26 165.8 1.7 1.31 73 0.696 0.61 13.73
331 mod 1
16.67 4.12 211.14 1.87 1.35 68.74 0.72 .60 14.44
[30]
331 mod 2
15.35 3.13 175.6 1.72 1.22 68.35 0.706 0.61 14.79
[30]
Range
10.81 2.32 160.07 1.25 0.84 70.09 0.498 0.32 17.13
Detect [30]
CBLD[29] 12.76 1.50 119.9 1.47 0.99 56.19 0.58 0.30 7.28
Pre-Layout_90nm
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Area Power Delay PDP APDP EDP
3-4[31] 4-3[31] Binary New[32]
Shift Add by 3[30] 3-3-1 Design[33] 331 modified 1 [30]
331 modified 2 [30] Range Detection [30] CBLD[29]
61
Texas Tech University, Ashish Joshi, May 2016
Post_Layout_90nm
1
0.8
0.6
0.4
0.2
0
Area Power Delay PDP APDP EDP
3-4[31] 4-3[31] Binary New[32]
Shift Add by 3[30] 3-3-1 Design[33] 331 modified 1 [30]
331 modified 2 [30] Range Detection [30] CBLD[29]
267
250
211
181
Delay(in ps)
200
176
166
160
130
121
150
120
121
100
75
70
73
70
68
69
64
56
50
22
14
15
13
13
17
14
12
7
Fig 5.10 Pre-Layout delay result for 14, 7 and 5nm technology node.
62
Texas Tech University, Ashish Joshi, May 2016
870
900
712
800
700
550
541
PDP(µW.ps) 600
449
500
371
322
313
400
300 180
200
96
93
97
73
94
83
67
59
56
5.5
9.0
8.7
8.4
5.6
5.5
100
6.1
2.2
9.0
0
Fig 5.11 Pre-Layout PDP results for 14nm (cmos), 7nm & 5nm (FinFET).
5.5.1.4 Synthesis Results with CDM standard cell library
Further synthesis with the newly designed CDM and C-CMOS standard cell library in
7nm technology node for all the binary to BCD converter designs are shown in table
5.11 & table 5.12 and in Fig 5.12 & Fig 5.13. From the following figures it can be
observed that synthesis with CDM standard cell library results in power savings for all
the referenced architectures and achieving 10% more PDP efficiency in CDM compared
with the C-CMOS logic style for the proposed CBLD based binary to BCD converter
design.
Table 5.11. Power dissipation with CDM and CMOS in 7nm technology node
Power
Dissipation(7nm) (µW)
CMOS CDM
CBLD[29] 0.878 0.676
3-4[31] 1.268 0.864
BN[32] 1.466 1.078
RD[30] 0.877 0.741
3-3-1[33] 1.434 1.129
331mod2[30] 1.32 1.078
4-3[31] 0.896 0.773
331mod1[30] 1.489 1.266
Shiftadd[30] 1.085 0.919
63
Texas Tech University, Ashish Joshi, May 2016
Table 5.12. Power Delay Product with CDM and CMOS in 7nm
PDP(7nm)(µW.ps)
CMOS CDM
CBLD[29] 33.90 31.30
3-4[31] 79.49 51.84
BN[32] 70.87 75.63
RD[30] 60.96 62.93
3-3-1[33] 108.87 96.47
331mod2[30] 93.34 99.55
4-3[31] 59.10 72.73
331mod1[30] 107.31 113.76
Shiftadd[30] 122.06 139.46
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
CMOS CDM
Fig 5.12 Power Dissipation with CDM and CMOS in 7nm technology.
64
Texas Tech University, Ashish Joshi, May 2016
139
122
140.00
114
109
107
120.00
100
96
93
100.00
79
PDP(µW.ps)
76
73
71
80.00
63
61
59
52
60.00
34
40.00 31
20.00
0.00
CMOS CDM
Fig 5.13. Power Delay Product with CDM and CMOS in 7nm technology.
65
Texas Tech University, Ashish Joshi, May 2016
CHAPTER 6
66
Texas Tech University, Ashish Joshi, May 2016
CMOS and CDM logic style. Hybrid Standard cell libraries can provide us PDP efficient
designs exploiting the power efficiency from CDM cells and timing efficiency from C-
CMOS cell during logic synthesis.
67
Texas Tech University, Ashish Joshi, May 2016
Appendix A
DESIGN COMPILER SCRIPT
#/**************************************************/
#/* Compile Script for Synopsys */
#/* dc_shell-t -f <name_of_file.tcl> */
#/**************************************************/
#/* All verilog files, separated by spaces */
set my_verilog_files [list c6288.v]
elaborate $my_toplevel
current_design $my_toplevel
link
ungroup -all -flatten -simple_names
compile -map_effort medium
quit
68
Texas Tech University, Ashish Joshi, May 2016
69
Texas Tech University, Ashish Joshi, May 2016
2 Input OR gate
.subckt ORx2 A B VSS VDD OR
M1 net07 B net27 B pfet nfin=2
M0 net27 A VDD A pfet nfin=2
M3 net07 B VSS B nfet nfin=1
M2 net07 A VSS A nfet nfin=1
X0 net07 VSS VDD OR INV
.ends ORx2
3 Input OR gate
.subckt ORx3 A B C VSS VDD OR
M2 net010 C net34 C pfet nfin=3
M1 net34 B net35 B pfet nfin=3
M0 net35 A VDD A pfet nfin=3
M5 net010 C VSS C nfet nfin=1
M4 net010 B VSS B nfet nfin=1
M3 net010 A VSS A nfet nfin=1
X0 net010 VSS VDD OR INV
.ends ORx3
Full Adder
.subckt full_adder1 A B C VSS VDD Carry Sum
X1 net19 C VSS VDD Sum XORx2
X0 A B VSS VDD net19 XORx2
X8 net19 C VSS VDD net25 ANDx2
X9 A B VSS VDD net24 ANDx2
X10 net25 net24 VSS VDD Carry ORx2
.ends full_adder1
Half Adder
.subckt half_adder1 A B VSS VDD Carry Sum
X0 A B VSS VDD Sum XORx2
X7 A B VSS VDD Carry ANDx2
.ends half_adder1
70
Texas Tech University, Ashish Joshi, May 2016
3 Input OR gate
.subckt ORx3 A B C VSS VDD OR
M7 net02 A VSS A nfet nfin=1
M6 net02 A_bar net2 A_bar nfet nfin=4
M5 net2 B_bar C_bar B_bar nfet nfin=4
M4 net2 B VSS B nfet nfin=4
X4 net02 VSS VDD OR INV
X2 C VSS VDD C_bar INV
X1 B VSS VDD B_bar INV
X0 A VSS VDD A_bar INV
.ends ORx3
Full Adder
.subckt full_adder1 A B C VSS VDD Carry Sum
M17 net012 B_bar VSS B_bar pfet nfin=4
71
Texas Tech University, Ashish Joshi, May 2016
Half Adder
.subckt half_adder1 A B VSS VDD Carry Sum
M8 net36 A_bar B_bar A_bar pfet nfin=2
M7 net36 A VDD A pfet nfin=2
M4 net35 A_bar B A_bar pfet nfin=2
M3 net35 A B_bar A pfet nfin=2
X11 net36 VSS VDD Carry INV
X13 net35 VSS VDD Sum INV
X5 B VSS VDD B_bar INV
X4 A VSS VDD A_bar INV
.ends half_adder1
Different functions generated by changing the input lines to single output CDM single
level and two level basic cell
Func1
.subckt func1 A B E VSS VDD output
M4 net11 B VDD B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 VSS B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output not
.ends func1
Func2
72
Texas Tech University, Ashish Joshi, May 2016
Func3
.subckt func3 A B E D VSS VDD output
M4 net11 B D B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 VSS B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output not
.ends func3
Func4
.subckt func4 A B E D VSS VDD output
M4 net11 B VDD B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 D B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output not
.ends func4
Func5
.subckt func5 A B E D VSS VDD output
M4 net11 B E B nfet nfin=1
M0 net12 A VDD A nfet nfin=1
M2 D B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output not
.ends func5
Func6
.subckt func6 A B E C D VSS VDD output
M4 net11 B D B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 C B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output not
.ends func6
73
Texas Tech University, Ashish Joshi, May 2016
Func7
.subckt func7 A B E C D F VSS VDD output
M5 net012 B F B nfet nfin=1
M4 net11 B D B nfet nfin=1
M0 net12 A net012 A nfet nfin=1
M3 E B net012 B pfet nfin=1
M2 C B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output not
.ends func7
Func8
.subckt func8 A B VSS VDD output
M0 net12 A B A nfet nfin=1
M1 VSS A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func8
Func9
.subckt func9 A B VSS VDD output
M0 net12 A B A nfet nfin=1
M1 VDD A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func9
Func10
.subckt func10 A B VSS VDD output
M0 net12 A VSS A nfet nfin=1
M1 B A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func10
Func11
.subckt func11 A B VSS VDD output
M0 net12 A VDD A nfet nfin=1
M1 B A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func11
Func12
.subckt func12 A B E VSS VDD output
M4 net11 B VSS B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 VDD B net11 B pfet nfin=1
74
Texas Tech University, Ashish Joshi, May 2016
Func13
.subckt func13 A B E VSS VDD output
M4 net11 B E B nfet nfin=1
M0 net12 A VSS A nfet nfin=1
M2 VDD B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func13
Func14
.subckt func14 A B E VSS VDD output
M4 net11 B E B nfet nfin=1
M0 net12 A VDD A nfet nfin=1
M2 VDD B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func14
Func15
.subckt func15 A B E D VSS VDD output
M4 net11 B D B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 VDD B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func15
Func16
.subckt func16 A B E D VSS VDD output
M4 net11 B VSS B nfet nfin=1
M0 net12 A E A nfet nfin=1
M2 D B net11 B pfet nfin=1
M1 net11 A net12 A pfet nfin=1
X0 net12 VSS VDD output INV
.ends func16
Func17
.subckt func17 A B E VSS VDD output
M4 net11 B VDD B nfet nfin=1
M0 net12 A VSS A nfet nfin=1
75
Texas Tech University, Ashish Joshi, May 2016
# HSPICE
set simulator hspice
set simulator_cmd {hspice <input_deck> -o <listing_file>}
76
Texas Tech University, Ashish Joshi, May 2016
# SPECTRE
# set simulator spectre6
# set simulator_cmd {spectremdl -tab -batch <mdl_file> -design <input_deck>
<listing_file> >&/dev/null}
# ELDO
# set simulator eldo
# set simulator_cmd {eldo -compat -i <input_deck> > <listing_file> >&/dev/null}
# MSIM
# set simulator msim
# (csh)
# set simulator_cmd {msim -hsp -i <input_deck> -o <listing_file> >&/dev/null}
# (sh)
# set simulator_cmd {msim -hsp -i <input_deck> -o <listing_file> 2>/dev/null}
# Default simulator options for Finesim, Hspice, Spectre, Msim, and Eldo
set simulator_options {
"common,finesim: finesim_mode=spicehd finesim_method=gear
finesim_speed=0 finesim_dvmax=0.1"
# Simulation resolution
set time_res_high 1e-12
77
Texas Tech University, Ashish Joshi, May 2016
# specifies which multi-rail format to be used in Liberty model; none, v1, or v2.
set liberty_multi_rail_format none
############################
# DEFAULT PINTYPE PARAMETERS
############################
pintype default {
78
Texas Tech University, Ashish Joshi, May 2016
#####################################
# LIBERTY MODEL GENERATION PARAMETERS
#####################################
define_parameters liberty_model {
# Add Liberty header attributes here for use with "model -create_new_model"
set_parameter liberty_time_unit "1ps"
set delay_model "table_lookup"
set default_fanout_load 0.0
set default_inout_pin_cap 0.0
set default_input_pin_cap 0.0
set default_output_pin_cap 0.0
set default_cell_leakage_power 0.0
set default_leakage_power_density 0.0
}
#######################
# VALIDATION PARAMETERS
#######################
define_parameters validation {
# Add validation parameters here
}
79
Texas Tech University, Ashish Joshi, May 2016
References
[1] Q. Xie, X. Lin, Y. Wang, S. Chen, M.J. Dousti, and M. Pedram. “Performance
Comparisons between 7nm FinFET and Conventional Bulk CMOS Standard Cell
Libraries,” IEEE Trans. on Circuits and Systems II, Vol. 62, No. 8, Aug. 2015, pp.
761-765.
[2] Q. Xie, X. Lin, Y. Wang, M.J. Dousti, A. Shafaei, M. Ghasemi-Gol, and M. Pedram.
“5nm FinFET standard cell library optimization and circuit synthesis in near- and
super-threshold voltage regimes,” Proc. of IEEE Computer Society Annual Symp.
on VLSI, Jul. 2014.
[3] Shen-Fu Hsiao,Ming-Yu Tsai, Chia-Sheng Wen."Low Area/Power Synthesis Using
Hybrid Pass Transistor/CMOS Logic Cells in Standard cell-Based Design
Environment,"IEEE Trans. on Circuits and Systems II,EXPRESS BRIEFS, VOL. 57,
NO. 1, JANUARY 2010.
[4] T. Nikoubin, F. Eslami, A. Baniasadi, and K. Navi, “A new cell design methodology
for balanced XOR-XNOR circuits for hybrid-CMOS logic” Journal of Low Power
Electronics 5, 2 (2009).
[5] T.Nikoubin,, Grailoo, M., & Mozafari, H. (2010) “Cell design methodology based
on transmission gate for low-power high-speed balanced XOR-XNOR circuits in
hybrid-CMOS logic” Journal of Low Power Electronics, 6, 1–10.
[6] Tooraj Nikoubin,Poona Bahrebar,Sara Pouri,Keivan Navi, and Vaez
Iravani2."Simple Exact Algorithm for Transistor Sizing of Low-Power High-Speed
Arithmetic Circuits".Hindawi Publishing Corporation VLSI Design Volume 2010,
Article ID 264390.
[7] K. Yano, Y. Sasaki, K. Rikino, and K. Seki, “Top-down pass-transistor logic
design,” IEEE J. Solid-State Circuits, vol. 31, no. 6, pp. 792–803, Jun. 1996.
[8] C. Yang and M. Ciesielski, “Bds: a bdd-based logic optimization system,”
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,
VOL. 21, NO. 7, JULY 2002.
80
Texas Tech University, Ashish Joshi, May 2016
[9] T. Nikoubin, M. Grailoo, and C. Li, “Cell design methodology (cdm) for balanced
carry-inversecarry circuits in hybrid-cmos logic style,” International Journal of
Electronics, vol. 101, no. 10, pp. 1357–1374,2014.
[10] http://web.eecs.umich.edu/~jhayes/iscas.restore/c6288.html
[11] Uppalapati, Siri, Michael L. Bushnell, and Vishwani D. Agrawal. "Glitch-free
design of low power ASICS using customized resistive feedthrough cells." Proc. of
the 9th VLSI Design and Test Symposium. 2005.
[12] http://venividiwiki.ee.virginia.edu/mediawiki/index.php/Main_Page
[13] Farkhani, Hooman, et al. "Comparative study of FinFETs versus 22nm bulk
CMOS technologies: SRAM design perspective." System-on-Chip Conference
(SOCC), 2014 27th IEEE International. IEEE, 2014.
[14] http://ptm.asu.edu/
[15] Brunvand, E. “Digital VLSI Chip Design with Cadence and Synopsys CAD
Tool,” Addison-Wesley, 2010.
[16] Synopsys, Design Compiler User Guide, Product Version 13.3, April 2013.
[17] Liberty User Guides and Reference Manual Suite, Version 2013.03
[18] Synopsys Inc., "Liberty™ ncx user guide," F-2011.06 ed., 2011.
[19] https://www.coursera.org/course/vlsicad
[20] Standard Cell Library design, Lecture Notes Advanced VLSI Design,CMPE-
641,UMBC
[21] http://www.ecs.umass.edu/ece/labs/vlsicad/bds/bds.html
[22] https://embedded.eecs.berkeley.edu/pubs/downloads/sis/
[23] J. Rabaey, Low Power Design Essentials (Integrated Circuits and Systems),
2009.
[24] Sung-Mo Kang and Yusuf Leblebici, CMOS Digital Integrated Circuits
(Analysis and Design), 2nd Edition.
[25] T.Nikoubin, N.Navi, and O.Kavei, “A new method in reorganization of the
timing behavior of symmetric XOR/XNOR circuits”. CSI J. Computer Science and
Engineering 5, 276 (2007).
81
Texas Tech University, Ashish Joshi, May 2016
[26] K. Yano et al., “A 3.8ns CMOS 16×16-b multiplier using complementary pass-
transistor logic”. IEEE J. Solid-State Circuits 25, 388 (1990).
[27] S. Rapolu and T. Nikoubin, "Fast and energy efficient FinFET full adders with
Cell Design Methodology (CDM)," 2015 6th International Conference on
Computing, Communication and Networking Technologies (ICCCNT), Denton, TX,
2015, pp. 1-5.
[28] http://sportlab.usc.edu/
[29] Ashish Joshi, Sri Rathan Rangisetti, Tooraj Nikoubin." Fast and Energy efficient
binary to BCD converter with Complement based logic design, "IEEE Trans. on
Circuits and Systems II,EXPRESS BRIEFS(Submitted).
[30] Sri Rathan Rangisetti, Ashish Joshi, Tooraj Nikoubin, “Area-Efficient and
Power-Efficient Binary to BCD Converters”, IEEE, Sixth International Conference
on Computing, Communications and Networking Technologies 6th ICCCNT–
35239, Denton, U.S.A, July 13 - 15, 2015.
[31] Osama Al-Khaleel, Zakaria Al-Qudah and Mohammad Al-Khaleel, “Fast and
compact binary-to-BCD conversion circuits for decimal multiplication,” IEEE 29th
International Conf. on Computer Design, pp. 226 – 231, Oct. 2011.
[32] Tso-Bing Juang,Yu-Ming Chiu."Fast Binary to BCD Converters for Decimal
Communications Using New Recoding Circuits". IEEE International Symposium on
Integrated Circuits (ISIC), pp.188 – 191, 2014.
[33] Arvind Kumar Mehta, Mukesh Gupta, Vipin Jain, Sudhir kumar." High
Performance Vedic BCD Multiplier and Modified Binary to BCD Converter". IEEE
Annual India Conference (INDICON), pp. 1 – 6 2013.
[34] J. Bhattacharya, A. Gupta, and A. Singh. “A high performance binary to BCD
converter for decimal multiplication”. IEEE International Symposium on VLSI
Design, Automation and Test (VLSI-DAT), pp. 315 – 318, 2010.
[35] G. Jaberipur and A. Kaivani, “Improving the Speed of Parallel Decimal
Multiplication” IEEE Transactions on Computers, vol. 58, issue 11, pp. 1539 -
1552. 2009.
82
Texas Tech University, Ashish Joshi, May 2016
83