32 Bit Kogge Stone Based Hybrid Adder Implemented Using Standard Cells of Different Logic Families
32 Bit Kogge Stone Based Hybrid Adder Implemented Using Standard Cells of Different Logic Families
32 Bit Kogge Stone Based Hybrid Adder Implemented Using Standard Cells of Different Logic Families
Abstract: Adders performs a critical role in all computational operations, thereby optimizing them with
respect to design constraints for a system is essential. In this paper, standard cells of different logic families,
namely- CMOS, Pseudo NMOS, and MGDI, are designed in Cadence Design Suite Virtuoso 6.1.7 in 180nm
technology and characterized using Liberate 15.1.3. The standard cell libraries thus created are then applied
to 32-bit KSA (Kogge-Stone Adder) and KSA based proposed hybrid adder that are implemented in Verilog,
functionally verified on Xilinx Vivado 2020.2 and synthesized on Cadence Genus 15.22. Pseudo NMOS logic
shows 14.03% area savings and MGDI offers 54.43% power saving based on area per cell over the traditional
CMOS technology. It is also seen that the proposed adder offers a decrease in power and delay by 32.13% and
13.75% over KSA, respectively, in CMOS logic. Further discussions are made and suitable applications for all
designs are also discussed.
Keywords: CMOS, Pseudo NMOS, MGDI, Kogge-Stone Adder, Cadence Design Suite, Virtuoso,
Liberate 15.1.3, Genus 15.22, Xilinx Vivado 2020.2
1. INTRODUCTION
Adders are the fundamental building blocks for all arithmetic and logical operations
performed by the ALU (Arithmetic Logic Unit). The design of these modules will
eventually determine the performance of the system. Thus, based on requirements, adders
must be designed to function efficiently, either prioritizing area, speed, scalability or power,
or optimizing all parameters. High-performance adders typically use a parallel prefix tree
to compute the group generate and the group propagate signals that compute the
intermediate and final carries along with the resultant sum bits. Therefore, among the
several adder designs, the Parallel Prefix Adder (PPA) designs are the most preferred for
their higher speed of operation. In the past few decades, several algorithms for addition
were proposed aiming at enhancing the computational efficiency of PPAs.[6] The Kogge-
Stone adder is widely used as the fastest adder since the carry generation is done in O
(log2N) time where n is the number of bits of each of the adder inputs. Kogge-Stone is
thereby considered in this study to enhance the speed of carry propagation through the adder
circuit while optimizing the area and power consumption overheads.[1]
In addition to the optimizations offered by the 32-bit KSA designed in this study, standard
cells for Inverter, NAND and NOR gates were developed using traditional CMOS, Pseudo
NMOS and MGDI techniques.[2] The need for this study is to perform a comprehensive
analysis of the performance of the parallel prefix adders with respect to standard cells
designed so that we may have data to comment on the performance and the associated
application of the specific design. The following sections detail the parameters and design
constraints used in the study along with brief explanations on the modules and devices used.
2. Adder Architectures
2.1. Kogge-Stone Adder
KSA performs the key role of fast addition operation and is often referred to as the prefix
form of Carry Look ahead Adder (CLA). It entirely decreases the delay time in design to
generate the carry signals [3] making it a popular choice in DSP (Digital Signal Processing)
applications and Control System Industries for fast arithmetic and logic functions. The
structure of a 32-bit KSA is detailed in Figure 1. The first two stages consist of the
There are three main computational steps involved in the design of a Kogge-Stone Adder,
namely [1],
1. Pre-Computation
This stage involves the computation of the Generate and Propagate signals
from the inputs which is then passed on to the next stage. The Group
Generate signal (1) is computed as,
where the AND operation of each individual bits of inputs A and B are
computed, with i= 0,1,2…,31, indicating the number of input bits and the
associated generate signal. Similarly, the Propagate signal is computed
(2) as,
Where the XOR operation of each individual bits of inputs A and B are
computed, with i= 0,1,2…,31, indicating the number of input bits and the
associated generate signal.
It can be observed that the Propagate and Generate signals are computed
parallelly, thereby aiding in increasing the speed of computation,
simultaneously decreasing the area of implementation as these signals are
computed using basic logic gates in the pre-processing step.
2. Prefix Computation
This stage calculates the carry signal groups directly – reducing the carry
propagation delay and reflecting the functions of CLA – using the
Propagate and Generate signals obtained from the first stage. The Carry
Propagation (3) and the Carry Generation (4) functions are as given below,
3. Post-Computation
This final stage computes the sum bits of KSA through the XOR of the
Carry Propagation signal and the previous carry signal. This is calculated
as in the equation,
The advantage of this adder block is that the computation of the sum and carry out is
performed before the carry in is supplied, thereby if this type of adder is inserted in the later
stages of computation, i.e., computing MSBs (Most Significant Bit) of the inputs, this stage
would only have to wait for the carry input from the LSBs (Least Significant Bit) of the
inputs to produce the sum and carry out, since its computation occurs parallelly with the
LSBs computation. This adder architecture offers a delay of O (√𝑁) and it is derived from
uniform sizing, where the ideal number of full-adder elements per block is equal to the
square root of the number of bits being added, as that will yield an equal number of MUX
delays.
3. Logic Families
3.1. Complementary Metal Oxide Semiconductor
The CMOS (Complementary Metal Oxide Semiconductor) is a commonly used technique
in the design of digital circuits. It consists of a PUN (Pull Up Network) and a PDN (Pull
Down Network). The PUN consists of PMOS transistors connected to VDD source that
will set the output to logic high when activated and the PDN consists of NMOS transistors
connected to GND that will set the output to logic zero when activated. Both the PUN and
PND networks cannot be simultaneously activated or deactivated, other than during logic
switching when the VDD and GND are shorted and we have dynamic power dissipation.
The static power dissipation of CMOS circuits is extremely low in comparison to other
logic family gates. The comprehensive general structure of CMOS circuits is given in
Figure 4. This structure is usually implemented in the 2:1 PUN to PDN inverter width ratio.
The main advantage of this technique is the availability of full swing output and low static
power dissipation. Although, it must be noted that the number of transistors is more for the
design of each gate, therefore area and power consumption is increased.
It must be noted that in order to drain the output capacitance before shorting the VDD to
the GND, the transistors must be sized in such a way that the pull up PMOS and the PDN
network must be sized in a way that the width of the two must be of the ratio 2:4. It must
be noted that the Noise Margin Low is significantly higher than the CMOS logic gates. This
technique offers full swing output with lower number of transistors. The main drawback is
the static power dissipation at the PMOS transistor that cannot be avoided.
This technique reduced threshold and sub threshold leakage current as compared to CMOS
technology. The fabrication of this technique is possible using the traditional pwell
progression method. Thus, the substrate is permanently connected to GND and the nwell is
connected to the supply and with technology scaling, the influence of source body voltage
The layout of each of these logic gates were implemented using the multi-finger layout
technique to reduce parasitics and delay.[5][7]
4. Simulation Results
4.1. Design and verification of KSA and KSA based Hybrid Adder Verilog codes
The discussed architectures of KSA and the proposed hybrid adders were implemented
using Verilog Hardware Description Language (HDL) and verified on the Xilinx Vivado
2020.2 platform. Figure 7 and Figure 8 represent the functional output of 32-bit KSA and
the 32-bit Hybrid adder, respectively.
4.2. Design and verification of Standard cell libraries and synthesis of the adders
The verified codes were then synthesized in Cadence Genus 15.22 using the standard cells
characterized in Liberate 15.1.3 after their respective layouts were designed in Cadence
Virtuoso 6.1.7. The cells were designed using GPDK 180nm technology library with the
Operating Temperature 250C and Supply Voltage VDD 1.8V. The pitch size was
determined as 0.6 µm and the cell height was set at 15µm. The clock period was set at 50ns
for 32-bit KSA and the clock period was set at 80ns for 32-bit Hybrid adder. It must be
noted that the edge triggered D-flip flops were the same for all three standard cells and were
designed in traditional CMOS logic. All other cells were altered to the structures prescribed
by the respective logic families.
The generic synthesis of the adder architectures resulted in the following schematics. Figure
9. represents 32-bit KSA and Figure 10. represents 32-bit proposed hybrid adder,
respectively.
Table 1. Area of the adder architectures based on different standard cell libraries (µm2)
Logic Families 32-bit KSA 32-bit Hybrid Adder
CMOS 70908.301 (590 cells) 77747.101 (721 cells)
Pseudo NMOS 63769.201 (625 cells) 70860.301 (760 cells)
MGDI 112164.302 (1469 cells) 111822.302 (1449 cells)
The power consumption of the synthesized outputs based on the logic families applied and
the respective adder structure is as given in Table 2.
Table 2. Power consumption of the adder architectures based on different standard cell
libraries (µW)
Logic Families 32-bit KSA 32-bit Hybrid Adder
CMOS 1355.051 919.659
Pseudo NMOS 283655.838 328260.390
MGDI 3813.373 2417.383
The delay of the synthesized outputs based on the logic families applied and the respective
adder structure is as given in Table 3.
Table 3. Delay of the adder architectures based on different standard cell libraries (ps)
Logic Families 32-bit KSA 32-bit Hybrid Adder
CMOS 2895 2497
Pseudo NMOS 3059 2812
MGDI 22563 13289
6. References
6.1. Journal Article
[1] Samraj Daphni, Kasinadar Sundari Vijula Grace, “Design and Analysis of 32-bit Parallel Prefix Adders
for Low Power VLSI Applications,” Advances in Science, Technology and Engineering Systems Journal,
2019, doi:4. 10.25046/aj040213.
[2] C. N. Shilpa, K. D. Shinde and H. V. Nithin, "Design, Implementation and Comparative Analysis of Kogge
Stone Adder Using CMOS and GDI Design: A VLSI Based Approach," 2016 8th International Conference
on Computational Intelligence and Communication Networks (CICN), 2016, pp. 570-574, doi:
10.1109/CICN.2016.117.
[3] Megha Talsania and Eugene John, “A Comparative Analysis of Parallel Prefix Adders”, Dept of Electrical
and Computer Engineering, University of Texas at San Antonio Tx 78249.
[4] Sudeshna Sarkar, Monika Jain, Arpita Saha, Amitha Rathi, “Gate Diffusion Input: A Technique for Fast
Digital Circuits (Implementation on 180 nm Technology)”, IOSR Journals of VLSI and Signal Processing,
2014, Vol 4, Issue 2, Ver. IV.
[5] S.-L. Siu, W.-S. Tam, H. Wong, C.-W. Kok, K. Kakusima, H. Iwai, “Influence of multi-finger layout on
the subthreshold behaviour of nanometre MOS transistors,” Microelectronics Reliability, 2012, doi:52.
10.1016/j.microrel.2011.09.011.
[6] N. Poornima, V s kanchana Bhaaskaran, “Area Efficient Hybrid Parallel Prefix Adders,” Procedia
Materials Science, 2015, doi:10. 371-380. 10.1016/j.mspro.2015.06.069.
6.2. Book
[7] Behzad Razavi, “Design of Analog CMOS Integrated Circuits”, Tata McGraw-Hill Edition, (2002).