Parallel Prefix Adder
Parallel Prefix Adder
Abstract— Two operands addition is an essential unit in many such the operations in building efficient embedded and digital
embedded and digital signal processors as it is considered as the signal processors such as cryptographic processors.
basic building block for other arithmetic operations. However, as The core operation of these processors is the digital adder
one of the processor's performance issues, the carry propagation unit which contributes in almost every processors task and unit
during the addition operation limits the speed of arithmetic
such as the use of adders to build the Arithmetic Logic Unit
operation. Thus, efficient adder design and implementation
attracted many digital arithmetic designers especially those (ALU) in computers to calculate addresses or increment and
implemented for reconfigurable hardware such as Field decrement operators [2]. Also, adders are utilized in a lot of
Programmable Gate Array (FPGA) design. In this paper, we provide processors to do complex computations (almost in digital
efficient FPGA implementations of five parallel prefix adders systems) such as number theory algorithms (such as the GCD
namely: Kogge-Stone Adder (KSA), Brent-Kung Adder (BKA), Han- design in [1]), cryptography systems (such as the ECC
Carlson Adder (HCA), Sklansky Adder (SkA), and Ladner-Fischer Processor design in [3]), low power applications (such as the
Adder (LFA) using Altera Cyclone IV as a target FPGA device. As a ECC Processor design in [4]), communication systems (such as
result, the comparison between different adders shows that KSA the MIMO System design in [5])) and many other applications.
recorded the best values of critical path delay with 4.504 ns for 64
Because of this essential turn of adders, researchers tried to
bits while BKA recorded the least design area results with 223 logic
elements for the same bit length. Finally, the comparison with enhance the ordinary Carry-Ripple Adder (CRA) to obtain
previous designs illustrates that the proposed adders’ efficient fast adders. For instance, Carry-Lookahead adder
implementations have enhanced the performance over many state- (CLA) was the first known fast adder that has been done by
of-the-art designs with even more than 200%. manipulating the carry propagate to achieve the best and the
fast performance [6]. Thereafter, more precise adjustments in
Keywords—Parallel Prefix Adders (PPAs), Kogge-Stone Adder the carry generation stage of CLA were proposed to finally
(KSA), Brent-Kung Adder (BKA), Han-Carlson Adder (HCA), obtain the Parallel Prefix Adders (PPAs).
Sklansky Adder (SkA), Ladner-Fischer Adder (LFA), FPGA design. PPAs are implemented in Very Large-Scale Integration
I. INTRODUCTION (VLSI) chips which rely heavily on fast and reliable arithmetic
computation, therefore, PPAs are very useful in today’s world
Over the past decades, a significant revolution has been of technology [7]. Practically, five PPAs were proposed
accomplished in the field of digital hardware design. More independently by different researchers based on the
than millions of logic gates and flip-flops can be existed in one distribution of carry propagate and generate signals, they are:
digital hardware design to enable users to configure and Ladner-Fischer Adder (LFA), Brent-Kung Adder (BKA),
implement several functions. Thus, on top of these digital Kogge-Stone Adder (KSA), Hans-Carlson Adder (HCA), and
hardware design techniques is the Field Programmable Gate Sklansky Adder (SkA). Each adder of these is superior in one
Arrays (FPGAs). FPGAs are semiconductor devices that are certain aspect such as area, speed, power, and fan in/out.
based around a matrix of configurable logic blocks (CLBs) In this paper, we propose efficient FPGA implementations
connected via programmable interconnects. FPGA devices can for the five prefix adders' blocks using Altera Cyclone IV
be programmed/configured in the field to perform specific FPGA kit with variable bit-length (8- to- 64 bits). Larger
tasks or applications by the designer using description adders can be implemented by cascading a number of adder
languages such as VHDL (VHSIC (Very High Speed blocks in scalable effective connection. Also, we compare
Integrated Circuit) Hardware Description Language) [1] along between the adders in terms of area (the number of logic
with other computer-aided design program tools such as elements) and the delay (critical path delay), in addition to the
ModelSim simulator and Quartus II Synthesizer. comparison of many state of the art designs.
FPGA devices are designed to give users much more
flexibility to configure and build their own systems. Because II. PARALLEL PREFIX ADDERS -REVISITED
of this flexibility, researchers have adopted FPGAs in Parallel prefix adders (PPAs) [8] are fast two operands
distinctive design environment as a tool to study and analyze adders that execute addition on parallelized manner. PPAs are
their systems. Digital arithmetic designers represented a major just like CLA but with an enhancement in the carry propagation
part of this adoption as recognized by the literature which is stage (called the middle stage). There are five different
full FPGA designs of different arithmetic and number theory vibration of PPAs namely: Ladner-Fischer Adder (LFA), Brent-
operations and techniques. This is due to substantial role of Kung Adder (BKA), Kogge-Stone Adder (KSA), Hans-Carlson
Adder (HCA), and Sklansky Adder (SkA). These adders differ Logic blocks used for the calculation of generate and
by the tree structure design to optimize certain aspects such as, propagate bits can be describe by the following logic equations
performance, power, area size, and fan in/out. For instance, (3) and (4):
KSA utilizes large area size to achieve higher performance
comparing to the others, where LFA suffers from large fan out. Pout = Pin 1 and Pin 2 (3)
PPAs compute addition in three stages shown in Fig.1. Gout = Gin 1 or (Pin1 and Gin 2) (4)
Where the generation group have only logic equation (5) for
carry generation:
Gout = Gin 2 or (Pin2 and Gin 1) (4)
C. Post processing (Calculating the Sum):
This is the last step and is common to all adders of this
family (carry look ahead). It involves computation of sum bits.
Sum bits are computed by the logic given in (5):
Si = Pi xor Gi-1 (5)
III. DESIGN EVALUATION AND RESULTS
As alluded before, we have targeted Altera Cyclone IV
(EP4CGX-22CF19C7) FPGA device to implement the afore
mentioned PPAs using structural VHDL coding as hardware
description language along with Altera Quartus II and
Modelism-Altera 10.1d for simulation and synthesizing
purposes. Each PPA deign included four blocks: partial full
adder block, a group generation block, a group propagation
Fig. 1. Parallel prefix adder stages. block, and a sum block to get the last answer. Also, we have
used the Altera Quartus II software to accomplish the work in
A. Pre-processing stage this paper. Furthermore, a high-performance multiprocessor
The computation of generate and propagate of each bit from platform has been used in the phases of coding, simulation, and
A and B are done in this step. These signals are given by the verification as well as synthesizing and tests. The simulation
logic equations (1) and (2): results are given in table1 and 2 along with figures 3 and 4. The
path delay results were generated using TimeQuest timing
Pi = Ai xor Bi (1) analyzer tool of Quartus II package with Fast 1200mV 0C
Model and area estimation results were generated using
Gi = Ai and Bi (2) Analysis and Synthesize tool after Post-Fitting Mapping and
port assignments.
B. Carry generation network
PPA differentiates from each other by the connections of TABLE I. COMPARISON BETWEEN ADDERS IN TERMS OF PATH DELAY (NS)
the network. It computes carries of each bit by using generate
and propagate signals from previous block. In this block two Block size KSA LFA BKA HCA SkA
blocks are defined group generation and propagation in addition
to group generation only as shown in Fig.2. 8 3.187 3.337 3.337 3.200 3.048
Also, Kumari and Nagendra in [11] implemented three cascading process of different block size from the proposed
different 32-bit PPAs where they got path delay 20.2 ns, 23.2 blocks). Also, the proposed PPAs' implementations can be
ns and 22.9 ns for KSA, LFA, and BKA respectively. These synthesized for many other FPGA kits such as vertex and
designs are slower than their counterparts in our proposed spartan kits for the purpose of benchmarking with other kit
adders by at least a factor of 4. Moreover, authors of [12] families.
simulated their 64-bit KSA using VIRTEX-4 (XC4VFX140
device, 11FF1517 package, -11 speed degree) whereas our REFERENCES
chip technology offers c7 speed degree which result in more [1] Q. A. Al-Haija, M. Al-Ja'fari and M. Smadi, (2016) 'A comparative study
than 50% of optimization in terms of performance. up to 1024 bit Euclid's GCD algorithm FPGA implementation &
synthesizing', 2016 5th International Conference on Electronic Devices,
Furthermore, N. D. Gundi [13] designed a 32-bit BKA using Systems and Applications (ICEDSA), Ras Al Khaimah, United Arab
Complementary Pass Transistor Logic (CPL) and CMOS Emirates, pp. 1-4. https://doi:10.1109/ICEDSA.2016.7818535.
technology and he listed a total delay of 21.427 and 10.650 ns [2] Brey, Barry (2007). The Intel Microprocessors. Pearson Education,
for CPL and CMOS technology which is considered much Dorling Kindersley Publishing. pp. 323–326. ISBN 81-317-1428-4.
slower than our 32-bit BKA (four and two times faster [3] L Tawalbeh, Q. Abu Al-Haija, (2011). 'Enhanced FPGA
Implementations for Doubling Oriented and Jacobi- Quartics Elliptic
respectively). Another comparison can be made with the 64- Curves Cryptography', Journal of Information Assurance and Security,
bit BKA implementation in [14] which recorded a path delay USA:Dynamic Publishers, Inc., vol. 6, no. 3, pp. 167-175, 2011
of 13.275 ns which considered 2.5 times slower than [4] M. Sabaghi, S. Marjani, and A. Majdabadi (2016). The Design of Ultra-
counterpart BKA. Low Power Adder Cell in 90 and 180 nm CMOS Technology. Circuits
In addition, Rani and Kumar [15] implemented five and Systems,07,58-67. doi: 10.4236/cs.2016.72007
different 64-bit PPAs and they got maximum delays of 17, [5] A. J. Pacheco, A. F.Herrero, and J. C. Quirós, “Design and
Implementation of a Hardware Module for MIMO Decoding in a 4G
18.1, 14.9, 15.1, and 38.2 ns for KSA, BKA, LDA, HCA, and Wireless Receiver,” VLSI Design, vol. 2008, Article ID 312614, 8 pages,
SkA implementations respectively. Their synthesized delay 2008. doi:10.1155/2008/312614
outcomes are much slower than mine (shown in table 1) as our [6] G.B. Rosenberger, Simultaneous Carry Adder, U.S. Patent 2,966,305,
64-bit SkA adder optimized the delay 700% compared to its Dec. 27, 1960.
peer in [15]. While, Fariddin and Vijay implementations of 8- [7] Zamhari, N., Voon, P., Kipli, K., Chin, K. L., & Husin, M. H. (2012).
Comparison of parallel prefix adder (PPA). In Proceedings of the World
and 16-bit LFA illustrates an improvement of our LFA Congress on Engineering (Vol. 2, pp. 4-6).
implementations by a factor of 3.5 for similar operand size. [8] M. D. Ercegovac and T. Lang, “Digital Arithmetic," Morgan Kaufmann
Finally, Authors in [16] and [17] implemented various types of Publishers, Elsevier, Vol1, Ch2, pages (51-136), 2004.
PPAs with datapath size of 64- and 16-bit where they best [9] J. Kaur, P. Kumar, (2014), Analysis of 16 & 32 Bit Kogge Stone Adder
delay results are slower than our designs by even more than Using Xilinx Tool, JECET, Vol.3.No.3, 1639-1644.
400%. [10] R. Adusumilli, V. Kumar, (2015). Design and Implementation of a High
Speed 64 bit Kogge-Stone Adder using Verilog HDL. International
IV. CONCLUSION Journal of Electrical and Electronic Engineering & Telecommunications,
Vol. 4, No. 1.
In this paper, we propose an efficient FPGA [11] P. kumari, R. Nagendra, (2013). Design of 32 bit Parallel Prefix Adders,
implementation for parallel prefix adder (Kogge Stone, Han IOSR Journal of Electronics and Communication Engineering (IOSR-
Carlson, Brent Kung, Ladner Fisher, and Sklansky Adders) JECE), PP 01-06.
using Altera FPGA devices technology that improve the [12] A. Kaul, A. Kumar, (2016). Simulation of 64-bit MAC Unit using Kogge
Stone Adder and Ancient Indian Mathematics, Journal of Engineering
computation process. The performance of the proposed designs Research and Application, Vol. 6, Issue 6, pp.01-05
was studied in terms of both critical path delay (ns) and the [13] Noel Daniel Gundi, (2008), Implementation of 32 Bit Brent Kung Adder
area size (LEs) in order to compare the performance of the Using Complementary Pass Transistor Logic, Visvesvaraya
proposed adders and the existing implementations and Technological University.
simulations. In addition, the synthesizing results of this paper [14] V. Jeevan, et all. (2017). Design and Implementation 0f 64-Bit Parallel
targeted Cyclone IV E chip technologies along with four Prefix Brent Kung Adder, International Journal of Electrical and
Electronic Engineering, Vol. 9 Issue No. 1.
different datapath sizes (8-64bit) and it was found that KSA
[15] G. Rani, S. Kumar, (2012). Delay Analysis of Parallel-Prefix Adders,
could be used as fastest FPGA adder. KSA was found to be the International Journal of Science and Research (IJSR), Vol.6 Issue No. 6.
fastest among the others making use of large numbers of [16] S. Butchibabu, S. Kishore Bab (2014). Design and Implementation of
LUTs, where BKA is featuring optimization in area size. Efficient Parallel Prefix Adders on FPGA, International Journal of
However, HCA, which is a hybrid adder that constitutes of a Engineering Research & Technology, Vol. 3 Issue No.7.
good trade-off between speed (KSA) and low Area (BKA), is [17] D. Raj et all., (2015), Design and Implementation of different types of
proved to be most adequate adder for achieving high speed efficient parallel prefix adders, National Conference on Advanced
Innovation in Engineering and Technology, Vol. 3, Special Issue No.1.
close to KSA and low area. As a future work, this work can be
enhanced by building larger adder units (we might use