0% found this document useful (0 votes)
54 views

Parallel Prefix Adder

uytudf

Uploaded by

Arjun Ram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Parallel Prefix Adder

uytudf

Uploaded by

Arjun Ram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA)

Cost Analysis Study of Variable Parallel Prefix


Adders Using Altera Cyclone IV FPGA Kit
Ibrahim Marouf, Mohammed Mosab Asad, Ahmad Bakhuraibah and Qasem Abu Al-Haija
Department of Electrical Engineering, King Faisal University,
Al-Ahsa, 31982, P.O. Box 380, Saudi Arabia

Abstract— Two operands addition is an essential unit in many such the operations in building efficient embedded and digital
embedded and digital signal processors as it is considered as the signal processors such as cryptographic processors.
basic building block for other arithmetic operations. However, as The core operation of these processors is the digital adder
one of the processor's performance issues, the carry propagation unit which contributes in almost every processors task and unit
during the addition operation limits the speed of arithmetic
such as the use of adders to build the Arithmetic Logic Unit
operation. Thus, efficient adder design and implementation
attracted many digital arithmetic designers especially those (ALU) in computers to calculate addresses or increment and
implemented for reconfigurable hardware such as Field decrement operators [2]. Also, adders are utilized in a lot of
Programmable Gate Array (FPGA) design. In this paper, we provide processors to do complex computations (almost in digital
efficient FPGA implementations of five parallel prefix adders systems) such as number theory algorithms (such as the GCD
namely: Kogge-Stone Adder (KSA), Brent-Kung Adder (BKA), Han- design in [1]), cryptography systems (such as the ECC
Carlson Adder (HCA), Sklansky Adder (SkA), and Ladner-Fischer Processor design in [3]), low power applications (such as the
Adder (LFA) using Altera Cyclone IV as a target FPGA device. As a ECC Processor design in [4]), communication systems (such as
result, the comparison between different adders shows that KSA the MIMO System design in [5])) and many other applications.
recorded the best values of critical path delay with 4.504 ns for 64
Because of this essential turn of adders, researchers tried to
bits while BKA recorded the least design area results with 223 logic
elements for the same bit length. Finally, the comparison with enhance the ordinary Carry-Ripple Adder (CRA) to obtain
previous designs illustrates that the proposed adders’ efficient fast adders. For instance, Carry-Lookahead adder
implementations have enhanced the performance over many state- (CLA) was the first known fast adder that has been done by
of-the-art designs with even more than 200%. manipulating the carry propagate to achieve the best and the
fast performance [6]. Thereafter, more precise adjustments in
Keywords—Parallel Prefix Adders (PPAs), Kogge-Stone Adder the carry generation stage of CLA were proposed to finally
(KSA), Brent-Kung Adder (BKA), Han-Carlson Adder (HCA), obtain the Parallel Prefix Adders (PPAs).
Sklansky Adder (SkA), Ladner-Fischer Adder (LFA), FPGA design. PPAs are implemented in Very Large-Scale Integration
I. INTRODUCTION (VLSI) chips which rely heavily on fast and reliable arithmetic
computation, therefore, PPAs are very useful in today’s world
Over the past decades, a significant revolution has been of technology [7]. Practically, five PPAs were proposed
accomplished in the field of digital hardware design. More independently by different researchers based on the
than millions of logic gates and flip-flops can be existed in one distribution of carry propagate and generate signals, they are:
digital hardware design to enable users to configure and Ladner-Fischer Adder (LFA), Brent-Kung Adder (BKA),
implement several functions. Thus, on top of these digital Kogge-Stone Adder (KSA), Hans-Carlson Adder (HCA), and
hardware design techniques is the Field Programmable Gate Sklansky Adder (SkA). Each adder of these is superior in one
Arrays (FPGAs). FPGAs are semiconductor devices that are certain aspect such as area, speed, power, and fan in/out.
based around a matrix of configurable logic blocks (CLBs) In this paper, we propose efficient FPGA implementations
connected via programmable interconnects. FPGA devices can for the five prefix adders' blocks using Altera Cyclone IV
be programmed/configured in the field to perform specific FPGA kit with variable bit-length (8- to- 64 bits). Larger
tasks or applications by the designer using description adders can be implemented by cascading a number of adder
languages such as VHDL (VHSIC (Very High Speed blocks in scalable effective connection. Also, we compare
Integrated Circuit) Hardware Description Language) [1] along between the adders in terms of area (the number of logic
with other computer-aided design program tools such as elements) and the delay (critical path delay), in addition to the
ModelSim simulator and Quartus II Synthesizer. comparison of many state of the art designs.
FPGA devices are designed to give users much more
flexibility to configure and build their own systems. Because II. PARALLEL PREFIX ADDERS -REVISITED
of this flexibility, researchers have adopted FPGAs in Parallel prefix adders (PPAs) [8] are fast two operands
distinctive design environment as a tool to study and analyze adders that execute addition on parallelized manner. PPAs are
their systems. Digital arithmetic designers represented a major just like CLA but with an enhancement in the carry propagation
part of this adoption as recognized by the literature which is stage (called the middle stage). There are five different
full FPGA designs of different arithmetic and number theory vibration of PPAs namely: Ladner-Fischer Adder (LFA), Brent-
operations and techniques. This is due to substantial role of Kung Adder (BKA), Kogge-Stone Adder (KSA), Hans-Carlson

978-1-5386-0872-2/17/$31.00 ©2017 IEEE


2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA)

Adder (HCA), and Sklansky Adder (SkA). These adders differ Logic blocks used for the calculation of generate and
by the tree structure design to optimize certain aspects such as, propagate bits can be describe by the following logic equations
performance, power, area size, and fan in/out. For instance, (3) and (4):
KSA utilizes large area size to achieve higher performance
comparing to the others, where LFA suffers from large fan out. Pout = Pin 1 and Pin 2 (3)
PPAs compute addition in three stages shown in Fig.1. Gout = Gin 1 or (Pin1 and Gin 2) (4)
Where the generation group have only logic equation (5) for
carry generation:
Gout = Gin 2 or (Pin2 and Gin 1) (4)
C. Post processing (Calculating the Sum):
This is the last step and is common to all adders of this
family (carry look ahead). It involves computation of sum bits.
Sum bits are computed by the logic given in (5):
Si = Pi xor Gi-1 (5)
III. DESIGN EVALUATION AND RESULTS
As alluded before, we have targeted Altera Cyclone IV
(EP4CGX-22CF19C7) FPGA device to implement the afore
mentioned PPAs using structural VHDL coding as hardware
description language along with Altera Quartus II and
Modelism-Altera 10.1d for simulation and synthesizing
purposes. Each PPA deign included four blocks: partial full
adder block, a group generation block, a group propagation
Fig. 1. Parallel prefix adder stages. block, and a sum block to get the last answer. Also, we have
used the Altera Quartus II software to accomplish the work in
A. Pre-processing stage this paper. Furthermore, a high-performance multiprocessor
The computation of generate and propagate of each bit from platform has been used in the phases of coding, simulation, and
A and B are done in this step. These signals are given by the verification as well as synthesizing and tests. The simulation
logic equations (1) and (2): results are given in table1 and 2 along with figures 3 and 4. The
path delay results were generated using TimeQuest timing
Pi = Ai xor Bi (1) analyzer tool of Quartus II package with Fast 1200mV 0C
Model and area estimation results were generated using
Gi = Ai and Bi (2) Analysis and Synthesize tool after Post-Fitting Mapping and
port assignments.
B. Carry generation network
PPA differentiates from each other by the connections of TABLE I. COMPARISON BETWEEN ADDERS IN TERMS OF PATH DELAY (NS)
the network. It computes carries of each bit by using generate
and propagate signals from previous block. In this block two Block size KSA LFA BKA HCA SkA
blocks are defined group generation and propagation in addition
to group generation only as shown in Fig.2. 8 3.187 3.337 3.337 3.200 3.048

(Gin2, Pin2) 16 3.572 3.680 4.367 4.144 3.463


(Gin1, Pin1)
32 4.280 4.600 4.045 4.322 4.380
Gin
(Gin2, Pin2) 64 4.504 4.855 5.317 4.509 5.259
1

Table 1 compares the path delay (in Nano Second ns)


between the five parallel prefix adders with variable design
P, G G length sizes (8-, 16-, 32-, and 64-bits). Also, the relationships
generate generate between bit length and delay has been drawn out in figure 3.
Figure 3 obviously shows that BKA recorded the longest
propagation delay with 5.317 ns for 64-bit adder followed by
LFA for almost all bit sizes where the shortest path delay (i.e.
(Gout, Gout faster adder unit) was related to KSA which listed a delay of
Pout) 4.504 for 64-bit and for SkA with small adder sizes (8- and 16-
Gout bits). As for Han Carlson adder which is a hybrid adder
(Gout, Pout) between KSA and BKA, the critical path delay becomes closer
Fig. 2. Group generation and propagation. to KSA delay values as the number of bit increases
2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA)

TABLE II. COMPARISON BETWEEN THE FIVE ADDERS IN TERMS OF AREA

Block size KSA LFA BKA HCA SkA


8 29 22 19 28 24
16 85 56 50 75 55
32 230 125 105 145 130
64 569 292 223 336 282

In short, the graphical plots (Fig.3. and Fig.4.) show that


KSA leading the other adders as it has the smallest time delay.
This result is very useful and conforms the theatrical modeling
of KSA which has the least number of logic levels. Then, HCA
comes second with very close values to KSA easpcially for
larger bit lengths. LFA stands in the middle, where BKA and
SkA are relatively slow with a long path delay for larger bit
lengths. In terms of area size, the equality reverse in which
BKA is highly optimized and KSA is not. Following, we
provide a verification practical example of our simulation for
of 8-bit Kogge Stone Adder (KSA-8) uploaded on the kit:
((C7H) was added with (FCH) and the result is (01C3H). The
Fig. 3. Plot of the path delay in the five adders.
entered operands and the result are presented on 7-segment
display as we can see in Fig.5.
On other hand, table 2 compares the design area (in the
number of Logic Elements LE) between the five parallel prefix
adders with variable design length sizes (8-, 16-, 32-, and 64-
bits). Also, the relationships between bit length and the # of
LEs has been drawn out in figure 4. Its clearly seen that BKA
recorded the best area results as it occupies the least number of
logic element (LE). BKA required only 223 LEs to implement
64-bit adder unit whereas KSA, HCA, LFA, and SkA are
implemented using 569, 336, 292, and 282 LEs, respectively,
for the same bit length. Each logic element is consisted of four
Look-Up-Tables (LUTs). Thus, KSA implementation costs
2276 LUTs, HCA costs 1344 LUTs and (LFA and SkA) costs
about 1148 LUTs. On the other hand, BKA compensates for
long path delay with only 892 LUTs and considered as the most
costless.

Fig. 5. A simulation sample of 8-bit KSA.

In regard of the comparison with other state-of-art


implementations, it is valid and it shows that our proposed
implementation is competitive with many dedicated solutions.
For example, the FPGA implementation of KSA in [9] has
reported an adder delay of 11.260 ns and 12.840 ns at operand
sizes 8-bit and 16-bit, respectively (generated by Xilinx 14.1
Software tool) which are approximately 3.5 times slower than
our 8- and 16-bits KSA adders respectively. In related work,
Adusumilli and Kumar [10] synthesized their KSA design
using Verilog HDL for Xilinx ARTIX-7 and the design tools
reported a maximum delay of 8.959 and 10.30 ns of operand
size 32-bit and 64-bit, respectively. Instead, our design (using
Altera cyclone IV E instead of Xilinx Artix-7) computes the
addition in 4.3 ns and 4.5 ns for similar operand sizes which
Fig. 4. Plot of the path delay in the five adders. almost double the speed of execution.
2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA)

Also, Kumari and Nagendra in [11] implemented three cascading process of different block size from the proposed
different 32-bit PPAs where they got path delay 20.2 ns, 23.2 blocks). Also, the proposed PPAs' implementations can be
ns and 22.9 ns for KSA, LFA, and BKA respectively. These synthesized for many other FPGA kits such as vertex and
designs are slower than their counterparts in our proposed spartan kits for the purpose of benchmarking with other kit
adders by at least a factor of 4. Moreover, authors of [12] families.
simulated their 64-bit KSA using VIRTEX-4 (XC4VFX140
device, 11FF1517 package, -11 speed degree) whereas our REFERENCES
chip technology offers c7 speed degree which result in more [1] Q. A. Al-Haija, M. Al-Ja'fari and M. Smadi, (2016) 'A comparative study
than 50% of optimization in terms of performance. up to 1024 bit Euclid's GCD algorithm FPGA implementation &
synthesizing', 2016 5th International Conference on Electronic Devices,
Furthermore, N. D. Gundi [13] designed a 32-bit BKA using Systems and Applications (ICEDSA), Ras Al Khaimah, United Arab
Complementary Pass Transistor Logic (CPL) and CMOS Emirates, pp. 1-4. https://doi:10.1109/ICEDSA.2016.7818535.
technology and he listed a total delay of 21.427 and 10.650 ns [2] Brey, Barry (2007). The Intel Microprocessors. Pearson Education,
for CPL and CMOS technology which is considered much Dorling Kindersley Publishing. pp. 323–326. ISBN 81-317-1428-4.
slower than our 32-bit BKA (four and two times faster [3] L Tawalbeh, Q. Abu Al-Haija, (2011). 'Enhanced FPGA
Implementations for Doubling Oriented and Jacobi- Quartics Elliptic
respectively). Another comparison can be made with the 64- Curves Cryptography', Journal of Information Assurance and Security,
bit BKA implementation in [14] which recorded a path delay USA:Dynamic Publishers, Inc., vol. 6, no. 3, pp. 167-175, 2011
of 13.275 ns which considered 2.5 times slower than [4] M. Sabaghi, S. Marjani, and A. Majdabadi (2016). The Design of Ultra-
counterpart BKA. Low Power Adder Cell in 90 and 180 nm CMOS Technology. Circuits
In addition, Rani and Kumar [15] implemented five and Systems,07,58-67. doi: 10.4236/cs.2016.72007
different 64-bit PPAs and they got maximum delays of 17, [5] A. J. Pacheco, A. F.Herrero, and J. C. Quirós, “Design and
Implementation of a Hardware Module for MIMO Decoding in a 4G
18.1, 14.9, 15.1, and 38.2 ns for KSA, BKA, LDA, HCA, and Wireless Receiver,” VLSI Design, vol. 2008, Article ID 312614, 8 pages,
SkA implementations respectively. Their synthesized delay 2008. doi:10.1155/2008/312614
outcomes are much slower than mine (shown in table 1) as our [6] G.B. Rosenberger, Simultaneous Carry Adder, U.S. Patent 2,966,305,
64-bit SkA adder optimized the delay 700% compared to its Dec. 27, 1960.
peer in [15]. While, Fariddin and Vijay implementations of 8- [7] Zamhari, N., Voon, P., Kipli, K., Chin, K. L., & Husin, M. H. (2012).
Comparison of parallel prefix adder (PPA). In Proceedings of the World
and 16-bit LFA illustrates an improvement of our LFA Congress on Engineering (Vol. 2, pp. 4-6).
implementations by a factor of 3.5 for similar operand size. [8] M. D. Ercegovac and T. Lang, “Digital Arithmetic," Morgan Kaufmann
Finally, Authors in [16] and [17] implemented various types of Publishers, Elsevier, Vol1, Ch2, pages (51-136), 2004.
PPAs with datapath size of 64- and 16-bit where they best [9] J. Kaur, P. Kumar, (2014), Analysis of 16 & 32 Bit Kogge Stone Adder
delay results are slower than our designs by even more than Using Xilinx Tool, JECET, Vol.3.No.3, 1639-1644.
400%. [10] R. Adusumilli, V. Kumar, (2015). Design and Implementation of a High
Speed 64 bit Kogge-Stone Adder using Verilog HDL. International
IV. CONCLUSION Journal of Electrical and Electronic Engineering & Telecommunications,
Vol. 4, No. 1.
In this paper, we propose an efficient FPGA [11] P. kumari, R. Nagendra, (2013). Design of 32 bit Parallel Prefix Adders,
implementation for parallel prefix adder (Kogge Stone, Han IOSR Journal of Electronics and Communication Engineering (IOSR-
Carlson, Brent Kung, Ladner Fisher, and Sklansky Adders) JECE), PP 01-06.
using Altera FPGA devices technology that improve the [12] A. Kaul, A. Kumar, (2016). Simulation of 64-bit MAC Unit using Kogge
Stone Adder and Ancient Indian Mathematics, Journal of Engineering
computation process. The performance of the proposed designs Research and Application, Vol. 6, Issue 6, pp.01-05
was studied in terms of both critical path delay (ns) and the [13] Noel Daniel Gundi, (2008), Implementation of 32 Bit Brent Kung Adder
area size (LEs) in order to compare the performance of the Using Complementary Pass Transistor Logic, Visvesvaraya
proposed adders and the existing implementations and Technological University.
simulations. In addition, the synthesizing results of this paper [14] V. Jeevan, et all. (2017). Design and Implementation 0f 64-Bit Parallel
targeted Cyclone IV E chip technologies along with four Prefix Brent Kung Adder, International Journal of Electrical and
Electronic Engineering, Vol. 9 Issue No. 1.
different datapath sizes (8-64bit) and it was found that KSA
[15] G. Rani, S. Kumar, (2012). Delay Analysis of Parallel-Prefix Adders,
could be used as fastest FPGA adder. KSA was found to be the International Journal of Science and Research (IJSR), Vol.6 Issue No. 6.
fastest among the others making use of large numbers of [16] S. Butchibabu, S. Kishore Bab (2014). Design and Implementation of
LUTs, where BKA is featuring optimization in area size. Efficient Parallel Prefix Adders on FPGA, International Journal of
However, HCA, which is a hybrid adder that constitutes of a Engineering Research & Technology, Vol. 3 Issue No.7.
good trade-off between speed (KSA) and low Area (BKA), is [17] D. Raj et all., (2015), Design and Implementation of different types of
proved to be most adequate adder for achieving high speed efficient parallel prefix adders, National Conference on Advanced
Innovation in Engineering and Technology, Vol. 3, Special Issue No.1.
close to KSA and low area. As a future work, this work can be
enhanced by building larger adder units (we might use

You might also like