0% found this document useful (0 votes)

1 views41 pages

Keywords-Approximate Multiplier, Power Consumption, Propagation Delay

This document discusses the design and implementation of approximate adders and multipliers in digital circuits to improve energy efficiency and performance in applications that tolerate inexact computations. It introduces single exact single approximate (SESA) and single exact dual approximate (SEDA) adders, which allow for runtime configurability between exact and approximate modes while maintaining bounded error. The proposed 8-Bit approximate multiplier demonstrates significant energy savings, achieving a 26.7% reduction in energy-delay product compared to exact multiplication.

Uploaded by

rowdyh20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views41 pages

Keywords-Approximate Multiplier, Power Consumption, Propagation Delay

Uploaded by

rowdyh20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Abstract

An adder is the basic computational circuit in digital Very Large Scale Integration
design. Approximate Adders have been proposed to improve the design metrics of an
adder. Digital Multiplier is a fundamental component in many digital signal
processing (DSP) systems, which takes up the most part of the computational
resources. As many DSP applications have an inherent tolerance for inexact
computations, approximate multiplication is considered as an appropriate substitution
to obtain energy-performance-accuracy tradeoffs, especially in those applications that
require high energy-efficiency in computing. Meanwhile, reducing the supply voltage
is proved to be an efficient way to further lower the total energy consumption. In this
paper multiplexer based approximate full adder is proposed for 8-Bit approximate
multiplier and its circuit implementation is proposed for error-resilient multiplication
with a low supply voltage. Simulation results indicate that the approximate multiplier
with our proposed 8-Bit approximate multiplier consumes the least energy per
operation with the same computational accuracy when compared with other
multipliers for the operand length of 8 bits. It achieves 26.7% reduction on energy-
delay product (EDP) when compared with the exact multiplication.

Keywords- Single Exact Single Approximate Adder, Single Exact Dual Approximate Adder,
Approximate Multiplier, Power consumption, propagation delay.
INTRODUCTION

Page | 1
1.1 Introduction

The use of approximation computing is crucial in modern electrical systems.

It is a computational approach that emphasizes finding accurate solutions to problems
rather than exact ones. This technology has gained popularity in numerous industries
due to the increasing demand for high-performance computer solutions in the energy
sector. While approximation computing offers advantages in terms of resource
conservation and efficiency improvement, it is essential to thoroughly evaluate the
balance between accuracy and resource preservation. The use of approximation
computing depends on the unique requirements of the application and the acceptable
level of variance in the results. Approximate circuits, also known as inexact or
unreliable circuits, are intentionally designed to produce results that are not exact, but
instead include a defined margin of error. These circuits are employed in businesses
that prioritize energy conservation and cost reductions over precise accuracy. They
are particularly beneficial in image and signal processors for tasks such as picture
scaling, filtering, audio and speech processing, as well as audio compression, where a
slight decrease in quality is acceptable. These approximation circuits are widely used
for hardware accelerators in the domains of machine learning and deep learning.

By mid 20thcentury, due to the technology advancements in semiconductor

device fabrication, it is seen that the semiconductor devices could perform the
functions of vacuum tubes (David B Haviland, 2014). The development of small chips
by incorporating large amounts of minute transistors was an enormous improvement
over the manual assembly of discrete electronic components. With the invention of
Integrated Circuits(IC) the approach towards circuit design changed which ensured
mass production capability and reliability. This led to the rapid adoption of
standardized ICs in place of designs using discrete transistors. There are two main
advantages of ICs over discrete circuits - cost and performance. The production cost
of an IC is less, as the chips with the entire circuit is printed as a single entity by
photolithography and not constructing a separate component at a time. Performance
of an IC is also high, since the components switch quickly and consume little power
as they are small and close together. With the advent of ICs in this modern era, all the

Page | 2
technology developments aim at ease of use with improved computational capability.
Almost all applications involve basic arithmetic operations among which, addition and
multiplication are the most important and widely used functional blocks. Hence,
adders and multipliers play a major role in these applications. To bring about
compactness and faster operation, the normal arithmetic mechanism is replaced by
many techniques that promise minimum hardware with early and correct results. Many
adders like carry save adder, carry select adder, carry look ahead adder, carry skip
adder and many others were developed with an aim to result in faster outputs and less
hardware. Improvising adders alone did not meet the requirements of the modern
world. So, the complex and time consuming blocks namely, the multiplier blocks were
concentrated for improving the efficiency further. This chapter gives an overview of
multiplier basics, their types and the effective way for optimization.

Graphic Processing Units (GPUs) have utility in diverse domains, including

gaming and scientific simulations. Graphics rendering can utilize approximate circuits
to enhance real-time performance while maintaining quality within a defined range.
The use of approximate circuits is contingent upon the specific needs of the application
and the tolerable margin of error. Approximation circuits become an attractive choice
when there is a favorable combination of reduced energy consumption, increased
speed, and acceptable error margins. Adders are crucial elements in digital electronics
and computer systems, primarily used for performing arithmetic operations, including
addition. Adders are frequently employed in various applications in the field of
computing and electronics. There are two types of adders: precise adders and
approximate adders. Accurate adders may consume a larger quantity of power and
energy compared to approximate adders, making them less suitable for energy-
efficient circuits. Accurate adders are essential for situations when precision is of
utmost importance, while approximation adders offer advantages in terms of speed,
resource efficiency, and energy conservation [1]. Approximation adders are a
specialized type of digital adder circuit designed to perform addition operations with
a certain degree of approximation. The primary objective of approximation adders is
to minimize power consumption, reduce hardware complexity, and improve speed.
Approximation adders are particularly useful in scenarios where a slight reduction in
Page | 3
accuracy is acceptable or imperceptible, such as multimedia processing, real-time
signal processing, or low-power devices like IoT sensors. However, it is essential to
carefully assess their use based on the specific requirements of the application [2].

A multiplier is most commonly used computation module which involves

more computation time and area compared to an adder. An implementation of an n x
n bit multiplier results in a 2n bit product. Figure 2.1 shows an example of the basic
multiplier structure which consists of the inputs X and Y and the output P. The
multiplier X is with 8 bits ranging from X0 to X7 and the multiplicand Y is with 8 bits
ranging from Y0 to Y7. The output of multiplication process is given by the partial
products Xr Yr where r denotes the positional value of the inputs. The output of the
multiplier is given by P0 to P15. As the bit width increases, the process of
multiplication becomes tedious and the results are difficult to store. Accuracy of the
multiplier is directly related with the multiplier output. If full width output of the
multiplier is taken, the accuracy is maximum and vice versa.

To maintain the accuracy of the system, the output bit length increases
proportionately with the increase in input bit width. But, it is not easy to implement
such architectures with ever increasing bit width. Certain applications including like
personal blood pressure monitors, intelligent toys, IP cameras do not require full
output for manipulation. This chapter‘s approach focuses towards the multiplier
design for such applications. In order to meet the requirements in systems where an
exact result is not required, certain techniques are implemented which provides
comparatively appreciable results with notable minimization in area and power.

A multiplier with n bit multiplicand and n bit multiplier results in output with
2n bits. The partial products that are generated by the multiplication process can be
divided into two sub regions. These regions are named based on their position as Least
Significant Region (LSR) and Most Significant Region (MSR). The least significant
region comprises of n columns ranging from P0 to Pn-1indicating the less significant
portions of the partial product matrix, whereas the partial products ranging from Pn to
P2n-1 refers to the most significant region.

Page | 4
The two techniques by which output of the multiplier can be shrinked is
categorized as word length reduction (de la Guia Solaz et al. 2010 & Han et al. 2005)
and truncation (Schulte & Swartzlander 1993). Word length reduction technique aim
at decreasing power consumption by input shifting. Truncated technique is generally
based on array structures like BaughWooley or Booth in which the bits that are least
significant or the lower regions of the partial product matrix are omitted. In this
chapter, a truncated multiplier is proposed for efficient implementation.

Adition is one of the most widely used operations in many applications [1],
[2], [3], [4], [5], [6]. Due to the widespread use of adders, there have been several
works focused on reducing the energy consumption of adders [1], [2], [3], [4], [5], [6].
In recent years, it has been shown that there are applications that have resilience
toward errors in computations [7], [8]. Most applications that exhibit the behavior of
error resilience lie in the category of recognition, mining, and synthesis [9], [10], [11].
Since these applications are inherently robust to erroneous computations, they do not
require the exact output to give the desired results [12], [13]. Thus, designing circuits
that produce exact output for each computation is not required for these applications.
We require circuits that provide nearby correct outputs for the given application [14],
[15], [16].

Modern systems have various modes of operations, such as the high-

performance mode or the low-power mode. For an application, not all computations
can be approximated, and depending up the output quality required by the user, there
is a need for switching between exact and approximate modes. Also, it has been shown
in many works that an application consists of both exact and approximate sections,
i.e., the entire program cannot be fully approximated [17], [18], [19]. Thus, there is a
need for circuits that can be configured at runtime between exact and approximate
modes.

Various approximate arithmetic circuits such as adders, subtractors,

multipliers, and dividers have been extensively explored [14], [15], [16], [20], [21],
[22], [23], [24]. The prior works related to the design of approximate adders at

Page | 5
transistor level have introduced approximation using the following strategy. The exact
adder design is converted into approximate adder design by removing some transistors
or modifying a group of transistors to generate an approximate adder design [25], [26],
[27], [28]. Some of the exact adders in the n-bit adders were then replaced by these
approximate adders to build the overall approximate adder. Thus, they follow a static
design, i.e., once the number of approximate bits has been decided for approximation,
the overall approximate adder design is fixed. Some works allow for dynamic runtime
configurability at the gate level [29], [30] but in this work we focus on introducing
approximation at the circuit level. The existing designs at transistor levels do not allow
configurability between exact and approximate modes.

In this article, we propose single exact single approximate (SESA) and single
exact dual approximate (SEDA) adders, which not only allow for configurability
between exact and approximate modes but also allow us to vary the amount of
approximation when in approximate mode. According to the best of our knowledge,
SESA and SEDA adders are the first approximate adders to use the same hardware
with some additional circuitry at the circuit level to perform either single (one) n-bit
exact addition or single (one)/dual (two) n-bit approximate additions. Another
important requirement of using approximate adders in general-purpose processors is
to have a maximum bounded error. It was shown in [28] that approximate adder
designs that do not have maximum bounded error cannot be used in approximate
processors as they lead to overflow. SESA and SEDA adders have a maximum
bounded error. A full adder has three inputs and two outputs: SUM and CARRY.
SESA and SEDA adders have a bounded maximum error as they do not introduce any
error in the CARRY output. Thus, for an n-bit adder if m-bits are approximated, the
maximum error is bounded to 2m − 1. This makes them suitable for use in processors.

In this article, we will focus on the works performing approximation at the

lowest level, i.e., the transistor level. A full adder implementation called mirror adder
is shown in Fig. 1 [31]. One of the first works on approximate adder was done by
Gupta et al. [27]. In this work, four different CMOS-based approximate adders were
proposed. The four approximate adder designs have 2, 3, 4, and 5 errors combined in

Page | 6
both the outputs. These approximate adders were derived from mirror adders. The
other works used pass transistor logic (PTL) for the approximate adder
implementations [25], [26]. Since both the works have used PTL for the
implementation, they suffer from logic swing degradation. Thus, PTL-based designs
are not suitable for lower technology node as shown in [32]. In one of the recent works
on approximate adders, hybrid CMOS logic-based approximate adders were designed
to improve the logic swing, and extensive evaluation was done for mobile processing
applications [28].

However, hybrid CMOS logic-based adders suffer from glitches in output and
buffers need to be inserted to improve the delay in multibit adders [28], [33]. None of
these prior works allows for the runtime configurability between exact and
approximate mode. In this work, we present the design of the transistor-level runtime
configurable approximate adders. We first propose SESA, an adder that can perform
either one exact addition or one approximate addition. There are three approximate
SESA adder designs. We then propose the design of the SEDA adder which can
perform either one exact addition or two approximate additions. SEDA also has a
maximum bounded error.

Most image, audio and video processing applications are highly error-tolerant
since humans do not perceive minor variations in them. These applications are often
run in hand-held devices that demand low energy consumption. Also when multiple
images and videos are to be processed, speed is of great importance. In such situations,
error can be introduced deliberately in the application in such a way that we obtain
reduction in power, delay and area, at the cost of reduced accuracy. This deliberate
introduction of error results in a trade-off between accuracy metrics such as PSNR of
the application and performance metrics such as savings in power, area and speed of
implementation. This method of achieving trade-off between accuracy and
performance metrics is referred to as approximate computing.

The basic approximate computing method used in this thesis is to replace

accurate adders in circuits by approximate adders to obtain power savings. In this

Page | 7
chapter, a variety of approximate adders available in the literature is reviewed and the
basic metrics used in characterizing the error and power savings are described.

Many approximate adders have been proposed in the literature. They fall into
two major categories - low latency approximate adders (LLAA) and low power
approximate adders (LPAA). In case of low latency approximate adders such as
Variable Latency Speculative Adder (VLSA), Error Tolerant Adder (ETA-II), Equal
Segmentation Adder (ESA), Accuracy Configurabe Adder (ACA-II), Gracefully
Degrading Adder (GDA), Generic Accuracy Configurable Adder (GeAr), adder with
low relative error, Reconfigurable Approximate Carry Look-Ahead Adder (RAP-
CLA), Reverse Carry Propagate Adder (RCPFA), Simple Accuracy-Reconfigurable
Adder (SARA), Block-based Carry Speculative Adder (BCSA) proposed by Verma et
al. (2008), Zhu et al. (2010a), Dutt et al. (2019), Kahng and Kang (2012), Ye et al.
(2013), Shafique et al. (2015), Hu and Qian (2015), Akbari et al. (2018), Pashaeifar et
al. (2018), Xu et al. (2018), EbrahimiAzandaryani et al. (2019), respectively, the
length of the carry chain is reduced in order to decrease the critical path delay. This is
done by dividing the adder into many subadders. Each sub-adder has simplified carry
prediction logic. These adders can be used to reduce power consumption by lowering
voltage.

All the low power approximate adders belong to the class of approximate
adders called the two-part segmented approximate adders, where the adder is divided
into two sub-adders, with one sub-adder for the most significant bits (MSBs) and
another for the least significant bits (LSBs). The sub-adder that computes the MSB
sum is implemented using an accurate adder. The sub-adder that computes the LSB
sum is implemented using approximate logic to obtain savings in power. In these
adders, the amount of power savings obtained and the accuracy lost is controlled by
the number of bits in the LSB sub-adder.

Approximate Mirror Adders (AMA), XOR/XNOR based Adders (AXA),

Transmission Gate based Adders (TGA), Inexact Adders (InXA) and Approximate
Full Adders (AFA) are the approximate full adders proposed in Gupta et al. (2013),

Page | 8
Yang et al. (2013), Yang et al. (2015), Almurib et al. (2016) and Dutt et al. (2017),
respectively. In these adders, the sum and carry are approximated using a lower
number of transistors, achieving low dynamic power consumption and low area. By
simplifying the logic used to compute the sum and by ignoring the carry ripple in the
approximate part of the adder, dynamic power savings are obtained in Lower part OR
Adder (LOA) and Error Tolerant Adder (ETA-I) proposed by Mahdiani et al. (2010)
and Zhu et al. (2010b), respectively. The approximate adder proposed by Zhu et al.
(2011) is a modification of ETA-I adder that includes additional logic to check the
range of the input patterns, so that accuracy could be improved

Approximate computing can be applied to a broad range of applications at

various levels of abstraction such as algorithm, architecture and circuit level. At the
algorithm level, approximate computing can be used to reduce the run time of the
programs by skipping some of the iterations, simplifying the algorithms and
approximating the data types. At the architectural level, approximate computing is
done by replacing accurate functional blocks with corresponding approximate blocks.
Typically, in signal processing applications, accurate adders and multipliers are
replaced by approximate adders and multipliers. At the circuit level, approximate
computing is achieved using voltage over-scaling or transistor-level functional
approximation. In the voltage over-scaling technique, the circuit is operated well
below the operating voltage thereby introducing timing-induced errors. In case of
functional approximation, lower number of transistors and gates are used,
approximating the functionality and introducing errors even at the operating voltage.
Instead of confining approximate computing to a single level of abstraction, it can be
applied at multiple levels to get more benefits. In this dissertation, approximate
computing is introduced at the architecture level by replacing accurate adders with
approximate adders to achieve significant savings in power at a given accuracy. The
major steps involved in using approximate computing at this level of abstraction are
elaborated below.

 As the first step, an approximate adder is to be designed. Over the past ten years,
many approximate adders have been proposed in the literature to achieve high

Page | 9
speed operation and low power implementation. Two-part segmented approximate
adders are typically used in low power implementations. In these adders, the inputs
are divided into two parts. The upper part of the sum is computed using accurate
adders and the lower part of the sum is approximated using some simplified logic.
The number of bits in the lower part of sum is termed as the level of approximation
for these adders. As the level of approximation is varied, a trade-off between
power and accuracy is obtained. An approximate adder that gives a good trade-off
is desirable.
 Once an approximate adder with good power-accuracy trade-off has been designed
for use in a system, each accurate adder of the system is replaced by the
approximate adder. The level of approximation in each adder can be varied to get
power-accuracy trade-offs. An optimal configuration of approximation levels of
the adders in the system has to be found so that maximum power savings is
obtained for a given accuracy at the output of the system.
 Besides the error, the system typically has other objectives, as for example,
achievable compression of images. It is important to study such trade-offs as well
in systems.

1.2 Motivation
Since accurate adders are replaced by approximate adders to get power
savings, the approximate adder used should be able to provide maximum power
savings for a given value of error metric at the output. This involves computation of
the error metrics and optimization to maximize the number of approximate bits.
Typical error metrics used are mean error distance (MED), mean square error (MSE)
or peak signal to noise ratio (PSNR).

The various methods proposed in the literature to find the optimal

approximation levels for adders in low power implementations can be broadly
classified into three major categories as follows.

1. In the first category, the level of approximation given by the number of

approximate bits, denoted by k in all the adders is kept as a constant. Monte Carlo

Page | 10
(MC) 2 simulations are used to compute the error metric, say signal to noise ratio
(SNR) for different values of k. Finally the value of k that gives an acceptable
value of error metric is chosen as in Almurib et al. (2018), Soares et al. (2015). In
this approach, the possible configurations of approximation levels in all the adders
are not explored sufficiently. This results in a limited range of values for power
savings and error metrics.
2. In the second category, an optimization routine is employed that uses a
precomputed table of error metrics of approximate adder for various values of k
as in Kadiyala et al. (2019). The values of error metrics are computed using MC
simulations, assuming that the inputs are independent and uniformly distributed.
All the adder inputs are assumed to have uniformly distributed inputs and the same
pre-computed table of error metrics is used for all the adders in the circuit.
3. In the third category, an optimization routine is employed that uses analytical
expressions to compute the error metrics of an approximate adder for a given k as
in Gupta et al. (2013), Snigdha et al. (2016), Sengupta et al. (2017), Pashaeifar et
al. (2019). These error metrics are computed assuming that either the adder inputs
are uniformly distributed or the error introduced by the adder is uniformly
distributed.

The optimization framework in each case is different, but all of them follow the
spirit of classical word length optimization problem in that, the error introduced in
each node depends only on the degree of approximation (number of approximate bits)
and is independent of the nature of its inputs. Also, the expression used for the mean
error and variance of error is derived, assuming that the inputs or error are uniformly
distributed. While the assumption of uniformly distributed lower order bits may be
justified for the primary inputs, in many approximate adders, neither the output nor
the error is uniformly distributed. As a result, if the error statistics of an adder are
computed assuming inputs to an adder are uniformly distributed, it gives rise to
inaccurate prediction of overall accuracy of the system.

A more accurate method of obtaining the probability mass function (PMF) of error
is proposed by Sengupta et al. (2019). However, including this method within an

Page | 11
optimization routine would require extensive computations. Moreover, in most
applications, an accurate estimate of the mean error and mean square error is sufficient
and we do not need the PMF of the error. Hence, an important requirement is to derive
an accurate and simple error model for approximate adders, taking into account the
distribution of the inputs and the level of approximation.

Page | 12
SESA: SINGLE EXACT SINGLE
APPROXIMATE ADDERS

Page | 13
Additionally, the design characteristics of SESA adders are especially
addressed in this section. Adders that are part of the SESA family are able to do either
a single accurate addition or a single approximation addition. It can be deduced from
the fact that SESA adders have a finite maximum error that there is no approximation
implemented in the CARRY bits. This article provides an introduction to three
different SESA designs, each of which will be discussed in greater detail in the
following sections.

An exact mirror adder is a type of digital circuit designed using complementary

metal-oxide-semiconductor (CMOS) logic for performing addition. It utilizes the
concept of current mirroring to achieve the sum operation. Similar to a traditional
full adder, an exact mirror adder takes two binary input bits (A and B) and a
carry-in bit (Cin) and outputs the sum (Sum) and carry-out (Cout) of the addition
operation. The core idea behind an exact mirror adder is exploiting the
ability of transistors in CMOS logic to mirror currents. Specific configurations
of transistors are used to create a proportional relationship between the currents
flowing through them. The Exact Mirror Adder is a type of digital adder circuit
used in binary arithmetic. It's designed to perform addition by exploiting the
symmetry in the binary representation of numbers. In binary addition, a carry bit is
generated when the sum of two bits in a column exceeds 1. The Exact Mirror Adder
exploits the symmetry in binary addition, meaning that for every possible
combination of input bits, there's a mirror combination that produces the
same output.

This property allows the Exact Mirror Adder to compute the sum of two binary
numbers by simultaneously generating the mirror inputs and using an
appropriate selection circuit to determine the correct output. By taking advantage
of this symmetry, the number of transistors and logic gates required can be reduced
compared to traditional binary adders, leading to potential improvements in speed
and efficiency. Exact Mirror Adders can be implemented using different logic
families such as CMOS, TTL, or other digital logic technologies. They are

Page | 14
particularly useful in applications where efficiency and speed are critical, such as
in high-performance computing and digital signal processing.

2.1SESA1 Adder

In SESA1 adder, we introduce approximation in the SUM output by making it to

be equal to the complement of CARRY. As seen from Table 2.1, SESA1 adder has
errors in SUM bits for input combinations 000 and 111. The circuit for SESA1 adder
is shown in Fig. 2(a). The CARRY output remains same irrespective of whether
SESA1 adder operates in exact or approximate mode. In exact mode, the SUM output
is computed using a power gated (PG) controlled inverter whose input is the
complement of SUM. In approximate mode, the SUM module is PG, and the
approximate output which is made equal to the complement of CARRY is generated
using another PG-controlled inverter whose input is connected to the complement of
CARRY as shown in Figure.

When we use the SESA1 adder, we make the SUM output equal to the complement
of CARRY. This allows us to insert approximation into the SUM result. When the
input combinations 000 and 111 are used, the SESA1 adder produces mistakes in the
SUM bits. The circuit for SESA1 adder is shown in Figure 2.1. The CARRY output
remains same irrespective of whether SESA1 adder operates in exact or approximate
mode.

In exact mode, the SUM output is computed using a power gated (PG) controlled
inverter whose input is the complement of SUM. In approximate mode, the SUM
module is PG, and the approximate output which is made equal to the complement of
CARRY is generated using another PG-controlled inverter whose input is connected
to the complement of CARRY as shown in Figure. 2.1.

Page | 15
Figure 2.1: Circuit diagram of SESA1

2.2 SESA2 Adder

In SESA2 adder, we introduce approximation in the SUM output by making it

0 for all the input combinations as shown in Table II for approximate mode. As a result
of this, SESA2 adder has errors in SUM bits for input combinations 001, 010, 100,
and 111. The circuit for SESA2 adder is shown in Fig. 2(b). The CARRY output
remains same irrespective of whether SESA2 adder operates in exact or approximate
mode. In exact mode, the SUM output is computed using the normal SUM module. In
approximate mode, the SUM module is PG and the approximate output which is equal
to 0 is generated using a PG-controlled NMOS that pulls down the SUM output to a 0
as shown in Figure.

To implement approximation in the SUM output of the SESA2 adder, we make

it equal to zero for all of the possible input combinations in order to use the
approximate mode. As a consequence of this, the SESA2 adder exhibits faults in the

Page | 16
SUM bits for the input combinations including 001, 010, 100, and 111. The circuit for
SESA2 adder is shown in Figure 2.2.

Figure 2.2: Circuit diagram of SESA2

The CARRY output remains same irrespective of whether SESA2 adder

operates in exact or approximate mode. In exact mode, the SUM output is computed
using the normal SUM module. In approximate mode, the SUM module is PG and the
approximate output which is equal to 0 is generated using a PG-controlled NMOS that
pulls down the SUM output to a 0 as shown in Figure. 2.2.

2.3 SESA3 Adder

In SESA3 adder, we introduce approximation in the SUM output by making it

1 for all the input combinations as shown in Table II. As a result of this, SESA3 adder
has errors in SUM bits for input combinations 000, 011, 101, and 110. Even though

Page | 17
we have the same number of errors as in SESA2, the error in the output is data-
dependent and we will see in Sections V and VI that this will result in different output
errors. The circuit for SESA3 adder is shown in Fig. 2(c). The CARRY output remains
same irrespective of whether SESA3 adder operates in exact or approximate mode. In
exact mode, the SUM output is computed using the normal SUM module. In
approximate mode, the SUM module is PG and the approximate output which is equal
to 1 is generated using a PG-controlled NMOS that pulls down the complement of
SUM to a permanent 0. Since this is connected to an inverter, we get the SUM output
in approximate mode to be a 1 as shown in Figure.

Thus, we have discussed the design of all three SESA adders. For all the three
designs, there is no error in the CARRY output, and hence, all the designs have
maximum bounded errors. The configurability is introduced by power gating the SUM
portion of the adder design, which helps in reducing the energy as will be discussed
in Section IV. SESA1 adder has the least number of errors in SUM output which is 2.
Both SESA2 adder and SESA3 adders have four errors in the SUM output but they
are at different input combinations. Since the error is approximate circuits, it is
dependent on the data, and we will see in Sections VI and VII that SESA2 adder is
better for some applications while SESA3 adder is beneficial for other applications.
We will show in Section V that SESA allows for fine grain configurability

We implement approximation in the SUM output of the SESA3 adder by

setting it to 1 for each and every combination of inputs. As a consequence of this, the
SESA3 adder exhibits faults in the SUM bits for the input combinations 000, 011, 101,
and 110. Even though we have the same number of errors as in SESA2, the error in
the output is data-dependent and we will see in Sections V and VI that this will result
in different output errors. The circuit for SESA3 adder is shown in Figure. 2.3. The
CARRY output remains same irrespective of whether SESA3 adder operates in exact
or approximate mode. In exact mode, the SUM output is computed using the normal
SUM module. In approximate mode, the SUM module is PG and the approximate
output which is equal to 1 is generated using a PG-controlled NMOS that pulls down
the complement of SUM to a permanent 0. Since this is connected to an inverter, we

Page | 18
get the SUM output in approximate mode to be a 1 as shown in Figure. 2.3. Thus, we
have discussed the design of all three SESA adders. For all the three designs, there is
no error in the CARRY output, and hence, all the designs have maximum bounded
errors. The configurability is introduced by power gating the SUM portion of the adder
design, which helps in reducing the energy as will be discussed. SESA1 adder has the
least number of errors in SUM output which is 2. Both SESA2 adder and SESA3
adders have four errors in the SUM output but they are at different input combinations.
Since the error is approximate circuits, it is dependent on the data, and we will see in
SESA2 adder is better for some applications while SESA3 adder is beneficial for other
applications. That SESA allows for fine grain configurability.

Figure 2.3: Circuit diagram of SESA3

Following are the contributions of our work.

1) We propose SESA adder, a single exact single approximate adder. SESA can
perform either single n-bit exact addition or single n-bit approximate additions.

Page | 19
2) We propose SEDA adder, a single exact dual approximate adder. SEDA can
perform either single n-bit exact addition or dual n-bit approximate additions.
3) Both the SESA and SEDA adders allow for dynamic configurability at runtime to
switch between exact and approximate modes.
4) Both the SESA and SEDA adders have maximum bounded error, i.e., for an n-bit
adder if m-bits are approximated, the maximum error is bounded to 2m −1.
5) We evaluate both the SESA and SEDA adders in approximate processors and
evaluated the results on image processing applications and Moby benchmarks.

Table 2.1: Truth Table for SESA and SEDA Approximate Adders

The above table show that various Full adders based on the designs like Single
Exact Single Approximate and Single Exact Dual Approximate Adders. In SESA1
Adder Sum and carry are the complement each other’s that is when sum is ‘0’ carry is
‘1’ vice versa. In SESA2 adder Sum produce ‘0’ for all input conditions and carry is
similar like normal full adder. In SESA3 Adder Sum produce ‘1’ for all input
conditions and carry is similar like normal full adder. SEDA is similar like SESA1 it
produces carry is complement for Sum.

Page | 20
SEDA: SINGLE EXACT DUAL
APPROXIMATE ADDERS

Page | 21
3.1 SEDA

The SEDA adders surpasses the SESA adder in terms of advancement.

Additionally, it possesses a maximum limit on the potential inaccuracy and offers the
possibility to be customized. In contrast to SESA adders, which are limited to
performing only one approximate addition, SEDA adders have the capability to do
two approximate additions when operating in approximation mode. In the framework
of SESA, we implement power gating for the SUM module of the adder, particularly
when it is used in approximation mode. This is done in order to ensure safety and
efficiency. Additionally, the SUM module is utilized within the SEDA adder in order
to generate an extra CARRY.

SEDA adder goes one step further than SESA adder. It also has maximum
bounded error and allows for configurability. Unlike SESA adders which can only
perform one approximate addition, SEDA adder can perform two approximate
additions when used in approximate mode. In SESA, we power gate the SUM module
of the adder when it is used in approximate mode. In SEDA adder, we use the SUM
module of the adder to generate another CARRY. The SUM module of a mirror adder
is shown in Fig. 3. We observed that the SUM module of the adder can be converted
into the CARRY module by switching off transistors 1 and 2 and switching on
transistors 3 and 4. The converted circuit can be used for an additional CARRY
computation. The conversion of SUM circuit to generate CARRY allows us to perform
two approximate computations. For each of the two approximate computations, the
SUM is obtained by inverting the CARRY output. Thus, the truth table of SEDA adder
is same as that of SESA1 adder as shown in Table 2.1.

The error in SEDA is for input values 000 and 111. The circuit for SEDA adder
is shown in Figure 3.1. The CARRY output remains exact irrespective of whether
SEDA adder operates in exact or approximate mode. In exact mode, the SUM output
is computed using the SUM module as shown in Figure 3.1. In approximate mode, the
SUM module is converted to compute the exact CARRY of other input using a set of
MUXes as shown in Figure 3.1. The transistors 1 and 2 are switched off and transistors
3 and 4 are switched on using MUXes. Thus, the CARRY module generates the exact

Page | 22
CARRY output for one set of inputs A0, B0, C0, and the converted SUM module
generates the exact CARRY output for the set A1, B1,C1. For both the input sets, the
SUM is generated by complementing their respective CARRY outputs. The overall
circuit of SEDA adder is as shown in Figure 3.1. Thus, SEDA adder is capable of
performing single bit exact addition or two single bit approximate additions with
minimal changes to the hardware. We will show that SEDA adder is suitable for coarse
grain configuration unlike SESA adders.

In the SEDA addition, we refrain from approximating the carry value to

preserve a limited error range. The inaccuracy can spread to the highest important bit
when the carrying value of the bit with the lowest importance is determined using an
n-bit addition. Regarding the SEDA method, we merely make an approximation for
the total, suvw, equal to c ′ uvw. Therefore, in the case of an n-bit approximation,
the highest possible error is expected to be equal to 2n -1.By examining the truth
table associated with a full-adder, it becomes obvious that the total output only
contains errors when the given values of {u,v,w} are '000' and '111', while the
carry bit remains error-free.By employing the approximation above, we can split
the 1-bit precise mirror adder circuitry into two parts, enabling us to carry out two
1-bit approximation additions.

The SEDA addition provides accurate carry outcomes for consecutive 1-bit
additions, ensuring the error remains within a defined limit. In addition, the design
has been rendered adjustable by incorporating multiplexers that are carried
out employing transmission gates regulated by an input source/drain, as illustrated
in Figure 2. By using n-SEDA adders, we can execute the exact addition of a 1-bit n-
bit number or approximation addition of two n-bit numbers. Despite the addition
of 5 muxes, power consumption is still used during switching a transistor's gates due
to a substantial gate capacitance. Due to the typical occurrence of switching
the gate input of the Transmission Gate whenever transitioning from an
approximation method to an accurate mode of operation, the resulting power
consumption is minimal.

Page | 23
Figure 3.1: Circuit diagram of SEDA adder

3.2 Evaluation of SESA and SEDA Adders:

We implemented the proposed approximate adders in UMC 28-nm technology

nodes. The simulation results were obtained using Cadence virtuoso tool. The supply
voltage was taken to be 1.05 V. The PMOS-to-NMOS ratio of width was taken as 2:1
for the CARRY module, as it is in the critical path of a ripple carry adder. The CARRY
module is also sized such that the same current flows through the pull up network and
the pull down network as in a 2:1 size inverter to have equal rise and fall delay [31].
The SUM module is minimum sized as it is not in the critical path [31].

Page | 24
Table 3.1: Evaluation Results Per Bit Computation of Various Adder Designs

We evaluated a single bit adder with the next stage as the load. We simulated
the designs for all possible input combinations, i.e., in SESA since there are three
inputs in exact and approximate modes, the total possible combinations of input
transitions (000 → 000, 000 → 001, . . . , 111 → 111) in all the SESA adders are equal
to 64 (26 ). For SEDA adder in exact mode, the number of inputs is three, and thus,
the number of possible transitions at the input is 64. When SEDA is used in

Page | 25
approximate mode, the total number of inputs in 6, and the total number of possible
input transitions (000000 → 000000, 000000 → 000001, . . ., 111111 → 111111) is
4096 (212). Even though none of the prior works related to the design of approximate
adders deals with runtime configurability at the transistor level, we have still done a
comparison with the prior works. Among the four prior [25], [26], [27], [28], we are
comparing our designs with [27]. The approximate adders proposed in [25] and [26]
are based on PTL and suffer from logic swing degradation. They are not suitable at
lower technology nodes as studied extensively in [32]. The adder designs proposed in
[28] is based on hybrid CMOS logic. The dynamic energy, leakage power, delay, and
energy delay product (EDP) for all the designs are shown in Table 3.1.

A. Static Designs

We observe that the dynamic energy, delay, EDP, and area of the adders which
do not allow configurability [27] are on an average 39%, 19%, 37%, and 29% lesser
when compared with mirror adder, respectively, as shown in Table III. While these
designs give large benefits in energy, delay, and area, they suffer from the limitation
of being static.

B. Naive Configurable Approximate Adders

As discussed in the previous subsection, the existing works do not allow for
configurability at the circuit level. The most naive way to introduce approximation
would be to have both an exact adder and an approximate adder and to switch between
them as needed using MUXes. When the exact computations are needed, the exact
section is used while the approximate section is PG and vice versa. We implemented
circuits using the existing approximate adders and we can see that the energy, delay,
and EDPs are higher when compared with SESA and SEDA as shown in Table 3.1.
The area overheads, even after we do not add the area of the MUXes, are higher when
compared with SESA and SEDA adders. This highlights the need for designs such as
SESA and SEDA. Also, as SESA and SEDA are better than the naive configurable
circuits in energy, delay, and area, we have not used them for further comparisons.

Page | 26
C. SESA

We observe that the configurability comes with overheads. The configurability

overhead is when SESA adders are operated in exact modes. The dynamic energy of
SESA1, SESA2, and SESA3 adder is 13%, 2%, and 2% more when compared with
mirror adder, respectively. The delay of SESA1, SESA2, and SESA3 adder is 0.4%,
0.1%, and 0.1% more when compared with mirror adder, respectively. The EDP of
SESA1, SESA2, and SESA3 adder is 14%, 8%, and 8% more when compared with
mirror adder, respectively. The area of SESA1, SESA2, and SESA3 adder is 21%, 7%,
and 7% more when compared with mirror adder, respectively. When operated in
approximate mode, we see benefits in dynamic energy and EDP. The dynamic energy
of SESA1, SESA2, and SESA3 adder is 31%, 44%, and 46% lesser when compared
with mirror adder, respectively. The EDP of SESA1, SESA2, and SESA3 adder is
30%, 44%, and 45% less when compared with mirror adder, respectively.

D. SEDA

We observe that the configurability in SEDA also comes with overheads. The
configurability overhead is when SEDA adder is operated in exact modes. The
dynamic energy of SEDA adder is 43% more when compared with mirror adder. The
delay of SEDA adder is 22% more when compared with mirror adder. The EDP of
SEDA adder is 76% more when compared with mirror adder. The area of SEDA adder
is 70% more when compared with mirror adder. When operated in approximate mode,
we see benefits in dynamic energy, delay, and EDP. The dynamic energy of SEDA
adder is 52% less when compared with mirror adder. The delay of SEDA adder is 34%
less when compared with mirror adder. The EDP of SEDA adder is 45% less when
compared with mirror adder. Since an application consists of both exact and
approximate operations, we want to identify which applications will benefit from the
approximation. SESA and SEDA adders lead to overheads when performing exact
additions and benefits when approximate computations are done. In Figure 3.2 (a) and
(b).

Page | 27
We show what fractions of operations in an application need to be approximate
to obtain the energy benefits and EDP benefits, respectively. For SESA1, SESA2,
SESA3, and SEDA adders, the fraction of approximate computations should be 30%,
10%, 10%, and 50%, respectively, to obtain benefits in dynamic energy. For SESA1,
SESA2, SESA3, and SEDA adders, the fraction of approximate computations should
be 40%, 20%, 20%, and 60%, respectively, to obtain benefits in EDP.

Figure 3.2 (a): Tradeoff analysis for configurability in Energy savings

Figure 3.2 (b): Tradeoff analysis for configurability in EDP savings

Since SESA1 and SEDA adders have the same output error, as in both the
cases the SUM is approximated to be the complement of CARRY, we can see that if
an application has more than 60% approximate additions, SEDA adder outperforms
SESA1 adder and vice versa. Overall, we can conclude that while prior works have

Page | 28
shown that static approximate adders give large amounts of benefits, actual
applications have sections of programs that can only be approximated. This introduces
overheads but is more practical, unlike static designs. We also see that even though
the benefits given by SESA and SEDA adders are lesser when compared with static
adders, we still see significant benefits when compared with exact computations. We
also can conclude that if an application requires configurability and maximum
bounded errors as required in a general system, we need to pay overheads for the same.

3.3. SESA and SEDA for Approximate Processors

In this section, we discuss how SESA and SEDA adders can be used in
approximate processors. In Section IV, we showed that introducing configurability in
approximation comes at a price, but applications that have a mix of both exact and
approximate computations can still benefit from the same irrespective of the
overheads.

A. SESA Configuration

SESA adders can work in exact and approximate modes. Hence, SESA adders
can be used in a general-purpose processor setting. The instruction set architecture
(ISA) of the processor can have additional instruction added to support approximate
addition. This instruction can also have bits to decide the number of approximated bits
(NABs). For a 32-bit processor, 5 bits can be used to select the NABs. Thus, in the
instruction 5 bits can be reserved for performing the approximate addition. Depending
on whether exact or approximate addition is required for an operation, SESA can be
configured to perform exact addition or approximate addition. SESA adders can vary
the amount of approximation by selecting a different number of bits for
approximation. SESA adders can be configured for even finer granularities.

B. SEDA Configuration

SEDA adder can also be used in an approximate processor. Since SEDA adder
operates on multiple additions together, it is suitable for superscalar processors and

Page | 29
multiple instructions are waiting to be executed in the reservation station [36]. We
explain the integration of SEDA adder in the processor with the help of an example.
Suppose we have four 8-bit addition operation to perform: A1 + B1, A2 + B2, A3 +
B3, and A4 + B4. There are three modes of operations in SEDA adder which are
explained as follows.

1) Exact Mode:

In exact mode of operation, SEDA adder is used to perform exact addition.

Since using SEDA adder we can perform one exact addition, to perform 4 additions,
we will require four cycles as shown in Figure. We see from Table III that the delay
of exact mode in SEDA adder is 71.5 ps, which is 22% more when compared with
mirror adder. Since in processors adders are usually not in the critical path, having a
higher delay does not lead to a loss in performance [36].

2) Full Approximate Mode:

In full approximate mode, SEDA adder can perform two approximate

additions in one cycle. We call it full approximate mode as all the bits of the output
are approximated in this mode. Thus, the four additions can be performed in two cycles
as shown in Figure. In Table III, we have reported the delay of SEDA adder per bit of
computation. Since SEDA adder performs 2 bits of computations per cycle, the overall
delay in approximate mode is 76.8, but we see that we end up saving two cycles for
approximate computations when compared with the exact mode.

3) Half Approximate Mode:

In half approximate mode, we configure SEDA adder to approximate half of

the bits of the adder. Since many applications cannot tolerate approximation in all the
bits of the additions, half approximate mode allows an in between ground between
half approximate and full approximate. First, SEDA adder can be configured to
approximate mode and it can perform addition of half of the LSBs as shown in Figure.

Page | 30
Since there are four additions, SEDA adder can then be used in exact mode to perform
exact addition on the MSBs in the other two cycles as shown in Figure.

3.4 Evaluation for Applications

I am evaluating SESA and SEDA adders on image processing application and

Moby benchmark applications. Before diving into the applications we will discuss the
framework used in the analysis of approximate applications.

A. Evaluation Framework.

I have used the framework proposed in [28] for the analysis of applications.
The framework for evaluation is shown in Figure.

1) Energy Calculation:

The netlist of the single bit approximate adders are generated using Cadence
virtuoso tool. Ocean scripts 2 are used to simulate each adder using spectre tool to
obtain the energy consumption for every possible transition that can occur at the input.
We used the Moby benchmarks [37], a suite of mobile benchmarks to obtain the inputs
seen by the adder when these applications run on an ARM core. We used gem5 [38],
a system-level simulator to model a 32-bit ARM Quad core configuration running
Android 4.2.2. The input traces consisting of 100 000 inputs are, i.e., the inputs seen
by the adder in the processor, obtained using the gem5 simulator. For the image
processing applications, we have directly read the pixel values from the images using
a C program.

These values from the applications are then passed into a binary transition
counter which counts the number of various possible input transitions. As discussed
in Section IV, SESA adders have 64 possible different input transitions and SEDA
adders have 4096 different possible input combinations. The count of each of these
transitions is recorded for each of the adders. The energy calculator multiplies the
input transitions with their corresponding energy value to give the total energy

Page | 31
consumed by the application. For image processing applications, we generate the input
trace from the images and those values are fed in the binary transition counter.

2) Error Calculation:

The traces obtained from either the gem5 simulator or the images are fed as
inputs to the behavioral model of adders. The behavioral models of adder are C codes
which mimic the approximate adders to give the approximate outputs. These are then
compared with the golden outputs. which refers to the correct output, using the error
calculator to compute the overall error.

B. Evaluation for Image Processing Applications.

We performed two image processing applications using the images from the
Razor dataset [39]. We have first performed image addition which is widely shown as
an application in the prior works. We also used image enhancement as the second
application. In image enhancement, the original image is smoothened using the blur
filter. The edges are then extracted using the Laplace filter from this smoothened
image. These extracted edges are added to the smoothened image using approximate
adders to obtain the enhanced image [34]. We carried out the addition using a 16-bit
ripple carry adder as the images are 16 bits.

The approximation was introduced by replacing the least significant bits of the
adders with the approximate adder for the SESA adders. For SEDA adder, we perform
the addition using the three modes as explained in Section V. We have shown the
energy consumption of the adder in figure for image addition and image enhancement,
respectively. We want to mention that approximating all the 16-bits is not suitable for
image addition application as it leads to significant output quality deterioration. Thus,
we have limited our analysis 12 bits for SESA1, SESA2, and SESA3 adders and half
approximate mode for SEDA adder. The output quality is measured by the SSIM [40]
values. The SSIM values are shown in Figure for image addition and image
enhancement, respectively.

Page | 32
The average energy and SSIM benefits are shown in Table IV. We want to
mention that the NAB for SESA adders in image processing application, where we are
using a 16-bit adder, is taken to be 0, 4, 8, and 12, and for SEDA adders it is 0 and 8.
Since SEDA adder can be used only in half approximate and full approximate mode,
the half approximate mode corresponds to NAB = 8. We have not shown the results
for NAB = 16 for both the SEDA and SESA adders as the output quality deteriorates
significantly for NAB = 16. Note that NAB = 16 corresponds to full approximate in
SEDA adder.

1) Image Addition: The image sets used for image addition are 0—artificial, big_tree;
1—cathedral, fireworks; 2—hdr, leaves_iso_200; 3—nightshot_iso_1600,
spider_web; 4—big_building, bridge; 5—deer, flower, 6—leaves_iso_1600,
zone_plate. We see that for NAB = 0, i.e., when the approximate adders are used in
exact mode, the energy consumption is higher when compared with mirror adder. For
SESA1, SESA2, SESA3, and SEDA adders, the over heads for exact addition are
13.6%, 2.25%, 2.64%, and 41%, respectively, when compared with mirror adder and
shown in Table IV. When used in approximate mode, the adder designs do give energy
benefits. On an average, SESA1/SESA2/SESA3 adders give 0.33%/12.16%/12.56%
for NAB = 2 to 27.05%/39.88%/41.66% for NAB = 12. On an average, SEDA adder
gives a benefit of 16.92% for half approximate mode. We see that the least average
value SSIM is obtained for SESA2 for 12-bit approximation which is 0.87. The images
having SSIM value of 0.87 are of acceptable quality; one of the image is shown in Fig.
10(b) alongside the exact output as shown in Figure.

2) Image Enhancement: For image enhancement, the images used are 0—artificial,
1—big_tree, 2—cathedral, 3—fireworks; 4—hdr, 5—leaves_iso_200; 6—
nightshot_iso_1600, 7—big_building, 8—bridge; 9—deer, 10—flower, 11—
leaves_iso_1600. We see that for NAB = 0, i.e., when the approximate adders are used
in exact mode, the energy consumption is higher when compared with mirror adder.
For SESA1, SESA2, SESA3, and SEDA adders, the overheads for exact addition are
15.28%, 2.75%, 3.21%, and 39.75%, respectively, when compared with mirror adder
and shown in Table IV. When used in approximate mode, the adder designs do give

Page | 33
energy benefits. On an average, SESA1/SESA2/SESA3 adders give
1.35%/13.65%/14.22% for NAB = 2 to 32.53%/44.79%/47.21% for NAB = 12. On an
average, SEDA adder gives a benefit of 23.61% for half approximate mode. We see
that the least average value SSIM is obtained for SESA2 for 12-bit approximation
which is 0.77. The images having SSIM value of 0.77 are of acceptable quality; one
of the image is shown in Fig. 10(d) alongside the exact output as shown in Figure.

C. Evaluation for Moby Benchmarks: As shown in Section V, the proposed

approximate adders are suitable for use in processors. Thus, we have also evaluated
the proposed adders using the Moby benchmarks [37]. For the Moby benchmarks, the
evaluation was done using the 32-bit adder. Since approximating all the bits leads to
large errors, we have limited our analysis to 24 bits for SESA1, SESA2, and SESA3
adders and half approximate mode for SEDA adder. The energy values are shown in
Figure. The error metric used is the logarithm of the mean square error (MSE) as it is
a widely used metric in prior works. The log(MSE) values for the applications are
shown in Figure. The average energy and log(MSE) values are shown in Table V. We
want to mention that the NAB for SESA adders in the Moby benchmark applications,
where we are using 32-bit adder, is taken to be 0, 4, 8, 12, 16, and 24, and for SEDA
adders it is 0 and 16. We have not shown the results for NAB = 32 for both the SEDA
and SESA adders as the output quality deteriorates significantly for NAB = 32 as also
mentioned in the previous section.

For SESA1, SESA2, SESA3, and SEDA adders, the energy overheads for
exact addition are 16.21%, 2.99%, 3.49%, and 39.05%, respectively, when compared
with mirror adder and shown in Table IV. When used in approximate mode, the adder
designs do give energy benefits. We start see energy benefits for SESA2 and SESA3
from NAB = 4, but for SESA1 we start seeing energy benefits NAB = 8 onward. On
an average, SESA1/SESA2/SESA3 adders give 1.44%/14.58%/15.21% for NAB = 8
to 30.32%/42.7%/45.19% for NAB = 24. On an average, SEDA adder gives a benefit
of 22.52% for half approximate mode.

Page | 34
Overall, we see that even though the proposed adders have overheads due to
configurability, they do give benefits when used in approximate mode. Also, while we
have not approximated all the bits for any application, there may be applications that
can tolerate errors in all the bits, and thus, full approximation in SEDA adder can be
used for those applications. Also, while the energy benefits for SEDA adder are lesser
when compared with SESA adders, it is important to note that it consumes one lesser
cycle when compared with SESA adders as explained.

Page | 35
APPROXIMATE MULTIPLIER

Page | 36
Generally, multiplication process involves 3 important steps viz., PP
generation, PP Compression and final product generation. In an n x n multiplication,
n 2 AND gates are used for PP generation, which operates in parallel and the
maximum delay in PP generation is 1 AND gate. However, PP compression incurs
(n-1) column carry propagation delay and it increases in O(n). Various algorithms are
proposed in literature to reduce delay in PP compression (Wang et al. 2011; Devesh
Dwivedi 2019; Sureka 2013; Suganthi Venkatachalam 2018). Wallace tree multiplier
(Sureka 2013) and Dadda multiplier (Devesh Dwivedi 2019) have a regularity in PP
structure compared to conventional multiplier, however with no improvement in (n-
1) column carry propagation. On the other hand, multipliers using Booth and
Modified Booth algorithm reduce number of PP rows to n/2 (Wang et al. 2011) and
n/2 + 1 (Suganthi Venkatachalam 2018). However, column wise carry propagation
delay in PP compression remains same.

Number of approaches are proposed in literature on design of approximate

compressor units with logic level modification to reduce hardware complexity and
critical delay. These compressor units are employed for PP compression in
multipliers targeting error resilient image and signal processing applications. A brief
note on design of approximate compressor units proposed in literature are discussed
in the following sub sections.

4.1 APPROXIMATE MULTIPLIER

This section briefs the design of n X n approximate multiplier that use

proposed approximate compressors for PP compression. The multiplier performs PP
compression in stages using Carry save addition with Dadda structure based PP
arrangement. In the final stage, the PPs are reduced into two rows of sum and carry
signals and are added using Ripple Carry Adder(RCA). The maximum error in the 2n
bit multiplier output due to approximate compressors restrains to 1 unit in Bit
Significant Position(BSP) 2n-1 (i.e 22n-1). This is significantly tolerable for error
resilient applications in signal and image processing. Illustration of the proposed
methodology for n= 8 bits in input operands of the multiplier is shown in Figure 3.9.

Page | 37
Inputs to the multiplier are a[7:0] and b[7:0]. Generated PP bits are arranged in Dadda
structure to reduce complexity and PP compression is performed in stages.

Figure 4.1: proposed approximate multiplier

In the proposed multiplier designs for addition of 2 and 3 bits, approximate
Half Adder(AHA) and approximate Full Adder(AFA) cells are used. Logic
expressions for carry and sum outputs of AHA and AFA are given by Equation (4.1)
& Equation (4.2) respectively.
AHACarry = a & b
AHAsum = ~ AHAcarry
(4.1)
AFAcarry = a&b&c
AFAsum = ~ AFAcarry
(4.2)

Page | 38
4.2 EXACT 4:2 COMPRESSOR
Standard compressors are being used to add n 1-bit inputs and produce ‗i’ bits
result , such that ‗i‘ is the least integer that satisfy the condition n< 2i . A standard 4-
2 compressor has four inputs X1, X2, X3, X4 and carry input Cin from preceding
block, and produces three outputs viz., Sum(S), Carry(C), Cout. The input bits X1,
X2, X3, are fed to full adder-1 to produce FA sum and Carry C. X4 and Cin inputs are
added with FA sum using full adder-2 to produce S and Cout output bits. Note that the
conventional 4–2 compressor incur two FA sum delay at the maximum to produce
outputs . Block level architecture of standard 4:2 compressor is shown in Figure 4.1
and corresponding Boolean expressions defining the logic are shown in Equation
(4.3)- Equation(4.5). Following are the functional symbols used to express logic
functions viz., |- OR; & - AND; ^ - XOR.

S = X1 ^ X2 ^ X3^ X4^ Cin (4.3)

C = [(X1 ^ X2) & X3] | [ (X1^ X2) ‘& X1] (4.4)

Cout= [(X1 ^ X2 ^ X3^ X4) &Cin] | [(X1 ^ X2 ^ X3^ X4) ‘& X4] (4.5)

Modification to the logic expressions of standard 4:2 compressor is proposed

targeting PP compression in multiplier with logic low in Cin and neglecting generation
of Cout (Carry outputs) Elimination of Cout in 4:2 compressor generate error for
X1X2X3X4= 1111with probability .06 and Maximal Error Space (MES)=-2.
Conversely, the approximation logic which produce logic high in Sum signal for
X1X2X3X4= 1111 reduce the MES to -1. Hence, for fair assessment of 4:2
approximate compressors the error in Sum output for X1=X2=X3=X4=logic high was
ignored. The block diagram of modified 4:2 exact compressors is shown in Figure.
S = X1 ^ X2 ^ X3^ X4 (4.6)

C = ((X1 ^ X2 ^ X3^ X4)‘ & X4) (4.7)

Page | 39
The number of stages of the proposed multiplier is one stage less than that of
the former. At the last stage of the proposed multiplier, in order to obtain the
summation of the three remaining rows of partial product, an especial CPA is used
which was constructed by some half adders and some proposed approximate
compressors. In each of the columns 2 and 15 of this CPA, a half adder is used, and
in each of the columns 3 to 14, the first proposed compressor is used. Indeed, columns
3-14 of this CPA is our proposed CPA introduced in the previous sub-section which
does not have the carry propagate delay problem

Figure 4.2: Approximate Multiplier

The proposed multiplier shown each rectangle represents a half adder or a
full adder. Sum of the last stage’s columns are computed using a CPA which is
constructed by some half adders, full-adders, and proposed 4:2 compressors.

Page | 40

Project Base Paper
No ratings yet
Project Base Paper
6 pages
Classification of Digital Computers
50% (2)
Classification of Digital Computers
5 pages
Keywords-Transmission Gate Logic, Conventional CMOS, Pass Transistor Logic, Array Multiplier, Power Consumption, Propagation Delay
No ratings yet
Keywords-Transmission Gate Logic, Conventional CMOS, Pass Transistor Logic, Array Multiplier, Power Consumption, Propagation Delay
40 pages
ApproximateCompressor FinalforReview
No ratings yet
ApproximateCompressor FinalforReview
11 pages
JETIRBC06047
No ratings yet
JETIRBC06047
13 pages
Reddy - 2021 - BP - 11414D
No ratings yet
Reddy - 2021 - BP - 11414D
21 pages
Edited Project
No ratings yet
Edited Project
48 pages
Wordlengthresuction
No ratings yet
Wordlengthresuction
18 pages
1.1 Motivation: Energy Efficient Concept Approximate Multiplier For Error-Resilent Applications
No ratings yet
1.1 Motivation: Energy Efficient Concept Approximate Multiplier For Error-Resilent Applications
35 pages
Research Paper
No ratings yet
Research Paper
6 pages
Ieiejsts 202304 006
No ratings yet
Ieiejsts 202304 006
11 pages
VZXF
No ratings yet
VZXF
5 pages
Single Exact Single Approximate Adders and Single Exact Dual Approximate Adders
No ratings yet
Single Exact Single Approximate Adders and Single Exact Dual Approximate Adders
10 pages
IJPREMS41000036838
No ratings yet
IJPREMS41000036838
6 pages
Design and Implementation of MAC Using Approx. Multiplier
No ratings yet
Design and Implementation of MAC Using Approx. Multiplier
7 pages
A2 Intro
No ratings yet
A2 Intro
28 pages
Approximate Single Precision Floating Point Adder For Low Power Applications
No ratings yet
Approximate Single Precision Floating Point Adder For Low Power Applications
15 pages
IJONS2 - Yogeswari P
No ratings yet
IJONS2 - Yogeswari P
20 pages
Alam2016 PDF
No ratings yet
Alam2016 PDF
9 pages
User's Manual For REMIND EPC 50
100% (5)
User's Manual For REMIND EPC 50
50 pages
FPGA-Based Multi-Level Approximate Multipliers For High-Performance Error-Resilient Applications
No ratings yet
FPGA-Based Multi-Level Approximate Multipliers For High-Performance Error-Resilient Applications
17 pages
Performance Evaluation of Approximate Adders Case
No ratings yet
Performance Evaluation of Approximate Adders Case
8 pages
IJONS - Yogeswari P
No ratings yet
IJONS - Yogeswari P
17 pages
Programmable Logic Devices (PLD)
No ratings yet
Programmable Logic Devices (PLD)
33 pages
stm32h742 stm32h743 Device Limitations Stmicroelectronics
100% (2)
stm32h742 stm32h743 Device Limitations Stmicroelectronics
45 pages
Embedded System Material
No ratings yet
Embedded System Material
143 pages
An Enhanced Approximate Multiplier Using Error Report Propagation Full Adders 111
No ratings yet
An Enhanced Approximate Multiplier Using Error Report Propagation Full Adders 111
6 pages
Design of Roba Multiplier For High-Speed Yet Energy-Efficient Digital Signal Processing Using Verilog HDL
No ratings yet
Design of Roba Multiplier For High-Speed Yet Energy-Efficient Digital Signal Processing Using Verilog HDL
16 pages
Design of Low Power Approximate Radix 8 Booth Multiplier IJERTCONV5IS17004
No ratings yet
Design of Low Power Approximate Radix 8 Booth Multiplier IJERTCONV5IS17004
5 pages
Lecture 5 - Chapter 5 - BASIC COMPUTER ORGANIZATION AND DESIGN - Updated
No ratings yet
Lecture 5 - Chapter 5 - BASIC COMPUTER ORGANIZATION AND DESIGN - Updated
48 pages
Performance Analysis and Implementation 097e10b9
No ratings yet
Performance Analysis and Implementation 097e10b9
20 pages
On The Use of Low-Power Devices, Approximate Adders and Near-Threshold Operation For Energy-Efficient Multipliers
No ratings yet
On The Use of Low-Power Devices, Approximate Adders and Near-Threshold Operation For Energy-Efficient Multipliers
12 pages
Final
No ratings yet
Final
26 pages
A Low-Power High-Accuracy Approximate Multiplier Using High-Order Approximate Compressors
No ratings yet
A Low-Power High-Accuracy Approximate Multiplier Using High-Order Approximate Compressors
10 pages
Low Power 16×16 Bit Multiplier Design Using Dadda Algorithm
No ratings yet
Low Power 16×16 Bit Multiplier Design Using Dadda Algorithm
17 pages
Esda 2nd Paper
No ratings yet
Esda 2nd Paper
5 pages
Alam 2016
No ratings yet
Alam 2016
9 pages
T&F Format
No ratings yet
T&F Format
6 pages
Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation
No ratings yet
Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation
13 pages
Approximate Multipliers Based On New Approximate Compressors
No ratings yet
Approximate Multipliers Based On New Approximate Compressors
14 pages
Developing and Assessinginexact Multiplierarchitec
No ratings yet
Developing and Assessinginexact Multiplierarchitec
16 pages
IET Circuits Devices Syst - 2020 - Zhu - Design Evaluation and Application of Approximate Truncated Booth Multipliers
No ratings yet
IET Circuits Devices Syst - 2020 - Zhu - Design Evaluation and Application of Approximate Truncated Booth Multipliers
13 pages
Approximate Multiplier S
No ratings yet
Approximate Multiplier S
6 pages
New Metrics For The Reliability of Approximate and Probabilistic Adders
No ratings yet
New Metrics For The Reliability of Approximate and Probabilistic Adders
12 pages
Power-Area Efficient Computing Technique For Approximate Multiplier With Carry Prediction
No ratings yet
Power-Area Efficient Computing Technique For Approximate Multiplier With Carry Prediction
4 pages
9 .Efficient Design For Fixed Width Adder
No ratings yet
9 .Efficient Design For Fixed Width Adder
45 pages
3 PD Analysis
No ratings yet
3 PD Analysis
67 pages
Design of Roba Multiplier Using Mac Unit
No ratings yet
Design of Roba Multiplier Using Mac Unit
15 pages
A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation
No ratings yet
A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation
15 pages
Multiplier 6.10 CameraReady
No ratings yet
Multiplier 6.10 CameraReady
6 pages
1 s2.0 S0141933119305976 Main
No ratings yet
1 s2.0 S0141933119305976 Main
8 pages
Intro 2
No ratings yet
Intro 2
4 pages
Published Paper - High Speed Low Power Approximate Multipliers
No ratings yet
Published Paper - High Speed Low Power Approximate Multipliers
6 pages
Approximatemultipliers
No ratings yet
Approximatemultipliers
6 pages
מעבדה למיקרו מחשבים- חוברת של החומר התיאורטי מאת שלמה אנגלברג
No ratings yet
מעבדה למיקרו מחשבים- חוברת של החומר התיאורטי מאת שלמה אנגלברג
136 pages
Wallace and Dadda Multipliers
100% (1)
Wallace and Dadda Multipliers
3 pages
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
No ratings yet
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
5 pages
Application-Specific Efficiently Approximated Adders and Multipliers Design and Its Metrics Evaluation - WOS
No ratings yet
Application-Specific Efficiently Approximated Adders and Multipliers Design and Its Metrics Evaluation - WOS
8 pages
FA35880883
No ratings yet
FA35880883
4 pages
Approximate Radix-8 Booth Multipliers For Low-Power and High-Performance Operation
No ratings yet
Approximate Radix-8 Booth Multipliers For Low-Power and High-Performance Operation
8 pages
BOOTHvs KARATSUBAvs VEDIC
No ratings yet
BOOTHvs KARATSUBAvs VEDIC
6 pages
Design and Analysis of Approximate Redundant Binary Multipliers
No ratings yet
Design and Analysis of Approximate Redundant Binary Multipliers
15 pages
8085 Microprocessor Ramesh S Gaonkar
No ratings yet
8085 Microprocessor Ramesh S Gaonkar
330 pages
Interrupt 808512
No ratings yet
Interrupt 808512
14 pages
Managing Digital-System Power at The System Level: Dominik Macko, Katarína Jelemenská
No ratings yet
Managing Digital-System Power at The System Level: Dominik Macko, Katarína Jelemenská
5 pages
Power Efficient Approximate Booth Multiplier
No ratings yet
Power Efficient Approximate Booth Multiplier
4 pages
Example of Multiplier
No ratings yet
Example of Multiplier
4 pages
Topic One Introduction To Electronics
No ratings yet
Topic One Introduction To Electronics
22 pages
Week1 Hacettepe
No ratings yet
Week1 Hacettepe
51 pages
Buses and Interfaces
No ratings yet
Buses and Interfaces
35 pages
Power and Area Efficient Approximate Multipliers
No ratings yet
Power and Area Efficient Approximate Multipliers
5 pages
Design and Analysis of Approximate Compressors For Multiplication
No ratings yet
Design and Analysis of Approximate Compressors For Multiplication
11 pages
System Verilog Testbench Language: David W. Smith Synopsys Scientist Synopsys, Inc
No ratings yet
System Verilog Testbench Language: David W. Smith Synopsys Scientist Synopsys, Inc
13 pages
Ene KC3810 - V01
No ratings yet
Ene KC3810 - V01
26 pages
A Low-Power, High-Performance Approximate Multiplier With Configurable Partial Error Recovery
No ratings yet
A Low-Power, High-Performance Approximate Multiplier With Configurable Partial Error Recovery
4 pages
ECE Embedded Systems ECE EIE
No ratings yet
ECE Embedded Systems ECE EIE
4 pages
Keywords-Gate Diffusion Input, Pass Transistor Logic, Arithmetic Circuit, Power Dissipation, Propagation Delay
No ratings yet
Keywords-Gate Diffusion Input, Pass Transistor Logic, Arithmetic Circuit, Power Dissipation, Propagation Delay
47 pages
7 Series DSP48E1 Slice: User Guide
No ratings yet
7 Series DSP48E1 Slice: User Guide
58 pages
Wolfdale1333-D667 R2.0
No ratings yet
Wolfdale1333-D667 R2.0
42 pages
EndSem PracticalSchedule II&IIIYear Oddsem
No ratings yet
EndSem PracticalSchedule II&IIIYear Oddsem
5 pages
Debug 1214
No ratings yet
Debug 1214
12 pages
CC442 Sheet 3
No ratings yet
CC442 Sheet 3
2 pages
3
No ratings yet
3
48 pages
TE0614Instruction PDF
No ratings yet
TE0614Instruction PDF
1 page
CD 40102 B
No ratings yet
CD 40102 B
13 pages
COA Control Unit Design
No ratings yet
COA Control Unit Design
10 pages
Timing and Control
No ratings yet
Timing and Control
40 pages
X25160 16K 2K X 8 Bit SPI Serial E PROM With Block Lock Protection
No ratings yet
X25160 16K 2K X 8 Bit SPI Serial E PROM With Block Lock Protection
15 pages
BTS771
No ratings yet
BTS771
18 pages
Muraliiiii Offline Report Vlsi
No ratings yet
Muraliiiii Offline Report Vlsi
67 pages
Lesson 6 - Central Processing Unit
No ratings yet
Lesson 6 - Central Processing Unit
5 pages
Embedded Sys
No ratings yet
Embedded Sys
63 pages
Cyclone IV Product Table
No ratings yet
Cyclone IV Product Table
1 page
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet

Keywords-Approximate Multiplier, Power Consumption, Propagation Delay

Uploaded by

Keywords-Approximate Multiplier, Power Consumption, Propagation Delay

Uploaded by

Abstract

The use of approximation computing is crucial in modern electrical systems.

By mid 20thcentury, due to the technology advancements in semiconductor

Graphic Processing Units (GPUs) have utility in diverse domains, including

A multiplier is most commonly used computation module which involves

Modern systems have various modes of operations, such as the high-

Various approximate arithmetic circuits such as adders, subtractors,

In this article, we will focus on the works performing approximation at the

The basic approximate computing method used in this thesis is to replace

Approximate Mirror Adders (AMA), XOR/XNOR based Adders (AXA),

Approximate computing can be applied to a broad range of applications at

The various methods proposed in the literature to find the optimal

1. In the first category, the level of approximation given by the number of

An exact mirror adder is a type of digital circuit designed using complementary

In SESA1 adder, we introduce approximation in the SUM output by making it to

2.2 SESA2 Adder

In SESA2 adder, we introduce approximation in the SUM output by making it

To implement approximation in the SUM output of the SESA2 adder, we make

Figure 2.2: Circuit diagram of SESA2

The CARRY output remains same irrespective of whether SESA2 adder

2.3 SESA3 Adder

In SESA3 adder, we introduce approximation in the SUM output by making it

We implement approximation in the SUM output of the SESA3 adder by

Figure 2.3: Circuit diagram of SESA3

Following are the contributions of our work.

The SEDA adders surpasses the SESA adder in terms of advancement.

In the SEDA addition, we refrain from approximating the carry value to

3.2 Evaluation of SESA and SEDA Adders:

We implemented the proposed approximate adders in UMC 28-nm technology

B. Naive Configurable Approximate Adders

We observe that the configurability comes with overheads. The configurability

Figure 3.2 (a): Tradeoff analysis for configurability in Energy savings

Figure 3.2 (b): Tradeoff analysis for configurability in EDP savings

3.3. SESA and SEDA for Approximate Processors

In exact mode of operation, SEDA adder is used to perform exact addition.

2) Full Approximate Mode:

In full approximate mode, SEDA adder can perform two approximate

3) Half Approximate Mode:

In half approximate mode, we configure SEDA adder to approximate half of

3.4 Evaluation for Applications

I am evaluating SESA and SEDA adders on image processing application and

B. Evaluation for Image Processing Applications.

C. Evaluation for Moby Benchmarks: As shown in Section V, the proposed

Number of approaches are proposed in literature on design of approximate

4.1 APPROXIMATE MULTIPLIER

This section briefs the design of n X n approximate multiplier that use

Figure 4.1: proposed approximate multiplier

S = X1 ^ X2 ^ X3^ X4^ Cin (4.3)

C = [(X1 ^ X2) & X3] | [ (X1^ X2) ‘& X1] (4.4)

Modification to the logic expressions of standard 4:2 compressor is proposed

C = ((X1 ^ X2 ^ X3^ X4)‘ & X4) (4.7)

Figure 4.2: Approximate Multiplier

You might also like