Keywords-Approximate Multiplier, Power Consumption, Propagation Delay
Keywords-Approximate Multiplier, Power Consumption, Propagation Delay
An adder is the basic computational circuit in digital Very Large Scale Integration
design. Approximate Adders have been proposed to improve the design metrics of an
adder. Digital Multiplier is a fundamental component in many digital signal
processing (DSP) systems, which takes up the most part of the computational
resources. As many DSP applications have an inherent tolerance for inexact
computations, approximate multiplication is considered as an appropriate substitution
to obtain energy-performance-accuracy tradeoffs, especially in those applications that
require high energy-efficiency in computing. Meanwhile, reducing the supply voltage
is proved to be an efficient way to further lower the total energy consumption. In this
paper multiplexer based approximate full adder is proposed for 8-Bit approximate
multiplier and its circuit implementation is proposed for error-resilient multiplication
with a low supply voltage. Simulation results indicate that the approximate multiplier
with our proposed 8-Bit approximate multiplier consumes the least energy per
operation with the same computational accuracy when compared with other
multipliers for the operand length of 8 bits. It achieves 26.7% reduction on energy-
delay product (EDP) when compared with the exact multiplication.
Keywords- Single Exact Single Approximate Adder, Single Exact Dual Approximate Adder,
Approximate Multiplier, Power consumption, propagation delay.
INTRODUCTION
Page | 1
1.1 Introduction
Page | 2
technology developments aim at ease of use with improved computational capability.
Almost all applications involve basic arithmetic operations among which, addition and
multiplication are the most important and widely used functional blocks. Hence,
adders and multipliers play a major role in these applications. To bring about
compactness and faster operation, the normal arithmetic mechanism is replaced by
many techniques that promise minimum hardware with early and correct results. Many
adders like carry save adder, carry select adder, carry look ahead adder, carry skip
adder and many others were developed with an aim to result in faster outputs and less
hardware. Improvising adders alone did not meet the requirements of the modern
world. So, the complex and time consuming blocks namely, the multiplier blocks were
concentrated for improving the efficiency further. This chapter gives an overview of
multiplier basics, their types and the effective way for optimization.
To maintain the accuracy of the system, the output bit length increases
proportionately with the increase in input bit width. But, it is not easy to implement
such architectures with ever increasing bit width. Certain applications including like
personal blood pressure monitors, intelligent toys, IP cameras do not require full
output for manipulation. This chapter‘s approach focuses towards the multiplier
design for such applications. In order to meet the requirements in systems where an
exact result is not required, certain techniques are implemented which provides
comparatively appreciable results with notable minimization in area and power.
A multiplier with n bit multiplicand and n bit multiplier results in output with
2n bits. The partial products that are generated by the multiplication process can be
divided into two sub regions. These regions are named based on their position as Least
Significant Region (LSR) and Most Significant Region (MSR). The least significant
region comprises of n columns ranging from P0 to Pn-1indicating the less significant
portions of the partial product matrix, whereas the partial products ranging from Pn to
P2n-1 refers to the most significant region.
Page | 4
The two techniques by which output of the multiplier can be shrinked is
categorized as word length reduction (de la Guia Solaz et al. 2010 & Han et al. 2005)
and truncation (Schulte & Swartzlander 1993). Word length reduction technique aim
at decreasing power consumption by input shifting. Truncated technique is generally
based on array structures like BaughWooley or Booth in which the bits that are least
significant or the lower regions of the partial product matrix are omitted. In this
chapter, a truncated multiplier is proposed for efficient implementation.
Adition is one of the most widely used operations in many applications [1],
[2], [3], [4], [5], [6]. Due to the widespread use of adders, there have been several
works focused on reducing the energy consumption of adders [1], [2], [3], [4], [5], [6].
In recent years, it has been shown that there are applications that have resilience
toward errors in computations [7], [8]. Most applications that exhibit the behavior of
error resilience lie in the category of recognition, mining, and synthesis [9], [10], [11].
Since these applications are inherently robust to erroneous computations, they do not
require the exact output to give the desired results [12], [13]. Thus, designing circuits
that produce exact output for each computation is not required for these applications.
We require circuits that provide nearby correct outputs for the given application [14],
[15], [16].
Page | 5
transistor level have introduced approximation using the following strategy. The exact
adder design is converted into approximate adder design by removing some transistors
or modifying a group of transistors to generate an approximate adder design [25], [26],
[27], [28]. Some of the exact adders in the n-bit adders were then replaced by these
approximate adders to build the overall approximate adder. Thus, they follow a static
design, i.e., once the number of approximate bits has been decided for approximation,
the overall approximate adder design is fixed. Some works allow for dynamic runtime
configurability at the gate level [29], [30] but in this work we focus on introducing
approximation at the circuit level. The existing designs at transistor levels do not allow
configurability between exact and approximate modes.
In this article, we propose single exact single approximate (SESA) and single
exact dual approximate (SEDA) adders, which not only allow for configurability
between exact and approximate modes but also allow us to vary the amount of
approximation when in approximate mode. According to the best of our knowledge,
SESA and SEDA adders are the first approximate adders to use the same hardware
with some additional circuitry at the circuit level to perform either single (one) n-bit
exact addition or single (one)/dual (two) n-bit approximate additions. Another
important requirement of using approximate adders in general-purpose processors is
to have a maximum bounded error. It was shown in [28] that approximate adder
designs that do not have maximum bounded error cannot be used in approximate
processors as they lead to overflow. SESA and SEDA adders have a maximum
bounded error. A full adder has three inputs and two outputs: SUM and CARRY.
SESA and SEDA adders have a bounded maximum error as they do not introduce any
error in the CARRY output. Thus, for an n-bit adder if m-bits are approximated, the
maximum error is bounded to 2m − 1. This makes them suitable for use in processors.
Page | 6
both the outputs. These approximate adders were derived from mirror adders. The
other works used pass transistor logic (PTL) for the approximate adder
implementations [25], [26]. Since both the works have used PTL for the
implementation, they suffer from logic swing degradation. Thus, PTL-based designs
are not suitable for lower technology node as shown in [32]. In one of the recent works
on approximate adders, hybrid CMOS logic-based approximate adders were designed
to improve the logic swing, and extensive evaluation was done for mobile processing
applications [28].
However, hybrid CMOS logic-based adders suffer from glitches in output and
buffers need to be inserted to improve the delay in multibit adders [28], [33]. None of
these prior works allows for the runtime configurability between exact and
approximate mode. In this work, we present the design of the transistor-level runtime
configurable approximate adders. We first propose SESA, an adder that can perform
either one exact addition or one approximate addition. There are three approximate
SESA adder designs. We then propose the design of the SEDA adder which can
perform either one exact addition or two approximate additions. SEDA also has a
maximum bounded error.
Most image, audio and video processing applications are highly error-tolerant
since humans do not perceive minor variations in them. These applications are often
run in hand-held devices that demand low energy consumption. Also when multiple
images and videos are to be processed, speed is of great importance. In such situations,
error can be introduced deliberately in the application in such a way that we obtain
reduction in power, delay and area, at the cost of reduced accuracy. This deliberate
introduction of error results in a trade-off between accuracy metrics such as PSNR of
the application and performance metrics such as savings in power, area and speed of
implementation. This method of achieving trade-off between accuracy and
performance metrics is referred to as approximate computing.
Page | 7
chapter, a variety of approximate adders available in the literature is reviewed and the
basic metrics used in characterizing the error and power savings are described.
Many approximate adders have been proposed in the literature. They fall into
two major categories - low latency approximate adders (LLAA) and low power
approximate adders (LPAA). In case of low latency approximate adders such as
Variable Latency Speculative Adder (VLSA), Error Tolerant Adder (ETA-II), Equal
Segmentation Adder (ESA), Accuracy Configurabe Adder (ACA-II), Gracefully
Degrading Adder (GDA), Generic Accuracy Configurable Adder (GeAr), adder with
low relative error, Reconfigurable Approximate Carry Look-Ahead Adder (RAP-
CLA), Reverse Carry Propagate Adder (RCPFA), Simple Accuracy-Reconfigurable
Adder (SARA), Block-based Carry Speculative Adder (BCSA) proposed by Verma et
al. (2008), Zhu et al. (2010a), Dutt et al. (2019), Kahng and Kang (2012), Ye et al.
(2013), Shafique et al. (2015), Hu and Qian (2015), Akbari et al. (2018), Pashaeifar et
al. (2018), Xu et al. (2018), EbrahimiAzandaryani et al. (2019), respectively, the
length of the carry chain is reduced in order to decrease the critical path delay. This is
done by dividing the adder into many subadders. Each sub-adder has simplified carry
prediction logic. These adders can be used to reduce power consumption by lowering
voltage.
All the low power approximate adders belong to the class of approximate
adders called the two-part segmented approximate adders, where the adder is divided
into two sub-adders, with one sub-adder for the most significant bits (MSBs) and
another for the least significant bits (LSBs). The sub-adder that computes the MSB
sum is implemented using an accurate adder. The sub-adder that computes the LSB
sum is implemented using approximate logic to obtain savings in power. In these
adders, the amount of power savings obtained and the accuracy lost is controlled by
the number of bits in the LSB sub-adder.
Page | 8
Yang et al. (2013), Yang et al. (2015), Almurib et al. (2016) and Dutt et al. (2017),
respectively. In these adders, the sum and carry are approximated using a lower
number of transistors, achieving low dynamic power consumption and low area. By
simplifying the logic used to compute the sum and by ignoring the carry ripple in the
approximate part of the adder, dynamic power savings are obtained in Lower part OR
Adder (LOA) and Error Tolerant Adder (ETA-I) proposed by Mahdiani et al. (2010)
and Zhu et al. (2010b), respectively. The approximate adder proposed by Zhu et al.
(2011) is a modification of ETA-I adder that includes additional logic to check the
range of the input patterns, so that accuracy could be improved
As the first step, an approximate adder is to be designed. Over the past ten years,
many approximate adders have been proposed in the literature to achieve high
Page | 9
speed operation and low power implementation. Two-part segmented approximate
adders are typically used in low power implementations. In these adders, the inputs
are divided into two parts. The upper part of the sum is computed using accurate
adders and the lower part of the sum is approximated using some simplified logic.
The number of bits in the lower part of sum is termed as the level of approximation
for these adders. As the level of approximation is varied, a trade-off between
power and accuracy is obtained. An approximate adder that gives a good trade-off
is desirable.
Once an approximate adder with good power-accuracy trade-off has been designed
for use in a system, each accurate adder of the system is replaced by the
approximate adder. The level of approximation in each adder can be varied to get
power-accuracy trade-offs. An optimal configuration of approximation levels of
the adders in the system has to be found so that maximum power savings is
obtained for a given accuracy at the output of the system.
Besides the error, the system typically has other objectives, as for example,
achievable compression of images. It is important to study such trade-offs as well
in systems.
1.2 Motivation
Since accurate adders are replaced by approximate adders to get power
savings, the approximate adder used should be able to provide maximum power
savings for a given value of error metric at the output. This involves computation of
the error metrics and optimization to maximize the number of approximate bits.
Typical error metrics used are mean error distance (MED), mean square error (MSE)
or peak signal to noise ratio (PSNR).
Page | 10
(MC) 2 simulations are used to compute the error metric, say signal to noise ratio
(SNR) for different values of k. Finally the value of k that gives an acceptable
value of error metric is chosen as in Almurib et al. (2018), Soares et al. (2015). In
this approach, the possible configurations of approximation levels in all the adders
are not explored sufficiently. This results in a limited range of values for power
savings and error metrics.
2. In the second category, an optimization routine is employed that uses a
precomputed table of error metrics of approximate adder for various values of k
as in Kadiyala et al. (2019). The values of error metrics are computed using MC
simulations, assuming that the inputs are independent and uniformly distributed.
All the adder inputs are assumed to have uniformly distributed inputs and the same
pre-computed table of error metrics is used for all the adders in the circuit.
3. In the third category, an optimization routine is employed that uses analytical
expressions to compute the error metrics of an approximate adder for a given k as
in Gupta et al. (2013), Snigdha et al. (2016), Sengupta et al. (2017), Pashaeifar et
al. (2019). These error metrics are computed assuming that either the adder inputs
are uniformly distributed or the error introduced by the adder is uniformly
distributed.
The optimization framework in each case is different, but all of them follow the
spirit of classical word length optimization problem in that, the error introduced in
each node depends only on the degree of approximation (number of approximate bits)
and is independent of the nature of its inputs. Also, the expression used for the mean
error and variance of error is derived, assuming that the inputs or error are uniformly
distributed. While the assumption of uniformly distributed lower order bits may be
justified for the primary inputs, in many approximate adders, neither the output nor
the error is uniformly distributed. As a result, if the error statistics of an adder are
computed assuming inputs to an adder are uniformly distributed, it gives rise to
inaccurate prediction of overall accuracy of the system.
A more accurate method of obtaining the probability mass function (PMF) of error
is proposed by Sengupta et al. (2019). However, including this method within an
Page | 11
optimization routine would require extensive computations. Moreover, in most
applications, an accurate estimate of the mean error and mean square error is sufficient
and we do not need the PMF of the error. Hence, an important requirement is to derive
an accurate and simple error model for approximate adders, taking into account the
distribution of the inputs and the level of approximation.
Page | 12
SESA: SINGLE EXACT SINGLE
APPROXIMATE ADDERS
Page | 13
Additionally, the design characteristics of SESA adders are especially
addressed in this section. Adders that are part of the SESA family are able to do either
a single accurate addition or a single approximation addition. It can be deduced from
the fact that SESA adders have a finite maximum error that there is no approximation
implemented in the CARRY bits. This article provides an introduction to three
different SESA designs, each of which will be discussed in greater detail in the
following sections.
This property allows the Exact Mirror Adder to compute the sum of two binary
numbers by simultaneously generating the mirror inputs and using an
appropriate selection circuit to determine the correct output. By taking advantage
of this symmetry, the number of transistors and logic gates required can be reduced
compared to traditional binary adders, leading to potential improvements in speed
and efficiency. Exact Mirror Adders can be implemented using different logic
families such as CMOS, TTL, or other digital logic technologies. They are
Page | 14
particularly useful in applications where efficiency and speed are critical, such as
in high-performance computing and digital signal processing.
2.1SESA1 Adder
When we use the SESA1 adder, we make the SUM output equal to the complement
of CARRY. This allows us to insert approximation into the SUM result. When the
input combinations 000 and 111 are used, the SESA1 adder produces mistakes in the
SUM bits. The circuit for SESA1 adder is shown in Figure 2.1. The CARRY output
remains same irrespective of whether SESA1 adder operates in exact or approximate
mode.
In exact mode, the SUM output is computed using a power gated (PG) controlled
inverter whose input is the complement of SUM. In approximate mode, the SUM
module is PG, and the approximate output which is made equal to the complement of
CARRY is generated using another PG-controlled inverter whose input is connected
to the complement of CARRY as shown in Figure. 2.1.
Page | 15
Figure 2.1: Circuit diagram of SESA1
Page | 16
SUM bits for the input combinations including 001, 010, 100, and 111. The circuit for
SESA2 adder is shown in Figure 2.2.
Page | 17
we have the same number of errors as in SESA2, the error in the output is data-
dependent and we will see in Sections V and VI that this will result in different output
errors. The circuit for SESA3 adder is shown in Fig. 2(c). The CARRY output remains
same irrespective of whether SESA3 adder operates in exact or approximate mode. In
exact mode, the SUM output is computed using the normal SUM module. In
approximate mode, the SUM module is PG and the approximate output which is equal
to 1 is generated using a PG-controlled NMOS that pulls down the complement of
SUM to a permanent 0. Since this is connected to an inverter, we get the SUM output
in approximate mode to be a 1 as shown in Figure.
Thus, we have discussed the design of all three SESA adders. For all the three
designs, there is no error in the CARRY output, and hence, all the designs have
maximum bounded errors. The configurability is introduced by power gating the SUM
portion of the adder design, which helps in reducing the energy as will be discussed
in Section IV. SESA1 adder has the least number of errors in SUM output which is 2.
Both SESA2 adder and SESA3 adders have four errors in the SUM output but they
are at different input combinations. Since the error is approximate circuits, it is
dependent on the data, and we will see in Sections VI and VII that SESA2 adder is
better for some applications while SESA3 adder is beneficial for other applications.
We will show in Section V that SESA allows for fine grain configurability
Page | 18
get the SUM output in approximate mode to be a 1 as shown in Figure. 2.3. Thus, we
have discussed the design of all three SESA adders. For all the three designs, there is
no error in the CARRY output, and hence, all the designs have maximum bounded
errors. The configurability is introduced by power gating the SUM portion of the adder
design, which helps in reducing the energy as will be discussed. SESA1 adder has the
least number of errors in SUM output which is 2. Both SESA2 adder and SESA3
adders have four errors in the SUM output but they are at different input combinations.
Since the error is approximate circuits, it is dependent on the data, and we will see in
SESA2 adder is better for some applications while SESA3 adder is beneficial for other
applications. That SESA allows for fine grain configurability.
1) We propose SESA adder, a single exact single approximate adder. SESA can
perform either single n-bit exact addition or single n-bit approximate additions.
Page | 19
2) We propose SEDA adder, a single exact dual approximate adder. SEDA can
perform either single n-bit exact addition or dual n-bit approximate additions.
3) Both the SESA and SEDA adders allow for dynamic configurability at runtime to
switch between exact and approximate modes.
4) Both the SESA and SEDA adders have maximum bounded error, i.e., for an n-bit
adder if m-bits are approximated, the maximum error is bounded to 2m −1.
5) We evaluate both the SESA and SEDA adders in approximate processors and
evaluated the results on image processing applications and Moby benchmarks.
Table 2.1: Truth Table for SESA and SEDA Approximate Adders
The above table show that various Full adders based on the designs like Single
Exact Single Approximate and Single Exact Dual Approximate Adders. In SESA1
Adder Sum and carry are the complement each other’s that is when sum is ‘0’ carry is
‘1’ vice versa. In SESA2 adder Sum produce ‘0’ for all input conditions and carry is
similar like normal full adder. In SESA3 Adder Sum produce ‘1’ for all input
conditions and carry is similar like normal full adder. SEDA is similar like SESA1 it
produces carry is complement for Sum.
Page | 20
SEDA: SINGLE EXACT DUAL
APPROXIMATE ADDERS
Page | 21
3.1 SEDA
SEDA adder goes one step further than SESA adder. It also has maximum
bounded error and allows for configurability. Unlike SESA adders which can only
perform one approximate addition, SEDA adder can perform two approximate
additions when used in approximate mode. In SESA, we power gate the SUM module
of the adder when it is used in approximate mode. In SEDA adder, we use the SUM
module of the adder to generate another CARRY. The SUM module of a mirror adder
is shown in Fig. 3. We observed that the SUM module of the adder can be converted
into the CARRY module by switching off transistors 1 and 2 and switching on
transistors 3 and 4. The converted circuit can be used for an additional CARRY
computation. The conversion of SUM circuit to generate CARRY allows us to perform
two approximate computations. For each of the two approximate computations, the
SUM is obtained by inverting the CARRY output. Thus, the truth table of SEDA adder
is same as that of SESA1 adder as shown in Table 2.1.
The error in SEDA is for input values 000 and 111. The circuit for SEDA adder
is shown in Figure 3.1. The CARRY output remains exact irrespective of whether
SEDA adder operates in exact or approximate mode. In exact mode, the SUM output
is computed using the SUM module as shown in Figure 3.1. In approximate mode, the
SUM module is converted to compute the exact CARRY of other input using a set of
MUXes as shown in Figure 3.1. The transistors 1 and 2 are switched off and transistors
3 and 4 are switched on using MUXes. Thus, the CARRY module generates the exact
Page | 22
CARRY output for one set of inputs A0, B0, C0, and the converted SUM module
generates the exact CARRY output for the set A1, B1,C1. For both the input sets, the
SUM is generated by complementing their respective CARRY outputs. The overall
circuit of SEDA adder is as shown in Figure 3.1. Thus, SEDA adder is capable of
performing single bit exact addition or two single bit approximate additions with
minimal changes to the hardware. We will show that SEDA adder is suitable for coarse
grain configuration unlike SESA adders.
The SEDA addition provides accurate carry outcomes for consecutive 1-bit
additions, ensuring the error remains within a defined limit. In addition, the design
has been rendered adjustable by incorporating multiplexers that are carried
out employing transmission gates regulated by an input source/drain, as illustrated
in Figure 2. By using n-SEDA adders, we can execute the exact addition of a 1-bit n-
bit number or approximation addition of two n-bit numbers. Despite the addition
of 5 muxes, power consumption is still used during switching a transistor's gates due
to a substantial gate capacitance. Due to the typical occurrence of switching
the gate input of the Transmission Gate whenever transitioning from an
approximation method to an accurate mode of operation, the resulting power
consumption is minimal.
Page | 23
Figure 3.1: Circuit diagram of SEDA adder
Page | 24
Table 3.1: Evaluation Results Per Bit Computation of Various Adder Designs
We evaluated a single bit adder with the next stage as the load. We simulated
the designs for all possible input combinations, i.e., in SESA since there are three
inputs in exact and approximate modes, the total possible combinations of input
transitions (000 → 000, 000 → 001, . . . , 111 → 111) in all the SESA adders are equal
to 64 (26 ). For SEDA adder in exact mode, the number of inputs is three, and thus,
the number of possible transitions at the input is 64. When SEDA is used in
Page | 25
approximate mode, the total number of inputs in 6, and the total number of possible
input transitions (000000 → 000000, 000000 → 000001, . . ., 111111 → 111111) is
4096 (212). Even though none of the prior works related to the design of approximate
adders deals with runtime configurability at the transistor level, we have still done a
comparison with the prior works. Among the four prior [25], [26], [27], [28], we are
comparing our designs with [27]. The approximate adders proposed in [25] and [26]
are based on PTL and suffer from logic swing degradation. They are not suitable at
lower technology nodes as studied extensively in [32]. The adder designs proposed in
[28] is based on hybrid CMOS logic. The dynamic energy, leakage power, delay, and
energy delay product (EDP) for all the designs are shown in Table 3.1.
A. Static Designs
We observe that the dynamic energy, delay, EDP, and area of the adders which
do not allow configurability [27] are on an average 39%, 19%, 37%, and 29% lesser
when compared with mirror adder, respectively, as shown in Table III. While these
designs give large benefits in energy, delay, and area, they suffer from the limitation
of being static.
As discussed in the previous subsection, the existing works do not allow for
configurability at the circuit level. The most naive way to introduce approximation
would be to have both an exact adder and an approximate adder and to switch between
them as needed using MUXes. When the exact computations are needed, the exact
section is used while the approximate section is PG and vice versa. We implemented
circuits using the existing approximate adders and we can see that the energy, delay,
and EDPs are higher when compared with SESA and SEDA as shown in Table 3.1.
The area overheads, even after we do not add the area of the MUXes, are higher when
compared with SESA and SEDA adders. This highlights the need for designs such as
SESA and SEDA. Also, as SESA and SEDA are better than the naive configurable
circuits in energy, delay, and area, we have not used them for further comparisons.
Page | 26
C. SESA
D. SEDA
We observe that the configurability in SEDA also comes with overheads. The
configurability overhead is when SEDA adder is operated in exact modes. The
dynamic energy of SEDA adder is 43% more when compared with mirror adder. The
delay of SEDA adder is 22% more when compared with mirror adder. The EDP of
SEDA adder is 76% more when compared with mirror adder. The area of SEDA adder
is 70% more when compared with mirror adder. When operated in approximate mode,
we see benefits in dynamic energy, delay, and EDP. The dynamic energy of SEDA
adder is 52% less when compared with mirror adder. The delay of SEDA adder is 34%
less when compared with mirror adder. The EDP of SEDA adder is 45% less when
compared with mirror adder. Since an application consists of both exact and
approximate operations, we want to identify which applications will benefit from the
approximation. SESA and SEDA adders lead to overheads when performing exact
additions and benefits when approximate computations are done. In Figure 3.2 (a) and
(b).
Page | 27
We show what fractions of operations in an application need to be approximate
to obtain the energy benefits and EDP benefits, respectively. For SESA1, SESA2,
SESA3, and SEDA adders, the fraction of approximate computations should be 30%,
10%, 10%, and 50%, respectively, to obtain benefits in dynamic energy. For SESA1,
SESA2, SESA3, and SEDA adders, the fraction of approximate computations should
be 40%, 20%, 20%, and 60%, respectively, to obtain benefits in EDP.
Since SESA1 and SEDA adders have the same output error, as in both the
cases the SUM is approximated to be the complement of CARRY, we can see that if
an application has more than 60% approximate additions, SEDA adder outperforms
SESA1 adder and vice versa. Overall, we can conclude that while prior works have
Page | 28
shown that static approximate adders give large amounts of benefits, actual
applications have sections of programs that can only be approximated. This introduces
overheads but is more practical, unlike static designs. We also see that even though
the benefits given by SESA and SEDA adders are lesser when compared with static
adders, we still see significant benefits when compared with exact computations. We
also can conclude that if an application requires configurability and maximum
bounded errors as required in a general system, we need to pay overheads for the same.
In this section, we discuss how SESA and SEDA adders can be used in
approximate processors. In Section IV, we showed that introducing configurability in
approximation comes at a price, but applications that have a mix of both exact and
approximate computations can still benefit from the same irrespective of the
overheads.
A. SESA Configuration
SESA adders can work in exact and approximate modes. Hence, SESA adders
can be used in a general-purpose processor setting. The instruction set architecture
(ISA) of the processor can have additional instruction added to support approximate
addition. This instruction can also have bits to decide the number of approximated bits
(NABs). For a 32-bit processor, 5 bits can be used to select the NABs. Thus, in the
instruction 5 bits can be reserved for performing the approximate addition. Depending
on whether exact or approximate addition is required for an operation, SESA can be
configured to perform exact addition or approximate addition. SESA adders can vary
the amount of approximation by selecting a different number of bits for
approximation. SESA adders can be configured for even finer granularities.
B. SEDA Configuration
SEDA adder can also be used in an approximate processor. Since SEDA adder
operates on multiple additions together, it is suitable for superscalar processors and
Page | 29
multiple instructions are waiting to be executed in the reservation station [36]. We
explain the integration of SEDA adder in the processor with the help of an example.
Suppose we have four 8-bit addition operation to perform: A1 + B1, A2 + B2, A3 +
B3, and A4 + B4. There are three modes of operations in SEDA adder which are
explained as follows.
1) Exact Mode:
Page | 30
Since there are four additions, SEDA adder can then be used in exact mode to perform
exact addition on the MSBs in the other two cycles as shown in Figure.
A. Evaluation Framework.
I have used the framework proposed in [28] for the analysis of applications.
The framework for evaluation is shown in Figure.
1) Energy Calculation:
The netlist of the single bit approximate adders are generated using Cadence
virtuoso tool. Ocean scripts 2 are used to simulate each adder using spectre tool to
obtain the energy consumption for every possible transition that can occur at the input.
We used the Moby benchmarks [37], a suite of mobile benchmarks to obtain the inputs
seen by the adder when these applications run on an ARM core. We used gem5 [38],
a system-level simulator to model a 32-bit ARM Quad core configuration running
Android 4.2.2. The input traces consisting of 100 000 inputs are, i.e., the inputs seen
by the adder in the processor, obtained using the gem5 simulator. For the image
processing applications, we have directly read the pixel values from the images using
a C program.
These values from the applications are then passed into a binary transition
counter which counts the number of various possible input transitions. As discussed
in Section IV, SESA adders have 64 possible different input transitions and SEDA
adders have 4096 different possible input combinations. The count of each of these
transitions is recorded for each of the adders. The energy calculator multiplies the
input transitions with their corresponding energy value to give the total energy
Page | 31
consumed by the application. For image processing applications, we generate the input
trace from the images and those values are fed in the binary transition counter.
2) Error Calculation:
The traces obtained from either the gem5 simulator or the images are fed as
inputs to the behavioral model of adders. The behavioral models of adder are C codes
which mimic the approximate adders to give the approximate outputs. These are then
compared with the golden outputs. which refers to the correct output, using the error
calculator to compute the overall error.
We performed two image processing applications using the images from the
Razor dataset [39]. We have first performed image addition which is widely shown as
an application in the prior works. We also used image enhancement as the second
application. In image enhancement, the original image is smoothened using the blur
filter. The edges are then extracted using the Laplace filter from this smoothened
image. These extracted edges are added to the smoothened image using approximate
adders to obtain the enhanced image [34]. We carried out the addition using a 16-bit
ripple carry adder as the images are 16 bits.
The approximation was introduced by replacing the least significant bits of the
adders with the approximate adder for the SESA adders. For SEDA adder, we perform
the addition using the three modes as explained in Section V. We have shown the
energy consumption of the adder in figure for image addition and image enhancement,
respectively. We want to mention that approximating all the 16-bits is not suitable for
image addition application as it leads to significant output quality deterioration. Thus,
we have limited our analysis 12 bits for SESA1, SESA2, and SESA3 adders and half
approximate mode for SEDA adder. The output quality is measured by the SSIM [40]
values. The SSIM values are shown in Figure for image addition and image
enhancement, respectively.
Page | 32
The average energy and SSIM benefits are shown in Table IV. We want to
mention that the NAB for SESA adders in image processing application, where we are
using a 16-bit adder, is taken to be 0, 4, 8, and 12, and for SEDA adders it is 0 and 8.
Since SEDA adder can be used only in half approximate and full approximate mode,
the half approximate mode corresponds to NAB = 8. We have not shown the results
for NAB = 16 for both the SEDA and SESA adders as the output quality deteriorates
significantly for NAB = 16. Note that NAB = 16 corresponds to full approximate in
SEDA adder.
1) Image Addition: The image sets used for image addition are 0—artificial, big_tree;
1—cathedral, fireworks; 2—hdr, leaves_iso_200; 3—nightshot_iso_1600,
spider_web; 4—big_building, bridge; 5—deer, flower, 6—leaves_iso_1600,
zone_plate. We see that for NAB = 0, i.e., when the approximate adders are used in
exact mode, the energy consumption is higher when compared with mirror adder. For
SESA1, SESA2, SESA3, and SEDA adders, the over heads for exact addition are
13.6%, 2.25%, 2.64%, and 41%, respectively, when compared with mirror adder and
shown in Table IV. When used in approximate mode, the adder designs do give energy
benefits. On an average, SESA1/SESA2/SESA3 adders give 0.33%/12.16%/12.56%
for NAB = 2 to 27.05%/39.88%/41.66% for NAB = 12. On an average, SEDA adder
gives a benefit of 16.92% for half approximate mode. We see that the least average
value SSIM is obtained for SESA2 for 12-bit approximation which is 0.87. The images
having SSIM value of 0.87 are of acceptable quality; one of the image is shown in Fig.
10(b) alongside the exact output as shown in Figure.
2) Image Enhancement: For image enhancement, the images used are 0—artificial,
1—big_tree, 2—cathedral, 3—fireworks; 4—hdr, 5—leaves_iso_200; 6—
nightshot_iso_1600, 7—big_building, 8—bridge; 9—deer, 10—flower, 11—
leaves_iso_1600. We see that for NAB = 0, i.e., when the approximate adders are used
in exact mode, the energy consumption is higher when compared with mirror adder.
For SESA1, SESA2, SESA3, and SEDA adders, the overheads for exact addition are
15.28%, 2.75%, 3.21%, and 39.75%, respectively, when compared with mirror adder
and shown in Table IV. When used in approximate mode, the adder designs do give
Page | 33
energy benefits. On an average, SESA1/SESA2/SESA3 adders give
1.35%/13.65%/14.22% for NAB = 2 to 32.53%/44.79%/47.21% for NAB = 12. On an
average, SEDA adder gives a benefit of 23.61% for half approximate mode. We see
that the least average value SSIM is obtained for SESA2 for 12-bit approximation
which is 0.77. The images having SSIM value of 0.77 are of acceptable quality; one
of the image is shown in Fig. 10(d) alongside the exact output as shown in Figure.
For SESA1, SESA2, SESA3, and SEDA adders, the energy overheads for
exact addition are 16.21%, 2.99%, 3.49%, and 39.05%, respectively, when compared
with mirror adder and shown in Table IV. When used in approximate mode, the adder
designs do give energy benefits. We start see energy benefits for SESA2 and SESA3
from NAB = 4, but for SESA1 we start seeing energy benefits NAB = 8 onward. On
an average, SESA1/SESA2/SESA3 adders give 1.44%/14.58%/15.21% for NAB = 8
to 30.32%/42.7%/45.19% for NAB = 24. On an average, SEDA adder gives a benefit
of 22.52% for half approximate mode.
Page | 34
Overall, we see that even though the proposed adders have overheads due to
configurability, they do give benefits when used in approximate mode. Also, while we
have not approximated all the bits for any application, there may be applications that
can tolerate errors in all the bits, and thus, full approximation in SEDA adder can be
used for those applications. Also, while the energy benefits for SEDA adder are lesser
when compared with SESA adders, it is important to note that it consumes one lesser
cycle when compared with SESA adders as explained.
Page | 35
APPROXIMATE MULTIPLIER
Page | 36
Generally, multiplication process involves 3 important steps viz., PP
generation, PP Compression and final product generation. In an n x n multiplication,
n 2 AND gates are used for PP generation, which operates in parallel and the
maximum delay in PP generation is 1 AND gate. However, PP compression incurs
(n-1) column carry propagation delay and it increases in O(n). Various algorithms are
proposed in literature to reduce delay in PP compression (Wang et al. 2011; Devesh
Dwivedi 2019; Sureka 2013; Suganthi Venkatachalam 2018). Wallace tree multiplier
(Sureka 2013) and Dadda multiplier (Devesh Dwivedi 2019) have a regularity in PP
structure compared to conventional multiplier, however with no improvement in (n-
1) column carry propagation. On the other hand, multipliers using Booth and
Modified Booth algorithm reduce number of PP rows to n/2 (Wang et al. 2011) and
n/2 + 1 (Suganthi Venkatachalam 2018). However, column wise carry propagation
delay in PP compression remains same.
Page | 37
Inputs to the multiplier are a[7:0] and b[7:0]. Generated PP bits are arranged in Dadda
structure to reduce complexity and PP compression is performed in stages.
Page | 38
4.2 EXACT 4:2 COMPRESSOR
Standard compressors are being used to add n 1-bit inputs and produce ‗i’ bits
result , such that ‗i‘ is the least integer that satisfy the condition n< 2i . A standard 4-
2 compressor has four inputs X1, X2, X3, X4 and carry input Cin from preceding
block, and produces three outputs viz., Sum(S), Carry(C), Cout. The input bits X1,
X2, X3, are fed to full adder-1 to produce FA sum and Carry C. X4 and Cin inputs are
added with FA sum using full adder-2 to produce S and Cout output bits. Note that the
conventional 4–2 compressor incur two FA sum delay at the maximum to produce
outputs . Block level architecture of standard 4:2 compressor is shown in Figure 4.1
and corresponding Boolean expressions defining the logic are shown in Equation
(4.3)- Equation(4.5). Following are the functional symbols used to express logic
functions viz., |- OR; & - AND; ^ - XOR.
Cout= [(X1 ^ X2 ^ X3^ X4) &Cin] | [(X1 ^ X2 ^ X3^ X4) ‘& X4] (4.5)
Page | 39
The number of stages of the proposed multiplier is one stage less than that of
the former. At the last stage of the proposed multiplier, in order to obtain the
summation of the three remaining rows of partial product, an especial CPA is used
which was constructed by some half adders and some proposed approximate
compressors. In each of the columns 2 and 15 of this CPA, a half adder is used, and
in each of the columns 3 to 14, the first proposed compressor is used. Indeed, columns
3-14 of this CPA is our proposed CPA introduced in the previous sub-section which
does not have the carry propagate delay problem
Page | 40