Optimizing Power-Accuracy Trade-Off in Approximate Adders: Celia D, Vinita Vasudevan and Nitin Chandrachoodan
Optimizing Power-Accuracy Trade-Off in Approximate Adders: Celia D, Vinita Vasudevan and Nitin Chandrachoodan
Approximate Adders
Celia D, Vinita Vasudevan and Nitin Chandrachoodan
Department Electrical Engineering, Indian Institute of Technology Madras
Chennai, India 600036
Email: {ee13d003,vinita,nitin}@ee.iitm.ac.in
Abstract—Approximate circuit design has gained significance requirement for the lower part, it has the largest savings in both
in recent years targeting applications like media processing where power and area. However, it also results in significant errors
full accuracy is not required. In this paper, we propose an [3]. The other two-part segmented approximate adders namely
approximate adder in which the approximate part of the sum is
obtained by finding a single optimal level that minimises the mean approximate mirror adder [3], lower-part OR adder (LOA)
error distance. Therefore hardware needed for the approximate [4], error tolerant adder (ETA) [5] and inexact adder [6],
part computation can be removed, which effectively results in obtain better accuracy by introducing limited computations for
very low power consumption. We compare the proposed adder approximating the less significant part of the output. However,
with various approximate adders in the literature in terms of this results in additional costs in area and power.
power and accuracy metrics. The power savings of our adder is
shown to be 17% to 55% more than power savings of the existing There are other approximate adders, such as [7], [8] which
approximate adders over a significant range of accuracy values. do not split the output into approximate lower part and
Further, in an image addition application, this adder is shown accurate upper part. Instead, the adder is divided into many
to provide the best trade-off between PSNR and power. subadders and carry is predicted. Here we can trade off power
Index Terms—approximate adder, low power, accuracy, archi- and accuracy by varying the size of subadders. However, these
tectural approximation
adders perform worse than the two-part segmented adders in
terms of power-accuracy trade-off [9]. In this paper, therefore
I. I NTRODUCTION
we focus on segmented adders with two parts – an accurate
Many signal processing blocks, especially those meant for upper part and an inaccurate lower part.
video and speech, are error tolerant which makes it possi- It is possible to match the power savings of the truncation
ble to use inaccurate arithmetic units. This is exploited in adder if the approximate bits are set to a fixed value. In this
systems to save power and area as well as to reduce the paper, we propose to set it to a fixed value L, that minimizes
delay. Approximation is mainly done using voltage over- the mean error distance (MED). The value is chosen so that it
scaling and architectural approximation [1], [2]. In voltage is optimal for all inputs that have a symmetric probability mass
over-scaling, supply voltage is scaled down leading to power function (PMF). If the input PMF is not symmetric, we show
savings, but causing increased delays. This results in possible that it is close to optimal as long as the number of approximate
timing violations and hence inaccurate results. In architectural bits is not too large. We quantify the power savings of various
approximation, the functionality of the circuit is approximated two-part adders interms of the power savings per bit and the
and simplified so that even at nominal supply voltages, the MED. We have used the proposed adder in an image addition
results are not accurate. The simplification in functionality application to demonstrate its effectiveness.
results in reduced logic density and achieves savings in power, Section II contains a discussion of the power consumed by
delay and area. various approximate adders proposed in the literature. This is
Several approximate adders have been proposed and studied followed by a theoretical analysis leading to the choice of the
in the literature, each representing a trade-off between power fixed value L. Simulation results and power accuracy trade-
and accuracy. One possibility is to segment the adder into off for various adders are discussed in detail in Section IV.
two parts. The upper part of the sum containing its most Section V has the conclusions.
significant bits (MSBs) are obtained using accurate adders.
II. C OMPARISON OF PREVIOUSLY PROPOSED ADDERS
Approximate logic is used to compute the lower part of the
sum containing the remaining least significant bits (LSBs). In A. Notation and Previously proposed adders
such approximate adders, for a given number of lower-order Consider an approximate adder with N -bit inputs A, B and
bits being approximated, the power consumed by the accurate N + 1 bit sum S. Let k be the number of bits in the lower
upper part is almost the same. Power savings in the lower part of the sum which are approximated. The input A, with
portion is typically due to reduction in the switching activity binary representation aN −1 aN −2 ...ak ak−1 ...a0 , is denoted as
due to use of simpler gates. the concatenation AH AL , where AH = aN −1 aN −2 ...ak is the
The simplest of these adders is the truncation adder, where upper part and AL = ak−1 ...a0 is the lower part. The input B
the lower part is set to zero. Since there is no hardware is denoted as BH BL in a similar way. The adder output S has
978-3-9819263-0-9/DATE18/2018
c EDAA 1488
binary representation sN sN −1 ...sk sk−1 ...s0 and is denoted truncation adder. To do this, we need to have fixed values
as SH SL , where SH = sN sN −1 ...sk is the upper part and of SL and ck−1 so that, like the truncation adder, there is no
SL = sk−1 ...s0 is the lower part. ck−1 sk−1 sk−2 ...s0 denotes hardware to find the lower part of the output. The values of SL
the approximate sum of AL and BL . Here, ck−1 denotes the and ck−1 are to be chosen such that the MED is minimized.
carry bit to the upper part. For purposes of analysis, in this section, AH , AL , BH and
In a simple truncation adder, SL = 0 and ck−1 = 0. In BL refer to the corresponding decimal representation. In all
this adder there is no circuit to calculate the lower part of cases, we assume that both inputs have N bits and k bits of the
the output sum, and the power consumption is only due to the sum are approximated. The accurate sum therefore, is given
upper part. Consequently, it has the lowest power consumption by (AH + BH )2k + (AL + BL ).
amongst all the approximate adders in the literature. In the Assume that Z = AL +BL is approximated by a fixed value
approximate mirror adder 5 (AMA5) proposed in [3], the L. Therefore, the approximate sum is (AH + BH )2k + L. The
lower part of the result is set as SL = AL and the carry error distance (ED), which is the absolute value of the error,
is set as ck−1 = bk−1 . The AMA5 adder has higher power is therefore given by |Z − L|. The goal is to find L so that
consumption than truncation adder, due to the toggles in the E{|Z − L|} is minimised. The solution to this minimization
lower part that come from one of the inputs and also because of problem is well known, namely, the value of L that minimises
the carry propagating to the upper part of the sum. In the LOA E{|Z − L|} is the median of the distribution of Z [12]. For
approximate adder [4], SL = AL OR BL and ck−1 = ak−1 a discrete random variable, the median is defined as follows.
AND bk−1 . The power consumed by LOA adder is more than Definition: The median of the random variable Z is defined
that of truncation and AMA5 adders due to the OR gates used to be any number MZ that satisfies the relationship P (Z ≤
in computation of SL and the AND gate for ck−1 . In the ETA MZ ) ≥ 1/2 and P (Z ≥ MZ ) ≥ 1/2.
approximate adder proposed in [5], the lower part of the inputs Hence L = MZ . However, the value of MZ depends on the
are added from left to right until the point at which both input probability distribution (PMF) of Z. We now consider various
bits are logic 1. Beyond this point all the sum bits are set cases of input distributions.
to logic 1. Here ck−1 = 0. The hardware required and hence
the power consumed is higher than the LOA adder because A. Uniformly distributed inputs
it needs extra gates for the detection logic and setting of the This is the most common assumption for the PMF of the
sum bit to the appropriate value. In InXA2 adder proposed in inputs. Assume that A and B have a uniform PMF with values
[6], the i-th bit of the lower part sum is set as si = (ai XOR in the range [0, 2N −1]. It is obvious that AL and BL will also
bi ) OR ci−1 , where ci−1 is the carry obtained on adding the i have a uniform PMF with values within a range [0, 2k − 1].
LSBs of the two inputs. This adder has a relatively large power Since the PMF of Z = AL + BL is the convolution of the
consumption owing to the hardware needed for computation. PMFs of AL and BL , it is a triangular distribution with values
B. Accuracy Metrics between 0 and 2k+1 − 2. Since it is a symmetric PMF, the
median value is the midway point, namely, 2k −1. If the binary
To measure the accuracy and quality of approximate arith- representation of L is ck−1 sk−1 sk−2 · · · s0 , then ck−1 = 0 and
metic circuits, the metrics proposed in the literature are Mean sk−1 sk−2 · · · s0 = 11 · · · 1.
Error Distance (MED), Normalized Mean Error Distance
(NMED and NED) and Mean Relative Error Distance (MRED) B. Inputs have a symmetric distribution
[8], [10], [11].
MED is the average absolute error between the accurate Here, it is assumed that the PMFs of both inputs A and B
and approximate outputs. NMED is normalization of MED by are symmetric, i.e. P (A = Q) = P (A = 2N − 1 − Q) and
2N +1 − 2, which is the maximum sum possible in an N -bit P (B = Q) = P (B = 2N −1−Q), where 0 <= Q <= 2N −1.
adder. It is an indication of the significance of the MED in In such a setting, we claim that the distribution of AL and BL
adders of various lengths. NED is obtained by normalizing are also symmetric, i.e. P (AL = Q) = P (AL = 2k − 1 − Q)
the MED by 2k and is an indication of how rapidly the and P (BL = Q) = P (BL = 2k − 1 − Q), where 0 <= Q <=
MED grows with every additional approximate bit. MRED is a 2k − 1. A proof for this claim is as follows.
relative error metric and is an indicator of the percentage error Divide the range [0, 2N − 1] into 2N −k bins, each of size
across all values of the sum. The ranking of adders in terms of 2 . The i-th bin is given by [i2k , (i + 1)2k − 1], where 0 ≤
k
the MRED and NMED show the same trend [9]. As NMED is i ≤ 2N −k − 1. In each bin, the most significant N − k bits
just the MED divided by a constant value, optimization with have a fixed value. For 0 ≤ Q ≤ 2k − 1, we have
2N
−k
−1
respect to either MED or NMED will give the same result.
Since the primary error metric is the MED, we use it for P (AL = Q) = P (A = 2k i + Q).
i=0
further analysis and optimization.
Hence,
III. P ROPOSED M EDIAN A PPROXIMATE A DDER (MA) 2N
−k
−1
k
Our focus in this paper is to try and minimise the MED, P (AL = 2 − 1 − Q) = P (A = 2k i + 2k − 1 − Q).
while aiming to achieve the low power consumption of the i=0
log2 (MED)
InXA2 InXA2
MA MA MA
0.6
5 Trunc
5 5
AMA5
0.4
LOA
0 ETA
0.2 InXA2
0 0
MA
0 2 4 6 8 10 12 14 16 0 5 10 15 0
0 2 4 6 8 10 12 14 16 0 10 20 30 40 50 60
No. of approximate bits (k) No.of approximate bits (k) log2 (MED) Peak Signal to Noise Ratio (dB)
(a) (b) (c) (d)
Fig. 1: (a)-(b) Variation in Normalized Power savings and Mean Error Distance with the number of approximate bits in various 16-bit approximate adders;
(c) Normalized Power savings vs Mean Error Distance for 16-bit approximate adders; (d) Normalized Power vs PSNR for image addition using approximate
adders, with number of approximate bits varying from 1 to 7.
can be written as Ps,A = Psb,A (ktrunc + cA ). For the MA, a truncation adder, but provides significantly better accuracy.
Psb,M A = 1 and cMA ≈ log2 3 as seen from equation (1). When compared to other similar adders, simulation results
Therefore, MA always performs better than truncation. For show that this adder gives the best trade-off between power and
other approximate adders, accuracy. We also used this adder in a simple image processing
Ps,A = Psb,A · (ktrunc + cA ) application and showed that the median adder gives better
= Ps,trunc + Psb,A · cA − (1 − Psb,A ) log2 MED. (2) PSNR and power savings when compared to other similar
adders in the literature.
As seen in Table I, Psb,A < 1 for all adders other than
truncation and MA. As MED increases, the third term in (2) R EFERENCES
becomes greater than the second, so that Ps,A becomes less [1] J. Han and M. Orshansky, “Approximate computing: An emerging
than Ps,trunc . Fig. 1c shows that truncation adder performs paradigm for energy-efficient design,” in Proceedings of the 18th IEEE
European Test Symposium (ETS), 2013.
better than (a) ETA beyond log2 MED ≈ 3, (b) LOA beyond [2] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, “MACACO:
log2 MED ≈ 4 and (c) AMA5 beyond log2 MED ≈ 10. Modeling and analysis of circuits for approximate computing,” in
Although ETA and LOA have a larger cA for a given MED, the IEEE/ACM ICCAD, pp. 667–673, 2011.
[3] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power
difference is not large enough to result in a reduction of power. digital signal processing using approximate adders,” IEEE Trans. on
This is because both of them have a significant amount of Comp.-Aided Design of Integrated Circuits and Systems, 2013.
hardware to compute the lower part sum, resulting in a lower [4] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired
AMA5 computational blocks for efficient VLSI implementation of soft-
Psb,A . AMA5 effects a power saving compared to truncation computing applications,” IEEE Trans. on Circuits and Systems I: Regular
for most cases because its Psb,A is high as seen in Table I. Papers, vol. 57, pp. 850–862, 4 2010.
Since the MA adder has no additional hardware, the power [5] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of
low-power high-speed truncation-error-tolerant adder and its application
savings due to the increase in cA are fully realised. in digital signal processing,” IEEE Transactions on Very Large Scale
Both the power savings and MED depend only upon the Integration (VLSI) Systems, vol. 18, pp. 1225–1229, 8 2010.
number of approximate bits k. So, for any adder size N , the [6] H. A. F. Almurib, T. N. Kumar, and F. Lombardi, “Inexact designs
for approximate low power addition by cell replacement,” in Design,
power savings versus MED trade-off remains the same. Automation and Test in Europe (DATE), 2016.
[7] D. Mohapatra, V. K. Chippa, A. Raghunathan, and K. Roy, “Design of
C. Application: Image addition voltage-scalable meta-functions for approximate computing,” in Design,
Automation and Test in Europe (DATE), 2011.
The approximate adders are used in a simple image pro- [8] A. B. Kahng and S. Kang, “Accuracy-configurable adder for approximate
cessing application, namely image addition. The objective is arithmetic designs,” in Design Automation Conference (DAC), 2012.
to compare the power consumed by various adders for a given [9] H. Jiang, J. Han, and F. Lombardi, “A comparative review and evaluation
of approximate adders,” in Proc. of the Great Lakes Symposium on VLSI
peak signal to noise ratio (PSNR). Two images (Cameraman (GLSVLSI), pp. 343–348, 2015.
and Rice) each of size 256x256 with each pixel represented [10] C. Liu, J. Han, and F. Lombardi, “A low-power, high-performance
using 8 bits are chosen as input images. Fig. 1d shows the approximate multiplier with configurable partial error recovery,” in
Design, Automation and Test in Europe (DATE), 2014.
dynamic power consumption as a function of PSNR for various [11] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of ap-
approximate adders, with number of approximate bits varying proximate and probabilistic adders,” IEEE Transactions on Computers,
from 1 to 7. From the figure, we see that MA once again vol. 62, pp. 1760–1771, 9 2013.
[12] S. K. Bar-Lev, B. Boukai, and P. Enis, “On the mean squared error, the
gives the best trade-off between dynamic power and PSNR as mean absolute error and the like,” Communications in Statistics - Theory
compared to all the other approximate adders. and Methods, vol. 28, no. 8, pp. 1813–1822, 1999.
V. C ONCLUSION
We have proposed the median adder, which approximates
the lower k bits of the output to the fixed value 2k − 1. The
median adder results in power consumption as low as that of