0% found this document useful (0 votes)
21 views6 pages

Kahng 2012

The document presents an accuracy-configurable adder (ACA) designed for approximate arithmetic, allowing for runtime adjustments in accuracy to optimize performance and power consumption. This ACA adder can operate in both approximate and accurate modes, achieving significant power reductions compared to conventional adders while maintaining the ability to meet varying accuracy requirements across different applications. The proposed design addresses the limitations of previous approximate adders by integrating error detection and correction mechanisms, enhancing its adaptability for dynamic accuracy needs.

Uploaded by

krishna s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Kahng 2012

The document presents an accuracy-configurable adder (ACA) designed for approximate arithmetic, allowing for runtime adjustments in accuracy to optimize performance and power consumption. This ACA adder can operate in both approximate and accurate modes, achieving significant power reductions compared to conventional adders while maintaining the ability to meet varying accuracy requirements across different applications. The proposed design addresses the limitations of previous approximate adders by integrating error detection and correction mechanisms, enhancing its adaptability for dynamic accuracy needs.

Uploaded by

krishna s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Accuracy-Configurable Adder for Approximate Arithmetic Designs

Andrew B. Kahng†‡ and Seokhyeong Kang†


† ‡
ECE and CSE Departments, University of California at San Diego
[email protected], [email protected]

ABSTRACT Various approximate arithmetic designs have been previously


Approximation can increase performance or reduce power consump- proposed. Lu [7] introduces a faster adder which has shorter carry
tion with a simplified or inaccurate circuit in application contexts chains and considers only the previous k bits of input in computing
where strict requirements are relaxed. For applications related to a carry bit. Verma et al. [12] provide a variable latency specula-
human senses, approximate arithmetic can be used to generate suf- tive adder (V LSA), which is a reliable version of the Lu adder [7]
ficient results rather than absolutely accurate results. Approximate with error detection and correction. Shin et al. [10] also propose
design exploits a tradeoff of accuracy in computation versus per- a data path redesign technique for various adders which cuts the
formance and power. However, required accuracy varies according critical path in the carry chain. Zhu et al. [14] [13] propose three
to applications, and 100% accurate results are still required in some approximate adders – ET AI, ET AII and ET AIIM . ETAI is
situations. In this paper, we propose an accuracy-configurable ap- divided into an accurate part and an inaccurate part to achieve ap-
proximate (ACA) adder for which the accuracy of results is con- proximate results. ETAII cuts carry propagation to speed up the
figurable during runtime. Because of its configurability, the ACA adder, and ETAIIM modifies ETAII by connecting carry chains
in accurate MSB parts. Kulkarni et al. [5] present a 2x2 under-
adder can adaptively operate in both approximate (inaccurate) mode designed multiplier, and use it to build large power-efficient ap-
and accurate mode. The proposed adder can achieve significant proximate multipliers. George et al. [3] define the concept of prob-
throughput improvement and total power reduction over conven- abilistic CMOS (PCMOS), and implement efficient arithmetic us-
tional adder designs. It can be used in accuracy-configurable ap- ing P CM OS. Shin et al. [11] propose a logic synthesis approach
plications, and improves the achievable tradeoff between perfor- to design an approximate circuit.
mance/power and quality. The ACA adder achieves approximately The approximate designs produce almost-correct results at the
30% power reduction versus the conventional pipelined adder at the given required accuracy, and obtain power reductions or perfor-
relaxed accuracy requirement. mance improvements in return. In some applications, however,
Categories and Subject Descriptors more accurate or totally accurate results are required under cer-
tain conditions – e.g., image processing in security cameras would
B.7.2 [Hardware]: INTEGRATED CIRCUITS—Design Aids; J.6 require cleaner images after detecting a motion. In contexts where
[Computer Applications]: COMPUTER-AIDED ENGINEERING the required accuracy changes during runtime, the accuracy of re-
sults should be configurable to maximize the benefit of approximate
General Terms operations. Figure 1 illustrates how power benefits can be achieved
Algorithms, Design, Performance with an accuracy-configurable design. The accuracy-configurable
design can adapt to changing accuracy constraints by using differ-
Keywords ent modes in each situation. To our knowledge, no previous work
Approximate Arithmetic, Error-Tolerance, Power Minimization, can configure the output accuracy during runtime, and each is thus
Accuracy-Configurable Adder restricted (or, best-suited) to particular application contexts. In con-
texts where the accuracy requirement can change dynamically, the
previous methods’ benefits from the accuracy tradeoff are reduced
1. INTRODUCTION since the implementation must be targeted to the maximum accu-
Guardbands for dynamic variations severely limit performance racy requirement.
and energy efficiency of conventional IC designs. To overcome
consequences of overdesign, several recent mechanisms for vari- accurate mode
ation-resilient design [4] allow timing errors and manage design
reliability dynamically. Relaxing the requirement of correctness for 1.0 accurate
normalized power

designs may dramatically reduce costs of manufacturing, verifica- design


tion and test [16]. In resilient designs, errors can be corrected with
redundancy techniques (error-tolerance), or accepted in some ap-
plications relating to human senses such as hearing and sight (error- accuracy
acceptance). In the error-acceptance regime, approximation via a configurable
simplified or inaccurate circuit can increase performance and/or re- approximate mode design
duce power consumption. required accuracy
80% 100% 90% 80%
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are time
event occurred
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to Figure 1: Power benefits from accuracy-configurable design.
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. In this paper, we propose an accuracy-configurable approximate
DAC 2012, June 3-7, 2012, San Francisco, California, USA. (ACA) adder, which can configure the accuracy of results during
Copyright 2012 ACM ACM 978-1-4503-1199-1/12/06 ...$10.00. runtime. The main contributions of our work are the following.

820
• The proposed ACA adder has runtime-configurable accuracy adder with a parameter k, which is the bit-width of the sub-adder
to better enable tradeoff of accuracy in computation versus result. In the adder, each divided sub-module produces a k-bit re-
performance and power. sult except for the last sub-module, which produces a 2k-bit result.
• We provide quantitative metrics for an approximate arith- The approximate adder thus consists of the (N/k − 1) sub-modules
metic design. We compare the ACA adder to previous ap- as described in Equation (1).
proximate adders based on these metrics. SU M [N − ik − 1 : N − (i + 1)k] =
• We demonstrate the power benefits of the ACA adder over A[N − ik − 1 : N − (i + 2)k] +
previous approximate and conventional adder designs for ac-
curacy-configurable applications. B[N − ik − 1 : N − (i + 2)k],
where i = 0, ..., N/k − 2 (1)
The rest of the paper is organized as follows. Section 2 presents
the proposed ACA adder design. Section 3 provides experimen- In modern adder designs, such as carry-lookahead (CLA), carry-
tal results and analysis. Section 4 summarizes and concludes the select and Kogge-Stone adders, the path depth and area are asymp-
paper. totically proportional to log2 N and N log2 N respectively, where
N is the bit-width of the adder [15]. Based on this, we can ex-
press delay, area and power consumption of the proposed adder in
2. ACCURACY-CONFIGURABLE ADDER terms of the parameters N and k. The proposed ACA adder has
(N/k − 1) sub-adders, each of which is a 2k-bit adder. Therefore,
2.1 Approximate Adder Implementation delay of the critical path can be expressed with Equation (2) and
area can be estimated with Equation (3), where Cdelay and Carea
FDUU\
$+ $>@ are constants for delay and area, respectively.
$0 $>@
ELW
$/ $>@ DGGHU 680+ delay = Cdelay (log2 k + 1) (2)
680>@
$+%+
$>@
680>@ area = Carea (N − 2k)(log2 k + 1) (3)

ELW
680>@ P owerdyn = Cpower (N − 2k)(log2 k + 1)2 (4)
DGGHU 6800
680>@ Power consumption of the ACA adder can be roughly estimated
$0%0
680>@ as follows. Dynamic power consumption with voltage scaling at
$>@
$>@ %>@ ELW
a fixed frequency is proportional to capacitance · Vdd 2 , where
680 the capacitance is proportional to the area. Cell delay is pro-
DGGHU 680/
2
$/%/ portional to 1/(Vdd − Vt )β , and Vdd is roughly proportional to
1/(cell delay) if we assume that β is 2. Since (cell delay) ×
Figure 2: Proposed approximate adder – 16-bit adder case. 2
(path depth) is constant at a fixed frequency, Vdd is proportional to
Previous approximate adders [7] [10] [14] have difficulty detect- the path depth, which is log2 k + 1. Consequently, dynamic power
ing and correcting errors since they are designed for error-accept- with voltage scaling can be expressed using Equation (4), where
able applications with a target accuracy. However, accurate com- Cpower is a constant fixed for given Vdd for dynamic power con-
putations are still required at certain times, according to the appli- sumption. Static power consumption of the adder can be roughly
cation. VLSA [12] can provide accurate results, but has large delay estimated as proportional to the area in Equation (3).
and area overhead for the error detection and correction. The cen- In our proposed adder design, the output of each sub-adder (ex-
tral contribution of our present work is to propose an approximate cept the last sub-adder) is incorrect when a carry input should be
adder which supports both accurate and inaccurate computation propagated to the results. In Figure 2, when the carry[4] (carry
with error-correction and accuracy-configuration capability. Figure bit from AL + BL ) is ‘1’ and SU MM [3 : 0] is 1111(2) , the
2 shows our proposed approximate circuit for the case of a 16-bit output result has an error in SU M [11 : 8]. In the general im-
adder. In the adder, the carry chain is cut to reduce critical-path plementation, the output result will be correct when there are no
delay, and three sub-adders generate results of partial summations. errors in all (N/k − 1) sub-adders. In the ith sub-adder, errors
With the reduced critical-path delay, high performance (by increas- occur when (1) the LSB part of the result (SU Mi [k − 1 : 0])
ing the clock frequency) or low power consumption (by decreasing has all ‘1’ values (probability P = 21k ) and (2) the LSB part
the operating voltage) is obtained. A middle sub-adder (AM +BM ) ([k − 1 : 0]) of the (i + 1)th sub-adder produces a carry bit (prob-
is introduced to increase accuracy. Without the middle sub-adder ability P = 14 + 12 · 14 + 12 · 12 · 14 + ...). Therefore, with a random
(as in ETAII [13]), error occurs when the eighth carry bit is high, input vector, the probability of having a correct result in the pro-
and for random input patterns the error rate is 50.1%. On the posed adder is
other hand, with the introduction of the middle sub-adder, error rate
for random input patterns is reduced to 5.5%. (In the real imple- 1 2k − 1 Nk −2
mentation, all redundant parts (four-LSB output of AH + BH and P (N, k) = (1 − · ) (5)
2k 2k+1
AM + BM sub-adders) are optimized only for carry-generation.)
Table 1 shows the estimated results of 16-bit ACA adders with
k N: bit width, k: ½ carry-chain depth
different parameter values k. With smaller k value, the minimum
clock period and dynamic power can be reduced, but the pass rate
(probability of having a correct result) will be decreased. The esti-
A [N-1:N-k] A [N-k-1:N-2k] A [N-2k-1:N-3k] A [N-2k-1:N-3k]
mations come from Equations (2), (3), (4) and (5). In Section 3.3
below, we validate the above estimation with real implementations.
B [N-1:N-k] B [N-k-1:N-2k] B [N-2k-1:N-3k] B [N-2k-1:N-3k]

Table 1: Estimated minimum clock cycle, area, dynamic power and pass rate for
each k value when N = 16 (normalized to the conventional CLA 16-bit adder).
carry SUM [N-1:N-k] SUM [N-k-1:N-2k] SUM [N-2k-1:N-3k]
k=2 k=3 k=4 k=5 k=6
min. clock period 0.5 0.65 0.75 0.83 0.89
Figure 3: General implementation for the proposed adder. area 0.87 1.05 1.12 1.15 1.12
dynamic power 0.44 0.68 0.84 0.95 1.00
We can generalize the implementation of the proposed approxi- pass rate 0.554 0.829 0.942 0.982 0.995
mate adder. Figure 3 shows the general implementation of an N -bit

821
2.2 Error Detection and Correction for Accurate tiple stages. Figure 6 shows the pipelined adder implementation
Computation (k = N/8 case), in which four pipeline stages are required to
As described in Section 2.1, our proposed adder is incorrect when achieve a 100% accurate result. In the pipelined adder, each stage
a carry bit is propagated between sub-adders. However, the error generates a result with different accuracy; the output accuracy in-
can be detected and corrected with a small overhead. We detect an creases as the number of pipeline stages increases. According to
error for each sub-adder by checking the output of the sub-adder the accuracy requirement, we can turn off the later stages with a
and the carry-in signal that comes from the previous sub-adder. Er- power gating technique, and we can reduce the power consumption
ror detection can be implemented with several ‘and’ gates. To cor- further with the accuracy tradeoff.
rect the error, ‘1’ should be added to the approximate (inaccurate) Since the proposed adder supports both approximate and accu-
output, and the error correction can be implemented with an incre- rate results, it can be used in applications that require accurate re-
mentor circuit. sults only under certain conditions. Conventional accurate designs
are energy-inefficient in the error-acceptable application context,
approximate adder EDC circuit because they always compute the exact function. Previous approx-
SUMapprox imate designs cannot handle a varying accuracy requirement, and
IN sumi OUT this limits the benefit of the accuracy tradeoff: as noted above, the
sub-adderi SUMcorrect approximate function must meet the maximum accuracy threshold
incrementor across all applications. Moreover, if the application requests an ex-
sub-adderi+1 errori
act computation, additional accurate circuits must be added to the
previous approximate designs. By contrast, the ACA design effi-
error ciently exploits a tradeoff between accuracy and power/performance
with its runtime accuracy configurability.
data stall carryi+1
Stage 1 Stage 2
Figure 4: Error detection and correction with the approximate adder. AL
BL N/2-bit adder SUML
With these simple error detection and correction circuits, our carry
proposed adder can be implemented to have variable latency like AH
N/2-bit adder SUMH
the previous VLSA adder [12], with a small overhead for an er- BH
ror detection and correction (EDC) system. Figure 4 shows an
EDC system with our proposed adder. The error detection cir- error
cuit (‘and’ gates) checks the carry propagation and generates an A
approximate adder SUMcorrect
error signal. The error correction (incrementor) circuit produces B error correction
an error-free output by adding compensation data, and requires an SUMapprox
accurate power gating
additional clock cycle. When errors are detected from input pat- mode switches
terns, the error signal is activated. The error signal holds the input Figure 5: Pipelined adder implementation – conventional adder (above) and ap-
pattern during the error correction and chooses the error-corrected proximate adder (below). In approximate operation, the error correction stage is
value (SU Mcorrect ) as an output. With this approach, our approxi- power-gated.
mate adder can provide accurate results at a higher clock frequency
than that of conventional adders (e.g., CLA). According to the esti- 3. EXPERIMENTAL SETUP AND RESULTS
mated results in Table 1, clock period can be reduced by 25% with
6% (= error rate) recovery-cycle overhead (16-bit ACA, k = 4). 3.1 Experimental Setup
To test approximate designs, we have written each design in Ver-
2.3 Accuracy Configuration with Pipelined Archi- ilog and synthesized it to a TSMC 65GP cell library with Synopsys
tecture DesignCompiler [17]. We then perform gate-level simulations us-
When our proposed adder is combined with a pipelined architec- ing Cadence NC-Sim [18]. In the simulation, gate delay is taken
ture, we can obtain accurate results with the same throughput as a from an SDF (standard delay format) file. For voltage scaling ex-
conventional adder. In the pipelined architecture, approximate ad- periments, we prepare Synopsys Liberty (.lib) files for each voltage
ditions are computed at the first pipeline stage, and error correction from 1.00V to 0.60V in 0.01V increments, using Cadence Library
can be completed at the second stage. Figure 5 shows the conven- Characterizer v9.1 [19]. The prepared libraries are used for SDF
tional pipelined adder (above) and the approximate adder (below). file generation and power estimation at each voltage. Each simula-
The pipelined implementation of approximate adder has a struc- tion is performed with input patterns for one million cycles. During
tural analogy with the pipelined adder of the 2006 U.S. patent [8] in the simulation, each output value is compared with a reference (cor-
which partial summations are performed at the first stage and carry rect) value to produce the accuracy metrics. For the input patterns,
bits are added at the later stages. However, the patent is clearly we use random data, as well as actual data from SPEC 2006 [20]
directed to accurate operations, not approximate computations. In benchmarks. We extract operand data from ADD instructions in
addition, we use our approximate adder (Figure 3) in the first stage. the SPEC benchmarks.
In the pipelined approach, there is no improvement of the clock fre-
quency since the achievable clock period is the same as that of the 3.2 Metric for Approximate Design
conventional adder. However, power benefits are obtained through To quantify errors in approximate designs, two metrics have been
configuration of accuracy: in the approximate mode, the error cor- previously proposed [1]. Error rate (ER) is the percentage of cy-
rection stage is power-gated with foot (or, head) switches in Figure cles in which output value is different from the correct value. Error
5, and power reduction over the conventional adder design can be significance (ES) is the numerical difference between correct and
achieved. We compare the conventional and approximate pipelined output results; this quantifies the amount of error. In image/video
adders in Section 3. applications, [2] uses the product of ES and ER as a metric of
In the proposed adder implementation, to achieve higher perfor- error tolerance. [10] introduces a criterion for acceptability: ES
mance or lower power consumption, we can reduce the carry chain × ER ≤ acceptance threshold, where the acceptance threshold is
depth (k) of sub-adders (see Table 1). However, when k is less than specified according to the application. For the error significance
N/4, it is impossible to correct all errors and achieve 100% cor- (ES) metric, [14] considers only amplitude of error. This is use-
rect results within one clock cycle since the error-correction paths ful for many digital signal processing (DSP) systems that process,
become critical. To achieve correct results in the pipelined imple- e.g., sound and image data. However, in communication systems
mentation, the error-correction stage should be extended to mul- that mainly handle information data, the number of incorrect bits

822
6WDJH 6WDJH 6WDJH 6WDJH

$ FRUUHFWLRQRQ6 FRUUHFWLRQRQ6 FRUUHFWLRQRQ6


DSSUR[LPDWH
680FRUUHFW
DGGHU
%
HUURUV HUURUV HUURUV
RQ6 RQ6 RQ6

680 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

DSSUR[LPDWH FRUUHFW DSSUR[LPDWH FRUUHFW DSSUR[ FRUUHFW FRUUHFW

Figure 6: Accuracy-configurable implementation for pipelined adder.

(Hamming distance) is a more meaningful metric for accuracy – adders: CLA, Lu’s adder [7], ETAI, ETAIIM [14] and the pro-
e.g. a (32,28) Reed-Solomon code can correct up to 2-byte errors. posed ACA adder (without error correction). In the experiment,
This consideration for the ES metric is required when approximate the same carry-chain width (8-bit) is selected for the four approxi-
arithmetic is applied to error-tolerant systems with a redundancy mate adders. In the implementation, a register (flip-flop) is inserted
technique. in each output port to detect timing errors.
Table 2 shows two accuracy metrics for amplitude data and in- Table 4 shows area, pass rate, accuracy, minimum clock period
formation data. ACCamp used in [14] quantifies the amplitude of and EDC overhead for each adder design. According to the re-
errors, where Rc and Re are the correct and obtained results, re- sults, the ETAI adder has the smallest design area, but has a low
spectively. We propose another accuracy metric, ACCinf , which pass rate and limited accuracy with respect to the ACCinf metric.
measures error significance as Hamming distance, where Be is the Therefore, the ETAI adder is preferred for applications which allow
number of error bits and Bw is the bit-width of the data. For ex- low accuracy in results. The ETAIIM adder shows fairly high ac-
ample, when the correct (reference) data is 1000_0000(2) and the curacy, but does not have speed (clock period) benefit. Lu’s adder
result data is 1100_0000(2) , accuracy with ACCamp and ACCinf shows a smaller error rate and high accuracy with respect to both
will be 12 and 78 , respectively. To evaluate the approximate cir- ACCamp and ACCinf metrics. However, it requires larger area
cuits, we obtain average values of accuracy metrics ACCamp and than the other designs. The proposed adder shows similar results
ACCinf over the entire simulation to consider both ER and ES. for both metrics as Lu’s adder. However, the area of the ACA adder
is smaller than that of Lu’s adder, and EDC is possible with small
Table 2: Accuracy metrics for error significance (ES).
area overhead (28%). With the ACA adder, the minimum clock
metric definition data type period can be reduced by 26% compared to the accurate CLA.
ACCamp 1 − |Rc − Re |/Rc amplitude data
ACCinf 1 − Be /Bw information data 1.000

ACCamp
Voltage scaling
(1.0V~0.6V)
Table 3: ACA adder results with different k values. 0.900
k 2 3 4 5 1.000

min. clock period (ps) 180 190 220 230 0.800


0.995
area (um2 ) 550 990 920 840
0.700
pass rate (%) 55.3 82.8 94.0 98.1 0.990
throughput improvement (%) 11.3 24.6 22.3 21.4 3.00E-04 8.00E-04
0.600

ACA adder CLA


Table 4: Design comparison for each adder design. 0.500 Lu's adder ETAI
CLA LU ACA ETAI ETAIIM ETAIIM total power (W)
area (um2 ) 910 1356 923 576 678 0.400
2.00E-04 4.00E-04 6.00E-04 8.00E-04 1.00E-03 1.20E-03
min. clock period (ps) 280 210 200 200 260
pass rate (%) 100 99.2 94.1 10.0 97.0 1.000
ACCinf

Voltage scaling
ACCamp (maximum) 1.000 0.998 0.997 0.999 0.999 (1.0V~0.6V)
ACCinf (maximum) 1.000 0.999 0.993 0.694 0.996 0.900
1.000
area overhead for EDC N/A 75% 28% N/A 15%
0.800
0.990
0.700

3.3 Approximate Adder with Different Parameters 0.600


0.980
4.00E-04 8.00E-04
We explore the proposed adder with different parameters (k: half ACA adder CLA
of carry-chain depth). Table 3 summarizes results – minimum clock 0.500 Lu's adder ETAI
period, area, error rate and throughput improvements – for each im- ETAIIM
total power (W)
0.400
plementation of the 16-bit adder with different k values. According 2.00E-04 4.00E-04 6.00E-04 8.00E-04 1.00E-03 1.20E-03
to the results, with smaller k, the maximum operating frequency in-
Figure 7: Accuracy (y-axis) vs. power consumption (x-axis) under fixed clock
creases, but the error rate increases as well. With higher k, the er- period (0.25ns) and scaled voltage (from 1.0V to 0.6V ).
ror rate is reduced significantly, but the benefit of the approximate
circuit, i.e., clock period reduction, is small. In the table, through- Figure 7 shows a power vs. accuracy tradeoff in a voltage scaling
put improvement over conventional design is calculated including scenario: the x-axis shows total power consumption, and the y-axis
error recovery overhead. From the implementations, a maximum shows the accuracy (ACCamp , ACCinf ). The power consumption
throughput improvement is achieved when k = 3. If we correct and the accuracy are measured with different voltage libraries char-
erroneous results with EDC as in Figure 4, then 17.2% additional acterized using Cadence Library Characterizer [19]. The clock
clock cycles are required for error correction. With this overhead, period is fixed at 0.30ns during the simulations. In the results,
ACA adder can improve data throughput by 24.6% over the con- Lu’s adder does not show power benefits due to its design size.
ventional CLA adder. ETAI shows low power consumption and high ACCamp accuracy,
but has low ACCinf accuracy, and cannot detect and correct er-
3.4 Approximate Adder Comparison rors. ETAIIM shows similar characteristics to ACA in the voltage
We evaluate each approximate adder with respect to the pass scaling case, but the adder cannot be used for a high-performance
rate and the accuracy metrics which we have proposed. We use (high-frequency) design, as shown in Table 4. The results in Figure
gate-level simulation at each possible clock period to compare five 7 imply that our proposed adder can provide a significant power

823
reduction with small accuracy penalty. When the required accu- lected as N/4 for a two-stage pipelined implementation. In the
racy is 0.970 (ACCamp ), the ACA adder shows 37.0%, 36.4% and table, minimum clock period is measured at a fixed voltage (1.0V ),
15.9% total power reduction over CLA, Lu’s adder and ETAIIM, and total power is measured at a fixed frequency (2.5GHz) with
respectively. voltage scaling. In the ACA adder case, timing and power over-
We have tested our approximate adder on a real application – a heads from power gating cells, output MUXes, and IR drop are
Gaussian smoothing filter used in [6]. Gaussian smoothing is per- included. We can see that area, timing and power of both designs
formed on the input image by convolving with a matrix in the spa- are similar when the ACA adder operates in the accurate mode.
tial domain. In the convolution, the addition operation is done with Total power of the approximate adder is comparable to that of the
approximate 16-bit adders. Other operations, such as multiplication conventional adder, even though ACA has additional EDC circuits.
and division, are accurate computations. Figure 8 shows results for This is because ACA has fewer registers between stage-1 and stage-
various approximate adders when they consume 50% of the power 2 than the conventional pipelined adder. (In Figure 5, the conven-
of accurate CLA. From the results, the ACA adder has PSNR of tional adder requires registers for AH , BH , SU ML and carry at
24.5dB, and this suggests that image processing/filtering applica- the first stage. For a 16-bit adder, 25 registers (8 + 8 + 8 + 1) are
tions could employ our proposed adder with significant power sav- required. On the other hand, ACA requires 18 registers (16 for
ings and only small loss in image quality. SU Mapprox and 2 for error indication).)

7.00E-03
voltage scaling accurate result
6.00E-03

total power consumption (W)


5.00E-03

4.00E-03

3.00E-03 mode change


2.00E-03
Conventional pipelined adder ACA adder (mode 1)
ACA adder (mode 2) ACA adder (mode 3)
(a) (b) (c) 1.00E-03
ACA adder (mode 4)
0.00E+00
0.95 0.96 0.97 0.98 0.99 1.00

7.00E-03
ACCamp
accurate result
6.00E-03 voltage scaling
total power consumption (W)

5.00E-03

4.00E-03
mode change
3.00E-03

(d) (e) (f) 2.00E-03


Conventional pipelined adder ACA adder (mode 1)
1.00E-03 ACA adder (mode 2) ACA adder (mode 3)
Figure 8: Image smoothing: (a) original image with noise; (b) accurate adder; (c) ACA adder (mode 4)
0.00E+00
ACA, PSNR: 24.5 dB; (d) ETAI, PSNR: 25.3 dB; (e) ETAIIM, PSNR: 16.2 dB; (f) 0.80 0.85 0.90 0.95 1.00
Lu’s adder, PSNR: 11.1dB.
ACCinf
Table 5: Comparison between conventional and approximate (2-stage) pipelined Figure 9: Accuracy metric ACCamp (above) and ACCinf (below) vs. power
adders at the accurate mode. consumption for conventional pipelined adder, ACA adder in accurate mode, and
conventional pipelined approximate pipelined ACA adder in approximate mode (4-stage, 32-bit adder).
adder area clock total area clock total
width (um2 ) period power k (um2 ) period power In the pipelined architecture, the ACA adder can provide various
(N ) (ns) (mW ) (ns) (mW )
8 459 0.313 0.557 2 576 0.312 0.564 configurable modes according to the pipeline depth. To improve
16 1082 0.357 1.558 4 1171 0.358 1.669 the design performance, we increase the pipeline depth; the deeper
32 2252 0.404 2.860 8 2420 0.414 2.914 pipeline reduces the path depth of the design. In the conventional
pipelined adder, bit-width of the adder in each stage can be reduced
to N/#stage, where N is the entire bit-width and #stage is the
Table 6: Implementation results of 32-bit ACA adder with 4-stage pipeline (power
consumption of each mode and power reduction over conventional pipelined
depth (number) of the pipeline stages. In the ACA adder, we can re-
adder). duce the value of parameter k with deeper pipeline depth as shown
in Figure 6. To show the benefit of accuracy configuration, we have
config.
power- ACCamp ACCinf total power reduction implemented a 32-bit ACA adder (N = 32, k = 4) with 4-stage
gating (max.) (max.) (mW) (%) pipeline, and compared it with a conventional pipelined adder with
mode-1 none 1.000 1.000 5.962 -11.5% an 8-bit CLA in each stage. Table 6 shows the implemented results
mode-2 stage-4 0.998 0.960 4.683 12.4%
mode-3 stage-3, 4 0.991 0.925 3.691 31.0% for the 32-bit ACA adder. For the accuracy estimation, one million
mode-4 stage-2, 3, 4 0.983 0.900 2.588 51.6% cycles of random patterns are used. The ACA adder can operate
in four different modes, based on the power gating of each stage.
We can see that the modes show different power consumptions and
3.5 Accuracy Configuration and Power Savings different achievable accuracies. The ACA adder consumes 11.5%
When the architecture allows pipelining for addition, our pro- more power than the conventional adder in accurate mode (mode-1)
posed adder can be implemented as shown in Figure 5. We imple- due to the presence of recovery circuits. At the same time, it shows
ment both the conventional pipelined adder and the approximate a significant power reduction in the approximate modes: 12.4%,
pipelined adder to compare the designs in terms of area, timing and 31.0% and 51.6% in mode-2, mode-3 and mode-4, respectively.
power. In the implementation, registers (flip-flops) are included at Figure 9 shows detailed results for power consumption versus ac-
each pipeline stage (before stage-1, between stage-1 and stage-2, curacy metrics in each configuration. From the results, we can see
and after stage-2). that accuracy configuration with the mode change is much more ef-
Table 5 shows the implementation results for the conventional fective than with voltage scaling, in terms of the tradeoff between
and approximate pipelined adders. The parameter k has been se- accuracy and power.

824
Table 7: Accuracy (ACCamp , ACCinf ) results of 32-bit ACA adder for real benchmarks (SPEC 2006).
accuracy metric benchmark astar bzip2 calculix gcc h264ref mcf sjeng soplex
mode-1 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
mode-2 0.9999 1.0000 0.9999 0.9992 0.9999 0.9997 0.9998 0.9999
ACCamp
mode-3 0.9993 0.9998 0.9972 0.9990 0.9990 0.9997 0.9995 0.9998
mode-4 0.9979 0.9970 0.9958 0.9951 0.9978 0.9991 0.9981 0.9953
mode-1 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
mode-2 0.9979 1.0000 0.9978 0.9881 0.9953 0.9819 0.9897 0.9985
ACCinf
mode-3 0.9949 0.9984 0.9967 0.9849 0.9897 0.9809 0.9876 0.9965
mode-4 0.9940 0.9931 0.9910 0.9617 0.9851 0.9596 0.9787 0.9925

1 adder. The ACA adder can also be used in accuracy-configurable


0.99 d ACCamp d 1.00
Normalized power consumption

applications with pipelining. We demonstrate that the ACA adder


0.8 can provide approximately 30% power reduction under a relaxed
accuracy requirement versus the conventional pipelined adder. Fi-
0.6 mode-4 nally, we show that our ACA adder can improve the achievable
mode-3 tradeoff between performance, power and quality for given accu-
0.4
mode-2 racy requirements.
mode-1 Our ongoing work seeks to implement accuracy-configurable de-
0.2
signs for other arithmetic components such as multipliers, multi-
input adders, etc. More broadly, our research addresses additional
0
astar bzip2 calculix gcc h264ref mcf sjeng soplex
aspects of (runtime) accuracy-configurable systems and applica-
tions.
1
Normalized power consumption

0.95 d ACCinf d 1.00 5. REFERENCES


0.8
[1] M. A. Breuer, “Intelligible Test Techniques to Support Error-Tolerance”, Proc.
0.6 Asian Test Symp., 2004, pp. 386–393.
mode-4
[2] I. Chong, H. Y. Cheong and A. Ortega, “New Quality Metric for Multimedia
mode-3 Compression Using Faulty Hardware”, Proc. International Workshop on Video
0.4
mode-2 Processing and Quality Metrics for Consumer Electronics, 2006, pp. 267–272.
mode-1
[3] J. George, B. Marr, B. E. S. Akgul and K. V. Palem, “Probabilistic Arithmetic
0.2 and Energy Efficient Embedded Signal Processing”, Proc. CASES, 2006, pp.
158–168.
0 [4] S. Ghosh and K. Roy, “Parameter Variation Tolerance and Error Resiliency:
astar bzip2 calculix gcc h264ref mcf sjeng soplex New Design Paradigm for the Nanoscale Era”, Proceedings of the IEEE
98(10) (2010), pp. 1718–1751.
Figure 10: Normalized power consumption versus conventional pipelined de- [5] P. Kulkarni, P. Gupta and M. Ercegovac, “Trading Accuracy for Power with an
sign when the accuracy requirement is varied uniformly over the interval 0.99 Underdesigned Multiplier Architecture”, Proc. IEEE/ACM International
≤ ACCamp ≤ 1.00 and 0.95 ≤ ACCinf ≤ 1.00. Conference on VLSI Design, 2011, pp. 346–351.
[6] M. S. Lau, K.-V. Ling and Y.-C. Chu, “Energy-Aware Probabilistic Multiplier:
Design and Analysis”, Proc. CASES, 2009, pp. 281–290.
We also obtain the accuracy results in each accuracy mode with [7] S.-L. Lu, “Speeding Up Processing with Approximation Circuits”, IEEE
real input patterns extracted from SPEC 2006 benchmarks. Table 7 Computer 37(3) (2004) pp. 67-73.
shows accuracy results of a 32-bit ACA adder with such real input [8] H. D. Mohammed and L. Hemmert, “Fast Pipelined Adder/Subtractor using
patterns. The accuracy results are different for each benchmark, Increment/Decrement Function with Reduced Register Utilization”, U.S.
e.g, the measured accuracy for bzip2 is higher than for gcc. Fur- Patent No. 7,007,059, 2006.
thermore, the accuracy with real patterns is greater than with ran- [9] B. J. Phillips, D. R. Kelly and B. W. Ng, “Estimating Adders for a Low
Density Parity Check Decoder”, Proc. SPIE, vol. 6313, 2006, pp. 1–9.
dom input patterns (Table 6), most likely because addition inputs
[10] D. Shin and S. K. Gupta, “A Re-Design Technique for Datapath Modules in
for MPU have infrequently and/or systematically changing patterns Error Tolerant Applications”, Proc. Asian Test Symp., 2008, pp. 431–437.
in the applications. We evaluate power reductions across accuracy [11] D. Shin and S. K. Gupta, “Approximate Logic Synthesis for Error Tolerant
requirements with the patterns from SPEC 2006 benchmarks. Fig- Applications”, Proc. DATE, 2010, pp. 957–960.
ure 10 shows power reduction achieved by the ACA adder ver- [12] A. K. Verma, P. Brisk and P. Ienne, “Variable Latency Speculative Addition: A
sus the conventional pipelined adder under the accuracy require- New Paradigm for Arithmetic Circuit Design”, Proc. DATE, 2008, pp.
ments. We assume that required accuracy is from 0.99 (0.95) to 1250–1255.
1.0 for ACCamp (ACCinf ), and that it varies uniformly over this [13] N. Zhu, W. Goh and K. Yeo, “An Enhanced Low-Power High-Speed Adder
range during the entire runtime. From the results, dynamic accu- For Error-Tolerant Application” Proc. Intl. Symp. on Integrated Circuits, 2009,
pp. 69–72.
racy configuration achieves up to 44.5% (30.0% on average) and [14] N. Zhu, W. Goh, W. Zhang, K. Yeo and Z. Kong, “Design of Low-Power
47.1% (35.8% on average) power reduction over the conventional High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital
pipelined design for ACCamp and ACCinf metrics, respectively. Signal Processing”, IEEE Trans. on VLSI Systems 18(8) (2010), pp.
1225–1229.
[15] M. Ziegler and M. Stan, “Optimal Logarithmic Adder Structures with a Fanout
4. CONCLUSIONS of Two for Minimizing the Area-Delay Product”, Proc. ISCAS, 2001, pp.
In this paper, we propose an accuracy-configurable approximate 657–660.
(ACA) adder for which the accuracy of results is configurable dur- [16] International Technology Roadmap for Semiconductors, 2009,
ing runtime. Due to its configurability, the ACA adder can oper- http://www.itrs.net .
ate adaptively in both approximate (inaccurate) mode and accurate [17] Synopsys Design Compiler User’s Manual.
http://www.synopsys.com .
mode. To quantify the accuracy in approximate computation, we [18] NC-Sim User’s Manual. http://www.cadence.com .
provide two metrics for amplitude data and information data. We [19] Cadence LC User’s Manual. http://www.cadence.com .
compare the ACA adder against previous approximate adders based [20] Standard Performance Evaluation Corporation (SPEC) CPU2006.
on the proposed metrics. The ACA adder shows high accuracy with http://www.spec.org/cpu2006 .
respect to the metrics, and can provide up to 24.6% throughput im-
provement and 37.0% power reduction over the conventional CLA

825

You might also like