A Theoretical Framework For Quality Estimation and Optimization of DSP Applications Using Low-Power Approximate Adders
A Theoretical Framework For Quality Estimation and Optimization of DSP Applications Using Low-Power Approximate Adders
Abstract— In this paper, we present a framework for analyt- Internet-of-Things (IoT), the computations consist of digital
ically estimating the output quality of common digital signal processing of the signals [3]–[5]. For a large bunch of appli-
processing (DSP) blocks that utilize approximate adders. The cations where minimum output quality constraint is tolera-
framework is based on considering the error of approximate
adders as an additive noise (approximation noise) that disturbs
ble, digital signal processing (DSP) blocks may perform the
the output of the DSP block in question. A signal processing required processing approximately. These blocks consisting of
theoretical modeling approach for describing the power of the arithmetic units, therefore, may operate using the approximate
approximation noise which is the integral of error spectral computing paradigm. In these applications, the minimum
density over the bandwidth, is developed. The output qualities output quality may be subject to a compromise between
of DSP blocks, such as finite impulse response filter, discrete the quality and energy efficiency/speed. In fact, approximate
cosine transform, and fast Fourier transform, which utilize
approximate adders, are thus estimated. The accuracy of the pro- computing has the feature of sacrificing the accuracy for
posed framework is evaluated by comparing mathematical model the energy or speed (performance) [3]. This paradigm may
predictions to simulation results by using the signal-to-noise be invoked at both software and hardware domains of the
ratio (SNR) metric. The inaccuracy of the SNRs predicted by processing systems.
the framework was, on average, less than 2.5dB compared with In the hardware domain, several approximate components
that obtained from simulations. Therefore, a mathematical opti-
mization approach based on Lagrange Multipliers for optimizing
like adders [6]–[17] and multipliers [18], [19] have been
design parameters is also presented. The optimization is realized introduced. Some prominent examples of approximate adders
by choosing a proper configuration of the target block, such as are: ETA-I [6], AMAs [7], TGAs [8], LOA [10], ETA-II [12],
determining the data width of the inexact computation part for LREA [13], GeAr [14], RAP-CLA [15], and QuAd [16]. The
each approximate adder in the design. approximate components have been evaluated by using them
Index Terms— Approximation noise, analytical quality estima- in DSP blocks like Finite Impulse Response (FIR) filters,
tion, approximate computing, optimization, low power approxi- and Discrete Cosine Transform (DCT) [7], and in multi-
mate adders, digital signal processing. media applications like image processing [15]. Approximate
I. I NTRODUCTION computing may also be applied at the algorithmic level in DSP
applications while exact components are employed for imple-
M ODERN digital systems may require a high volume
of computations while having some critical energy
and speed constraints. In mobile systems where normally
menting datapath operations after algorithmic approximations
have been applied [3].
The use of approximate units in computation systems
the energy stored in the battery is the source of power for
including DSP blocks degrade the output quality which should
their operation, energy consumption reduction is a critical
be determined for an optimum use of these units. More
design goal [1], [2]. The criticality also applies to the systems
specifically, one should quantify the impact of approximation
that harvest energies from the environment. In many applica-
error on output quality as a key step for using approximate
tions, such as communications, biomedical, multi-media and
units. An efficient quantification may be achieved using an
Manuscript received February 25, 2018; revised June 1, 2018 and analytical model for the units such as approximate adders.
July 5, 2018; accepted July 10, 2018. Date of publication July 27, In conventional approaches, statistical characterizations of
2018; date of current version December 6, 2018. This paper was recom-
mended by Associate Editor M. Mozaffari Kermani. (Corresponding author: approximate component error have been obtained by con-
Ali Afzali-Kusha.) ducting exhaustive simulations [8], [9], [15]. In an attempt
M. Pashaeifar and M. Kamal are with the School of Electrical and to manage the complexity, some designers relied on Monte-
Computer Engineering, University of Tehran, Tehran 14399-57131, Iran
(e-mail: [email protected]; [email protected]).
Carlo (MC) simulations to determine the quality [12]. The
A. Afzali-Kusha is with the School of Electrical and Computer Engineering, limitations of exhaustive and MC simulation include the
University of Tehran, Tehran 14399-57131, Iran, and also with the School of need for simulating each type of adder (multipliers) along
Computer Science, Institute for Research in Fundamental Sciences (IPM), with its configuration, requiring large runtimes for estimating
Tehran 19538-33511, Iran (e-mail: [email protected]).
M. Pedram is with the Department of Electrical Engineering, Univer- the output quality of DSP applications, not providing any
sity of Southern California, Los Angeles, CA 90089-2562 USA (e-mail: insights about the mechanisms and causes of the error, and
[email protected]). not giving any vision on the effect of the approximation
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. error on the signal characteristic. To overcome these issues,
Digital Object Identifier 10.1109/TCSI.2018.2856757 recently, a fully analytical modeling approach for some error
1549-8328 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
328 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
PASHAEIFAR et al.: THEORETICAL FRAMEWORK FOR QUALITY ESTIMATION AND OPTIMIZATION OF DSP APPLICATIONS 329
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
330 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
First, the dependency of the error and the inputs were TABLE II
determined by one-time error-free simulation. Then, a simple T HE S UMMARY OF THE D IFFERENCES B ETWEEN F EATURES
OF THE P ROPOSED F RAMEWORK AND THE P RIOR W ORKS
representation of this dependency was extracted and employed
for obtaining the error PMF. Finally, the combination of the
quality estimation and energy models were utilized to optimize
the approximate DFGs. In this case, the optimization problem
was a non-linear non-convex integer problem, which was
solved by meta-heuristic methods. Employing the optimiza-
tion, FFT and inverse DCT (iDCT) were designed obtaining
28× (2×) faster runtime compared to that of a full simulation-
based optimization (of the approach of [23]).
In [26], first, the error variances of some LPA adders are
determined through MC simulations. Next, by using variance
as the error metric, the output quality of the considered
application is estimated by a depth-first search on the direct
acyclic graphs (DAGs) of the application. Finally, the integer important metrics of error rate (ER) and error distance (ED).
linear programming (ILP) is exploited for determining the The ER of the LPA adders is significantly high, while their ED
optimum energy of the application under the predefined quality is limited to their inexact part width. On the other hand, the ER
constraint. The same flow has been employed for optimizing of the segmented adders is low while since the error can occur
the JPEG compression in [28]. Also, in [27], which was an in each bit position including most significant bits, their ED
extension of [26], a heuristic technique based on a mathemati- is high. Therefore, modeling of the error as an additive signal,
cal solution was proposed to solve the area/power optimization should be performed differently for these two types of adders.
problem. The error estimation approach in [26] and [27], did Thus, this work focuses on error modeling in the case of the
not properly support multi-output components. LPA adders.
3) Summary and Conclusion: The recent related works
have focused on analytical models to assess the statistical A. Theory of Approximation Noise
error metrics or focused on reducing high computational The magnitude of the error of an approximate add operation
efforts of output quality estimation in DSP applications. The is defined as the difference between the exact and approximate
optimization has been a secondary importance which has been summation results. This leads us to consider the error as
addressed by solving linear or non-linear optimization prob- an additional signal added to the exact summation result.
lems. For an efficient approximate design framework (for DSP Therefore, one may express the approximate add operation as
applications), a fully analytical yet accurate error model,
a simple quality estimator, and a mathematical optimizer x [n] + y [n] = s [n] + e [n] , (1)
may be considered as critical requirements. Toward this end,
in this work, we concentrate on presenting an analytical model, where n is an integer, x[n] and y[n] are the n th sequence of
based on the noise signal, for describing the approximate input signals, s[n] is the exact summation result, and e[n] is
error and its effect on the output of the approximate adders the approximation error magnitude.
in the DSP blocks. The knowledge obtained from the error Based on the quantization theory and the experimental
characterization as a signal leads to enlightening the impact of study, we modeled the approximation error as white noise,
the approximation error on the outputs when processing digital which is called approximation noise. To model the approx-
signals. In addition, we present a compact error model which imation error as white noise, two criteria should be satisfied
may be employed to generate the error of the approximate which are: 1) the error is independent of the input signals, and
adders in simulations without employing the approximate 2) the error is a random signal with constant power spectral
adder model. The mathematical quality estimation framework density.
provides us with the ability to use analytical optimization 1) The Quantization Theory: As mentioned in Section II.A,
method for minimizing the energy and maximizing the perfor- an n-bit LPA adder consists of k-bit inexact adders for LSBs
mance of hardware implementations of the DSP blocks for a and an (n-k)-bit exact adders for MSBs. This means that
given application subject to the desired output quality. Hence, the error occurs in the summation result and output carry of
in this work, a mathematical optimization based on Lagrange the inexact part where the error depends on the bit patterns
Multipliers method for characterizing the design parameters of the input operand k lease significant bits. Based on the
is suggested. TABLE II summarizes the differences between quantization error, if 2n is significantly larger than 2k , and
the features of the proposed framework and those of the prior the probability density distribution of the input operands are
works. smooth (which is valid in the case of DSP applications),
the inexact part inputs will become independent from the input
III. A NALYSIS AND M ODELING operands having uniform probability density distributions [32].
OF A PPROXIMATION N OISE Therefore, even though the error and the inexact part inputs
As discussed in Section II, the approximate adders were are dependent, the independence of the inexact part inputs with
categorized in two groups. Generally, the segmented adders respect to the input signals (based on the quantization theory)
and LPA adders have different characteristics in terms of two makes the error independent of the input signals.
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
PASHAEIFAR et al.: THEORETICAL FRAMEWORK FOR QUALITY ESTIMATION AND OPTIMIZATION OF DSP APPLICATIONS 331
MSE = E D 2i × P (E D i ) , (7)
1 2
N−1
M S E = E e [n] =
2
e [n], (2) i=0
N where in the case of LPA adders, E D i and P (E D i ) are
n=0 1−x n
To study the impact of approximation noise, one should 2i and Pe , respectively. Using n−1
i=0 x = 1−x , one can easily
i
consider the spectrum of the noise which is determined by show the validity of (6).
the autocorrelation function defined as the correlation of a As mentioned before, the approximation in LPA adders is
signal with its delayed version. For white noise, which is applied in the internal structure of the FAs. Hence, in LPA
an uncorrelated signal, the autocorrelation function (denoted adders, the error magnitude depends on weights of the bit
by R) is defined as [34] positions of the employed approximate FAs. Moreover, the ER
of LPA adders is high [21] which we consider it to be 1 here.
R [m] = E e2 [n] δ [m] Therefore, in addition to the analytical model, we suggest a
where compact model for error of LPA adders based on the following
definition:
1 m=0
δ [m] = (3) Definition 1: The approximate adder may be modeled by
0 m = 0. an exact adder with injected noise, which we called compact
Evidently, the MSE and autocorrelation function are equal model. The injected noise may be modeled as white noise,
when m = 0. The power spectral density (PSD) of e [n] which exists at all times (ER is one) and its amplitude
(denoted by S) is the discrete Fourier transform (DFT) of and frequency are random. Its probability distribution may
autocorrelation function obtained from be modeled as an exponential distribution, which MSE is
+∞
determined from Theorem 1.
S (ω) = R [m]e− j mω . (4) Fig. 4 shows the estimated (using theorem 1) and simulated
m=−∞ RMSE for LPA adders. Note that instead of MSE, we report
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
332 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
in frequency (DIF) butterfly structures of FFT are expressed Proof: When two signals with independent random nature
by Theorems 6 and 7. (uncorrelated) are added with each other, the power of the
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
PASHAEIFAR et al.: THEORETICAL FRAMEWORK FOR QUALITY ESTIMATION AND OPTIMIZATION OF DSP APPLICATIONS 333
function is applied on the ones. It means that some part of Proof: To calculate the output noise of the butterfly, in the
the identical spectrum of white noises (n 2i and n 2A1 ), which first step, the input and added noises should be propagated to
is in the rejection band of the filter, will be rejected. Thus, the output. Thus, the output noise power may be written as
the spectrum of the noise is shaped by filter, and hence, cannot n 2o1 = n 2o1 = ci2 n 2i + c2j n 2i + 2n 2q + n 2A (13)
be considered as white noise.
By employing trigonometric equation (cos2 (θ ) +
π
cos(i 16 )
cos2 π2 − θ = 1), and the fact that ci = , the proof
B. Discrete Cosine Transform 2
of the theorem is reached by simplifying (13).
Discrete Cosine Transform (DCT) is a DSP block com-
Theorem 5: If n 2i , n 2q , and n 2A are powers of the input,
monly used in multi-media systems. DCT has an intrinsic
quantization, and approximation noises, respectively, and the
compression of signal power. This characteristic makes DCT
tuples of ci , c j ∈ {(c1 , c7 ) , (c7 , c1 ) , . . . , (c5 , c3 )} contains
suitable for the signal compression process. A generic structure
the coefficients of butterfly of the DCT shown in Fig. 6(c),
of a DCT is shown in Fig. 6(a). The butterfly, which is
the output noise of butterfly can be determined from
used commonly in the DCT implementation, can be imple-
1
mented in two ways of simple (four multipliers) and improved n 2o1 = n 2o1 = n 2i + 2n 2q + 1 + c2j n 2A . (14)
(three multipliers) structures (see Fig. 6). To calculate the 4
output SNR, the noise of each output should be determined. Proof: Unlike the previous structure, here, the input noise
Unlike FIR filters, since DCT is a transform function from the passes through two paths to reach the output. In this case,
time to the frequency domain and is not a complete transform which two correlated signals are added, to obtain the power,
function such as FFT, PSD may not be employed for the output the amplitude of the noise, which is determined by adding the
quality estimation. Therefore, inverse DCT (iDCT) should amplitudes of the signals [26], is used while in the case of
be employed to transform the calculated noise power at the uncorrelated signals, we only need to add the powers of the
output of the DCT to the time domain for calculating the signals (see proof of Theorem 2). Thus, the noise power of
SNR. The power gain of iDCT is one (output power to input the output may be expressed as
power ratio is one) meaning that the calculated noise power n 2o1 = n 2o1 = ci2 n 2i + c2j n 2i + 2n 2q + 1 + c2j n 2A (15)
at the output of the DCT is usable for calculating SNR. Thus,
we determine the output noise of DCT in the time domain Simplifying (15), one can proof Theorem 5.
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
334 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
PASHAEIFAR et al.: THEORETICAL FRAMEWORK FOR QUALITY ESTIMATION AND OPTIMIZATION OF DSP APPLICATIONS 335
output of the block is homogenous. Thus, employing DIT but- Section III for each DSP block (denoted by ANL), simulating
terfly, the noise PSD at the output of the FFT ( Ŝ D I T ,R , Ŝ D I T ,I ) the DSP blocks by employing exact adders and injecting
is obtained from noises based on the compact representation models proposed
s
in Section III by definitions 1 (symbolized by MOD), and
Ŝ D I T ,R = Ŝ D I T ,I = 2s Si + 2i Sq pure simulation of the DSP blocks when the approximate
i=1 adders were used without using the approximation noise
+ 2s S A,1 + 2s−1 S A,2 + . . . + 2S A,S . (24) model (indicated by SIM). These cases were implemented in
MATLAB, and with the SNR (PSD) as the quality metric. The
where s is the number of stages in the FFT blocks, and
estimation errors and accuracies of ANL and MOD cases were
S A,1 to S A,S are the PSD of the approximation noise of the
obtained by comparing to the SIM case. In all cases and for
adders which are utilized in stage 1 to s. If all adders in all
all implementations, 24-bit approximate adders were employed
stages have the same structure, the added noise in all the stages
for the addition operation. Because studies were based on the
will be the same and the PSD of the output noise of the FFT
SNR, the width of the operations did not impact final results.
block may be simplified to
Ŝ D I T ,R = Ŝ D I T ,I = 2s Si + 2s+1 − 2 Sq + S A . (25) A. Low-Pass FIR Filter
In this work, to evaluate the proposed estimation approach,
Output noises of the DIF butterfly in both outputs are not 10-tap (10 add operations) and 40-tap (40 add operations) low-
the same making the noise, in general, different in different pass FIR filters were designed with the conventional and linear
stages. The PSD of the output noise for the DIF structure phase structures by utilizing 24-bit adders. The SNRs for the
( Ŝ D I F,R , Ŝ D I F,I ) is determined by outputs of the FIR filters in the three studied cases (S N R AN L ,
s−1
3 S N R M O D , and S N R S I M ) implemented by LPA adders were
Ŝ D I F,R = Ŝ D I F,I = 2s Si + 2i Sq + 2s−1 S A,1 studied for several k values where the results for the inaccuracy
2
i=1 of the analysis and models are depicted in Fig. 8. On average,
3 3 the inaccuracy of S N R AN L in the conventional (linear phase)
+ 2s−2 S A,2 + . . . + S A,S . (26)
2 2 structure of 10-tap and 40-tap FIRs were 2.4dB (2.4dB) and
where s is the number of stages in the FFT blocks, and 2.7dB (2.6dB), respectively. The conventional and linear-phase
S A,1 to S A,S are the approximation noise of the adders structures were denoted by conv. and L.P., respectively. The
employed in stages 1 to s. If all adders in all stages have the maximum inaccuracy of S N R AN L in the conventional (linear
same structure, the added noise in all the stages will be the phase) structure belongs to AMA-III (AMA-V) which was
same and the output noise PSD of FFT block may be simplified 4.9dB (4.7dB).
to
3
B. DCT
Ŝ D I F,R = Ŝ D I F,I = 2 S Si + 2 S − 1 Sq + S A . (27) The DCT is the second DSP block which was studied
2 for evaluating the accuracy of the proposed approach. The
To obtain the power of the noise in the output from DCT was designed using simple (28 add operations) and
PSD, which is determined in (24) to (27), the autocorrelation improved (33 add operations) structures. To perform this study,
function (see (4)) should be determined by [34] 16384 (2048 input set) random numbers were injected into
π
1 the DCT block as an input signal and the SNR of the outputs
R [m] = S e j ω e j mω dω. (28)
2π −π were calculated. The level of inaccuracy of the estimation and
modeling of the SNR for six selected approximate adders are
As mentioned in Section III, the power of the noise is the
reported in Fig. 8.
same as the autocorrelation function when m = 0 (n̂ 2o =
R [0]). Furthermore, the right side of (28) is the integral of C. Fast Fourier Transform
signal spectrum that is continuous. In the case of the FFT, the To assess the effectiveness of the proposed estimation
spectrum of the signal is discrete and the output power can be and modeling approaches, the FFT block was designed with
expressed by DIT and DIF structures. The number of input samples was
N−1 the same as that of the FFT points (NFFT). The FFT block
R [0] = n̂ 2o = Ŝ R [ j ] + Ŝ I [ j ] = N Ŝ R [ j ] + Ŝ I [ j ] , was implemented for different FFT points, inexact part widths,
j =0 and structures. Adders in all stages had the same structure.
(29) For evaluating the accuracy level of the proposed approach,
the PSD, which inaccuracy is the same as the error of SNR
where N is the number of outputs and Ŝ R [ j ] ( Ŝ I [ j ]) is real (see (29)), was considered.
(imaginary) part of the PSD which determined in the j t h Accuracies of the analysis and models compared to
output by employing (24) to (27). the simulations for 256-point (6144 add operations) and
1024-point (30720 add operations) FFT implementations in
V. R ESULTS AND D ISCUSSION different structures are demonstrated in Fig. 8. It should be
In this section, the efficacy of the proposed error estimation mentioned that in the case of the FFT (FIR) results, the results
for the considered DSP blocks is assessed. For this purpose, for AMA-II and AMA-IV for 256-point (10-tap) and AMA-III
output qualities were studied for three cases of estimating and AMA-V for 1024-point (40-tap) resemble those for all
the noise by employing the analytical approach proposed in the eight type adders. On average, the inaccuracy of
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
336 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
Fig. 8. The inaccuracies of the quality analysis and modeling for a) 10-tab and 40-tab FIR filters, b) DCT, and c) 256- and 1024-point FFT compared to
the case of the simulation approach (SIM).
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
PASHAEIFAR et al.: THEORETICAL FRAMEWORK FOR QUALITY ESTIMATION AND OPTIMIZATION OF DSP APPLICATIONS 337
benchmark, in most cases, the inaccuracy of the proposed width of the approximate adders. If the carry propagation delay
method was almost equal or less than 3dB (see Fig. 8). of the inexact full-adder is smaller than that of the exact full-
adder in the LPA adders, the delay function (D()) of the DSP
VI. A NALYSIS -BASED O PTIMIZATION block may be obtained from
In the previous sections, we introduced a general yet simple D(k1 , . . . , k S ) = nD E X T −mi n(k1 , . . . , k S )(D E X T −D A P X ),
analytical model for describing the power of the approximation
(34)
noise which estimated the output quality of DSP blocks with
a good accuracy. Now, in this section, we discuss the use where D E X T (D A P X ) is the carry propagation delay of exact
of the analysis-based approach for optimizing the hardware (approximate) full-adder and n is the adders bit length. The
implementations which use the approximation computing par- first term on the RHS represents the delay of the exact adder,
adigm for reducing the delay and energy consumption. Here, the term (D E X T − D A P X ) denotes the delay difference of the
we propose an optimization method based on the proposed exact and approximate FAs, and the clock of the system is set
analytical quality estimation approach. As was shown in by mi n (k1 , . . . , k S ). To be more specific, as the minimum of
Section IV, without loss of generality, the adders of a DSP k j (minimum width of the inexact parts) increased, we can
block could be categorized into S stages (e.g., see Figs. 4-6) apply a higher clock rate minimizing the delay of the block.
where the adders in a stage have a similar configuration. For Now, let us assume that the energy saving obtained at the
the optimization, the SNR is used as the metric for determining cost of 1-bit approximation in LPA adders is E S , and the total
the output quality. The relation between the approximation energy of the exact implementation of the DSP block is E E X T .
noise and SNR was shown in (9), which can be rewritten as Thus, we can express the energy function (E()) for the DSP
SN R block which has been implemented by LPA adders as
10− 10
n̂ 2A = − n̂ 2q − n̂ 2i , (30)
S
A2s
E (k1 , . . . , k S ) = E E X T − m j ES k j , (35)
where n̂ 2q (n̂ 2i ) is the power of the quantization (input) noise j =1
appeared at the output of the DSP block and A2s is the power where m j is the number of adders that are employed in the
of the signal. For example, in the case of the linear-phase j t h stage. In addition to the energy and delay, energy-delay-
m2
FIR (see Theorem 3), n̂ 2i and n̂ 2q are 2n 2i × i=0 h [i ]2 and product (EDP) is another parameter which may be considered
n q × 2 , respectively. The approximation noise (n̂ A ) obtained
2 m 2 as optimization objective function. Based on (34) and (35),
by (30), is a function of the width of the inexact part (k j where the EDP () function may be written as
j is the stage number) of the LPA adders given by (k1 , . . . , k S ) = E (k1 , . . . , k S ) × D (k1 , . . . , k S ). (36)
S
S
Pe 2k j
n̂ 2A = a j n 2A j = aj 2 , (31) Now, the optimization problem is defined by
3
j =1 j =1 min (k1 , . . . , k S )
where a j is the coefficient of the approximation noise of the subject to G n̂ 2A , k1 , . . . , k S ≥ 0, (37)
j t h stage adders propagated to the output. In the case of
m2 where (k1 , . . . , k S ) is the objective function (which can be
linear-phase
m FIR (see Theorem 3), a1 and a2 are i=0 h [i ]2
and 2 , respectively. one of the functions from (34) to (36)).
If minimizing the delay is the objective function, the min
By using (30), the noise budget (maximum n̂ 2A based on the
operation of (34) (i.e., mi n (k1 , . . . , k S )) should be maximized
minimum SNR constraint) that could be employed to approx-
which results in minimizing (34). According to the constraint
imate the low-energy/high-performance design is determined.
function given in (31), increasing each of k j may lead to some
Using this amount of noise budget guarantees achieving the
quality reduction and vice versa. Therefore, the minimum
desired output quality. Also, (31) determines n̂ 2A based on the
delay may be achieved when the design parameters (k j ) of
inexact widths of the approximate adders utilized in the block.
all the stages have the same value.
Therefore, the constraint can be expressed as
In the case of optimizing the energy and EDP, the Lagrange
S
Pe 2k j 10− 10
SN R
multipliers method may be employed [37]. The Lagrange
aj 2 ≤ n̂ 2A = − n̂ 2q − n̂ 2i . (32) multipliers method can find the local maxima and minima
3 A2s
j =1
of a function subject to equality constraints. By defining the
This leads us to defining the constraint function given in parameter λ as the Lagrange multiplier, the Lagrange func-
Definition 2 for our optimization problem. tion (L()) for the optimization problem may be expressed as
Definition 2: The constraint function (G()) is a function that L (k1 , . . . , k S , λ) = (k1 , . . . , k S ) − λG (k1 , . . . , k S ). (38)
relates the SNR metric to the inexact part of the LPA adders.
Thus, we define the constraint function for the LPA adders as To obtain the optimum point, the stationary point, which is
S
Pe
the point where the partial derivatives of Lagrange function
G n̂ 2A , k1 , . . . , k S = n̂ 2A − a j 22k j . (33) are zero, should be found. The partial derivatives (∇L()) of
3 the Lagrange function are determined by
j =1
To complete our optimization formulation, we should also ∂L ∂L ∂L
∇ K 1,..., K S ,λ L (k1 , . . . , k S , λ) = ,..., , . (39)
model the energy and delay as functions of the inexact part ∂k1 ∂k S ∂λ
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
338 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
TABLE III
T HE O PTIMUM PARAMETERS AND C ONDITIONS FOR
D ELAY AND E NERGY O PTIMIZATION
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
PASHAEIFAR et al.: THEORETICAL FRAMEWORK FOR QUALITY ESTIMATION AND OPTIMIZATION OF DSP APPLICATIONS 339
approach, when the floor and round functions applied were A PPENDIX B
less than 0.9% and 0.4%, respectively. VII shows the para- For optimizing EDP, by putting (34) and (35) in (36), the
meters and optimum points that achieved from the proposed objective function may be expressed as
optimization method for the improved structure of DCT when
utilizing AMA-II.
S
= nE E X T D E X T − E E X T D S km − nD E X T E S m jkj
VII. C ONCLUSION j =1
In this paper, a framework for accurate yet analytical output
S
quality estimation of DSP designs realized using approximate + D S E S km m j k j , (47)
adders was presented. The error of low power approximate j =1
adders was studied as an additive noise (approximation noise)
which disturbs the signals in the digital signal processors. where km and D S are mi n (k1 , . . . , k S ) and (D E X T − D A P X ),
A mathematical modeling approach for describing the power respectively. Taking the partial derivative of the Lagrange
of the approximation noise was developed providing accurate function, one obtains
yet simple expressions for calculating the noise power and ∂L Pe
SNR of the output signal in DSP blocks. To evaluate the = −nD E X T E S m j + E S D S m j km − 2λa j 22k j , (48)
∂k j 3
proposed framework, the output quality of some DSP blocks ∂L Pe
including FIR filter, DCT, and FFT which represented different = −nD E X T E S m m + 2E S D S m m km − 2λam 22km , (49)
∂km 3
types (both in time and frequency domains) and complexities
(up to more than 30,720 addition operations) were studied. According to (43) and using (48) and (49), λ can be
The error of the estimated SNRs using the proposed analytical determined as
model was, on average, less than 2.5dB compared to that of
E S (D S km − nD E X T ) Sj=1 m j + E S D S m m km
the pure simulation showing its high accuracy estimation of λ=− . (50)
the output quality. Also, an analytical optimization approach 2n̂ 2A
based on the Lagrange multipliers was presented. For proposed ∂L
By replacing λ in (49), the solution of ∂km = 0 can be
optimization approach, the energy, delay, and EDP were
determined as
minimized subject to the desired quality determined as SNR.
When compared to the exhaustive search method, the proposed 2D S m m km − nD E X T m m 3n̂ 2A
22km = , (51)
optimization approach provided the energy saving of less −D S (m t + m m ) km + nD E X T m t am Pe
than 1% compared to that of the exhaustive method.
where m t is Sj=1 m j . When n̂ 2A is determined, (51) which fol-
A PPENDIX A
lows the form of e x = − CAx−B
x−D has to be solved numerically.
Using (33) and (35), the Lagrange function for the energy This optimization problem may either be infeasible or have
optimization may be expressed as more than one solution.
⎛ ⎞
S
S
P
m j E S k j − λ ⎝n̂ 2A − a j 22k j ⎠. (41)
e
L = EE X T − R EFERENCES
3
j =1 j =1
[1] J. Kung, D. Kim, and S. Mukhopadhyay, “On the impact of energy-
Taking the partial derivations of the Lagrange function, one accuracy tradeoff in a digital cellular neural network for image process-
obtains ing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34,
no. 7, pp. 1070–1081, Jul. 2015.
∂L Pe
= −m j E S − 2λa j 22k j , (42) [2] T. Moreau, A. Sampson, and L. Ceze, “Approximate computing: Making
∂k j 3 mobile systems more efficient,” IEEE Pervasive Comput., vol. 14, no. 2,
pp. 9–13, Apr. 2015.
∂L Pe S
[3] A. Madanayake et al., “Low-power VLSI architectures for DCT/DWT:
= −n̂ 2A + a j 22k j . (43) Precision vs approximation for HD video, biomedical, and smart antenna
∂λ 3 applications,” IEEE Circuits Syst. Mag., vol. 15, no. 1, pp. 25–47,
j =1
1st Quart., 2015.
To find the stationary point, the partial derivations should [4] S. P. Kadiyala et al., “Perceptually guided inexact DSP design for power,
be equated to zero. Using (42), we may find area efficient hearing aid,” in Proc. IEEE Biomed. Circuits Syst. Conf.
(BioCAS), Oct. 2015, pp. 1–4.
Pe m j ES [5] F. Samie, L. Bauer, and J. Henkel, “An approximate compressor for
a j 22k j = − . (44) wearable biomedical healthcare monitoring systems,” in Proc. Int. Conf.
3 2λ
Hardw./Softw. Codesign Syst. Synthesis (CODES+ISSS), Oct. 2015,
m j ES
By replacing − 2λ in (43), λ can be determined as pp. 133–142.
S [6] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of
j =1 m j E S
low-power high-speed truncation-error-tolerant adder and its application
λ=− . (45) in digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI)
2n̂ 2A Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010.
[7] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power dig-
Now, utilizing (42) and (45), the energy optimized k j is ital signal processing using approximate adders,” IEEE Trans. Comput.-
expressed as Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137,
Jan. 2013.
m j n̂ 2A
ln [8] Z. Yang, J. Han, and F. Lombardi, “Transmission gate-based approx-
a j P3e (m 1 +···+m S ) imate adders for inexact computing,” in Proc. IEEE/ACM Int. Symp.
k j,opt = . (46) Nanoscale Archit. (NANOARCH), Jul. 2015, pp. 145–150.
2 ln 2
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.
340 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 66, NO. 1, JANUARY 2019
[9] H. A. F. Almurib, T. N. Kumar, and F. Lombardi, “Inexact designs for [33] K. P. Parker and E. J. McCluskey, “Probabilistic treatment of general
approximate low power addition by cell replacement,” in Proc. Design, combinational networks,” IEEE Trans. Comput., vol. C-24, no. 6,
Automat. Test Eur. Conf. Exhib. (DATE), Mar. 2016, pp. 660–665. pp. 668–670, Jun. 1975.
[10] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired [34] A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic
imprecise computational blocks for efficient VLSI implementation of Processes. New York, NY, USA: McGraw-Hill, 2002.
soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, [35] L. Wanhammar, DSP Integrated Circuits. New York, NY, USA:
vol. 57, no. 4, pp. 850–862, Apr. 2010. Academic, 1999.
[11] S. Geetha and P. Amritvalli, “High speed error tolerant adder for [36] M. S. Khairy, A. Khajeh, A. M. Eltawil, and F. J. Kurdahi, “Equi-
multimedia applications,” J. Electron. Test., vol. 33, no. 5, pp. 675–688, noise: A statistical model that combines embedded memory failures and
Oct. 2017. channel noise,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 2,
[12] N. Zhu, W. L. Goh, and K. S. Yeo, “An enhanced low-power high-speed pp. 407–419, Feb. 2014.
adder for error-tolerant application,” in Proc. 12th Int. Symp. Integr. [37] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge,
Circuits, Dec. 2009, pp. 69–72. U.K.: Cambridge Univ. Press, 2004.
[13] J. Hu and W. Qian, “A new approximate adder with low relative error
and correct sign calculation,” in Proc. Design, Automat. Test Eur. Conf. Masoud Pashaeifar received the B.Sc. degree from
Exhib. (DATE), Mar. 2015, pp. 1449–1454. the Shahid Bahonar University of Kerman, Kerman,
[14] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, “A low latency generic Iran, in 2011, and the M.Sc. degree in electrical
accuracy configurable adder,” in Proc. ACM/EDAC/IEEE Des. Automat. engineering, circuits and systems from the Univer-
Conf. (DAC), Jun. 2015, pp. 1–6. sity of Tehran, Tehran, Iran, in 2013, where he
[15] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, “RAP-CLA: is currently pursuing the Ph.D. degree in circuits
A reconfigurable approximate carry look-ahead adder,” IEEE Trans. and systems. His current research interests include
Circuits Syst., II, Exp. Briefs, to be published. approximate computing, robust and energy efficient
[16] M. A. Hanif, R. Hafiz, O. Hasan, and M. Shafique, “QuAd: Design signal-processing, and Internet of Things.
and analysis of quality-area optimal low-latency approximate adders,”
in Proc. ACM 54th Des. Automat. Conf., Jun. 2017, pp. 1–6.
[17] W. Liu, L. Chen, C. Wang, M. O’Neill, and F. Lombardi, “Design and
analysis of inexact floating-point adders,” IEEE Trans. Comput., vol. 65, Mehdi Kamal received the B.Sc. degree from the
no. 1, pp. 308–314, Jan. 2016. Iran University of Science and Technology, Tehran,
[18] B. Shao and P. Li, “Array-based approximate arithmetic computing: Iran, in 2005, the M.Sc. degree from the Sharif
A general model and applications to multiplier and squarer design,” University of Technology, Tehran, in 2007, and the
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 4, pp. 1081–1090, Ph.D. degree from the University of Tehran, Tehran,
Apr. 2015. Iran, in 2013, all in computer engineering. He is
[19] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, “Dual-quality currently an Assistant Professor with the School of
4:2 compressors for utilizing in dynamic accuracy configurable multi- Electrical and Computer Engineering, University of
pliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 4, Tehran. His current research interests include relia-
pp. 1352–1361, Apr. 2017. bility in nanoscale design, approximate computing,
[20] S. Mazahir, O. Hasan, R. Hafiz, M. Shafique, and J. Henkel, “Proba- neuromorphic computing, design for manufactura-
bilistic error modeling for approximate adders,” IEEE Trans. Comput., bility, embedded systems design, and low-power design.
vol. 66, no. 3, pp. 515–530, Mar. 2017.
[21] M. K. Ayub, O. Hasan, and M. Shafique, “Statistical error analysis for Ali Afzali-Kusha received the B.Sc. degree from
low power approximate adders,” in Proc. ACM/EDAC/IEEE 54th Des. the Sharif University of Technology, Tehran, Iran,
Automat. Conf. (DAC), Jun. 2017, pp. 1–6. in 1988, the M.Sc. degree from the University
[22] J. Huang, J. Lach, and G. Robins, “Analytic error modeling for imprecise of Pittsburgh, Pittsburgh, PA, USA, in 1991, and
arithmetic circuits,” in Proc. SELSE, 2011, pp. 1–4. the Ph.D. degree from the University of Michigan,
[23] J. Huang, J. Lach, and G. Robins, “A methodology for energy-quality Ann Arbor, MI, USA, in 1994, all in electrical
tradeoff using imprecise hardware,” in Proc. ACM/EDAC/IEEE Design engineering.
Automat. Conf. (DAC), 2012, pp. 504–509. He was a Post-Doctoral Fellow with the University
[24] W.-T. J. Chan, A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, of Michigan from 1994 to 1995. He has been with
“Statistical analysis and modeling for error composition in approxi- the University of Tehran, since 1995, where he is
mate computation circuits,” in Proc. Int. Conf. Comput. Des. (ICCD), currently a Professor with the School of Electri-
Oct. 2013, pp. 47–53. cal and Computer Engineering and the Director of the Low-Power High-
[25] S. Lee, D. Lee, K. Han, E. Shriver, L. K. John, and A. Gerstlauer, Performance Nanosystems Laboratory. He was a Research Fellow with the
“Statistical quality modeling of approximate hardware,” in Proc. Int. University of Toronto, Toronto, ON, Canada, and the University of Waterloo,
Symp. Qual. Electron. Design (ISQED), Mar. 2016, pp. 163–168. Waterloo, ON, Canada, in 1998 and 1999, respectively. His current research
[26] C. Li, W. Luo, S. S. Sapatnekar, and J. Hu, “Joint precision opti- interests include low-power high-performance design methodologies from the
mization and high level synthesis for approximate computing,” in Proc. physical design level to the system level for nanoelectronics era.
ACM/EDAC/IEEE Des. Automat. Conf. (DAC), Jun. 2015, pp. 1–6.
[27] D. Sengupta, F. S. Snigdha, J. Hu, and S. S. Sapatnekar, “SABER: Massoud Pedram received the B.S. degree in elec-
Selection of approximate bits for the design of error tolerant circuits,” trical engineering from the California Institute of
in Proc. ACM/EDAC/IEEE Des. Automat. Conf. (DAC), Jun. 2017, Technology, Pasadena, CA, USA, in 1986, and the
pp. 1–6. M.S. and Ph.D. degrees in electrical engineering and
[28] F. S. Snigdha, D. Sengupta, J. Hu, and S. S. Sapatnekar, “Optimal design computer sciences from the University of California
of JPEG hardware under the approximate computing paradigm,” in Proc. at Berkeley, Berkeley, CA, USA, in 1989 and 1991,
ACM/EDAC/IEEE Des. Automat. Conf. (DAC), Jun. 2016, pp. 1–6. respectively. In 1991, he joined the Ming Hsieh
[29] C. Liu, J. Han, and F. Lombardi, “An analytical framework for evaluating Department of Electrical Engineering, University of
the error characteristics of approximate adders,” IEEE Trans. Comput., Southern California (USC), Los Angeles, CA, USA,
vol. 64, no. 5, pp. 1268–1281, May 2015. where he is currently the Stephen and Etta Varra
[30] J. Miao, K. He, A. Gerstlauer, and M. Orshansky, “Modeling and synthe- Professor with the USC Viterbi School of Engi-
sis of quality-energy optimal approximate adders,” in Proc. IEEE/ACM neering. He was a recipient of the National Science Foundation’s Young
Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2012, pp. 728–735. Investigator Award in 1994, the Presidential Early Career Award for Scientists
[31] D. Sengupta and S. S. Sapatnekar, “FEMTO: Fast error analysis in and Engineers in 1996, two Design Automation Conference Best Paper
multipliers through topological traversal,” in Proc. IEEE/ACM Int. Conf. Awards, the Distinguished Paper Citation from the International Conference
Comput.-Aided Design (ICCAD), Nov. 2015, pp. 294–299. on Computer Aided Design, three Best Paper Awards from the International
[32] B. Widrow, “A study of rough amplitude quantization by means of Conference on Computer Design, the IEEE T RANSACTIONS ON V ERY
Nyquist sampling theory,” IRE Trans. Circuit Theory, vol. 3, no. 4, L ARGE S CALE I NTEGRATION S YSTEMS Best Paper Award, and the IEEE
pp. 266–276, Dec. 1956. Circuits and Systems Society Guillemin-Cauer Award.
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on December 10,2023 at 17:47:50 UTC from IEEE Xplore. Restrictions apply.