0% found this document useful (0 votes)
15 views8 pages

306 ICCAD ApproximateLogicSynthesis WithErrorConstr

Uploaded by

William Manzolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

306 ICCAD ApproximateLogicSynthesis WithErrorConstr

Uploaded by

William Manzolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multi-Level Approximate Logic Synthesis

under General Error Constraints

Abstract lay, and energy. Such logic-level optimizations have been applied
Approximate computing has recently emerged as an alternative in an ad-hoc manner to several arithmetic building blocks, such as
computing paradigm that trades small accuracy losses for large com- adders and multipliers [1, 6, 10, 14, 20, 21].
plexity and energy savings. In this paper, we address the problem There is a need to develop effective techniques for systematic
of multi-level approximate logic synthesis (MALS). The goal is to automated approximate logic synthesis with full capability of con-
synthesize circuits with minimum gate cost whose functionality de- straining arbitrary error types. Designing approximate circuits in
viates in a well-controlled manner, where we allow error frequency an ad hoc manner is not a viable option as there exists a large
and magnitude to be constrained. design space with trade-offs between acceptable accuracy and en-
We make two major contributions. First, we adopt the formula- ergy, where acceptable errors may vary from application to applica-
tion of magnitude-constrained problems as Boolean relation mini- tion. Importantly, depending on the application, the error tolerance
mizations for MALS. This formulation is more general and grants is primarily a function of either the frequency of errors, the mag-
better flexibility on error constraints than existing approach rely- nitude of errors, or both. For example, in applications that can not
ing on incompletely specified functions. Furthermore, by expanding directly accept erroneous results, the frequency of triggering cor-
the search space, it leads to better solutions. Second, we propose rection mechanisms determines ultimate overhead. By contrast, in
two algorithms for solving the MALS problem. Both algorithms are image and video processing applications, if the produced pixel val-
based on novel relax-and-recover heuristics: an over-optimistic so- ues have small error magnitudes, human perception will not be able
lution under relaxed constraints is first explored, and this solution is to distinguish such subtle changes, especially in typically employed
then iteratively recovered until all constraints are met. Our first al- quadratic error metric such as Peak Signal-to-Noise Ratio (PSNR).
gorithm addresses the MALS problem under general magnitude and Existing approximate logic synthesis approaches thus far have
frequency constraints. We further introduce a second algorithm that primarily focused on dealing with a single error type of constraint.
is more effective in solving a simplified MALS problem in which A two-level approximate logic synthesis algorithm was introduced
frequency is the only constraint. in [17]. In that work, the objective is to synthesize a minimized cir-
When applied to typical arithmetic blocks, such as adders or mul- cuit under constrained error frequency. The algorithm does not con-
tipliers, our proposed algorithms can synthesize circuits under tight sider constraints on error magnitude. Moreover, it suffers from high
error constraints with gate counts reduced by up to 50%. This is up to runtime complexity, especially, at large error frequencies. In [13],
20% fewer gates than existing approaches. Furthermore, our simpli- the authors rigorously synthesize approximate logic circuits under
fied algorithm results in up to 26% lower gate counts for frequency- both magnitude and frequency types of error constraints, which aims
only constrained problems on which the general, magnitude-based at solving the problem in two-level form, i.e. the simplest approx-
solutions are less effective. imate Boolean function. However, the optimal two-level solution
may not lead to an optimal multi-level circuit. In [4, 12], the authors
propose approaches to pruning multi-level circuits according to the
1. Introduction input vector statistics in an application context. Due to the nature of
Energy minimization has become the major concern in the design the proposed pattern-driven approach, the optimization space is re-
of VLSI systems. One way to reduce energy consumption is to stricted to only a small subset of inputs. In [19], the authors mapped
exploit the trade-off between reduced computation accuracy and the approximate logic synthesis problem into a conventional logic
improved energy efficiency for digital systems that naturally tolerate synthesis with external don’t care (EXDC) sets, where the EXDC
errors, such as signal processing circuits. Many recent approaches set of the original circuit is identified by extracting the internal ob-
have studied the possibility of approximate computation at different servability don’t care (ODC) set of an auxiliary wrapper circuit that
levels ranging from algorithms [5, 15] and architectures [8, 9, 11, 18] needs to be provided to model the constraints. Although the au-
to the logic [6, 10, 20] and transistor levels [7]. thors claim the ability to flexibly model various error constraints,
One class of techniques seek to realize approximate computa- only error magnitude-type constraints are actually permitted. Error
tional by deriving approximate, or inexact, versions of specified frequency constraints can not be handled in their framework. Fur-
combination circuits. The derived implementations typically result thermore, the EXDC set extraction is a special case of incompletely
in logic implementations of reduced complexity, smaller area, de- specified Boolean functions (ISF), which are by definition are for
single outputs only and thus it cannot catch the correlated multi-
output bits errors. By contrast, we aim to support definition of flex-
ible error constraints directly as part of the problem specification
itself.
In this paper, we address the problem of multilevel approximate
logic synthesis (MALS) under arbitrary error magnitude and error
frequency constraints. We develop two heuristic algorithms that can
effectively synthesize approximate circuits with reduced gate counts
whose errors deviate from an exact circuit in a well constrained way. The problem (1) is different from problem (2) in that the latter
We make two major contributions: problem is not concerned with specific output values. As such, it can
(1) We adopt Boolean relation (BR) minimization to formulate be solved more efficiently. In this paper, we present two algorithms
the error magnitude constraint. This formulation is more general for solving both types of problems.
and flexible in representing constraints on errors than the existing A key observation is that in problem definition (1), the error mag-
ISF-based approach relying on the incompletely specified functions nitude constraint essentially describes a Boolean Relation, where
(ISFs), such as the approaches based on the EXDC set. By expand- each input maps to more than one output value [3].
ing the search space, a BR-based approach leads to better solutions
compared to the existing ISF-based techniques [19]. Definition 2.1. Boolean relation. A Boolean relation is a one-
(2) We propose two algorithms for solving the MALS problem. to-many, multi-output Boolean mapping, BR : B n → B k . A
Both algorithms are based on novel relax-and-recover (RAR) heuris- set of multi-output Boolean functions, fi , each compatible with
tics: first, an over-optimistic solution under relaxed constraints is BR, is associated with a relation. A Boolean relation is specified
found, which is then iteratively refined until all constraints are met. by defining for each input xi ∈ B n a set of equivalent outputs,
The first algorithm solves the MALS problem under both error mag- Ixi ⊆ B k .
nitude and frequency constraints. We also introduce a second algo-
rithm that is more effective in solving a simplified MALS problem The error magnitude constraint in problem (1) can be represented
in which the frequency of error is the only constraint. by the set Ixi , where a magnitude constraint means that the approxi-
In summary, we present two efficient heuristic RAR algorithms mate circuit should produce outputs within the given set Ixi . We pro-
to solve MALS problems under arbitrary error constraints. Experi- pose to use the framework of Boolean relations to solve the MALS
ments demonstrate that our proposed algorithms can synthesize ap- problem. Given a Boolean relation BR, the problem is in finding a
proximate circuits with up to 50% fewer gates for very tight error multi-output multi-level Boolean network Gm , which is an imple-
constraints, which is up-to 20% better than ISF-based methods. Fur- mentation of the Boolean function Fm , such that Fm is compatible
thermore, our simplified algorithm results in up to 26% lower gate with BR. If a cost function C(·) is given, we say Gm is minimum
counts for frequency-only constrained problems on which the gen- if ∀Gmi , C(Gm ) < C(Gmi ).
eral, magnitude-based solutions are less effective. A Boolean relation is a generalization of the concept of ISFs. It
allows capturing more complex constraints and thus allows for better
solutions. The work of [19] maps the multilevel approximate logic
2. MALS Formulation synthesis problem into conventional netlist optimization problem
Consider an n-input, k-output combinational logic network G that with external don’t care sets (EXDC), which are computed through
is an implementation of a Boolean function F : B n → {0, 1, −}k , single output bit don’t care extraction. This single-output ISF based
where − refers to a don’t care. A multi-level approximate logic syn- approach does not expose the full flexibility of error magnitude
thesis problem is concerned with formally synthesizing a minimum- specifications. For example, for a 2-bit input pattern {01}, if it
cost (gate count or delay) multi-level approximate logic network relates to three alternative output values {11, 00, 01}, the don’t care
whose behavior deviates in a well-controlled manner from the spec- form can only capture 2 out of the 3 alternatives, i.e. {0-} or {-1},
ified exact Boolean function F . The deviations can be specified by where{00, 11} can never be captured together by don’t cares due to
error magnitude and error frequency. The error magnitude constraint the limitations of their single-output nature.
means that for a certain input vector xi ∈ B n , the approximate cir- Solving a Boolean relation problem is more complex than solv-
cuit should produce an output value within the specified absolute ing an ISF problem. Our proposed algorithm is an approximate
error range M . On the other hand, we want to constraint the total heuristic for multi-level Boolean relation minimizations. In this pa-
number of input vectors that produce approximate outputs. We call per, we map the problem of solving multi-output Boolean relation
this the error frequency (rate) constraint. Error magnitude and error problems into a series of conventional multi-output don’t care op-
frequency describe the two important aspects of possible deviations. timizations. The key insight is that through the proposed iterative
We denote Gm,r an approximate version of G with exactly r inputs algorithm, we can identify a set of EXDCs such that it captures de-
in error and with the largest magnitude of error being m. Let R be viations close or equal to the Boolean relation specification while
the error frequency constraint indicating that no more than R input allowing for a broader definition of deviations than the ISF-based
patterns are allowed to be in error. Furthermore, let M represent the EXDC set. In the following sections, we will focus on how to iden-
error magnitude constraint, where | · | is the absolute value operator. tify the optimal don’t care sets.
With this, the full multi-level approximate logic synthesis problem As mentioned previously, problem (1) is different from problem
is: (2). We will discuss two different algorithms for identifying the
EXDC sets, respectively. Section 3 deals with the general error
min C(Gm,r )
constraint MALS problem while Section 4 addresses the simplified
s.t. |Gm,r (xi ) − G(xi )| ≤ M, ∀xi ∈ B n (1) error frequency constraint only problem. Section 5 will present
r≤R experimental results for both algorithms, while Section 6 will finally
conclude the paper with a summary.
where, in this paper, C(Gm,r ) is the gate count of Gm,r , and the
Gm,r has no more logic levels than the exact network G.
The problem defined in (1) captures the general constraints of
both error types. The primary goal in problem (1) is to utilize the 3. MALS under Error Magnitude and Frequency
alternative output values to simplify the circuit complexity, when Constraints
the output values are constrained to take on certain patterns. By In this section, we present an algorithm for solving the MALS prob-
contrast, if we remove the error magnitude constraint, we can define lem under general error magnitude type constraints either jointly
a simplified variant of the problem in which any difference in the with or without error frequency constraints, i.e. the problem from
output is treated as an error: (1). We denote this algorithm as RAR-MALS. We first map the error
magnitude-constraint of the general MALS problem into a Boolean
min C(Gr ) relation minimizations. Error frequency constraints will be then con-
(2) sidered after that.
s.t. r≤R
As mentioned above, in our work, the multi-level Boolean rela- Table 1: Boolean relation table
tion minimization problem is solved by a classical don’t care based Inputs Outputs
multi-level logic synthesis. The central challenge is to identify the x1 , x 0 y2 , y1 , y0
EXDC set closest or equal to while also compatible with the Boolean 00 {000, 001}
relation under a given synthesis strategy. Closeness here means 01 {010, 011, 100}
finding the largest set of allowed deviations that can be commonly 10 {100, 101, 010}
shared between the target EXDC set and the original BR. We call the 11 {110, 100, 101}
closest compatible EXDC set a Relation-Aware EXDC (RA-EXDC)
set. Once the RA-EXDC set is identified, the original circuit will be
1) We first decouple the multi-output bits of the Boolean relation
simplified with the RA-EXDC set by applying standard don’t care
into individual outputs. In doing so, we collect all possible output
based netlist optimization techniques, such as SIS [16]. Note that the
values for each output bit under each input vector. This provides a
RA-EXDC concept is a function highly dependent on the backend
specification of relaxed deviations with flexibilities beyond those
netlist optimization tool being used. As such, the RA-EXDC set can
provided by the original Boolean relation (see Table 2, where
be only defined in connection with a specific synthesis tool used for
“−” denotes a don’t care).
applying the EXDC-based optimizations.
The proposed algorithm is based on the idea of a relax-and-
Table 2: Decoupled Boolean relation
recover (RAR) strategy: we first relax the error constraints of the
original MALS problem such that a lower bound solution can be Inputs Decoupled
found, which covers all and more than the deviations specified by x1 , x0 y2 y1 y0
BR. The corresponding solution may violate the original constraints, 00 0 0 -
which requires recovery steps to gradually refine the solution until it 01 - - -
satisfies all constraints. 10 - - -
We propose a heuristic to iteratively identify the RA-EXDC 11 1 - -
set. This heuristic starts from overapproximated external don’t care
(O-EXDC) sets. The idea is to extract the O-EXDC set from a 2) From the decoupled relation table, we then obtain the don’t
decoupled Boolean relation table. For example, assume that a certain care set for every single output bit independently, assuming no
input vector relates to two different 2-bit output values {00,11}. We correlations among outputs. For output bit j, if it corresponds to
decouple the two output bits into independent outputs. Each output a don’t care, we denote this output bit j as not being sensitive
bit has allowed output values of {0,1}, equivalent to a don’t care to the corresponding input vector. By collecting all input vectors
for each output. As such, the decoupled Boolean relation results in a that are not sensitive to output bit j, we can thus form the O-
more relaxed don’t care set, which allows for more deviations than EXDC set with respect to output bit j. We encode a don’t care
the original Boolean relation. condition in the ON-set and a care condition in the OFF-set for
Hence, using O-EXDC sets for circuit simplification leads to the the O-EXDC sets, as shown in Table 3. As such, for the given
largest possible gate count reduction, but may introduce exceeding 0 0
example, the O-EXDC set for y2 is {x1 x0 , x1 x0 }, the O-EXDC
the original relation constraints for some input vectors. That is 0 0
what constitutes the relaxation step. In the later recovery step, the set for y1 is {x1 x0 , x1 x0 , x1 x0 }, and the O-EXDC set for y0 is
O-EXDC sets will be iteratively refined until all the conflicts are the full set of input vectors.
eliminated and a RA-EXDC is arrived at.
Existing work has demonstrated the possibility of using conven- Table 3: O-EXDC set for each output bit
tional ISF-based approaches to solve magnitude-only constrained Inputs Decoupled O-EXDC
MALS problems [19]. Such ISF-based approaches rely on con- x1 , x0 y2 y1 y0 y2 y1 y0
ventional underapproximated EXDC (U-EXDC) sets, which, unless 00 0 0 - 0 0 1
later expanded on, are inherently over-constrained and thus subopti- 01 - - - 1 1 1
mal. In our work, we will use U-EXDC sets as the baseline to guide 10 - - - 1 1 1
the recover procedure, which will be presented in Section 3.2. We 11 1 - - 0 1 1
omit the details of acquiring the U-EXDC set, which can either be
done using a similar approach as proposed in [19], or by directly
extracting the don’t cares from the Boolean relation table. Note that in the example of Table 3, none of the output bits is
An intuition for using a RAR strategy is that, the simplified sensitive to the input vector 01. As such, input vector 01 also allows
circuit obtained from O-EXDC set is expected to have input vectors for an output value of, for instance, 111, which is not a valid output
producing approximate outputs beyond the input combinations that vector according to the original relation in Table 1. In general, the O-
can be covered by U-EXDC set alone. In the following, we will EXDC set covers all and more than the alternative output values that
discuss the relaxation and recovery steps in details. the original Boolean relation can specify. At the expense of requiring
subsequent recovery of conflicts, this grants the full flexibility for
simplifications in O-EXDC based synthesis.
3.1 MALS under Error Magnitude Constraint Relaxation We can summarize the flexibilities of specifying alternative out-
In the following, we illustrate how we define and compute the O- put values among U-EXDC, RA-EXDC, and O-EXDC sets as:
EXDC sets from the Boolean relation table using a simple example.
U-EXDC ⊆ RA-EXDC ⊆ O-EXDC (3)
As discussed before, once the O-EXDC set is computed, we synthe-
size a simplified multi-level circuit from it using standard black-box For each output, the U-EXDC set is a subset of the RA-EXDC set,
tools, such as SIS. which in turn is a subset of the O-EXDC set. In other words, any
Consider a 2-input, 3-output multi-level circuit with error magni- input vector that is in the U-EXDC/RA-EXDC set of an output will
tudes as specified in Table 1. Note that the first output value in each also be in its RA-EXDC and/or O-EXDC sets. Likewise, any output
row refers to the original correct netlist implementation. The algo- that is (in-)sensitive to a particular input vector in the O-EXDC/RA-
rithm for identifying O-EXDC set then proceeds in two main steps: EXDC set will also be (in-)sensitive to the same vector in the RA-
EXDC and/or U-EXDC sets. Table 4 shows the comparison for the
example presented in Table 1. We encode a don’t care by the ON-set Algorithm 1: MALS under general error constraints
and a care by the OFF-set in all cases.
Input: NL: Original netlist, BR: Error magnitude constraint,
Table 4: Comparison of Boolean relation, U-EXDC, and O-EXDC R: Error frequency constraint
Output: Minimized netlist with constrained error magnitude
Inputs Boolean Relation U-EXDC O-EXDC
and error frquency
x1 , x 0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0 1 EXDCc = EXDC o = Decouple(BR);
00 {000, 001} 001 001
EXDC u = ISF(BR);
01 {010, 011, 100} 001 111
2 NLc = Optimize(NL, EXDCo );
10 {100, 101, 010} 001 111
11 {110, 100, 101} 010 011 3 Conf = {xi |NLc (xi ) ∈ / BR(xi ), xi ∈ B n };
n n
4 r = |{xi |NLc (xi ) 6= NL(xi ), xi ∈ B }|/2 ;
5 while (Conf 6= Φ) or (r > R) do
3.2 Recovering Magnitude Conflicts 6 yapprx = NLc (xi );
After optimizing the netlist with the acquired O-EXDC set, the 7 if Conf 6= Φ then
resulting gate complexity will represent a lower bound for the 8 foreach xi : NLc (xi ) ∈ / BR(xi ) do
magnitude-only constrained MALS solution, where the resulting 9 a = BR(xi );
circuit will generally produce conflicting output values that deviate 10 do
from the originally specified Boolean relation. We now address the 11 yallow = argmina Hamming(yapprox , a);
recovery procedure to gradually remove the conflicts introduced by 12 yremov = (yallow ⊕ yapprx ) ∧
the O-EXDC sets. We use the U-EXDC set to guide to recovery, Candidate(EXDC o , EXDC u );
where it is guaranteed that the final solution is at least as good as 13 a = a − yallow ;
any solution obtained by directly synthesizing with the U-EXDC 14 while yremov 6= 0;
set. In other words, all the conflicts fall into the difference between 15 foreach non-zero bit b of yremov do
the U-EXDC and O-EXDC sets. 16 EXDCc ← Remove input xi from output bit
We follow a greedy approach, which first uses the O-EXDC sets b;
to simplify the original circuit (e.g. via SIS) and then compares the 17 end
output values of each input vector of the synthesized circuit with the 18 end
original Boolean relation table. We denote those input vectors that 19 end
violate the error constraints as conflict input vectors. We eliminate 20 NLc = Optimize(NL, EXDCc );
these conflicts by modifying the O-EXDC sets and re-synthesizing 21 r = |{xi |NLc (xi ) 6= NL(xi )}|/2n ;
the original circuit in an iterative fashion using updated EXDC sets.
This process is repeated using updated EXDC sets in each iteration 22 if r > R then
until the synthesized circuit is conflict free. 23 foreach xi : NLc (xi ) 6= NL(xi ) do
In the following, we describe how we obtain the updated EXDC 24 yremov = NL(xi ) ⊕ NLc (xi );
sets in each iteration. We introduce the following terminology: for 25 EXDCc ← Remove input xi from the least
every output bit j and its corresponding O-EXDC/U-EXDC set, if its significant ON-bit in yremov ;
corresponding input vector xi is in the O-EXDC set but not in the 26 end
U-EXDC set, we denote input vector xi as candidate input vector 27 NLc = Optimize(NL, EXDCc );
with respect to output bit j. Each candidate input vector xi can 28 r = |{xi |NLc (xi ) 6= NL(xi )}|/2n ;
correspond to more than one output bit, i.e. the input vector is a 29 end
candidate for all output bits that are not sensitive to this xi according 30 Conf = {xi |NLc (xi ) ∈ / BR(xi )};
to the O-EXDC sets but are sensitive according to the U-EXDC sets.
31 end
These output bits together form the candidate output bits for input
vector xi . In other words, by comparing the encoded O-EXDC and 32 return NLc ;
U-EXDC sets for every input vector xi , output bits for which the
encoding in O-EXDC differs from that in U-EXDC are denoted as such, outputs y1 and y2 (but not y0 ) are included in the candidate
candidate output bits for input vector xi (note that if an U-EXDC set for this input, and the 01 input is in turn considered for removal
value is 1, its O-EXDC encoding will also be 1, i.e. U/O-EXDC sets from the EXDC sets of each of these outputs (y1 and y2 ).
can only differ in a 0/1 pattern). For example, in Table 4, y1 , y2 are We adopt the following strategy for selecting among the candi-
candidate output bits for input vector 10. Note that after optimizing dates for removal from the EXDC sets. For each input vector with
the netlist with the O-EXDC sets, not all candidate input vectors will conflicting outputs:
eventually violate the error constraints. In other words, the conflict
input vectors and output bits are a subset of the candidate input 1) Perform a bitwise comparison of the approximate output value
vectors and output bits. with all allowed output values and find the allowed output
Consider the example in Table 5 for a 2-input, 3-output multi- with the minimum Hamming distance whose bitwise difference
level circuit. Suppose that after simplifying the original circuit based (XOR) from the approximate output also intersects with the can-
O-EXDC sets, the shaded two input vectors produce output values didate output set.
beyond the error magnitude constraint. Thus, in order to remove the 2) Change all the output bits that are in the intersection from 1) to be
conflicts caused by using the O-EXDC sets, we need to remove the sensitive to the input vector, i.e. remove the current input vector
shaded input vectors from their candidate output bits. In this table, from the EXDC set of each such output. Repeat this process for
we again encode the candidate output bits by the ON-set and non- all input vectors that produce conflicts.
candidates output bits by the OFF-set of a corresponding candidate
function. Input vector 01 produces an approximated output value of The idea of the above strategy is to successively reducing the O-
101, which conflicts with the error magnitude specification, where EXDC set while keeping as much flexibility as possible for netlist
the O-EXDC and U-EXDC sets differ for outputs y1 and y2 . As optimization. We therefore choose the allowed output value that
Table 5: EXDC magnitude recovery example
Approx. Candidate Updated
Inputs Outputs U-EXDC O-EXDC
Outputs Outputs EXDC
x1 , x 0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0
00 {000, 001} 001 001 001 000 001
01 {010, 011, 100} 001 111 101 110 001
10 {110, 101, 010} 100 111 101 011 111
11 {011, 111, 100} 100 111 101 011 101

has the minimum Hamming distance from the approximate output fied circuit produces an approximate value not equal to the original
giving the minimal circuit so far. Note that there can be multiple output vector. Similar to the conflict outputs earlier, we call such
allowed output values satisfying the strategy described in 1) and 2). an output vector a difference output. Similarly, the error frequency
We select the one that is first found from the allowed outputs which reduction is based on eliminating the difference outputs in EXDC
are listed in an arithmetic ascending order. In our given example sets such that the netlist simplified with the updated EXDC set en-
(Table 5), by comparing the approximate output 101 for input 01 forces the correct output values on these inputs while at the same
with all allowed output values {010, 011, 100}, the output value 100 time aiming to minimize the resulting gate count increase. Again, to
produces the minimum Hamming distance with a difference only in minimize the gate penalty, we aim to find a minimal set of EXDC
the LSB bit. However, the LSB bit is not a candidate output for this modifications, which we base on a notion of the smallest distance to
input vector. As such, we select 011 as the allowed output with the the best EXDC set and hence netlist obtained so far. Note that dur-
next minimal Hamming distance whose bitwise XOR difference also ing this process, new conflicts with the originally specified Boolean
intersects with the candidate vector for at east one output bit, In this relation may be produced. In that case, the algorithm will roll back
case, the allowed output 101 differs from the current approximated to the procedures discussed in the previous section for elimination
output 011 in bits y1 and y2 , both of which are also in the candidate of the error magnitude violations.
list. As such, we modify y1 and y2 bits to be sensitive to the input We adopt the following strategy for removing input vectors with
vector 01, i.e. we remove input 01 from the O-EXDC set of output difference outputs from the EXDC sets:
bits y1 and y2 . Similarly, we remove the input vector 11 from the
O-EXDC set of the y1 bit. The non-shaded EXDC set remain the 1) For an approximate output that produces a value other than the
same as there are no conflicts in this iteration. correct one, perform a bitwise XOR comparison between the
Applying the updated EXDC sets to the original multilevel cir- difference output and the approximate output.
cuit will produce another version of the approximate circuit, where 2) Compute the bitwise intersection between the encoded EXDC
the conflict outputs in previous iterations will be mitigated but new output and the XOR result previously computed. Change the least
conflict outputs may be produced. The algorithm will thus continue significant bit that is in this intersection to be sensitive to the input
with the iterative output correction until no more conflicting outputs vector, i.e. remove the input vector from the EXDC sets of the
exist. In the worst case, the algorithm stops when there are no candi- least significant output bit in the intersection.
date outputs in the current iteration. In that case, the updated EXDC
Note that we choose to correct bits in the difference outputs from
sets are reduced to be identical to the U-EXDC sets. As such, the
LSB to MSB based on the fact that, for typical arithmetic circuits
gate count is guaranteed to be no larger than directly applying the
targeted in this work, the LSBs usually have less complexity than the
U-EXDC set to the original network. The above algorithm is sum-
MSBs. Correcting LSBs first therefore should lead to the smallest
marized in Algorithm 1 up to Line 19.
gate increase. For general non-arithmetic circuits, we can correct
Up to this point, we have acquired an approximate circuit with
output bits on similar estimates of the amount of logic shared with
reduced gate count that is compatible with the original Boolean re-
other outputs.
lation, i.e. error magnitude constraint. In the next subsection, we
See the example in Table 6. By comparing the approximate
will present how to extend the above algorithm to solve the gen-
output 001 for input 00 with the exact output value 000, the only
eral MALS problem (1) that jointly considers error magnitude and
difference lies in the LSB bit. As such, we modify the y0 bit to be
frequency constraints.
sensitive to the input vector 00, i.e. we remove input 00 from the
EXDC set of the y0 output bit. Similarly, we remove the input vector
3.3 Resolving Frequency Violations 10 from the EXDC set of y2 . The non-shaded EXDC sets remain the
In this subsection, we consider how to jointly constrain the error same as there is no difference for this iteration.
magnitude and error frequency. This joint exploration provides new
opportunities in exploring design decisions when error frequency Table 6: EXDC frequency recovery example
and error magnitude together affect the final quality level. Exact Old Approx. Updated
After removing all conflict outputs using the algorithm presented Inputs
Outputs EXDC Outputs EXDC
in the previous section, we have solved the MALS problem without x1 , x 0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0 y2 , y1 , y0
error frequency constraints. As such, the resulting approximate cir- 00 {000, 001} 001 001 000
cuit has an arbitrary error rate. Typically, the error frequency will be 01 {010, 011, 100} 100 010 100
high and will not satisfy the given constraint. 10 {110, 100, 010} 110 010 010
The solution obtained from Algorithm 1 (up to Line 19) can be 11 {011, 111, 100} 101 011 101
viewed as a relaxed solution of problem (1) from an error frequency
perspective. In the following, we will develop similar recovery tech-
niques as previously shown for magnitude violations in order to ul- We apply the above criteria to all input vectors that produce
timately also satisfy a given frequency constraint. different output values in this iteration. The updated EXDC sets
Similar to the recovery procedure discussed in the previous sec- will be used to simplify the original netlist, and this procedure is
tion, the correction of error frequency violations is based on a dif- repeated until the current error frequency satisfies the constraint. We
ference criteria. Consider an input vector xi , for which the simpli- summarize the complete approach in Algorithm 1.
4. Frequency-only Constrained MALS a 3 input bit circuit, all 8 input vectors are initially set as don’t cares.
In this section, we address the MALS problem when error frequency Input vectors in the EXDC set can be denoted in a prime implicant
is the only constraint, i.e. solving problem (2). As such, the error form {− − −}, where − indicates a don’t care condition on a single
magnitude is no longer a concern and we only need to consider how input bit position. In the first iteration, we choose a reduced EXDC
frequently the errors arise. We assume input vectors to be uniformly set resulting in the minimal gate count increase when applied to the
distributed, i.e. error frequency is equivalent to the number of the original circuit from the following 6 sets: {−−1}, {−−0}, {−1−},
input vectors that cause erroneous output values. Problem (2) can {−0−}, {1 − −}, and {0 − −}. Again, candidate EXDC sets are
be solved with the previously introduced general RAR-MALS algo- expressed in prime implicant forms. Note that these 6 EXDC sets
rithm by allowing every input vector to take on any possible output each covers exactly half of all input vectors. Furthermore, they are
value. However, such a fully unconstrained problem with all out- the 6 largest prime implicants having that property.
puts in the don’t care set results in a trivially simplified circuit dur- After the first iteration, the EXDC space is reduced by half such
ing the relaxation phase, where a subsequent recovery then has to that the current error frequency is guaranteed to be no larger than
re-create the desired functionality from scratch. In the following, 50%. The second iteration starts from the selected EXDC set of the
we proposed an alternative, simplified RAR-fMALS algorithm that first iteration, and again selects the best half of the remaining input
specifically and effectively targets the frequency constrained MALS vectors. Suppose, for example, that {−1−} is selected as the best
problem only. EXDC set after the first iteration. The second iteration then selects
Similar to the general MALS problem, we map the frequency- from the candidate sets {−11}, {−10}, {11−}, and {01−}. The
only MALS (fMALS) problem to the problem of finding the best set best EXDC set in this iteration produces at most a 25% error rate.
of EXDCs that causes the least number of erroneous outputs while The above steps continue until the circuit as simplified by the best
leading to the largest reduction in gate count. Here, we do not need EXDC set satisfies the error frequency constraint. The gate count
to consider the correlations of multi-output bits. Hence, the problem reduction achieved by this EXDC set is the final optimization result.
can be viewed as finding the best subset of input vectors that, when The recovery search in our algorithm is performed in a greedy
added to the external don’t care set of each output, leads to the manner. In order to explore a wider search space, we extend our
minimum gate count in the simplified circuit, where the cardinality of algorithm to not only choose the best, but to select the best two
this common EXDC set should be no larger than the error frequency EXDC sets in each iteration. This doubles the number of candidate
constraint. The biggest challenge is obviously that, for a given error EXDC sets being explored, and consequently, the runtime increases
frequency constraint (i.e. cardinality of the EXDC set), there are a by a factor of 2 as well. Despite the simplicity of the algorithm,
combinatorial number of possible EXDC sets. Furthermore, the gate our experiments have demonstrated its effectiveness on benchmark
count reduction of a particular EXDC set can only determined after circuits of arithmetic functionalities, as will be shown in Section 5.
time-consuming logic synthesis has been applied. As such, the goal Compared to an exhaustive search, this heuristic algorithm reduces
is to identify the best EXDC set in the least number of iterations. the complexity from O(2n ) to O(n2 ). The algorithm is summarized
Our proposed algorithm is again based on a relax-and-recover in Algorithm 2.
(RAR) strategy. In the same manner as in the previous general algo-
rithm, we relax the error frequency constraint to the most optimistic Algorithm 2: MALS under error frequency constraint only
EXDC set, in which all input vectors are added to the common don’t Input: NL: Original netlist, R: Error frequency constraint
care set of every output. This essentially adds all input vectors to the Output: Minimized netlist with constrained error frequency
EXDC set, which obviously simplifies to an empty network intro- 1 EXDC 1 = EXDC 2 = all input vectors;
duces the largest error frequency. In contrast to the general RAR-
2 NL1 = NL2 = Optimize(NL, EXDC 1 );
MALS algorithm, however, we adopt a different recovery strategy
3 r = 1.0;
for the frequency-only problem. We iteratively recover frequency
conflicts by following a quasi-binary search strategy. In each itera- 4 while r > R do

tion of this strategy, the error frequency is reduced by half, such that 5 E1 = EXDC 1 ; N1 = NL1 ;
the recovered input vectors cause the minimum gate count increase. 6 E2 = EXDC 2 ; N2 = NL2 ;
Since we do not need to consider the correlation of multi output 7 foreach E ∈ {EXDC 1 , EXDC 2 } do
bits, an input vector xi is either a don’t care or a care to every output 8 foreach unrestricted input bit b in E do
bit. In the beginning, every input vector is in the don’t care set. We 9 foreach v ∈ {0, 1} do
then gradually remove input vectors by restricting specific input bits 10 E3 = restrict b ← v in E;
of vectors in the EXDC set. Denote the set S as the set of currently 11 N3 = Optimize(NL, E3 );
unrestricted input bits, with cardinality of S being D (initially, 12 if C(N3 ) < C(N1 ) then
D = n, where n is the original number of input bits). Furthermore, 13 E1 = E3 ; N1 = N3 ;
d is the index of a specific input bit in S, where 0 ≤ d < D. In every 14 end
iteration of the algorithm, we restrict an additional input to remove 15 if C(N3 ) < C(N2 ) then
half of all input vectors from the EXDC set while minimizing the 16 E2 = E3 ; N2 = N3 ;
gate count increase until the frequency constraint is satisfied. In 17 end
D
each iteration, there are (2D−1)(2!×(2 )!
D−1 )! different combinations 18 end
to choose from. A quasi-binary search algorithm is proposed to 19 end
identify the best half of the input vectors to be removed. Instead 20 end
of exhaustively searching all combinations, our heuristic algorithm 21 EXDC 1 = E1 ; NL1 = N1 ;
explores only a subset of all possibilities. Specifically, the don’t care 22 EXDC 2 = E2 ; NL2 = N2 ;
set of input vectors is split into two parts based on the dth input bit
23 r = r/2;
value. For every bit position d, we compare the impact of removing
24 end
all input vectors in which the dth bit is either set to 0 or 1 from the
25 return argminNLi C(NLi );
EXDC set. Among all these cases, we select the best bit position and
its best value (1 or 0) as the best reduced EXDC set. For example, for
1 .0 with stricter error frequency constraints, where a sharper drop can
in some cases be observed when frequency recovery is initially ap-
plied, especially under tight magnitude constraints.
0 .9 Next, we present the results for RAR-fMALS algorithm under
N o r m a liz e d G a te C o u n ts

frequency constraints only. In addition to circuits in Table 7, we


applied RAR-fMALS to arithmetic circuits from the ISCAS-85 and
0 .8
other benchmarks. Results are summarized in Table 8. We find that
by allowing 1% error frequency, the simplified circuit can have up to
0 .7 37.8% fewer gates, with an average gate count reduction of 21.1%.
W a lla c e 8 _ IS F
C L A 1 6 _ IS F
If we increase the error frequency constraint to 10%, an up to 44%
R C A 1 6 _ IS F gate count reduction is observed, and the average gate reduction is
0 .6 W a lla c e 8 _ R A R -M A L S 30.4%.
C L A 1 6 _ R A R -M A L S Finally, we compare the performance of RAR-MALS and RAR-
R C A 1 6 _ R A R -M A L S fMALS algorithms when only the error frequency is constrained.
0 .5
0 .0 0 .5 1 .0 1 .5
Although both algorithms can handle the case frequency-only con-
E r r o r M a g itu d e (% ) straints, RAR-fMALS is more effective and efficient compared to
RAR-MALS. When running RAR-MALS in this experiment, we
Figure 1: Comparison between RAR-MALS and ISF-based ap- set the error magnitude to be the maximum possible value. Table 9
proaches. summarizes the comparison. The gate count reduction achieved by
RAR-fMALS is on average 13.3% larger than RAR-MALS. Further-
5. Experimental Results more, RAR-fMALS also runs up to 16X faster than RAR-MALS.
We have implemented the two MALS algorithms in a C++ environ-
ment using SIS [16], ABC [2] and Design Compiler as the synthe- 6. Summary and Conclusions
sis tools. To evaluate the capability of the proposed algorithms for
significant gate count reduction under general error magnitude and We presented two heuristic algorithms for solving a general multi-
frequency constraints, we used the algorithms to generate a range of level approximate logic synthesis problem. We adopt the formula-
approximate solutions of different types of adders and multipliers, tion of magnitude-constrained problems as a Boolean relation min-
as well as arithmetic benchmark circuits from the ISCAS-85 suite. imization, which is more general and grants better flexibilities on
All experiments were performed on an Intel 3.4GHz Core i7 work- error constraints, and hence better solutions, than the existing ISF
station. Table 7 shows the circuit-specific information for the adders based approaches. We further proposed two algorithms to solve the
and multipliers we used. MALS problem. Both algorithms are based on a novel relax-and-
recover heuristics: an over-optimistic solution under relaxed con-
Table 7: Circuits synthesized with the RAR-MALS algorithm straints is first explored, and this solution is then iteratively recov-
Name Function I/O Gates ered until all constraints are met. Our first algorithm addresses the
RCA8 8-bit Ripple Carry Adder 16/9 323 MALS problem under general magnitude and frequency constraints.
RCA16 16-bit Ripple Carry Adder 32/17 411 We further introduce a second algorithm that is more effective in
CLA16 16-bit Carry Lookahead Adder 32/17 412 solving a simplified MALS problem in which frequency is the only
KS16 16-bit Kogge Stone Adder 32/17 465 constraint. Experiments on arithmetic circuit blocks demonstrated
RCA32 32-bit Ripple Carry Adder 64/33 834 the effectiveness in achieving large gate count reductions across
Wallace8 8-bit Wallace Multiplier 16/16 1259 flexible error magnitude and frequency constraints.
Dadda8 8-bit Dadda Multiplier 16/16 1128
References
We first applied the general RAR-MALS algorithm to several [1] P. Albicocco, G. C. Cardarilli, A. Nannarelli, M. Petricca, and M. Re.
types of adders and multipliers with different bitwidths (Table 7). Imprecise arithmetic for low power image processing. In Signals,
For the 32-bit RCA circuit, we only applied our algorithm to the Systems and Computers (ASILOMAR), 2012.
lower 24 bits. The runtime varies from a few seconds for a simple [2] R. Brayton and A. Mishchenko. Abc: An academic industrial-strength
adder to 6hrs for the multipliers. We first compare the effectiveness verification tool. In Computer Aided Verification, 2010.
of the proposed RAR-MALS algorithm and the existing ISF-based [3] R. Brayton and F. Somenzi. An exact minimizer for boolean relations.
approach when only the magnitude of error is constrained. Figure 1 In ICCAD, 1989.
shows two types of 16-bit adders and an 8-bit Wallace multiplier [4] L. Chakrapani and K. Palem. A probabilistic boolean logic for energy
synthesized by our proposed algorithm as well as the existing ISF efficient circuit and system design. In ASP-DAC, 2010.
[5] V. Chippa, A. Raghunathan, K. Roy, and S. Chakradhar. Dynamic
based approach for several magnitudes of allowed error. The er- effort scaling: Managing the quality-efficiency tradeoff. DAC, 2011.
ror magnitude is shown as the percentage of the maximum output [6] A. A. Del Barrio, R. Hermida, and S. O. Memik. Exploring the energy
value (since circuits have different bitwidths, errors of the same ab- efficiency of multispeculative adders. In Computer Design (ICCD),
solute magnitude indicate varying relative significance). We find that 2013 IEEE 31st International Conference on, pages 309–315. IEEE,
the proposed RAR-MALS algorithm outperforms the ISF based ap- 2013.
proach by up to 20% in terms of achieved gate count reduction. [7] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy.
In Figure 2, we show the results of synthesizing approximate cir- IMPACT: imprecise adders for low-power approximate computing. In
cuits jointly under both types of constraints. We synthesized each ISLPED, 2011.
circuit with error magnitude constraints equal to 100 and 1000 (cor- [8] K. He, A. Gerstlauer, and M. Orshansky. Controlled timing-error
acceptance for low energy idct design. In DATE, 2011.
responding to different relative error magnitudes), all while sweep- [9] R. Hegde and N. Shanbhag. Soft digital signal processing. TVLSI01.
ing error frequency constraints from 0% to 100%. Results show that [10] Y. Kim, Y. Zhang, and P. Li. An energy efficient approximate adder
depending on the error magnitude and circuit, gate count reductions with carry skip for error resilient neuromorphic vlsi systems. In Pro-
ranging from 5% to 45% can be achieved if frequency is uncon- ceedings of the International Conference on Computer-Aided Design,
strained. Achievable gate count reductions thereby decrease linearly 2013.
1 .0
1 .0
1 .0

N o r m a liz e d G a te C o u n ts
N o r m a liz e d G a te C o u n ts
0 .9
N o r m a liz e d G a te C o u n ts

0 .9
0 .9
0 .8

0 .7 0 .8
C L A 1 6 , M = 1 0 0 (0 .0 7 % ) 0 .8 R C A 3 2 , M = 1 0 0 D a d d a 8 , M = 1 0 0 ( 0 .1 5 % )
C L A 1 6 , M = 1 0 0 0 (0 .7 % ) D a d d a 8 , M = 1 0 0 0 (1 .5 % )
(0 .0 0 0 0 0 1 2 % )
0 .6 K S 1 6 , M = 1 0 0 (0 .0 7 % ) W a lla c e 8 , M = 1 0 0 (0 .1 5 % )
R C A 3 2 , M = 1 0 0 0 W a lla c e 8 , M = 1 0 0 0 (1 .5 % )
K S 1 6 , M = 1 0 0 0 (0 .7 % )
(0 .0 0 0 0 1 2 % ) 0 .7
0 .5 0 .7
0 2 0 4 0 6 0 8 0 1 0 0 0 2 0 4 0 6 0 8 0 1 0 0 0 2 0 4 0 6 0 8 0 1 0 0
E rro r F re q u e n c y (% ) E rro r F re q u e n c y (% ) E rro r F re q u e n c y (% )

(a) 16-bit adders. (b) 32-bit Ripple Carry adder. (c) 8-bit multipliers.
Figure 2: Error magnitude and frequency constrained RAR-MALS for different circuits.

Table 8: Error frequency constraint only RAR-fMALS


Error Frequency < 1% Error Frequency < 10%
Orig. Gates by Reduced Gates by Reduced
Name I/O Time(s) Time(s)
Gates RAR-fMALS Gates % RAR-fMALS Gates %
c432 36/7 209 130 1248 37.8 130 662 37.8
c880 60/26 327 292 1199 10.7 285 657 12.8
c1908 33/25 414 359 2871 13.3 327 2101 21.0
c3540 50/22 1038 897 1406 13.6 673 872 35.2
alu2 10/6 401 288 505 28.2 225 345 43.9
alu4 14/8 735 566 2327 23.0 501 1987 31.8
Avg. 21.1 30.4

Table 9: RAR-MALS and RAR-fMALS under error frequency constraint of 10%


RAR-MALS RAR-fMALS Gain
Orig. Gate Gate Reduced Time Gate Reduced Time Reduced
Name I/O Time
Counts Counts Gates % (s) Counts Gates % (s) Gates %
RCA8 16/9 323 317 1.8 25 234 27.5 29 25.7 0.86
RCA16 32/17 411 408 0.7 2141 377 8.2 146 7.5 14.7
KS16 32/17 465 465 0 3025 424 8.8 183 8.8 16.5
RCA32 64/33 834 834 0 3415 732 12.2 2176 12.2 1.6
Wallace8 16/16 1259 1212 3.7 18310 1057 16.0 1438 12.3 12.7
Avg. 13.3 9.3

[11] F. Kurdahi, A. Eltawil, K. Yi, S. Cheng, and A. Khajeh. Low-power [17] D. Shin and S. K. Gupta. Approximate logic synthesis for error tolerant
multimedia system design by aggressive voltage scaling. TVLSI10. applications. In DATE, 2010.
[12] A. Lingamneni, C. Enz, J. L. Nagel, K. Palem, and C. Piguet. En- [18] S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and
ergy parsimonious circuit design through probabilistic pruning. In A. Raghunathan. Quality programmable vector processors for ap-
DATE2011. proximate computing. In Proceedings of the 46th Annual IEEE/ACM
[13] J. Miao, A. Gerstlauer, and M. Orshansky. Approximate logic synthesis International Symposium on Microarchitecture, 2013.
under general error magnitude and frequency constraints. In ICCAD, [19] S. Venkataramani, A. Sabne, V. Kozhikkottu, K. Roy, and A. Raghu-
2013. nathan. Salsa: systematic logic synthesis of approximate circuits. In
[14] J. Miao, K. He, A. Gerstlauer, and M. Orshansky. Modeling and DAC, 2012.
synthesis of quality-energy optimal approximate adders. In ICCAD12. [20] R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu. On reconfiguration-
[15] S. H. Nawab, A. V. Oppenheim, A. P. Chandrakasan, J. M. Winograd, oriented approximate adder design and its application. In Proceedings
and J. T. Ludwig. Approximate signal processing. VLSI Signal of the International Conference on Computer-Aided Design. IEEE
Processing, 15, 1997. Press, 2013.
[16] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Sal- [21] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong. Design of
danha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni- low-power high-speed truncation-error-tolerant adder and its applica-
Vincentelli. Sis: A system for sequential circuit analysis. Technical tion in digital signal processing. TVLSI, 2010.
report, Tech. Report No. UCB/ERL M92, 1992.

You might also like