0% found this document useful (0 votes)

58 views7 pages

Approximate Wallace Tree Multiplier

Uploaded by

tinnguyen230303

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views7 pages

Approximate Wallace Tree Multiplier

Uploaded by

tinnguyen230303

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Power- and Area-Efficient Approximate Wallace

Tree Multiplier for Error-Resilient Systems

Kartikeya Bhardwaj Pravin S. Mane Jörg Henkel
Electrical & Electronics Engg. Electrical & Electronics Engg. Department of Computer Science
BITS Pilani-Goa Campus BITS Pilani-Goa Campus Karlsruhe Institute of Technology
Goa – 403 726, India Goa – 403 726, India Karlsruhe – 76131, Germany
Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—Today in sub-nanometer regime, chip/system design- increases. In this paper, we present a new Approximate
ers add accuracy as a new constraint to optimize Latency-Power- Wallace Tree Multiplier (AWTM) based on a bit-width aware
Area (LPA) metrics. In this paper, we present a new power algorithm. We design it specifically to give good results for
and area-efficient Approximate Wallace Tree Multiplier (AWTM)
for error-tolerant applications. We propose a bit-width aware large operands. Besides accuracy, the AWTM is also optimized
approximate multiplication algorithm for optimal design of our for power and area. For single cycle implementation, AWTM
multiplier. We employ a carry-in prediction method to reduce gives significant reduction in latency as well. Our contributions
the critical path. It is further augmented with hardware efficient are:
precomputation of carry-in. We also optimize our multiplier de-
∙ We propose a new power and area-efficient AWTM based
sign for latency, power and area using Wallace trees. Accuracy as
well as LPA design metrics are used to evaluate our approximate on a bit-width aware multiplication algorithm.
multiplier designs of different bit-widths, 𝒊.𝒆. 4 × 4, 8 × 8 and ∙ We employ a novel Carry-in Prediction technique which
16 × 16. The simulation results show that we obtain a mean significantly reduces the critical path of our multiplier. We
accuracy of 99.85% to 99.965%. Single cycle implementation further derive an efficient carry-in precomputation logic
of AWTM gives almost 24% reduction in latency. We achieve
significant reduction in power and area, 𝒊.𝒆. up to 41.96% and to accelerate the carry propagation.
34.49% respectively that clearly demonstrates the merits of our ∙ We obtain a very high mean accuracy of 99.965% (mean
proposed AWTM design. Finally, AWTM is used to perform a error of only 0.035%) when the size of operands are 10
real time application on a benchmark image. We obtain up to bits or more. However, if there is no lower bound on the
39% reduction in power and 30% reduction in area without any size of operands, the mean accuracy varies from 99.85%
loss in image quality.
Index Terms—Approximate multiplier; Bit-width aware mul-
to 99.9% (a very small mean error of 0.1% to 0.15%).
tiplication algorithm; Wallace tree; Error-resilient systems ∙ We achieve a significant reduction in power and area,
i.e. up to 41.96%, and 34.49% respectively for the 16-
I. I NTRODUCTION bit accuracy-configurable AWTM design. For single cycle
implementation of 16 × 16 AWTM, we also reduce the
The International Technology Roadmap for Semiconductors latency by around 24%.
(ITRS) [1] has anticipated imprecise/approximate designs that ∙ Our proposed AWTM, when used for a real time applica-
became a state-of-the art demand for the emerging class of tion on an image, achieved up to 39% reduction in power
killer applications that manifest inherent error-resilience such and up to 30% reduction in area with negligible loss in
as multimedia, graphics, and wireless communications. In the image quality.
error-resilience systems, adders and multipliers are used as Rest of the paper is organized in various sections. In section
basic building blocks and their approximate designs have 2 we discuss some background and related work reported in
attracted significant research interest recently. Conventional literature. We describe some preliminaries in Section 3. An
wisdom investigated several mechanisms such as truncation approximate multiplier architecture is explained in Section
[2], over-clocking, and voltage over-scaling(VOS) [3] which 4. We propose a bit-width aware approximate multiplication
could not configure accuracy as well as Latency-Power-Area algorithm in Section 5. We present AWTM design based on the
(LPA) design metrics effectively. Most of the other design proposed methodology and its optimization w.r.t. LPA design
techniques rely on functional approximations and a wide metrics in Section 6. The experimental results are given in
spectrum of approximate adders like [4], [5], [6] and [7] have Section 7. Finally, we conclude the paper in Section 8.
been proposed in the past. However, very few research papers
are reported on approximate multipliers in the literature. II. BACKGROUND AND R ELATED W ORK
Most of the approximate multiplier designs reported shorten Research on approximate arithmetic circuits mainly reported
the carry-chains in which error is configurable and the algo- in the literature is on approximate adders. It is worthwhile
rithms employed in the designs are for smaller numbers and to study these approximate adders in order to make research
give large magnitude of error as the bit-width of operands contributions on approximate multipliers. Lu [8] proposed a

978-1-4799-3946-6/14/$31.00 ©2014 IEEE 263 15th Int'l Symposium on Quality Electronic Design
𝑘-bit carry look-ahead adder in which only previous 𝑘 bits are Accurate
Partial Product
considered to estimate current carry signal. Lu’s adder exhibits 2b bits

a low probability of getting correct sum and increases area AH XL

AH XH AL XL

overhead. Shin et al. [9] reduce data-path delay and re-design AL XH

the data-path modules. It cuts the critical-path in carry-chain AH XH AL XL

AH XL
to exploit a given error rate to improve parametric yield. AL XH
Accurate to a
b bits b bits b bits
Zhu et al.[4] manifest an error-tolerant adder: ETA-I. ETA-I 2b bits
Large Extent

divides inputs into: 1) Accurate part, and 2) Inaccurate part. Final Product
4b bits Final Product
In the latter, no carry signal is considered at any bit position. 4b bits

Gupta et al. [10] target low-power and propose five different (a) (b)

versions of mirror adder by reducing the number of transistors Fig. 1. (a) Recursive Multiplication (b) Approximate Multiplication
and internal node capacitance. Verma et al. [6] presented a
Variable Latency Speculative Adder (VLSA) which provides
approximate/accurate results but gives considerable delay and followed by additions. Fig. 1(b) is derived from Fig. 1(a) for
large area overhead. Kahng et. al [7] proposed an accuracy- approximate multiplication.
configurable adder with reduced critical-path and error rate.
In contrast with the above work, very few researchers have B. Accuracy Design Metrics
reported work on approximate multipliers. Sullivan et al. [11] The accuracy design metrics are defined as follows:
used 𝑇 𝑟𝑢𝑛𝑐𝑎𝑡𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 (TEC) to investigate an 1) Relative Error: Relative Error can be calculated as
iterative approximate multiplier in which some amount of (∣𝑅𝑐 − 𝑅𝑒 ∣/𝑅𝑐 ) × 100% for 𝑅𝑐 ∕= 0. Here, 𝑅𝑐 is correct
error correcting circuitry is added for each iteration. This result and 𝑅𝑒 is approximate result. We denote accuracy
circuitry replicates the effects of multiple pipeline iterations as 𝐴𝐶𝐶𝑎𝑚𝑝 where 𝐴𝐶𝐶𝑎𝑚𝑝 = 1 − 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟.
for the most problematic inputs quite inexpensively. Kulkarni 2) Mean Error: Mean error is the average of relative errors
et al. [12] proposed a 2 × 2 underdesigned multiplier block of all the combinations tested in an algorithm.
and built arbitrarily power aware inaccurate multipliers. Kyaw 3) Minimum Acceptable Accuracy (MAA): Minimum Ac-
et al. [13] presented an Error Tolerant Multiplication (ETM) ceptable Accuracy is the minimum level of accuracy that
algorithm in which the input operands are split into two parts. an application can tolerate.
a multiplication part consists of higher order bits and a non- 4) Acceptance Probability (𝐴𝑃 ): It is the probability that
multiplication part with the remaining lower order bits. The accuracy of the approximate arithmetic circuit is higher
multiplication begins at the point where the bits split and than the minimum acceptable accuracy. Its value is given
move simultaneously towards the two opposite directions till by 𝐴𝑃 = 𝑃 (𝐴𝐶𝐶𝑎𝑚𝑝 > 𝑀 𝐴𝐴).
all bits are taken care of. The ETM exhibited a significant In the next section, we discuss an approximate multiplier
reduction in delay, power and hardware cost for specific input architecture to explain the bit-width aware algorithm proposed.
combinations. Next, we explain the preliminary concepts so
as to understand the proposed approximate multiplier. IV. A PPROXIMATE M ULTIPLIER A RCHITECTURE
In order for the multiplier to exhibit high accuracy, the most
III. P RELIMINARIES significant bits (MSBs) of the final 4𝑏 bit product (𝐴 × 𝑋)
We make use of a simple recursive multiplication for our should be accurate to high extent. Therefore, we make the
approximate multiplier design and use various accuracy design multiplier 𝐴𝐻 𝑋𝐻 as 𝑏 × 𝑏 accurate multiplier and 𝐴𝐻 𝑋𝐿 ,
metrics [7], [13] for its evaluation. The recursive multiplication 𝐴𝐿 𝑋𝐻 , 𝐴𝐿 𝑋𝐿 as 𝑏 × 𝑏 approximate multipliers. As we shall
and the accuracy design metrics are described in the following see, the 𝑏 × 𝑏 approximate multipliers generate upper 𝑏 bits
subsections. as accurate to high extent, which further makes the upper 2𝑏
bits of final 4𝑏 bit product achieve high accuracy. The same
A. Recursive Multiplication is illustrated in Fig. 1(b).
We have explained the design methodology of these approx-
A given multiplication can be recursively broken down into
imate 𝑏×𝑏 multipliers in the Carry-in Prediction Logic[14]. We
several smaller-size multiplications, each of which can be
briefly explain this novel technique in the following subsection
performed in the same clock cycle. Let 𝐴 be the multiplicand
with the help of an example.
and 𝑋 be the multiplier and both are of 2𝑏 bits each. 𝐴 and 𝑋
can also be written as 𝐴 = 𝐴𝐻 𝐴𝐿 and 𝑋 = 𝑋𝐻 𝑋𝐿 where A. The Carry-in Prediction − An Example
𝐴𝐻 , 𝐴𝐿 , 𝑋𝐻 , and 𝑋𝐿 are of 𝑏 bits each. Consider the unsigned multiplication of two 16-bit numbers
The multiplication 𝐴 × 𝑋, which is 2𝑏 × 2𝑏, can be (i.e. 𝑏 = 8):
recursively carried out as shown in Fig. 1(a). In this mul-
tiplication, 𝐴𝐻 𝑋𝐿 , 𝐴𝐻 𝑋𝐻 , 𝐴𝐿 𝑋𝐿 , and 𝐴𝐿 𝑋𝐻 are partial 𝐴 = (𝐴𝐸𝐷𝐵)16 = (44763)10
products, each of which is a 𝑏 × 𝑏 multiplication. Hence, a
2𝑏 × 2𝑏 multiplication is divided into four 𝑏 × 𝑏 multiplications 𝑋 = (𝐵6𝐸7)16 = (46823)10
b bits b bits EF = 00 EF = 01

11011011 11011011 CD CD CD CD CD CD CD CD
X 11100111 X 11100111 1
AB 0 01 1 10 AB 01 1 1
critical column
C11011011 11011011 AB 01 1 1 1 AB 1 1 1 1

C=1 11011011x 11011011x AB 1 1 1 1 AB 1 1 1 1

1 1 0 1 1 0 1 1 xx 1 1 0 1 1 0 1 1xx
AB 01 1 1 1 AB 1 1 1 1
0 0 0 0 0 0 0 0 xxx 0 0 0 0 0 0 0 0 xxx
0 0 0 0 0 0 0 0 x xxx 0 0 0 0 0 0 0 0x xxx
1 1 0 1 1 0 1 1 xx xxx 1 1 0 1 1 0 1 1 xx xxx EF = 10 EF = 11
1 1 0 1 1 0 1 1 xxx xxx 1 1 0 1 1 0 1 1 xxx xxx CD CD CD CD CD CD CD CD CD CD CD CD
1 1 0 1 1 0 1 1 xxxx xxx 1 1 0 1 1 0 1 1 xxxx xxx
AB 0 01 1 10 AB 01 1 1 1 AB 1 1 1 1
1 1 0 0 0 0 1 1 1 1 1 11 1 0 1 1 1 0 0 0 1 0 11 0 0 1 1 1 0 1
AB 0 1 1 1 AB 1 1 1 1 AB 1 1 1 1
accurate to inaccurate accurate completely accurate
certain extent (b/2 bits) (b/2 bits) AB 1 1 1 1 AB 1 1 1 1 AB 1 1 1 1
(2b bits)
(b bits)
AB 0 1 1 1 AB 1 1 1 1 AB 1 1 1 1
(a) (b)
(a) (b)
Fig. 2. Carry-in Prediction Example for 𝑏 = 8 (i.e. 16 × 16 multiplication):
(a) Approximate 𝐴𝐿 𝑋𝐿 (b) Accurate 𝐴𝐿 𝑋𝐿
Fig. 3. Carry-in Precomputation for (a) 𝑏 = 4 i.e. 8 × 8 Multiplier (b) 𝑏 = 6
i.e. 12 × 12 Multiplier

Now, let us evaluate one approximate product out of 𝐴𝐻 𝑋𝐿 ,

𝐴𝐿 𝑋𝐻 and 𝐴𝐿 𝑋𝐿 using our algorithm. Say, we want to can further simplify and approximate the evaluation of carry-
evaluate 𝐴𝐿 𝑋𝐿 i.e (𝐷𝐵)16 × (𝐸7)16 . As shown in Fig. 2(a), in so that reduction in latency is not achieved at the cost of
we divide this multiplication in three independent parts: First, power. We consider the cases of 𝑏 = 4 and 𝑏 = 6 in order to
accurate computation of 𝑏/2 least significant bits (LSBs), better explain carry-in precomputation procedure.
followed by second part wherein 𝑏/2 bits are simply set to The precomputation is made hardware efficient by making
1’s. The third part is again accurate computation of remaining minor changes in the K-Maps of the carry-in expressions as
elements in the multiplication tree with an additional carry ‘C’ shown in Fig. 3. Here 𝐴, 𝐵, . . . , 𝐹 are the elements in critical
arising from the inaccurate part at least significant position. So, column. The original K-Maps are obtained from the statement
the idea is to precompute ‘C’ through some mechanism and of Carry-in Prediction Logic i.e. 𝐶𝑖𝑛 = 1 if 2 or more elements
begin multiplication simultaneously from both first and third of critical column are 1. Fig. 3 further derives
part. At the same time, we reduce the number of addition ∙ for 𝑏 = 4: By making changes in 2 cases out of 16, we
operations involved by directly setting the bits in second part, can simplify the Carry-in expression to
thus significantly reducing the hardware costs.
Fig. 2(a) further shows a critical column as the column 𝐶𝑖𝑛 = 𝐴.𝐵 + 𝐶 + 𝐷
containing maximum number of elements in the multiplication Similar results can also be derived for 𝑏 < 4.
tree. Carry-in Prediction logic exploits the fact that if there are ∙ for 𝑏 = 6: We make changes in 6 cases out of 64 and get
two or more 1’s in the critical column, then a carry of at least
1 is definitely propagated to the next column. Next subsection 𝐶𝑖𝑛 = 𝐴 + 𝐵 + 𝐶 + 𝐷 + 𝐸 + 𝐹
discusses this prediction in more detail. In the second part, we This is same as OR operation of all the elements present
set the 𝑏/2 bits in the inaccurate part as 1’s because for such in critical column. Therefore, in general, we can state that
a large 𝑏 (>= 5), it is very probable that carry propagated for large 𝑏 (greater than 4), one should take the OR of
from critical column is more than 1. Therefore, setting those all the elements present in critical column to get 𝐶𝑖𝑛 .
bits will reduce the error involved as it is analogous to the
Next, we propose various accuracy configurations and a bit-
difference between 16 (5′ 𝑏10000) and 15 (5′ 𝑏01111) i.e. 16
width aware approximate multiplication algorithm.
just passes an extra carry.
Fig. 2(b) shows the accurate 𝐴𝐿 𝑋𝐿 evaluation. As evident, V. B IT-W IDTH AWARE A PPROXIMATE M ULTIPLICATION
out of 8 most significant bits (MSBs), 6 are correct in our In the approximate multiplication, we divide the 𝑏 × 𝑏
approximate 𝐴𝐿 𝑋𝐿 . Evaluating 𝐴𝐻 𝑋𝐿 and 𝐴𝐿 𝑋𝐻 in a accurate multiplier 𝐴𝐻 𝑋𝐻 into 4 smaller components, each
similar fashion and adding all these as indicated in Fig. 1(b) being a 𝑏/2 × 𝑏/2 multiplier. This is because, when accurate
gives approximate result as (7𝐶𝐸𝐵𝐴7𝐹 𝐵)16 . The correct 𝐴𝐻 𝑋𝐻 is performed in parallel with approximate 𝐴𝐻 𝑋𝐿 ,
answer is (7𝐶𝐸𝐷799𝐷)16 . The relative error in this case is 𝐴𝐿 𝑋𝐻 and 𝐴𝐿 𝑋𝐿 , the critical path will still be determined
merely 0.0056%. Precomputation of Carry-in ‘C’ is described by the accurate multiplier. Therefore, recursively reducing it
next in detail. to smaller multipliers will make approximate 𝑏 × 𝑏 multipliers
as deciding factors of critical path as they are more critical
B. Efficient Carry-in Precomputation than accurate 𝑏/2 × 𝑏/2 multipliers.
Carry-in Prediction necessitates the precomputation of In other words, the stage 1 of the pipelined approximate
carry-in. Since we are dealing with error resilient systems, we multiplier effectively consists of 7 multipliers. The designation
2b bits
set to 2 and not as 4. Further, the positions at which middle
b bits
AH XL
𝑏/2 bits are set to 1 also changes with operand bit-width. This
has already been indicated in Fig. 2. This is plausible because
AHH X HL
at a time, we will use a multiplier of fixed size depending on
AHH X HH AL XL
AHL X HL application and hence can program it accordingly.
b A
HL
X
HH
We present our algorithm (see Algorithm 1) of approximate
2
b
b bits b bits
𝑏 × 𝑏 partial product computation (e.g. 𝐴𝐻 𝑋𝐿 ) in its most
AL XH
2 general form. This algorithm is coded later as a C-Program
for simulation purposes. It should be noted that the index 0 is
2b bits
Final Product
the Most Significant Bit in the Algorithm 1.
4b bits

Algorithm 1 Approximate Partial Product Evaluation

Fig. 4. Latency-Driven Pipelined Approximate Multiplier procedure A PPROXIMATE P RODUCT(𝑝, 𝑎, 𝑥)
TABLE I
𝑃 𝑟𝑜𝑑𝑢𝑐𝑡 ← 𝑝[0, 1, ..., 2𝑏 − 1] /* Say 𝐴𝐻 𝑋𝐿 */
M ODES OF O PERATION OF ACCURACY C ONFIGURABLE M ULTIPLIER 𝑀 𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑐𝑎𝑛𝑑 ← 𝑎[0, 1, ..., 𝑏 − 1]
𝑀 𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 ← 𝑥[0, 1, ..., 𝑏 − 1]
Mode 𝐴𝐻𝐻 𝑋𝐻𝐻 𝐴𝐻𝐻 𝑋𝐻𝐿 𝐴𝐻𝐿 𝑋𝐻𝐻 𝐴𝐻𝐿 𝑋𝐻𝐿 𝑐 ← 0 /* Temporary Carry */
1 𝐴 𝐼 𝐼 𝐼
2 𝐴 𝐴 𝐼 𝐼 𝑞, 𝑟 ← 2𝑏 − 1; 𝑑 ← 𝑏 − (𝑘/2)
3 𝐴 𝐴 𝐴 𝐼 for 𝑖 ← 𝑏 − 1, 𝑏 − (𝑘/2) do /* Inaccurate Part */
4 𝐴 𝐴 𝐴 𝐴 for 𝑗 ← 𝑏 − 1, 𝑑 do
[𝑝(𝑞), 𝑐] ← add-bits(𝑝(𝑞), 𝑎(𝑗)&𝑥(𝑖), 𝑐);
𝑞 ←𝑞−1
of each of these multipliers and their respective arrangement end for
for addition in second stage is depicted in Fig. 4. Note that 𝑞, 𝑟 ← 𝑟 − 1; 𝑑 ← 𝑑 + 1; 𝑐 ← 0
this kind of arrangement will not lead to any change in latency end for
of second pipeline stage as we perform the addition of 𝐴𝐻 𝑋𝐿 for 𝑟 ← 2𝑏 − (𝑘/2) − 1, 2𝑏 − 𝑘 do
and 𝐴𝐿 𝑋𝐻 in parallel with rest of the smaller multipliers. The 𝑝(𝑟) ← 1;
latter additions generate a net sum of 𝐴𝐻 𝑋𝐻 in almost the end for
same time as the former takes to complete its addition. Now 𝐶𝑖𝑛 ← Carry-in-Pre /* Switch Case for various 𝑏 */
let us see how we can exploit this property in the proposed 𝑝(𝑟) ← 𝐶𝑖𝑛 ; 𝑞, 𝑟 ← 2𝑏 − 𝑘 − 1; 𝑑 ← 𝑏 − 𝑘 − 1; 𝐶𝑖𝑛 ← 0
multiplication to configure its accuracy. for 𝑖 ← 𝑏 − 1, 𝑏 − 𝑘 do /* Accurate Part */
for 𝑗 ← 𝑑, 0 do
A. Accuracy Configuration Modes [𝑝(𝑞), 𝐶𝑖𝑛 ] ← add-bits(𝑝(𝑞), 𝑎(𝑗)&𝑥(𝑖), 𝐶𝑖𝑛 );
Since now we have 7 multipliers in stage 1, we can vary if 𝑗 == 0 and 𝐶𝑖𝑛 == 1 and 𝑞 ∕= 0 then
the accuracy level of the proposed multiplier by varying the 𝑝(𝑞 − 1) ← 1
number of multipliers that are accurate. In any case, we keep end if
the 𝐴𝐻𝐻 𝑋𝐻𝐻 as always accurate, so that the accuracy level 𝑞 ←𝑞−1
does not fall below a certain level. Therefore, we obtain end for
an accuracy configurable multiplier whose accuracy can be 𝑞 ← 𝑟; 𝑑 ← 𝑑 + 1; 𝐶𝑖𝑛 ← 0
adjusted according to error tolerance of the application. The end for
number of inaccurate multipliers used will directly determine for 𝑖 ← 𝑏 − 𝑘 − 1, 0 do
the amount of power saved by the multiplier. for 𝑗 ← 𝑏 − 1, 0 do
We propose 4 different modes of operations of our approxi- [𝑝(𝑞), 𝐶𝑖𝑛 ] ← add-bits(𝑝(𝑞), 𝑎(𝑗)&𝑥(𝑖), 𝐶𝑖𝑛 );
mate multiplier based on accuracy levels. The proposed modes if 𝑗 == 0 and 𝐶𝑖𝑛 == 1 and 𝑞 ∕= 0 then
are given in Table I. Here ‘𝐴’ stands for an accurate multiplier 𝑝(𝑞 − 1) ← 1
and ‘𝐼’ stands for an inaccurate multiplier. We explain the bit- end if
width aware algorithm next. 𝑞 ←𝑞−1
end for
B. Proposed Bit-Width Aware Algorithm 𝑞, 𝑟 ← 𝑟 − 1; 𝐶𝑖𝑛 ← 0
We propose a bit-width aware algorithm for generalized 2𝑏× end for
2𝑏 multiplication which is configurable at run-time according end procedure
to bit-width of operands i.e. if 𝑏 × 𝑏 multiplication is with
smaller operands, say (1011)2 × (1101)2 , it will configure 𝑏
at run-time as 4, not as 8 (used previously for 𝑏 × 𝑏). The size VI. AWTM D ESIGN AND I TS LPA O PTIMIZATION
of inaccurate part (𝑘) in the approximate partial products will In our accuracy-configurable design, we have reduced the
always be equal to 𝑏/2 bits. Therefore, 𝑘 will be automatically horizontal critical path to just half. In order to reduce the
Carry−in TABLE II
Predicted
= Partial Product R ESULTS : M EAN E RROR AND ACCEPTANCE P ROBABILITY (AP)
Ai X j

STAGE 1
Mode Parameter For Operands For Operands
> 1 (in %) > 1000 (in %)
1 Mean Error 5.26 4.59
𝐴𝑃 27.34 28.18
2 Mean Error 3.42 3.16
1 1 1 1
𝐴𝑃 46.18 46.04
3 Mean Error 0.46 0.29
STAGE 2 𝐴𝑃 91.58 94.28
4 Mean Error 0.13 0.035
𝐴𝑃 98.44 99.72
1 1 1 1

STAGE 3
generate results on real time application by computing Dis-
crete Cosine Transform (DCT) and Inverse Discrete Cosine
1 1 1 1 Transform (iDCT) of a benchmark image.

6 BIT Full Adder (Carry−in at LSB is Carry−out of last stage, not 0) A. Accuracy and Acceptance Probability
We simulate our 16 × 16 bit-width aware multiplier us-
Fig. 5. Wallace Tree for approximate 8 × 8 partial product evaluation
ing a C Program by generating 5000 random numbers to
compute accurate and approximate products for all possible
vertical critical path as well (of 𝐴𝐻 𝑋𝐿 , 𝐴𝐿 𝑋𝐻 and 𝐴𝐿 𝑋𝐿 ), combinations for different accuracy modes. For each case,
we use Wallace Tree Reduction [15]. The Wallace Trees are 𝑀 𝐴𝐴 is set as 99%. It generates results like 𝐴𝐶𝐶𝑎𝑚𝑝 ,
fast and hardware efficient for multiplication of more than mean error and Acceptance Probability. The mean error and
16 bits. Wallace tree height also grows as 𝑙𝑜𝑔3/2 (𝑁/2). We Acceptance Probability (AP) results are tabulated in Table II
call this Wallace tree based design as Approximate Wallace for various modes. Note that for unbounded operand size
Tree Multiplier (AWTM). For an accurate 8 × 8 Wallace (operands > 1 in Table II), bit-width aware algorithm is
multiplication, it takes a total of 4 stages of reduction (each employed. Whereas when operand size is constrained to be
of which has a delay of 1 full adder) and then uses a 11-bit 10 bit or more (operands > 1000), it is found that a simple
full adder to compute the final product. We use these 8 × 8 16 × 16 approximate multiplier without bit-width awareness
partial products to evaluate a 16 × 16 multiplication. produces almost the same results.
Fig. 5 shows that approximate partial product multipliers Table II shows spectacular results for accuracy levels.
(𝐴𝐻 𝑋𝐿 , 𝐴𝐿 𝑋𝐻 and 𝐴𝐿 𝑋𝐿 ), which are 8 × 8, take a total of Acceptance Probability of more than 98% for a minimum
3 stages of reduction and further use a 6-bit full adder for final acceptable accuracy of 99% signifies that for all possible
product evaluation. When compared in terms of critical paths combinations of the random numbers generated, more than
(of stage 1 of pipelined 16 × 16 multiplier), an accurate 8 × 8 98% cases give an accuracy greater than 99%. Also, as
multiplier uses a delay of 15 full adders (4 stages and 11-bit explained earlier, for larger numbers (operand size > 1000),
full adder) and its approximate counterpart uses a delay of 9 the accuracy level shoots up to as high as 99.965%. A 16 × 16
full adders (3 stages and a 6-bit full adder). Theoretically, this multiplier proposed in [12] generates a mean error of 3.32%.
leads to an improvement of 40% in latency of stage 1. Clearly, our multiplier performs better than this for modes 3
Furthermore, for each of the 𝐴𝐻 𝑋𝐿 , 𝐴𝐿 𝑋𝐻 and 𝐴𝐿 𝑋𝐿 , and 4. The error is comparable for mode 2.
number of adders reduced is around 51.24% (48 full adders We further investigate the relationship between 𝑀 𝐴𝐴 and
and 25 half adders are required for their accurate evaluation acceptance probability. Fig. 6 shows a plot of 𝐴𝑃 vs. 𝑀 𝐴𝐴
which is in contrast with 23 full adders and 13 half adders for various modes and for ETM (proposed in [13]). It is
required for approximate evaluation) and hence for complete evident that our multiplier (when used in mode 3 and 4)
pipelined multiplier, total power reduction is expected to be outperforms ETM easily as far as accuracy is concerned. The
around 38.42% (because 𝐴𝐻 𝑋𝐻 is accurate and other 3 are accuracy results for mode 4 of 16×16 AWTM were confirmed
inaccurate, therefore three-fourth of 51.24%). We validate by inputting 10, 000 random test vectors in the RTL netlist.
these theoretical estimates by running actual simulations. The Note that for the sake of simplicity, we do not make hardware
experimental results and their analysis are discussed next. implementation (HDL codes) of AWTM as bit-width aware.
For such a design, we obtained a mean error of 0.16% and 𝐴𝑃
VII. E XPERIMENTAL R ESULTS AND A NALYSIS of 98.56% for 𝑀 𝐴𝐴 of 99%, quite in agreement with Table II.
In this section, we present power and area results obtained Hence, mode 4 of 16 × 16 multiplier gives almost the same
experimentally. All results have been produced using Cadence results (for all operands) when employed with or without bit-
RTL Compiler for 45𝑛𝑚 Nangate Opencell Library. We also width awareness. Next, we present Power and Area results.
70
100 4 bit
60 8 bit

Acceptance Probability (in %)

16 bit

Power Reduction (in %)

50
80
40

60 30

AWTM Mode 1
20
AWTM Mode 2
AWTM Mode 3
40 AWTM Mode 4 10
ETM
0
AWTM ETM Kulkarni Truncation
90 92 94 96 98 Multipliers
Minimum Acceptable Accuracy (in %)

Fig. 7. Power Reduction plot for Scalability of approximate multipliers

Fig. 6. Acceptance Probability vs. Minimum Acceptable Accuracy
(Higher is better)
TABLE III
R EDUCTION IN A REA AND P OWER (H IGHER IS BETTER ) 60
4 bit
8 bit
Approximate Multiplier Area (%) Power (%) 50 16 bit

AWTM (Proposed) 55.76 53.16

Area Reduction (in %)

40
4×4 ETM [13] 53.85 49.60
Kulkarni [12] 35.75 36.3 30
Truncation [11] 43.44 43.08
AWTM (Proposed) 51.93 57.19 20

8×8 ETM [13] 50.02 39.25

10
Kulkarni [12] 22.03 41.5
Truncation [11] 47.90 15.30 0
AWTM (Proposed) 34.49 41.96 AWTM ETM Kulkarni Truncation
16 × 16 ETM [13] 30.27 31.49 Multipliers

Kulkarni [12] 17.89 31.8

Truncation [11] 15.17 9.69
Fig. 8. Area Reduction for various scalable approximate multipliers (Higher
is better)

B. Power and Area Analysis TABLE IV

RTL C OMPILER RESULTS OF 16 × 16 AWTM
We obtain power and area results for our 4 × 4, 8 × 8
and 16 × 16 AWTM designs. Table III shows these results Mode Area Leakage Dynamic Total
along with power and area results for the comparable designs (in %) Power(in %) Power(in %) Power (in %)
reported in the literature. Here, power and area reduction of 1 34.49 34.06 43.4 41.96
various approximate multipliers are computed with respect to 2 31.91 32.52 41.46 40.63
their accurate counterparts. Fig. 7 and Fig. 8 display the scal- 3 29.89 30.97 39.68 38.86
ability results from which it is evident that AWTM performs 4 27.90 29.43 37.90 37.10
better than all other corresponding multipliers reported in the
literature w.r.t power and area.
Since, we have not optimized the second stage of pipeline Key Observations: First as expected, leakage power and area
(i.e. the addition stage), the latency results were same for follow almost the same trends (even in the values of percentage
both accurate and approximate 16 × 16 pipelined multipliers. reduction). Second, the percentage power reduction goes as
As Wallace trees are very fast, the minimum clock period high as 41.96%. Therefore, we have larger power savings for
in RTL Synthesis was decided by the addition stage itself. applications that can tolerate relatively more error. Finally, for
Nevertheless, when latency of single cycle implementation of mode 4, net power saving obtained from RTL Compiler is
AWTM (complete multiplier, not just stage 1) was compared 37.10% which was estimated to be around 38% theoretically,
with that of single cycle accurate multiplier, it was found that thus confirming the validity of these results.
AWTM reduces the latency by 23.91%. Similarly for 8 × 8 Furthermore, Fig. 9 compares percentage reduction in power
multiplier, we achieved 32% reduction in latency, which was and area against the mean error involved in 16 × 16 product.
theoretically predicted to be around 40%. The figure clearly shows that with increase in error tolerance
A net reduction in total power and area of 57.19% and of application, power (dynamic as well as leakage) and area
51.93% respectively is also obtained for 8×8 AWTM. Both of savings also increase. In the next subsection, we evaluate our
these values were expected to be around 52% theoretically as multiplier on a real time application.
mentioned in the previous section. Therefore, the experimental
results confirm to the theoretical results to a large extent. The C. Real Time Application: DCT and iDCT
area and power reduction results of 16 × 16 AWTM are given We make use of the AWTM to demonstrate its effectiveness
in Table IV for various modes of operation. in computing DCT and iDCT of benchmark image ‘Lena’ We
new bit-width aware approximate multiplication algorithm is
also presented. The AWTM design is further empowered with
40
Percentage Reduction
a Carry-in Prediction logic and its efficient precomputation to
increase overall throughput. Our power-area efficient AWTM
35 is fast, particularly for operands size of 16-bit or more, and
optimized w.r.t. LPA design metrics. Single cycle implemen-
30 Area
tation of AWTM showed a 23.91% reduction in latency. We
Leakage Power
Dynamic Power
obtained the mean accuracy of 99.85% to 99.965% for 16-bit
25
Total Power
multiplication of different sized operands. We also achieved
0.0 1.0 2.0 3.0
Mean Error (in %)
4.0 significant reduction in power and area of our multiplier
design, up to 41.96% and 34.49% respectively for 16-bit
multiplication which clearly demonstrates efficiency and ef-
Fig. 9. Mean Error vs. Area and Power Reduction fectiveness of AWTM. Finally, we demonstrated that AWTM
produced images of almost the same quality as obtained by
Original Image Accurate Multiplier Approximate Mode 4 operations using accurate multipliers but with power and area
savings of around 39% and 30% respectively.
R EFERENCES
[1] “International technology roadmap for semiconductors,
http://www.itrs.net.”
[2] E. J. Swartzlander, “Truncated multiplication with approximate round-
(a) (b) (c) ing,” in Signals, Systems, and Computers, 1999. Conference Record of
the Thirty-Third Asilomar Conference on, vol. 2, oct. 1999, pp. 1480–
Approximate Mode 3 Approximate Mode 2 Approximate Mode 1 1483 vol.2.
[3] L. N. Chakrapani, K. K. Muntimadugu, L. Avinash, J. George, and
K. V.Palem, “Highly energy and performance efficient embedded com-
puting through approximately correct arithmetic: a mathematical foun-
dation and preliminary experimental validation,” CASES, pp. 187–196,
2008.
[4] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of
low-power high-speed truncation-error-tolerant adder and its application
(d) (e) (f) in digital signal processing,” Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, vol. 18, no. 8, pp. 1225–1229, aug. 2010.
[5] N. Zhu, W. L. Goh, G. Wang, and K. S. Yeo, “Enhanced low-power high-
speed adder for error-tolerant application,” in SoC Design Conference
Fig. 10. DCT and iDCT of image using accurate multiplier and AWTM
(ISOCC), 2010 International, nov. 2010, pp. 323–327.
[6] A. Verma, P. Brisk, and P. Ienne, “Variable latency speculative addition:
A new paradigm for arithmetic circuit design,” in Design, Automation
have used this application because it involves multiplication of and Test in Europe, 2008. DATE ’08, march 2008, pp. 1250–1255.
[7] A. Kahng and S. Kang, “Accuracy-configurable adder for approximate
floating point numbers. Floating point multiplication uses large arithmetic designs,” in Design Automation Conference (DAC), 2012 49th
unsigned multipliers, making them an ideal application area of ACM/EDAC/IEEE, june 2012, pp. 820–825.
AWTM. Fig. 10 shows our results on this image. The same [8] S. L. Lu, “Speeding up processing with approximation circuits,” Com-
puter, vol. 37, no. 3, pp. 67–73, mar 2004.
image is generated back when the DCT and iDCT operations [9] D. Shin and S. Gupta, “A re-design technique for datapath modules in
are performed on it. error tolerant applications,” in Asian Test Symposium, 2008. ATS ’08.
As expected, the results for mode 1 (Fig. 10(e)) and mode 2 17th, nov. 2008, pp. 431–437.
[10] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power
(Fig. 10(f)) are not good as they give a mean error of 5 − 6%. digital signal processing using approximate adders,” Computer-Aided
It means that this application can’t tolerate such a magnitude Design of Integrated Circuits and Systems, IEEE Transactions on,
of error. Obviously there are applications which can do, and vol. 32, no. 1, pp. 124–137, jan. 2013.
[11] M. B. Sullivan and E. E. Swartzlander, “Truncated error correction for
hence mode 1 and 2 can be easily employed there. On the flexible approximate multiplication,” in Signals, Systems and Computers
other hand, it is hardly possible to distinguish between results (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar
of accurate multiplier (Fig. 10(b)) and those of AWTM mode Conference on, 2012, pp. 355–359.
[12] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power
4 (Fig. 10(c)) and mode 3 (Fig. 10(d)). The multiplier used with an underdesigned multiplier architecture,” in VLSI Design (VLSI
to produce image in Fig. 10(c) saves 37% power and 28% Design), 2011 24th International Conference on, 2011, pp. 346–351.
area. The one used for Fig. 10(d) reduces power by 38.86% [13] K. Y. Kyaw, W.-L. Goh, and K.-S. Yeo, “Low-power high-speed multi-
plier for error-tolerant application,” in Electron Devices and Solid-State
and area by 29.89%. Therefore, using AWTM, we can save Circuits (EDSSC), 2010 IEEE International Conference of, 2010, pp.
up to 39% power and 30% area with negligible loss in image 1–4.
quality. [14] K. Bhardwaj and P. S. Mane, “Acma: Accuracy-configurable multiplier
architecture for error-resilient system-on-chip,” in Reconfigurable and
Communication-Centric Systems-on-Chip (ReCoSoC), 2013 8th Inter-
VIII. C ONCLUSION national Workshop on, July 2013, pp. 1–6.
We proposed a power and area-efficient Approximate Wal- [15] C. S. Wallace, “A suggestion for a fast multiplier,” in Electronic
Computers, 1964 IEEE Transactions on, 1964, pp. 14–17.
lace Tree Multiplier (AWTM) for error-resilient systems. A

Project Base Paper
No ratings yet
Project Base Paper
6 pages
Efficient Design of Majority-Logic-Based Approximate Arithmetic Circuits
No ratings yet
Efficient Design of Majority-Logic-Based Approximate Arithmetic Circuits
13 pages
Design of Approximate Circuits by Fabrication of
No ratings yet
Design of Approximate Circuits by Fabrication of
12 pages
Final
No ratings yet
Final
28 pages
IJONS - Yogeswari P
No ratings yet
IJONS - Yogeswari P
17 pages
Approximate Arithmetic Circuits A Survey Characterization and Recent Applications
No ratings yet
Approximate Arithmetic Circuits A Survey Characterization and Recent Applications
28 pages
Abed 2018
No ratings yet
Abed 2018
15 pages
Wordlengthresuction
No ratings yet
Wordlengthresuction
18 pages
Final
No ratings yet
Final
26 pages
FPGA-Based Multi-Level Approximate Multipliers For High-Performance Error-Resilient Applications
No ratings yet
FPGA-Based Multi-Level Approximate Multipliers For High-Performance Error-Resilient Applications
17 pages
ApproximateCompressor FinalforReview
No ratings yet
ApproximateCompressor FinalforReview
11 pages
Approximate Recursive Multipliers Using Low Power
No ratings yet
Approximate Recursive Multipliers Using Low Power
16 pages
Suresight: Service Manual
No ratings yet
Suresight: Service Manual
84 pages
Accepted Manuscript Journal of Circuits, Systems and Computers
No ratings yet
Accepted Manuscript Journal of Circuits, Systems and Computers
24 pages
Approximate Hybrid High Radix Encoding For Energy-Efficient Inexact Multipliers
No ratings yet
Approximate Hybrid High Radix Encoding For Energy-Efficient Inexact Multipliers
10 pages
Timeprovider 5000 Ieee 1588 Grand Master Clock: User'S Guide
No ratings yet
Timeprovider 5000 Ieee 1588 Grand Master Clock: User'S Guide
306 pages
Design of Roba Multiplier For High-Speed Yet Energy-Efficient Digital Signal Processing Using Verilog HDL
No ratings yet
Design of Roba Multiplier For High-Speed Yet Energy-Efficient Digital Signal Processing Using Verilog HDL
16 pages
Abed 2011
No ratings yet
Abed 2011
11 pages
2021 A Hybrid Radix-4 and Approximate Logarithmic Multiplier - Lotric
No ratings yet
2021 A Hybrid Radix-4 and Approximate Logarithmic Multiplier - Lotric
20 pages
A Two-Stage Operand Trimming Approximate
No ratings yet
A Two-Stage Operand Trimming Approximate
11 pages
Low Power Approximate Unsigned Multipliers With Configurable Error Recovery
No ratings yet
Low Power Approximate Unsigned Multipliers With Configurable Error Recovery
8 pages
370 ICCAD2014 HLS Adders ErrorAnalys & Control PDF
No ratings yet
370 ICCAD2014 HLS Adders ErrorAnalys & Control PDF
8 pages
Multiplexer Based Error Efficient Fixed-Width Adder Tree Design For Signal Processing Applications
No ratings yet
Multiplexer Based Error Efficient Fixed-Width Adder Tree Design For Signal Processing Applications
8 pages
Adaptive Area-Efficient Multiplier With Accuracy-Configurable Lookahead Multiplication
No ratings yet
Adaptive Area-Efficient Multiplier With Accuracy-Configurable Lookahead Multiplication
23 pages
On The Use of Low-Power Devices, Approximate Adders and Near-Threshold Operation For Energy-Efficient Multipliers
No ratings yet
On The Use of Low-Power Devices, Approximate Adders and Near-Threshold Operation For Energy-Efficient Multipliers
12 pages
Ijlbps 66006543d0393
No ratings yet
Ijlbps 66006543d0393
8 pages
Design of High Performance Dynamically Truncated A-1
No ratings yet
Design of High Performance Dynamically Truncated A-1
7 pages
ACMA: Accuracy-Configurable Multiplier Architecture For Error-Resilient System-on-Chip
No ratings yet
ACMA: Accuracy-Configurable Multiplier Architecture For Error-Resilient System-on-Chip
6 pages
Developing and Assessinginexact Multiplierarchitec
No ratings yet
Developing and Assessinginexact Multiplierarchitec
16 pages
IJME Vol 3 Iss 1 Paper 7 370 374
No ratings yet
IJME Vol 3 Iss 1 Paper 7 370 374
6 pages
Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation
No ratings yet
Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation
13 pages
JTAG TAP Controller
100% (1)
JTAG TAP Controller
5 pages
A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation
No ratings yet
A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation
15 pages
A Simple Yet Efficient Accuracy-Configurable Adder Design
No ratings yet
A Simple Yet Efficient Accuracy-Configurable Adder Design
14 pages
Camus Dac16
No ratings yet
Camus Dac16
6 pages
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
No ratings yet
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
8 pages
1 s2.0 S0141933119305976 Main
No ratings yet
1 s2.0 S0141933119305976 Main
8 pages
Kahng 2012
No ratings yet
Kahng 2012
6 pages
VLSIEE007
No ratings yet
VLSIEE007
6 pages
AxRMs Approximate Recursive Multipliers Using High-Performance Building Blocks
No ratings yet
AxRMs Approximate Recursive Multipliers Using High-Performance Building Blocks
7 pages
An Enhanced Approximate Multiplier Using Error Report Propagation Full Adders 111
No ratings yet
An Enhanced Approximate Multiplier Using Error Report Propagation Full Adders 111
6 pages
DLD Lab-10 Designing Magnitude Comparator & BCD Adder.: Objectives
100% (1)
DLD Lab-10 Designing Magnitude Comparator & BCD Adder.: Objectives
11 pages
Power-Area Efficient Computing Technique For Approximate Multiplier With Carry Prediction
No ratings yet
Power-Area Efficient Computing Technique For Approximate Multiplier With Carry Prediction
4 pages
9 .Efficient Design For Fixed Width Adder
No ratings yet
9 .Efficient Design For Fixed Width Adder
45 pages
Multiplier 6.10 CameraReady
No ratings yet
Multiplier 6.10 CameraReady
6 pages
Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation
No ratings yet
Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation
13 pages
Wallace BoothMultipliersFinal
No ratings yet
Wallace BoothMultipliersFinal
4 pages
Published Paper - High Speed Low Power Approximate Multipliers
No ratings yet
Published Paper - High Speed Low Power Approximate Multipliers
6 pages
Application-Specific Efficiently Approximated Adders and Multipliers Design and Its Metrics Evaluation - WOS
No ratings yet
Application-Specific Efficiently Approximated Adders and Multipliers Design and Its Metrics Evaluation - WOS
8 pages
Intro 2
No ratings yet
Intro 2
4 pages
T&F Format
No ratings yet
T&F Format
6 pages
Literature
No ratings yet
Literature
2 pages
Design and Analysis of Approximate Redundant Binary Multipliers
No ratings yet
Design and Analysis of Approximate Redundant Binary Multipliers
15 pages
Approximate Radix-8 Booth Multipliers For Low-Power and High-Performance Operation
No ratings yet
Approximate Radix-8 Booth Multipliers For Low-Power and High-Performance Operation
8 pages
PMC 2021
No ratings yet
PMC 2021
6 pages
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
No ratings yet
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
5 pages
FA35880883
No ratings yet
FA35880883
4 pages
Networks Labs
No ratings yet
Networks Labs
62 pages
Example of Multiplier
No ratings yet
Example of Multiplier
4 pages
Power Efficient Approximate Booth Multiplier
No ratings yet
Power Efficient Approximate Booth Multiplier
4 pages
Design and Analysis of Approximate Compressors For Multiplication
No ratings yet
Design and Analysis of Approximate Compressors For Multiplication
11 pages
Power and Area Efficient Approximate Multipliers
No ratings yet
Power and Area Efficient Approximate Multipliers
5 pages
Archer - c5 (13.7MB)
No ratings yet
Archer - c5 (13.7MB)
118 pages
A Low-Power, High-Performance Approximate Multiplier With Configurable Partial Error Recovery
No ratings yet
A Low-Power, High-Performance Approximate Multiplier With Configurable Partial Error Recovery
4 pages
NUMBackUp Buckup
No ratings yet
NUMBackUp Buckup
3 pages
Analog Testing 02
0% (1)
Analog Testing 02
39 pages
Unit 1 StorageTechnologies
No ratings yet
Unit 1 StorageTechnologies
48 pages
Cisco Catalyst Fixed-Configuration Switches
No ratings yet
Cisco Catalyst Fixed-Configuration Switches
17 pages
Intelligent Helmet For Coal Miners
No ratings yet
Intelligent Helmet For Coal Miners
17 pages
Simplex Install Instructions 4100 1290 24 Point Graphic IO Module
No ratings yet
Simplex Install Instructions 4100 1290 24 Point Graphic IO Module
28 pages
ABAS (Aircraft Based Augmentation Systems) : 1.1 - RAIM (Receiver Autonomous Integrity Monitoring
No ratings yet
ABAS (Aircraft Based Augmentation Systems) : 1.1 - RAIM (Receiver Autonomous Integrity Monitoring
7 pages
Mis Srs Doc Iit Bom PDF
No ratings yet
Mis Srs Doc Iit Bom PDF
18 pages
Mini Project Report Group 30
No ratings yet
Mini Project Report Group 30
17 pages
ExternalEncoder Extrapolation DOKU V10 en
No ratings yet
ExternalEncoder Extrapolation DOKU V10 en
24 pages
F18 Datasheet 202312
No ratings yet
F18 Datasheet 202312
2 pages
Libiio
No ratings yet
Libiio
44 pages
Lab 1 Ip PPP Pos v3.2
No ratings yet
Lab 1 Ip PPP Pos v3.2
50 pages
JNCIA JunOS Lab Guide Bootcamp
No ratings yet
JNCIA JunOS Lab Guide Bootcamp
33 pages
CS 1101-01 Unit 5
No ratings yet
CS 1101-01 Unit 5
3 pages
L100 - L190 SpecSheet
No ratings yet
L100 - L190 SpecSheet
4 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
11 pages
DH PFS3110 8et 96 - New
No ratings yet
DH PFS3110 8et 96 - New
1 page
The Comparison of Microservice and Monolithic Architecture
No ratings yet
The Comparison of Microservice and Monolithic Architecture
4 pages
Tipos de Alarmas Siclock TM Siemens
No ratings yet
Tipos de Alarmas Siclock TM Siemens
10 pages
DTMF Receiver: Package Dimensions
No ratings yet
DTMF Receiver: Package Dimensions
8 pages
Mtech 1 Sem All Power Conversion Devices and Drives P1eebc03 2020
No ratings yet
Mtech 1 Sem All Power Conversion Devices and Drives P1eebc03 2020
2 pages
CPU Diagram
No ratings yet
CPU Diagram
1 page
Aditi Resume
No ratings yet
Aditi Resume
3 pages
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
From Everand
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
Analog Dialogue
No ratings yet

Approximate Wallace Tree Multiplier

Uploaded by

Approximate Wallace Tree Multiplier

Uploaded by

Power- and Area-Efficient Approximate Wallace

Tree Multiplier for Error-Resilient Systems

a low probability of getting correct sum and increases area AH XL

overhead. Shin et al. [9] reduce data-path delay and re-design AL XH

C=1 11011011x 11011011x AB 1 1 1 1 AB 1 1 1 1

Now, let us evaluate one approximate product out of 𝐴𝐻 𝑋𝐿 ,

Algorithm 1 Approximate Partial Product Evaluation

Acceptance Probability (in %)

Power Reduction (in %)

Fig. 7. Power Reduction plot for Scalability of approximate multipliers

AWTM (Proposed) 55.76 53.16

Area Reduction (in %)

8×8 ETM [13] 50.02 39.25

Kulkarni [12] 17.89 31.8

B. Power and Area Analysis TABLE IV

You might also like