0% found this document useful (0 votes)
6 views

Example of Multiplier

This paper presents an FPGA implementation of an approximate multiplier that utilizes inexact adder circuits to achieve energy-efficient data processing. The proposed multiplier demonstrates a power consumption reduction of up to 17.39% and a delay improvement of 13.49%, with an acceptable accuracy loss of less than 5%. The study highlights the potential of approximate computing in reducing power consumption while maintaining sufficient accuracy for applications in embedded systems.

Uploaded by

atharva3010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Example of Multiplier

This paper presents an FPGA implementation of an approximate multiplier that utilizes inexact adder circuits to achieve energy-efficient data processing. The proposed multiplier demonstrates a power consumption reduction of up to 17.39% and a delay improvement of 13.49%, with an acceptable accuracy loss of less than 5%. The study highlights the potential of approximate computing in reducing power consumption while maintaining sufficient accuracy for applications in embedded systems.

Uploaded by

atharva3010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2017 First New Generation of CAS

Approximate Multipliers based on Inexact Adders for


Energy Efficient Data Processing

Mario Ostaa,b, Ali Ibrahima, Maurizio Vallea Hussein Chibleb


DITEN, COSMIC Lab EDST, MECRL Lab
a
University of Genova b
Lebanese University
Genova, Italy Beirut, Lebanon

Abstract— Approximate computing circuits are considered as consumption. This goal stems from the need to reduce the
a promising solution to reduce the power consumption in complexity of the singular value decomposition
embedded data processing. This paper proposes an FPGA implementation proposed in [7] for tactile data processing.
implementation for an approximate multiplier based on inexact
adder circuits. The performance of the proposed multiplier is The rest of this paper is organized as follow: Section II
evaluated by comparing the power consumption, the accuracy of reviews the existing approximate multipliers in literature. In
computation, and the time delay with those of an approximate Section III, the architecture of the proposed approximate
multiplier based on exact adder presented in literature. Results multiplier is described. Section IV analyzes and evaluates the
reports a power saving up to 17.39% with an improvement in simulation results of the proposed multiplier in terms of
time delay by 13.49%, at cost of less than 5% of accuracy loss. accuracy and power consumption. Finally, conclusion is
reported in section V.
Keywords—Digital Multipliers; Approximate Computing;
Accuracy; Low Power Consumption; Error Tolerant. II. PRIOR WORKS
Efficient implementations of approximate multipliers
I. INTRODUCTION based on different approaches have been recently reported in
Low power consumption has become the most important literature. Kulkarni et al. [8] predicted the least significant
design goal in a wide range of electronic systems especially columns of the partial product as a constant by using a
when dealing with smart self-powered sensing systems for truncated multiplication method. They presented a simplified
application domains such as Internet of Things (IoT), Wearable inaccurate 2 × 2 multiplier cell in order to be used as the basic
Devices and Robotics. The ever-increasing demand for higher block for constructing larger multiplier architectures. The
computing power represents a driving force toward ultra-low power consumption has been reduced by an average of 31.78%
power design strategies. Seeking to improve the energy - 45.4% comparing to previous accurate multiplier designs,
efficiency, designers have turned to optimization methods in with an average error of 1.39% - 3.32%. Two approximate 4:2
several ways from system level down to transistor device level. compressors have been proposed in [9] providing efficient
reductions in power consumption, hardware resources and
In recent years, approximate computing has appeared as an
delay with respect to exact designs. Authors in [10] proposed
effective approach to improve energy efficiency [1], [2].
an approximate multiplier design with an error distribution
Usually approximate results are sufficient for many
reducing the propagation delay and improving the energy
applications such as tactile data processing [3], image
efficiency. Recently, a high-speed energy efficient multiplier
processing, and data mining. Thus, it is highly recommended
to take advantage of energy reduction with a minimal variation (RoBA) based on rounding of the inputs in the form of 2 has
in performance [4]. Recently, approximations have been been proposed in [11]. This approach dramatically improved
adopted in computing units of the embedded systems, the speed and the energy consumption (up to 65%) since the
especially for graphics processing units (GPUs) and field- computational intensive part of the multiplication was omitted.
programmable arrays (FPGAs) [5]. Computing units e.g. In this paper, we present the FPGA implementation of two
embedded digital signal processing (DSP) systems are approximate multipliers: 1) the first is adopted from [11] since
considered as key components of modern electronic embedded it provides high power reduction compared with exact
devices [6]. Among the arithmetic DSP operations, the multipliers, and 2) the second proposes new architecture
multiplication block has been always considered as a complex modifying the first one by employing an inexact adder circuit
block increasing the complexity of the DSP systems. in place of the accurate one. Based on FPGA implementation
Therefore, decreasing the complexity of multipliers may results, the performance of the proposed architecture is
reduce the power consumption of the overall system. In this evaluated showing a good improvement in terms of power
perspective, the proposed work uses the approximate consumption and computation delay.
computing techniques for the arithmetic units i.e. adders and III. PROPOSED APPROXIMATE MULTIPLIER
multipliers taking advantage of power consumption reduction.
The main goal is to implement an efficient hardware The proposed architecture has been adopted from [11]: it is
architecture of an approximate multiplier providing low power based on rounding signed and unsigned numbers to the form of
2 . The main idea is to make use of an approximate adder in

978-1-5090-6447-2/17 $31.00 © 2017 IEEE 125


DOI 10.1109/NGCAS.2017.41
place of the exact one in order to reduce the power
consumption. The multiplication of the two inputs values M
and N is written as follow:
M × N = (Mr – M) × (Nr – N) + Mr × N + Nr × M – Mr × Nr (1)

This equation is simplified by eliminating the first part i.e. (Mr


– M) × (Nr – N) thus the operation is performed using only
add/shift operations. The block diagram of the proposed
multiplier is presented in Fig. 1. Different blocks of the
architecture are described as follow:
1) Sign extractor:
Fig.2. ETA adder block diagram.
The sign extractor block extracts the sign of the input values
and gives as output their absolute value. It detects the sign bit 0 or different, then a normal addition is computed, otherwise
(most significant bit) of the input represented in two’s if the input bits are equal to 1, then all the remaining right side
complement format. Then, it reverses the input in the case of bits are set to 1.
negative values, and keeps it unchanged for the positive ones. The proposed architecture uses an exact subtractor which
2) Round/ shift: generates the difference between two bits adopting the borrow
bit of the lower significant stage.
This block applies the rounding on the absolute values to
find the nearest values of the inputs. 4) Sign set:
Therefore, the output values are extracted in the form of 2n The main function of the sign set block is to set the sign of
following the rounding process. Along the process, each the final multiplication result. It reverses the output of the
rounded bit could be equal to one in the following cases: subtractor when the extracted sign (from sign extractor) for the
x When the two right-side bits of the input bit M[i] are two input values is different.
one, M[i] and all the bits on its left-side are zero.
IV. HARDWARE IMPLEMENTATION
x When the right side bit of M[i] and all its left-side bits
are zero while M[i] is one. In order to evaluate the performance of the proposed
multiplier, two different architectures have been implemented
Since the rounded values are represented in the form of 2 n, and compared. The first architecture (MRCA) uses the RCA
the products  ×  ,  ×  and  × M are simply exact adder as an addition unit, while the second one (META)
obtained through a barrel shifter. The products of n bit width uses an inexact ETA adder as described in section III. The
circuits have been implemented in Vivado Design Suite 2017.1
are shifted based on  or  depending on the operand using VHDL Hardware Description Language. The designs
M or N respectively. The output bit widths generated from the have been synthesized using Xilinx Vivado synthesizer, with
shifter block are 2n. Virtex-7 xc7vx485tffg1157-1 as target device. Based on the
3) ETA adder: implementation results, this section analyzes the performance
Fig. 2 shows the functional block diagram of the ETA parameters of the proposed architectures highlighting the
accuracy of computation and the power consumption.
adder; it consists of three different components: Control
Block, Carry-free Adders and exact RCA adders. The idea of A. Accuracy
the ETA is based on providing inaccurate values in the lower Some tests have been carried out to assess the accuracy of
order bits while maintaining the accuracy in the higher order the multipliers. The variation of the acceptance probability as
bits using the RCA adder [12]. In the inaccurate part, from left a function of the minimum acceptable accuracy is analyzed
to right the values of both input bits are checked: if the bits are [12]. The inaccuracy of the approximate multipliers is

Fig.1. Block diagram of the proposed approximate multiplier.

126
generated after eliminating the term ( – M) × ( – ) from
the initial accurate multiplication. Accurate results are obtained
only when  and  are respectively equal to 2n and 2m. In
case, both inputs are equal to 3 × 2n and 3 × 2m respectively,
the error will be maximum. Terms used are explained as
follows:
x Error (E): E = |Re – Ri|, where Re is the exact
multiplication result, and Ri is the inexact result
obtained by the approximate multiplier simulation.
x Accuracy (ACC): ACC = (1 – E/Re) × 100. To
determine how accurate the output of the multiplier is
with respect to the exact multiplication. The values
could be between 0% and 100%.
x Minimum Acceptable Accuracy (MAA): it is
considered as the threshold value; to respect the
constraints of the system, the obtained accuracy must
be higher than this threshold value. Fig. 4. Comparison of META for different bit sizes.

x Probability of acceptance (PA): it is the probability of


values whose accuracy are higher than MAA, which is
represented as PA = P(ACC > MAA). Its value ranges
from 0 to 1.
The variation of PA with respect to MAA has been analyzed:
Fig. 3 illustrates the comparison between META and MRCA 8-
bit multipliers, and Fig. 4 shows three META multipliers with
8-, 16-, 24-, and 32-bit input bit width. We randomly selected
100 signed and unsigned input. Upon simulation, we observe
that 89% of the inputs have an accuray more than 90% for
META multiplier, while achieving an accuracy of 94.5% for
MRCA multiplier [11]. On the other hand, results show that the
accuracy and the probability of acceptances increases as the bit
length increases. Despite the small varition of accuary between Fig. 5. Error percentage distribution for the two approximate multipliers.
both multipliers (4.5%), the proposed multipier still has an
acceptable and reasonable accuray. B. Power consumption
Fig. 5 represents the error percentage distribution for META Table I reports the power consumption obtained by the
and MRCA. It is shown that 43% of the inputs have an error simulation of the implemented designs. A test has been run for
less than 2%, and about 31% of the inputs with an error 3μs to determine the average dynamic power and the time
between 2% to 5%. Less than 14% of the inputs are distributed delay. Results show that the proposed 8-bit META decreases
with an error value more than 8% which demonstrates the the power consumption by 17.39% and the delay by 13.49%
correctness of the proposed architecture. when compared to MRCA multiplier [11]. Moreover, another
test has been carried out over 1ms to analyze the performance
of META in terms of power consumption and time delay when
increasing the size of the multiplier. Fig. 6 shows the variations
of the power consumption and time delay in function of META
multiplier size. It is shown that the power consumption and
delay increase when the size of META becomes larger.
On the other hand, since the standard deviation of the power
is high for a wide range of inputs, we selected randomly five

TABLE I. SIMULATION RESULTS

Parameters
Approximate
Power Delay
Multipliers LUT
(mW) (ns)
MRCA 8-bit 69 9.71 0.04%

META 8-bits 57 8.4 0.05%

Fig. 3. Probability of acceptance versus minimum acceptable accuracy.

127
between accuracy and power saving for improving
performance and energy efficiency. As a conclusion,
implementation results have demonstrated that this new
design can be integrated into FPGA’s applications, especially
for digital signal processing (DSP). Future works will consist
on using the proposed architecture for the singular value
decomposition to reduce the power consumption of the overall
system for embedded tactile data decoding [3].
REFERENCES
[1] S. Mittal, 2016. A survey of techniques for approximate computing.
ACM Computing Surveys (CSUR), 48(4), p.62.
[2] J. Han and M. Orshansky, "Approximate computing: An emerging
paradigm for energy-efficient design," 2013 18th IEEE European Test
Symposium (ETS), Avignon, 2013, pp. 1-6.
doi: 10.1109/ETS.2013.6569370
Fig. 6. Variation of power consumption and delay with the size of META [3] A. Ibrahim, P. Gastaldo, H. Chible, and M. Valle, 2017. Real-Time
multiplier. Digital Signal Processing Based on FPGAs for Electronic Skin
Implementation. Sensors, 17(3), p.558.
[4] D. Mohapatra, G. Karakonstantis, and Roy, K., 2009, August.
Significance driven computation: a voltage-scalable, variation-aware,
quality-tuning motion estimator. In Proceedings of the 2009 ACM/IEEE
international symposium on Low power electronics and design (pp. 195-
200). ACM.
[5] L. Sekanina, "Introduction to approximate computing: Embedded
tutorial," 2016 IEEE 19th International Symposium on Design and
Diagnostics of Electronic Circuits & Systems (DDECS), Kosice, 2016,
pp.1-6.doi: 10.1109/DDECS.2016.7482460
[6] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie and C. Lucas, "Bio-Inspired
Imprecise Computational Blocks for Efficient VLSI Implementation of
Soft-Computing Applications," in IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 57, no. 4, pp. 850-862, April 2010.
[7] A. Ibrahim, M. Valle, L. Noli and H. Chible, "Assessment of FPGA
Implementations of One Sided Jacobi Algorithm for Singular Value
Decomposition," 2015 IEEE Computer Society Annual Symposium on
Fig. 7. Instantaneous dynamic power comparison of different selected
VLSI, Montpellier, 2015, pp. 56-61.
inputs.
[8] P. Kulkarni, P. Gupta and M. Ercegovac, "Trading accuracy for power
inputs as an example to assess the instantaneous power in a multiplier architecture." Journal of Low Power Electronics 7.4
(2011): 490-501.
consumption of the 8-bit META and MRCA multipliers. Each
[9] A. Momeni, J. Han, P. Montuschi and F. Lombardi, "Design and
input has been simulated for a period of 20ns to determine its Analysis of Approximate Compressors for Multiplication," in IEEE
instantaneous dynamic power. The comparison of the obtained Transactions on Computers, vol. 64, no. 4, pp. 984-994, April 2015.
results presented in Fig. 7 indicates an impressive saving in doi: 10.1109/TC.2014.2308214
dynamic power from 9.21% up to 50%. For instance, for the [10] S. Hashemi, R. I. Bahar and S. Reda, "DRUM: A Dynamic Range
product 1F × 7c the power drops from 82mW to 41mW while Unbiased Multiplier for approximate applications," 2015 IEEE/ACM
the accuracy of the results remains approximately unchanged. International Conference on Computer-Aided Design (ICCAD), Austin,
TX,2015,pp.418-425.doi: 10.1109/ICCAD.2015.7372600
V. CONCLUSION [11] R. Zendegani, M. Kamal, M. Bahadori, A. Afzali-Kusha and M.
Pedram, "RoBA Multiplier: A Rounding-Based Approximate Multiplier
In this paper, an FPGA implementation for a new for High-Speed yet Energy-Efficient Digital Signal Processing," in
approximate multiplier circuit called META, has been IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol.25,no.2,pp.393-401,Feb.2017.
proposed. The new architecture provided a noticeable
[12] N.Zhu, L. Goh, W. Zhang, K. S. Yeo and Z. H. Kong, "Design of Low-
improvement in latency and power consumption at the price Power High-Speed Truncation-Error-Tolerant Adder and Its
of a small error which is acceptable for our application [13], Application in Digital Signal Processing," in IEEE Transactions on Very
[14]. Two hardware implementations of the approximate Large Scale Integration (VLSI) Systems, vol. 18, no. 8, pp. 1225-1229,
multiplier were compared: the first one employs an exact Aug.2010.doi: 10.1109/TVLSI.2009.2020591
adder while the second one is based on inexact adder. The [13] M. Franceschi; L. Seminara; S. Dosen; M. Strbac; M. Valle; D. Farina,
"A system for electrotactile feedback using electronic skin and flexible
results revealed that the accuracy of the META multiplier matrix electrodes: Experimental evaluation," in IEEE Transactions on
decreased slightly around 4.5% which is considered an Haptics,vol.PP,no.99,pp.1-1doi: 10.1109/TOH.2016.2618377.
acceptable variation, while the power consumption and the [14] M. Franceschi, L. Seminara, L. Pinna, M. Valle, A. Ibrahim and S.
delay have been reduced respectively by 17.39% and 13.49%. Dosen, "Towards the integration of e-skin into prosthetic devices," 2016
12th Conference on Ph.D. Research in Microelectronics and Electronics
Therefore, the proposed architecture provides a tradeoff (PRIME), Lisbon, 2016, pp. 1-4.

128

You might also like