0% found this document useful (0 votes)
24 views

Low-Power Compressor-Based Approximate Multipliers With Error Correcting Module

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Low-Power Compressor-Based Approximate Multipliers With Error Correcting Module

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IEEE EMBEDDED SYSTEMS LETTERS, VOL. 14, NO.

2, JUNE 2022 59

Low-Power Compressor-Based Approximate


Multipliers With Error Correcting Module
U. Anil Kumar , Sumit K. Chatterjee, and Syed Ershad Ahmed

Abstract—This letter proposes an unsigned approximate TABLE I


multiplier architecture segmented into three portions: the least C OMPARISON OF VARIOUS M ULTIPLIER D ESIGNS
significant portion that contributes least to the partial prod-
uct (PP) is replaced with a new constant compensation term to
improve hardware savings without sacrificing accuracy. The PPs
in the middle portion are simplified using a new 4:2 approximate
compressor, and the error due to approximation is compensated
using a simple yet efficient error correction module. The most sig-
nificant portion of the multiplier is implemented using exact logic
as approximating it will results in a large error. Experimental
results of 8-bit multiplier show that the power and power-delay The remainder of this letter is organized as follows. The pro-
products are reduced up to 47.7% and 55.2%, respectively, in posedmultiplier architecture with a 4:2 approximate compressor
comparison with the exact design and 36.9% and 39.5%, respec- and an error correction module is presented in Section III. In
tively, in comparison with the existing designs without significant Section IV, exhaustive error and hardware analysis of existing
compromise on accuracy. and proposed multipliers are carried out. Finally, the proposed
Index Terms—Approximate computing, approximate method is evaluated using image processing applications in
multiplier, compressor, partial product reduction (PPR). Section V, while conclusions are drawn in Section VI.

I. I NTRODUCTION
II. R ELATED W ORK
VER the past two decades, the semiconductor industry
O has seen an explosive growth in integrating sophisti-
cated multimedia applications into portable electronic gadgets.
Ansari et al. [12] proposed two approximate 4*4 submul-
tipliers using AND-OR encoding of partial products (PPs)
and used probability-based approximate 4-2 compressors for
However, this comes at the cost of higher power dissipation,
accumulation. These submultipliers are then utilized to design
which results in lowering the devices’ lifespan and reliabil-
higher order multipliers. Waris et al. [13] implemented a recur-
ity [1]. Most media processing applications, such as digital
sive multiplier using partial-product-based building blocks.
and image processing [2], are allowed an acceptable range
Further, high-performance approximate half and full adder
of accuracy loss due to the inherent limitation of the human
cells are proposed for use in a 4 × 4 multiplier, which in
sensory system.
turn are used for designing larger multipliers.
Multiplication is the basic operation in media processing
Momeni et al. [3] proposed two approximate 4:2 copres-
applications [2] and is carried out in three steps: 1) partial prod-
sors to obtain low area and power, however, suffers from
uct generation (PPG); 2) partial product reduction (PPR); and
accuracy. Yang et al. [4] implemented a new high-accuracy
3) final accumulation. Since these applications are inherently
approximate compressor in a multiplier architecture; however,
error-tolerant, various approximate multiplier architectures have
suffers from power dissipation. Ha and Lee [5] modified the
been proposed in [3]–[8] to obtain the hardware savings.
Yang compressor and used in the approximate region of the
The main contributions of this letter are as follows.
PPR structure. To minimize the compressor’s error, they used
1) A new approximate 4:2 compressor and constant cor-
an error correction term from level 1 to level 2 in the PPR
rection term are proposed that reduces the hardware
structure. Two types of multiplier designs using approximate
utilization at the PPR stage in a multiplier.
arithmetic modules were proposed by Venkatachalam and Ko [6]
2) The error in the multiplier is reduced using a simple yet
resulting in improvement in speed and power consumption,
efficient error correction circuit.
but these designs suffer from low accuracy. Another variant
3) We parametrized the proposed multiplier by varying the
of compressor-based multiplier design of high accuracy was
number of columns in the approximate region for 8- and
proposed by Yi et al. [7]. To improve hardware metrics, such as
16-bit multiplier. Consequently, we obtained variants in
area and power, Pei et al. [8] presented four different multiplier
the proposed design that are optimized for accuracy and
designs using three approximate compressors along with an
hardware utilization.
error correcting module. Many works in the literature have
Manuscript received March 29, 2021; revised June 23, 2021; accepted proposed multiplier architectures, most of them do not use the
September 8, 2021. Date of publication September 16, 2021; date of current error correcting module and none of the works used constant
version May 19, 2022. This manuscript was recommended for publication by correction term at the LSB region, as summarized in Table I.
S. Venkatramani. (Corresponding author: U. Anil Kumar.)
The authors are with the Department of Electrical and Electronics
Engineering, BITS Pilani (Hyderabad Campus), Hyderabad 500078, India III. P ROPOSED U NSIGNED A PPROXIMATE M ULTIPLIER
(e-mail: [email protected]; [email protected]
pilani.ac.in; [email protected]). Applications, such as computer vision, discrete cosine
Digital Object Identifier 10.1109/LES.2021.3113005 transform (DCT) in high-efficiency video coding (HEVC)
1943-0671 
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.
60 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 14, NO. 2, JUNE 2022

Fig. 2. Proposed approximate 4:2 compressor.


TABLE II
T RUTH TABLE OF THE P ROPOSED A PPROXIMATE C OMPRESSOR

Fig. 1. Proposed PPs reduction structure using with error correction logic.

etc., require high accuracy. It is well known that multiple


modules/stages are needed to implement these applications.
Consequently, the output of one module/stage serves as an
input to the subsequent module. A multiplier is an ubiqui-
tous operation in these applications and forms one of the
stages. If the multiplier accuracy is low, then it affects the
subsequent stages. So in this work, we proposed high accurate
approximate multipliers.
Carry = G2 G1 + G3 .G1 (2)
A. Proposed Partial Product Reduction
Fig. 1 presents the proposed PPR structure comprising of where
PPs, generated using AND gates, denoted using solid dots G1 = Q1 ⊕ Q2
in an 8*8 multiplier. The PPR structure is divided into three G2 = Q1 Q2
regions: 1) least significant portion (LSP) region; 2) mid-
G3 = Q3 + Q4 .
dle significant portion (MSP); and 3) accurate region. The
LSP region comprising four PP columns contributes least to From Table II, it is evident that the proposed compressor
the final product and can be truncated to improve hardware produces an error for four input cases namely “0011,” “0111,”
savings; however, it results in loss of accuracy. To mitigate “1011,” and “1111.” As shown in the last column of Table II
this, in the proposed design, we replace the LSP with con- proposed approximate compressor is designed with an error
stant correction logic. Based on the idea presented in [11], a difference of −1. From the table, it is evident that the proba-
constant correction term is computed using the average value bility of getting error for the proposed approximate compressor
using all possible input combinations in the LSP region. The is 16/256, i.e., 0.0625.
compensation term thus obtained is (6)10 .
The PP computation in the next four columns, which we C. Error Correction Module
refer to as MSP, is carried out using half adder and 4:2 approx- As mentioned earlier proposed compressor generate an error
imate compressors. In the MSP region, exact 4:2 compressors for four cases with an error distance of ‘−1’ and these errors
tend to occupy more area and hence consume more hardware. are generated when the input bits Q3 and Q4 are ‘1’. To miti-
To mitigate this, we propose a new 4:2 approximate compres- gate this problem, AND logic gate is used as an error recovery
sor in levels 1 and 2 with an intention to improve the hardware module for which Q3 and Q4 bits act as inputs.
utilization without incurring large errors. Further, improvement As shown in Fig.1, level 1, two AND logic gates are used in
in accuracy is achieved using a new simple, yet efficient error the MSP of the approximate region for generating two error
correction technique deployed at levels 1 and 2. Since the PPs correction modules. These terms act as a carry-in to exact
in accurate region contributes most to the final product, they 4:2 compressors in the same level 1 which tend to improve
are compressed using exact adder modules. The PP columns the accuracy of the multiplier. Similarly, in level 2, one AND
from the accurate and MSP regions are reduced to two rows, logic gate is deployed in the MSP of the approximate region
and these rows are accumulated to the final product using a to generate one correction term. This term is used as a carry-in
ripple carry adder (RCA). of exact 4:2 compressor in the same level 2.
Extending this concept to a 16-bit approximate multiplier
B. Proposed Approximate 4:2 Compressor would require seven error correction modules in the approxi-
An exact 4:2 compressor is implemented using two full mate portion. Since the error correction module is only used
adders [3]. This compressor comprises of five inputs (Q1 , Q2 , in the approximate portion, the hardware overhead is small.
Q3 , Q4 , and Cin ) and three outputs (Sum, Carry, and C1 ). The By varying the number of columns in the approximate
output C1 act as a carry-in (Cin ) for the next column’s exact region of the proposed multiplier, variants of 8-bit and 16-bit
compressor. Though the C1 value’s weight is more, the prob- multipliers are achieved. The proposed multiplier is denoted by
ability of occurrence is “1” out of 16 cases, i.e., when all the P(N, M), where N denotes the multiplier size while M denotes
inputs are 1. Hence, C1 and Cin are neglected. The truth table the number of columns approximated.
of the proposed approximate compressor is shown in Table II.
The logic diagram of the proposed approximate compressor IV. E XPERIMENTAL R ESULTS
is shown in Fig. 2, and the corresponding logical expressions A. Error Analysis
are represented by the following equations:
Exhaustive error analysis on various multiplier architec-
Sum = G1 ⊕ G3 (1) tures, including the proposed designs, was carried out using
Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.
KUMAR et al.: LOW-POWER COMPRESSOR-BASED APPROXIMATE MULTIPLIERS WITH ERROR CORRECTING MODULE 61

TABLE III TABLE V


E RROR A NALYSIS OF VARIOUS 8-B IT A PPROXIMATE M ULTIPLIERS S YNTHESIS R ESULTS OF VARIOUS A PPROXIMATE C OMPRESSORS

TABLE VI
TABLE IV S YNTHESIS R ESULTS OF VARIOUS 8-B IT A PPROXIMATE M ULTIPLIERS
E RROR A NALYSIS OF VARIOUS 16-B IT A PPROXIMATE M ULTIPLIERS

65 536 and 1 million (random) cases for 8-bit and 16-bit TABLE VII
multiplier, respectively, and computed results are tabulated in S YNTHESIS R ESULTS OF VARIOUS 16-B IT A PPROXIMATE M ULTIPLIERS
Tables III and IV, respectively. Metrics, such as error dis-
tance (ED), normalized mean ED (NMED), mean relative ED
(MRED), and worst case relative ED [2], are computed to
quantify the approximate designs.
It can be observed from Table III that the P(8, 2) and P(8, 4)
designs are more accurate since they have better NMED
and MRED than the existing designs except for Ax8-1. The
improvement in accuracy in the P(8, 2) and P(8, 4) is due
to the constant correction scheme, approximate compressor,
and error correction module used in the PPR structure. Design
P(8, 6) has moderate NMED and MRED but consumes less
power since more columns being approximated in the approx-
imate region. Similar results can be observed in Table IV
corresponding to the 16-bit multiplier. From Table IV, design
P(16, 4) is more accurate than the existing designs.
(a) (b) (c) (d) (e)
B. Synthesis Results
For the sake of fair analysis, all the existing [3]–[8] and the
proposed designs are modeled using Verilog programming lan-
guage. Hardware synthesis of approximate multiplier designs
(f) (g) (h) (i) (j)
has been carried out at TSMC 180 nm process node (slow-
normal library) using Cadence RTL compiler v7.1 to compute
area, delay, and power.
From Table V, it is evident that the proposed compres-
sor consumes less power and area compared to the existing
compressors except for MUL2 and Momeni due to its simple (k) (l) (m) (n) (o)
structure. Though designs [3], [8] have less power consump-
tion and area, they suffer from accuracy. Fig. 3. (a)–(o) Images sharpened using exact, proposed, and existing
multipliers.
Similarly, area, power, and delay are calculated for the
proposed and existing multipliers and tabulated in Tables VI
and VII. It is evident from Table VI that P(8, 4) and P(8, 6) savings with better accuracy compared to the existing designs.
are faster than the existing designs except for Ax8-3, M3, and Similar results can be observed in Table VII corresponding to
M4. Design P(8, 6) achieves the lowest power consumption 16-bit multiplier.
compared to all the existing designs and has a percentage It is evident from Table VI that the design P(8, 6) has
improvement in power up to 47.7% and 36.9% compared to less PDP compared to all existing multipliers. The per-
exact and existing designs. The design P(8, 4) obtains lower centage improvement of design P(8, 6) in PDP is up to
power consumption than the existing multipliers except for 55.2% and 39.5% compared to exact and existing designs,
M4 and Momeni. The design P(8, 2) achieves moderate power respectively, for 8-bit design. Similarly, from Table VII,
Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.
62 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 14, NO. 2, JUNE 2022

TABLE VIII
PSNR OF D IFFERENT I MAGES U SING VARIOUS M ULTIPLIER D ESIGNS

power-delay-product (PDP). From Fig.4 (a) and (b), it is evi-


dent that design P(8, 4) has lower PDP with better MRED and
NMED than the existing designs. Similarly, from Fig.4(c), it can
be concluded that P(8, 4) achieves lower PDP and better PSNR
compared to existing designs. Finally, it can be concluded that
P(8, 4) performs better in terms of PSNR, MRED, and NMED
that too with less PDP compared to existing designs.

(a) VI. C ONCLUSION


This letter presents a multiplier architecture that reduces
computational complexity by replacing the LSP with constant
compensation term. Further, savings in hardware are achieved
using a new 4:2 approximate compressor, and error due to
approximation is reduced using a simple error recovery circuit.
Comprehensive error analysis and synthesis results show the
efficacy of the proposed designs compared to the existing
designs. Experimental results show that power and PDP are
(b)
reduced up to 47.7% and 55.2%, respectively, in comparison
with the exact design and 36.9% and 39.5%, respectively, with
the existing designs. Toward the end, the impact of the proposed
design on the image processing applications are investigated.

R EFERENCES
[1] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold Design
for Ultra Low-Power Systems, vol. 95. New York, NY, USA: Springer,
2006.
[2] H. Jiang, C. Liu, L. Liu, F. Lombardi, and J. Han, “A review, classifi-
cation, and comparative evaluation of approximate arithmetic circuits,”
(c) ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 4, p. 60, 2017.
[3] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and
analysis of approximate compressors for multiplication,” IEEE Trans.
Fig. 4. Hardware and accuracy comparison on various multipliers. (a) PDP Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
and MRED comparison. (b) PDP and NMED comparison. (c) PDP and PSNR [4] Z. Yang, J. Han, and F. Lombardi, “Approximate compressors for error-
comparison. resilient multiplier design,” in Proc. IEEE DFTS, Amherst, MA, USA,
2015, pp. 183–186.
[5] M. Ha and S. Lee, “Multipliers with approximate 4–2 compressors and
design P(16, 12) has less PDP compared to the existing error recovery modules,” IEEE Embedded Syst. Lett., vol. 10, no. 1,
designs. pp. 6–9, Mar. 2018.
[6] S. Venkatachalam and S.-B. Ko, “Design of power and area efficient
approximate multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 25, no. 5, pp. 1782–1786, May 2017.
V. B ENCHMARKING U SING I MAGE P ROCESSING [7] X. Yi, H. Pei, Z. Zhang, H. Zhou, and Y. He, “Design of an energy-
A PPLICATIONS efficient approximate compressor for error-resilient multiplications,” in
Proc. IEEE ISCAS, Sapporo, Japan, 2019, pp. 1–5.
The proposed multiplier efficacy is validated using JPEG [8] H. Pei, X. Yi, H. Zhou, and Y. He, “Design of ultra-low power con-
image compression (JIC) [9], image multiplication [13], and sumption approximate 4–2 compressors based on the compensation
characteristic,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 1,
image sharpening [10] applications. All the exact multiplica- pp. 461–465, Jan. 2021.
tion operations are replaced with proposed design while the [9] N. Rathore, “JPEG image compression,” Int. J. Eng. Res. Appl., vol. 4,
other operations are carried out using accurate modules. The no. 3, pp. 435–440,2014.
[10] M. S. K. Lau, K.-V. Ling, and Y.-C. Chu, “Energy-aware probabilistic
metric peak signal-to-noise ratio (PSNR) is used to estimate multiplier: Design and analysis,” in Proc. Int. Conf. Compilers Archit.
the performance of multipliers. From Table VIII, it can be Synth. Embedded Syst., 2009, pp. 281–290.
[11] C. S. R. Reddy, U. A. Kumar, and S. E. Ahmed, “Design of efficient
observed that the designs P(8, 2) and P(8, 4) achieves better approximate multiplier for image processing applications,” in Proc. Int.
PSNR compared to existing multipliers due to its low ED, Conf. Mosicom, 2020, pp. 511–518.
NMED, and MRED. [12] M. S. Ansari, H. Jiang, B. F. Cockburn, and J. Han, “Low-power
approximate multipliers using encoded partial products and approximate
It is evident from Fig. 3(a)–(d) that the obtained images compressors,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 3
using an exact multiplier and proposed designs, processed pp. 404–416, Sep. 2018.
using various algorithms, look almost identical. Fig. 4 [13] H. Waris, C. Wang, W. Liu, J. Han, and F. Lombardi, “Hybrid par-
tial product-based high-performance approximate recursive multipliers,”
shows the comparison results for 8*8 approximate multipliers IEEE Trans. Emerg. Topics Comput., early access, Aug. 4, 2020,
obtained by considering the MRED, NMED, and PSNR w.r.t doi: 10.1109/TETC.2020.3013977.

Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.

You might also like