Low-Power Compressor-Based Approximate Multipliers With Error Correcting Module
Low-Power Compressor-Based Approximate Multipliers With Error Correcting Module
2, JUNE 2022 59
I. I NTRODUCTION
II. R ELATED W ORK
VER the past two decades, the semiconductor industry
O has seen an explosive growth in integrating sophisti-
cated multimedia applications into portable electronic gadgets.
Ansari et al. [12] proposed two approximate 4*4 submul-
tipliers using AND-OR encoding of partial products (PPs)
and used probability-based approximate 4-2 compressors for
However, this comes at the cost of higher power dissipation,
accumulation. These submultipliers are then utilized to design
which results in lowering the devices’ lifespan and reliabil-
higher order multipliers. Waris et al. [13] implemented a recur-
ity [1]. Most media processing applications, such as digital
sive multiplier using partial-product-based building blocks.
and image processing [2], are allowed an acceptable range
Further, high-performance approximate half and full adder
of accuracy loss due to the inherent limitation of the human
cells are proposed for use in a 4 × 4 multiplier, which in
sensory system.
turn are used for designing larger multipliers.
Multiplication is the basic operation in media processing
Momeni et al. [3] proposed two approximate 4:2 copres-
applications [2] and is carried out in three steps: 1) partial prod-
sors to obtain low area and power, however, suffers from
uct generation (PPG); 2) partial product reduction (PPR); and
accuracy. Yang et al. [4] implemented a new high-accuracy
3) final accumulation. Since these applications are inherently
approximate compressor in a multiplier architecture; however,
error-tolerant, various approximate multiplier architectures have
suffers from power dissipation. Ha and Lee [5] modified the
been proposed in [3]–[8] to obtain the hardware savings.
Yang compressor and used in the approximate region of the
The main contributions of this letter are as follows.
PPR structure. To minimize the compressor’s error, they used
1) A new approximate 4:2 compressor and constant cor-
an error correction term from level 1 to level 2 in the PPR
rection term are proposed that reduces the hardware
structure. Two types of multiplier designs using approximate
utilization at the PPR stage in a multiplier.
arithmetic modules were proposed by Venkatachalam and Ko [6]
2) The error in the multiplier is reduced using a simple yet
resulting in improvement in speed and power consumption,
efficient error correction circuit.
but these designs suffer from low accuracy. Another variant
3) We parametrized the proposed multiplier by varying the
of compressor-based multiplier design of high accuracy was
number of columns in the approximate region for 8- and
proposed by Yi et al. [7]. To improve hardware metrics, such as
16-bit multiplier. Consequently, we obtained variants in
area and power, Pei et al. [8] presented four different multiplier
the proposed design that are optimized for accuracy and
designs using three approximate compressors along with an
hardware utilization.
error correcting module. Many works in the literature have
Manuscript received March 29, 2021; revised June 23, 2021; accepted proposed multiplier architectures, most of them do not use the
September 8, 2021. Date of publication September 16, 2021; date of current error correcting module and none of the works used constant
version May 19, 2022. This manuscript was recommended for publication by correction term at the LSB region, as summarized in Table I.
S. Venkatramani. (Corresponding author: U. Anil Kumar.)
The authors are with the Department of Electrical and Electronics
Engineering, BITS Pilani (Hyderabad Campus), Hyderabad 500078, India III. P ROPOSED U NSIGNED A PPROXIMATE M ULTIPLIER
(e-mail: [email protected]; [email protected]
pilani.ac.in; [email protected]). Applications, such as computer vision, discrete cosine
Digital Object Identifier 10.1109/LES.2021.3113005 transform (DCT) in high-efficiency video coding (HEVC)
1943-0671
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.
60 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 14, NO. 2, JUNE 2022
Fig. 1. Proposed PPs reduction structure using with error correction logic.
TABLE VI
TABLE IV S YNTHESIS R ESULTS OF VARIOUS 8-B IT A PPROXIMATE M ULTIPLIERS
E RROR A NALYSIS OF VARIOUS 16-B IT A PPROXIMATE M ULTIPLIERS
65 536 and 1 million (random) cases for 8-bit and 16-bit TABLE VII
multiplier, respectively, and computed results are tabulated in S YNTHESIS R ESULTS OF VARIOUS 16-B IT A PPROXIMATE M ULTIPLIERS
Tables III and IV, respectively. Metrics, such as error dis-
tance (ED), normalized mean ED (NMED), mean relative ED
(MRED), and worst case relative ED [2], are computed to
quantify the approximate designs.
It can be observed from Table III that the P(8, 2) and P(8, 4)
designs are more accurate since they have better NMED
and MRED than the existing designs except for Ax8-1. The
improvement in accuracy in the P(8, 2) and P(8, 4) is due
to the constant correction scheme, approximate compressor,
and error correction module used in the PPR structure. Design
P(8, 6) has moderate NMED and MRED but consumes less
power since more columns being approximated in the approx-
imate region. Similar results can be observed in Table IV
corresponding to the 16-bit multiplier. From Table IV, design
P(16, 4) is more accurate than the existing designs.
(a) (b) (c) (d) (e)
B. Synthesis Results
For the sake of fair analysis, all the existing [3]–[8] and the
proposed designs are modeled using Verilog programming lan-
guage. Hardware synthesis of approximate multiplier designs
(f) (g) (h) (i) (j)
has been carried out at TSMC 180 nm process node (slow-
normal library) using Cadence RTL compiler v7.1 to compute
area, delay, and power.
From Table V, it is evident that the proposed compres-
sor consumes less power and area compared to the existing
compressors except for MUL2 and Momeni due to its simple (k) (l) (m) (n) (o)
structure. Though designs [3], [8] have less power consump-
tion and area, they suffer from accuracy. Fig. 3. (a)–(o) Images sharpened using exact, proposed, and existing
multipliers.
Similarly, area, power, and delay are calculated for the
proposed and existing multipliers and tabulated in Tables VI
and VII. It is evident from Table VI that P(8, 4) and P(8, 6) savings with better accuracy compared to the existing designs.
are faster than the existing designs except for Ax8-3, M3, and Similar results can be observed in Table VII corresponding to
M4. Design P(8, 6) achieves the lowest power consumption 16-bit multiplier.
compared to all the existing designs and has a percentage It is evident from Table VI that the design P(8, 6) has
improvement in power up to 47.7% and 36.9% compared to less PDP compared to all existing multipliers. The per-
exact and existing designs. The design P(8, 4) obtains lower centage improvement of design P(8, 6) in PDP is up to
power consumption than the existing multipliers except for 55.2% and 39.5% compared to exact and existing designs,
M4 and Momeni. The design P(8, 2) achieves moderate power respectively, for 8-bit design. Similarly, from Table VII,
Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.
62 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 14, NO. 2, JUNE 2022
TABLE VIII
PSNR OF D IFFERENT I MAGES U SING VARIOUS M ULTIPLIER D ESIGNS
R EFERENCES
[1] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold Design
for Ultra Low-Power Systems, vol. 95. New York, NY, USA: Springer,
2006.
[2] H. Jiang, C. Liu, L. Liu, F. Lombardi, and J. Han, “A review, classifi-
cation, and comparative evaluation of approximate arithmetic circuits,”
(c) ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 4, p. 60, 2017.
[3] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and
analysis of approximate compressors for multiplication,” IEEE Trans.
Fig. 4. Hardware and accuracy comparison on various multipliers. (a) PDP Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
and MRED comparison. (b) PDP and NMED comparison. (c) PDP and PSNR [4] Z. Yang, J. Han, and F. Lombardi, “Approximate compressors for error-
comparison. resilient multiplier design,” in Proc. IEEE DFTS, Amherst, MA, USA,
2015, pp. 183–186.
[5] M. Ha and S. Lee, “Multipliers with approximate 4–2 compressors and
design P(16, 12) has less PDP compared to the existing error recovery modules,” IEEE Embedded Syst. Lett., vol. 10, no. 1,
designs. pp. 6–9, Mar. 2018.
[6] S. Venkatachalam and S.-B. Ko, “Design of power and area efficient
approximate multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 25, no. 5, pp. 1782–1786, May 2017.
V. B ENCHMARKING U SING I MAGE P ROCESSING [7] X. Yi, H. Pei, Z. Zhang, H. Zhou, and Y. He, “Design of an energy-
A PPLICATIONS efficient approximate compressor for error-resilient multiplications,” in
Proc. IEEE ISCAS, Sapporo, Japan, 2019, pp. 1–5.
The proposed multiplier efficacy is validated using JPEG [8] H. Pei, X. Yi, H. Zhou, and Y. He, “Design of ultra-low power con-
image compression (JIC) [9], image multiplication [13], and sumption approximate 4–2 compressors based on the compensation
characteristic,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 1,
image sharpening [10] applications. All the exact multiplica- pp. 461–465, Jan. 2021.
tion operations are replaced with proposed design while the [9] N. Rathore, “JPEG image compression,” Int. J. Eng. Res. Appl., vol. 4,
other operations are carried out using accurate modules. The no. 3, pp. 435–440,2014.
[10] M. S. K. Lau, K.-V. Ling, and Y.-C. Chu, “Energy-aware probabilistic
metric peak signal-to-noise ratio (PSNR) is used to estimate multiplier: Design and analysis,” in Proc. Int. Conf. Compilers Archit.
the performance of multipliers. From Table VIII, it can be Synth. Embedded Syst., 2009, pp. 281–290.
[11] C. S. R. Reddy, U. A. Kumar, and S. E. Ahmed, “Design of efficient
observed that the designs P(8, 2) and P(8, 4) achieves better approximate multiplier for image processing applications,” in Proc. Int.
PSNR compared to existing multipliers due to its low ED, Conf. Mosicom, 2020, pp. 511–518.
NMED, and MRED. [12] M. S. Ansari, H. Jiang, B. F. Cockburn, and J. Han, “Low-power
approximate multipliers using encoded partial products and approximate
It is evident from Fig. 3(a)–(d) that the obtained images compressors,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 3
using an exact multiplier and proposed designs, processed pp. 404–416, Sep. 2018.
using various algorithms, look almost identical. Fig. 4 [13] H. Waris, C. Wang, W. Liu, J. Han, and F. Lombardi, “Hybrid par-
tial product-based high-performance approximate recursive multipliers,”
shows the comparison results for 8*8 approximate multipliers IEEE Trans. Emerg. Topics Comput., early access, Aug. 4, 2020,
obtained by considering the MRED, NMED, and PSNR w.r.t doi: 10.1109/TETC.2020.3013977.
Authorized licensed use limited to: VR Siddhartha Engineering College. Downloaded on July 24,2024 at 10:48:33 UTC from IEEE Xplore. Restrictions apply.