0% found this document useful (0 votes)
22 views21 pages

Electronics 12 00446 v2

This paper presents a novel Enhanced Static Segment Multiplier (ESSM) designed to minimize mean square approximation error for both uniform and non-uniform input distributions. The authors propose a generalized version (gESSM) that optimally positions the middle segment of the inputs to reduce errors, achieving significant hardware improvements in power and area. Simulation results indicate that the gESSM performs well in applications such as image and audio processing, particularly with non-uniform inputs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views21 pages

Electronics 12 00446 v2

This paper presents a novel Enhanced Static Segment Multiplier (ESSM) designed to minimize mean square approximation error for both uniform and non-uniform input distributions. The authors propose a generalized version (gESSM) that optimally positions the middle segment of the inputs to reduce errors, achieving significant hardware improvements in power and area. Simulation results indicate that the gESSM performs well in applications such as image and audio processing, particularly with non-uniform inputs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

electronics

Article
Design of Generalized Enhanced Static Segment Multiplier
with Minimum Mean Square Error for Uniform and
Nonuniform Input Distributions
Gennaro Di Meo * , Gerardo Saggese , Antonio G. M. Strollo and Davide De Caro

Department of Electrical Engineering and Information Technology, University of Naples Federico II,
80125 Naples, Italy
* Correspondence: [email protected]

Abstract: In this paper, we analyze the performances of an Enhanced Static Segment Multiplier (ESSM)
when the inputs have both uniform and non-uniform distribution. The enhanced segmentation
divides the multiplicands into a lower, a middle, and an upper segment. While the middle segment
is placed at the center of the inputs in other implementations, we seek the optimal position able to
minimize the approximation error. To this aim, two design parameters are exploited: m, defining the
size and the accuracy of the multiplier, and q, defining the position of the middle segment for further
accuracy tuning. A hardware implementation is proposed for our generalized ESSM (gESSM), and an
analytical model is described, able to find m and q which minimize the mean square approximation
error. With uniform inputs, the error slightly improves by increasing q, whereas a large error decrease
is observed by properly choosing q when the inputs are half-normal (with a NoEB up to 18.5 bits for
a 16-bit multiplier). Implementation results in 28 nm CMOS technology are also satisfactory, with
area and power reductions up to 71% and 83%. We report image and audio processing applications,
showing that gESSM is a suitable candidate in applications with non-uniform inputs.

Keywords: approximate multiplier; static segmentation; low-power; approximate computing


Citation: Di Meo, G.; Saggese, G.;
Strollo, A.G.M.; De Caro, D. Design
of Generalized Enhanced Static
Segment Multiplier with Minimum
1. Introduction
Mean Square Error for Uniform and The reduction of power consumption in DSP algorithms is a primary concern for the
Nonuniform Input Distributions. feasible realization of electronic systems and calls for the adoption of suitable design strate-
Electronics 2023, 12, 446. https:// gies. Convolution, dot product, and correlation are well diffused operations in applications
doi.org/10.3390/electronics12020446 ranging from telecommunication to image and audio processing, and make the design
Academic Editor: Spyridon critical due to the extensive employment of adders and multipliers. As an example, IoT
Nikolaidis and mobile devices, which implement deep learning and machine learning algorithms,
demand quantization techniques, down-sampling, and arithmetic approximations to re-
Received: 20 December 2022 duce the hardware complexity [1–3]. In telecommunication, the suppression of noise in
Revised: 10 January 2023
transceivers, necessary for improving the receiver sensitivity, requires cancellation methods
Accepted: 12 January 2023
based on adaptive filtering [4–6]. The huge number of multipliers, used for adaptation,
Published: 15 January 2023
increases the power consumption and demands specific techniques aimed to reduce area
and power while preserving the quality of results [7–11]. Low-power designs are also re-
quired for audio applications [12] in which banks of filters realize operations as equalization
Copyright: © 2023 by the authors.
and denoising.
Licensee MDPI, Basel, Switzerland. Since multipliers are responsible for large power consumption in DSP algorithms,
This article is an open access article hardware-efficient designs are required for achieving acceptable performances. As the
distributed under the terms and nature of many DSP algorithms is error tolerant (as adaptive filtering or image and audio
conditions of the Creative Commons processing), the Approximate Computing paradigm constitutes a valuable means of im-
Attribution (CC BY) license (https:// proving the hardware performances of multipliers, providing a way to approximate the
creativecommons.org/licenses/by/ design at the cost of a tolerable accuracy loss. Approximations can be introduced in the
4.0/). partial product generation stage, in the partial product matrix (PPM) compression step,

Electronics 2023, 12, 446. https://doi.org/10.3390/electronics12020446 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 446 2 of 21

or in the final carry propagate adder of the multiplier. Since the PPM compression stage
is rich in half-adders and full-adders, the approximation of the compression circuit can
lead to a significant hardware improvement. In [13], the authors involve AND and OR
gates to merge the partial product generation stage with the compression step, while [14]
deletes some rows from the PPM at design time. In [15] a recursive approach is proposed,
in which the multiplier is decomposed into small approximate units. The paper [16]
shows a compression scheme in which OR gates substitute half-adders and full-adders,
whereas [17] improves this technique by compensating the mean approximation error.
In [18], fast counters encode the partial products by following a stacking approach, whereas
the works [19–24] analyze multipliers with approximate 4–2 compressors. In these papers,
the full-adders required for the realization of the exact compressor are substituted by simple
logic at the cost of an error in the computation, and the carry chain between compressors
is broken in order to optimize the critical path and to moderate the glitch propagation.
In [20], the authors propose three compressors with different levels of accuracy, while [21]
designs an error recovery module to improve the quality of results. The paper [22] shows
a statistical approach for ordering the partial products in approximate 4–2 compressors,
and analyzes the performances when different compressors are employed in the same
multiplier. In [23], compressors with positive and negative mean error are interleaved in
order to minimize the approximation effects, whereas [24] prefers NAND and NOR gates
to AND and OR gates for achieving high speed performances.
The fixed-width technique is a further approach able to reduce the power, providing a
way to discard some columns of the PPM [25,26]. In this case, properly weighing the partial
products in the truncated PPM reduces the approximation error of the multiplier [26].
Different from the previous works, the segmentation method reduces the bit-width of
the multiplicands with the aim to downsize the multiplier. The papers [27,28] describe a
dynamic segment method (DSM) in which the segment is selected starting from the leading
one bit of the multiplicand. While [27] adds a ‘1’ bit at the least significant position of the
segment for accuracy recovery, ref. [28] revises the multiplication as a multiply-and-add
operation and applies operand truncation for further simplification. On the contrary, the
paper [29] proposes a static segment method (SSM), which reduces the complexity of the
selection mechanism by choosing between two fixed m-bit segments, with n/2 ≤ m < n and
n that is the number of bits of the inputs. At the same time, an Enhanced SSM multiplier
(ESSM) is also proposed in [29], which allows for selecting between three fixed portions of
the inputs: the m most significant bits (MSBs), the m least significant bits (LSBs), and the m
central bits of the inputs. The paper [30] improves the accuracy of the SSM multipliers by
reducing the maximum approximation error, whereas in [31] the authors propose a hybrid
approach in which a static stage is cascaded to a dynamic stage. In these cases, error metric
results reveal satisfactory accuracy when the inputs have uniform distribution, along with
acceptable power improvements with respect to the exact and the DSM multipliers. At
the same time, these works do not offer an analysis with non-uniform distributed input
signals; in addition, the work [29] does not show a detailed analysis of the hardware
implementation of the ESSM multiplier.
In this paper, we analyze the performances of the ESSM multiplier as a function of
the input stochastic distribution and propose a novel implementation able to minimize
the mean square approximation error. Indeed, the statistical properties of a signal affect
the probability of assuming values in a range, giving high probability ranges and low
probability ranges. Starting from this observation, our idea is to properly place the central
segment (named middle segment in the following) in order to minimize the segmentation
error in the high probability ranges. To this aim, two design parameters are exploited: m,
which defines the size of the multiplier, and q, which defines the position of the middle
segment. For the error analysis, we consider inputs with uniform and non-uniform distri-
bution, taking into consideration half-normal signals for demonstration in this last case,
and also describe an analytical model able to find the optimal position qopt that minimizes
the multiplier error in a mean square sense.
Electronics 2023, 12, x FOR PEER REVIEW 3 of 21

distribution, taking into consideration half-normal signals for demonstration in this last
case, and also describe an analytical model able to find the optimal position qopt that min-
Electronics 2023, 12, 446 3 of 21
imizes the multiplier error in a mean square sense.
Simulation results match with the theoretical analysis, exhibiting accuracy perfor-
mances dependent on the input stochastic distribution and on the choice of m and q. Best
errorSimulation
metrics areresults
achievedmatch
withwith the theoretical
the middle segment analysis, exhibiting
placed toward accuracy
the MSBs if theperfor-
inputs
mances dependent on the input stochastic distribution and on the choice of m and q. Best
are uniform, and with middle segment placed at the center of the inputs if the distribution
error metrics areElectrical
is half-normal. achievedanalyses
with thealso
middle
showsegment placed
remarkable toward the
hardware MSBs if the if
improvements inputs
com-
are uniform, and with middle segment placed at the center of the inputs if the distribution is
pared with the exact multiplier, whereas only an acceptable degradation is registered with
half-normal. Electrical analyses also show remarkable hardware improvements if compared
respect to the SSM multipliers. Assessments of image and audio processing applications
with the exact multiplier, whereas only an acceptable degradation is registered with respect
confirm these trends, showing performances that depend on the position of the middle
to the SSM multipliers. Assessments of image and audio processing applications confirm
segment.
these trends, showing performances that depend on the position of the middle segment.
The paper is organized as follows: Section 2 shows the static segment method, also
The paper is organized as follows: Section 2 shows the static segment method, also
describing the correction technique of [30] and the enhanced segmentation presented in
describing the correction technique of [30] and the enhanced segmentation presented
[29]. Then, Section 3 describes the hardware structure of the proposed gESSM, along with
in [29]. Then, Section 3 describes the hardware structure of the proposed gESSM, along
the analytical model used to minimize the mean square value of the approximation error.
with the analytical model used to minimize the mean square value of the approximation
Section 4 shows the results in terms of error metrics, electrical performances, and applica-
error. Section 4 shows the results in terms of error metrics, electrical performances, and
tions in image and audio processing. A comparison with the state-of-the-art is also pro-
applications in image and audio processing. A comparison with the state-of-the-art is
posed.
also Section 5Section
proposed. further 5compares the multipliers
further compares finding thefinding
the multipliers pareto-optimal implemen-
the pareto-optimal
tations, and Section
implementations, 6 concludes
and the paper.the paper.
Section 6 concludes

2. Static
2. Static Segment
Segment Method
Method
2.1.
2.1. Static Segment
Segment Multiplier
Multiplier and
and Correction
Correction Technique
Technique
The
The SSM SSM technique
technique shown
shown in in [29]
[29] provides
provides for selecting m-bit
for selecting m-bit segments
segments from from the the
multiplicands, with n/2
multiplicands, n/2 ≤≤mm< <n,n,ininorder
ordertotoemploy
employ a smaller
a smaller m ×multiplier
mxm m multiplier instead instead of a
of
nxna nxn multiplier.
multiplier. As shown
As shown in Figure
in Figure 1 for
1 for the the unsigned
unsigned signal signal A,nif−the
A, if the n − m(i.e.,
m MSBs MSBs a15,
a14, ..,a15
(i.e., a10, )a14
are, ...,
lowa10the
) areleast
lowsignificant
the least significant m bits
m bits of the of the
input areinput
chosen, areforming
chosen, the forming segment the
segment
AL. On the AL .contrary,
On the contrary,
if any bitif any bit of
of the n-m n-m MSBs
theMSBs is highis high the most
the most significant
significant m bits m bitsare
are selected, forming the segment A . It is worth noting that the
selected, forming the segment AH. ItH is worth noting that the segmentation introduces an segmentation introduces
an error
error when when AHAis is chosen
H chosen since
since thethe bits
bits belongingtotoeAeAare
belonging aretruncated (i.e.,aa5,5 ,a4a,4 …,
truncated(i.e., , . . .a,0 ain
0
in Figure
Figure 1).1).InIn addition,mmisisthe
addition, theonly
onlyparameter
parameterableabletotodefine
define the
the accuracy
accuracy andand the the sizesize of of
the
the multiplier.
multiplier.

Figure 1.
Figure 1. Segmentation
Segmentation of
of the
the signal
signal A
A with
with nn == 16 bits and m = 10 bits.

Then,
Then, defining α asthe
αAAas theOR ORbetween
betweenthe then n–m− MSBs
m MSBs ofthe
of A, A, the segmented
segmented input
input Assm
A
isssm is 
AL i f α A = 0
Assm = 𝐴 𝑖𝑓𝛼 = 0 (1)
𝐴𝑠𝑠𝑚 =A{H i𝐿f α A𝐴 = 1 (1)
𝐴𝐻 𝑖𝑓𝛼𝐴 = 1
A similar expression holds also for the input B and the corresponding segment Bssm .
A similar expression holds also for the input B and the corresponding segment Bssm.
Then, the segmented multiplication is
Then, the segmented multiplication is
  
SHa,ssm SHb,ssm
γ𝛾ssm
𝑠𝑠𝑚 =
= (𝐴
A ssm
𝑠𝑠𝑚 ·∙22 𝑆𝐻𝑎,𝑠𝑠𝑚 )· (𝐵
∙ Bssm
𝑠𝑠𝑚· 2∙ 2 𝑆𝐻𝑏,𝑠𝑠𝑚 )= ( A
= (𝐴ssm
𝑠𝑠𝑚· B∙ ssm )·2)SHssm
𝐵𝑠𝑠𝑚 ∙ 2𝑆𝐻𝑠𝑠𝑚 (2)
(2)
with SHa,ssm, SHb,ssm that are
with SHa,ssm, SHb,ssm that are
0 𝑖𝑓𝛼𝐴 = 0
 𝑆𝐻𝑎, 𝑠𝑠𝑚 = { 𝑛 − 𝑚 𝑖𝑓𝛼𝐴 = 1
0 i f αA = 0 (3)
SHa, ssm = 0 𝑖𝑓𝛼𝐵 = 0
n −
𝑆𝐻𝑏, 𝑠𝑠𝑚f =α A{ 𝑛=−1𝑚
m i
𝑖𝑓𝛼𝐵 = 1 (3)
0 i f αB = 0
SHb, ssm =
n − m i f αB = 1
Electronics 2023, 12, 446 4 of 21

Electronics 2023, 12, x FOR PEER REVIEW 4 of 21

and SH = SHa,ssm + SHb,ssm, defining the left-shift used to express the result on 2·n bits:

and SH = SHa,ssm + SHb,ssm,


 0 defining the left-shift used i f α Ato=express
0 , α B the
= 0result on 2·n bits:

SHssm = n − m i f α A = 0 , α B = 1 or i f α A = 1 , α B 0= 0
0 𝑖𝑓𝛼 𝐴 = 0 , 𝛼𝐵 = (4)
𝑆𝐻𝑠𝑠𝑚 = {𝑛 − 𝑚 𝑖𝑓𝛼𝐴 = 0 , 𝛼𝐵 = 1 𝑜𝑟 𝑖𝑓𝛼𝐴 = 1 , 𝛼𝐵 = 0 (4)
2·(n − m) i f α A = 1 , αB = 1
2 ∙ (𝑛 − 𝑚) 𝑖𝑓𝛼𝐴 = 1 , 𝛼𝐵 = 1
Figure
Figure 2a depicts
2a depicts the the hardware
hardware implementation
implementation of the
of the SSMSSM multiplier.
multiplier. TheThe multi-
multiplex-
plexers
ers on A andonBAapply
and Btheapply the segmentation
segmentation choosing
choosing between between the most
the most significant
significant and
and least
least significant portions of the inputs, whereas two OR gates compute
significant portions of the inputs, whereas two OR gates compute the selection flags αA the selection flags
andααABand αB. After
. After the mthe
×m mxm multiplier,
multiplier, a further
a further multiplexerrealizes
multiplexer realizesthe
the left-shift
left-shift described
described
in
in (4). (4).

(a) (b)
Figure
Figure 2. Approximate
2. Approximate multiplier
multiplier with
with (a)(a) static
static segmentmethod
segment methodand
and(b)
(b) segmented
segmented multiplier
multiplierwith
with
the correction technique of [30].
the correction technique of [30].

TheThe accuracy
accuracy of the
of the SSMSSM multiplier
multiplier is improved
is improved in [30]
in [30] byby minimizing
minimizing thethe approxi-
approxima-
mation error in the case α A = 1, αB = 1 (i.e., when both inputs are truncated). Here, the
tion error in the case αA = 1, αB = 1 (i.e., when both inputs are truncated). Here, the authors
authors
estimate theestimate the committed
committed error as error as
𝑚−1
2𝑛−2𝑚 m −1 𝑘
CT 𝐶𝑇
= 2=2n2−2m · ∑
∙∑ctk𝑐𝑡
2k𝑘 2 (5)
(5)
k =0𝑘=0
with
with
𝐶𝑡(𝑘a=
Ctk = (𝑎 m bn−m
k +n−𝑘+𝑛−𝑚 𝑏𝑛−𝑚−1 )𝑂𝑅(𝑏
−1 )OR −m an𝑎
(bk+n𝑘+𝑛−𝑚 m −1 ) )
−𝑛−𝑚−1 (6)
(6)
andand
addadd
CT CT to the
to the approximate
approximate product
product forfor compensation:
compensation:
𝛾𝑠𝑠𝑚,𝑐 = (𝐴𝑠𝑠𝑚 ∙ 𝐵𝑠𝑠𝑚 + 𝐶𝑇) ∙ 2𝑆𝐻 (7)
γssm,c = ( Assm · Bssm + CT )·2SH (7)
As detailed in [30], using two or three terms of the summation (5) sufficiently im-
proves the accuracy.
As detailed in [30], using two or three terms of the summation (5) sufficiently improves
Figure 2b shows the implementation of the corrected SSM multiplier (named cSSM
the accuracy.
inFigure
the following).
2b showsThe the correction term CT
implementation of isthe
combined
corrected with
SSM the product A(named
multiplier ssm·Bssm if αA = in
cSSM 1
and α = 1 (see the AND gate highlighted in red). It is also worth noting
the following). The correction term CT is combined with the product Assm ·Bssm if αA = 1
B that the correction
andtechnique
αB = 1 (seehas thea AND
minimum impact on the
gate highlighted hardware
in red). performances
It is also since
worth noting a fused
that PPM is
the correction
employed for realizing the (7).
technique has a minimum impact on the hardware performances since a fused PPM is
employed for realizing the (7).
2.2. Enhanced SSM Multiplier
2.2. Enhanced SSM Multiplier
The ESSM multiplier described in [29] allows for selecting between three segments
ofThe
theESSM
input, multiplier
each one having m bits
described in(see
[29]Figure
allows3a).for In this implementation,
selecting between threethe middle
segments
of the input, each one having m bits (see Figure 3a). In this implementation, the respect
segment A M is placed at the center of the signal (i.e., (n – m)/2 bits on the left with middle
segment AM is placed at the center of the signal (i.e., (n − m)/2 bits on the left with respect
x FOR PEER REVIEW
Electronics 2023, 12, 446 5 5of
of 21

the LSB,
to the LSB, see
see the
thefigure).
figure).AsAsthe
theposition
position AMAis
of of m ismagain
is fixed,
M fixed, is again
the the
onlyonly design
design pa-
parameter
rameter which
which defines
defines thethe accuracy
accuracy and
and thethe size
size of of
thethe multiplier.
multiplier.

(a)

(b)
Figure
Figure 3.
3. Segmentation
Segmentation of
of the
the input A in
input A in the
the case
case n
n== 16
16 and
and m
m == 88 with
with (a)
(a) the
the ESSM
ESSM method
method of
of [29]
[29]
and (b) the proposed generalized ESSM method in the case q = 5.
and (b) the proposed generalized ESSM method in the case q = 5.

In
In this
this case,
case, two two control
control flags flags areare required
requiredfor forthetheselection,
selection,named namedαα AH and αAM in
AH and αAM
the following. Therefore, defining α as the OR of
in the following. Therefore, defining αAH as the OR of the first (n − m)/2 MSBs
AH the first (n–m)/2 MSBs of A (i.e., a15of
, a14A,
…,
(i.e.,a12, highlighted
a15 , a14 , . . . , a12 in blue in Figure
, highlighted in blue 3a),
inand
Figure αAM3a), as and
the α ORAMof asthethe remaining (n – m)/2
OR of the remaining
MSBs
(n − m)/2 (i.e., MSBs
a11, a10(i.e.,
, …, a811, highlighted
, a10 , . . . , a8 ,in green in Figure
highlighted in green 3a),inthe segment
Figure Aessm
3a), the is computed
segment Aessm
as
is computed as 
 AL 𝐴 i f 𝑖𝑓 AH , α
(α(𝛼 AM ) = (0, 0)
𝐿 𝐴𝐻 , 𝛼𝐴𝑀 ) = (0,0)
Aessm = A M𝐴 i f𝑖𝑓(α(𝛼AH ,,α𝛼AM)) = (0,1)
(0, 1) (8)
𝐴𝑒𝑠𝑠𝑚= { 𝑀 𝐴𝐻 𝐴𝑀 = (8)
A H i f (α AH , α AM ) = ( 1, 0 ) or ( 1, 1 )
𝐴𝐻 𝑖𝑓 (𝛼𝐴𝐻 , 𝛼𝐴𝑀 ) = (1,0) 𝑜𝑟 (1,1)
A similar expression holds also for the segment Bessm , with the flags αBH , αBM that
A similar expression holds also for the segment Bessm, with the flags αBH, αBM that
handle the segmentation.
handle the segmentation.
Therefore, the approximate product is
Therefore, the approximate product is
  
𝛾γ𝑒𝑠𝑠𝑚
essm = essm∙·2
= (𝐴A𝑒𝑠𝑠𝑚 2SHa,essm
𝑆𝐻𝑎,𝑒𝑠𝑠𝑚 ) · (𝐵
∙ Bessm
𝑒𝑠𝑠𝑚· ∙
2 SHb,essm
2 𝑆𝐻𝑏,𝑒𝑠𝑠𝑚 )= ((𝐴
= Aessm𝑒𝑠𝑠𝑚· B∙ essm
𝐵𝑒𝑠𝑠𝑚)·2)SHessm
∙ 2𝑆𝐻𝑒𝑠𝑠𝑚 (9)
with SHa,essm, SHb,essm that are
with SHa,essm, SHb,essm that are
 0 𝑖𝑓 (𝛼𝐴𝐻 , 𝛼𝐴𝑀 ) = (0,0)
0 i f
 = { (𝑛 − 𝑚)/2 AH
𝑆𝐻𝑎, 𝑒𝑠𝑠𝑚
 ( α 𝑖𝑓 ,(𝛼 = (0, 0)
𝐴𝐻 , )𝛼𝐴𝑀 ) = (0,1)
α AM
SHa, essm= (n −𝑛m−)/2𝑚 𝑖𝑓i f(𝛼(α𝐴𝐻AH, 𝛼, 𝐴𝑀 ) =) =
α AM (1,0) 1) (1,1)
(0, 𝑜𝑟
(10)
(𝛼 (1, 0)(0,0)
)


n−m 0 𝑖𝑓
i f (α AH , α AM , 𝛼
𝐵𝐻) =𝐵𝑀 = or (1, 1)
𝑆𝐻𝑏, 𝑒𝑠𝑠𝑚 = { (𝑛 − 𝑚)/2 𝑖𝑓 (𝛼𝐵𝐻 , 𝛼𝐵𝑀 ) = (0,1)
 (10)
 0 𝑛 − 𝑚 𝑖𝑓i f(𝛼
 (α BH, 𝛼, α BM ) = (0, 0)
𝐵𝐻 𝐵𝑀 ) = (1,0) 𝑜𝑟 (1,1)
SHb, essm= (n − m)/2 i f (α BH , α BM ) = (0, 1)
and SHessm defined in Table
1.

n−m i f (α BH , α BM ) = (1, 0) or (1, 1)
Table 1. Left-shift for the ESSM multiplier.
and SHessm defined in Table 1.
αAH, αAM, αBH, αBM SHessm
Table 1. Left-shift for the ESSM multiplier.
(0000) 0
αAH , αAM , (0100)
(0001), αBH , αBM (n – m)/2
SHessm
(0010), (0011), (0101),
(0000) (1000), (1100) n –0 m
(0110), (0111),
(0001), (1001),
(0100) (1101) (3/2)·
(n −(n – m)
m)/2
(0010), (0011), (0101), (1000),
(1010), (1011), (1110), (1100)
(1111) n − m
2·(n –m)
(0110), (0111), (1001), (1101) (3/2)·(n − m)
(1010), (1011), (1110), (1111) 2·(n –m)
As shown in the table, the left-shift SHessm ranges between five possible values, thus
requiring a 5 × 1 multiplexer to extend the result on 2·n bits.
Electronics 2023, 12, 446 6 of 21

As shown in the table, the left-shift SHessm ranges between five possible values, thus
requiring a 5 × 1 multiplexer to extend the result on 2·n bits.

3. Proposed Generalized ESSM Multiplier


3.1. Hardware Implementation
With the aim to improve the accuracy of the multiplier presented in the previous
section, we generalize the ESSM method by placing AM in any possible position between
the LSB and the MSB. With reference to Figure 3a, let us suppose A to be in the range
[212 , 213 ) with high probability, which means that the bit a12 is high and the bits a15 , a14 , a13
are low with high probability. The segmentation scheme of Figure 3a mostly chooses the
segment AH , approximating the input with resolution 28 , whereas AM , able to offer a finer
accuracy, is less used. In order to improve the performances, we can choose a segmentation
scheme as in Figure 3b, allocating the middle segment in order to collect the bits a12 , a11 ,
. . . , a5 . In this way, the selection mechanism mostly chooses AM allowing a finer resolution
(that is 25 instead of 28 ) with beneficial effects on the overall accuracy. As a consequence,
choosing the position of AM in dependance on the input statistical properties allows us to
optimize the accuracy of the multiplier.
As shown in Figure 3b, the parameter q defines the position of AM with respect the
LSB of the input (in this example q = 5). Therefore, two parameters are used for the design:
m, which defines the accuracy and the size of the multiplier, and q, which improves the
accuracy of the segmentation. Please note also that q defines the resolution of AM , which is
2q (see Figure 3b).
By noting that AM and AL overlap if q = 0, and that AM and AH overlap if q = n − m,
we choose q in the range [1, n − m – 1] to select three distinct segments. In addition, if
q = (n − m)/2 we get the ESSM multiplier presented in [29].
The selection flag αAH is computed by OR-ing the first n – (m + q) MSBs of A (i.e., a15 ,
a14 , a13 in Figure 3b, depicted in blue), whereas αAM is computed by OR-ing the remaining
q MSBs (i.e., a12 , a11 , . . . , a8 in Figure 3b, depicted in green). Then, the segmented inputs
Aessm , Bessm are computed as in (8), with the following expressions for SHa,essm, SHb,ssm:

 0 i f (α AH , α AM ) = (0, 0)
SHa, essm = q i f (α AH , α AM ) = (0, 1)
n − m i f (α AH , α AM ) = (1, 0) or (1, 1)

 (11)
 0 i f (α BH , α BM ) = (0, 0)
SHb, essm = q i f (α BH , α BM ) = (0, 1)
n − m i f (α BH , α BM ) = (1, 0) or (1, 1)

Likewise, the approximate product is computed as in (9) with the final left-shift
SHessm defined in Table 2. Now, SHessm ranges between six possible values, thus calling
for a 6 × 1 multiplexing scheme.

Table 2. Left-shift for the generalized ESSM multiplier.

αAH , αAM , αBH , αBM SHessm


(0000) 0
(0001), (0100) q
(0101) 2·q
(0010), (0011), (1000), (1100) n−m
(0110), (0111), (1001), (1101) n−m+q
(1010), (1011), (1110), (1111) 2·(n − m)

Figure 4 depicts the hardware implementation of the generalized ESSM multiplier


(named gESSM in the following). The 3 × 1 multiplexers allow for selecting between the
most significant, the middle, and the least significant part of the inputs, whereas a small
m × m multiplier computes the approximate product. The left-shift is realized by cascading
Electronics 2023, 12, 446 7 of 21
023, 12, x FOR PEER REVIEW 7 of 21

two 3 × 1 multiplexers, where the first multiplexer applies the shift SHa,essm due to the
flags αAH , αAM , and the second one applies the shift SHb,essm due to αBH , αBM . It is worth
noting that this approach prevents the usage of large multiplexers with beneficial effects
noting that this approach prevents the usage of large multiplexers with beneficial effects on
on the hardware performances of the multiplier.
the hardware performances of the multiplier.

Figure 4. BlockFigure
diagram of the
4. Block proposed
diagram of thegeneralized ESSM multiplier.
proposed generalized ESSM multiplier.

3.2. Minimization of the Mean Square Approximation Error


3.2. Minimization of the Mean Square Approximation Error
In this paragraph, we find m and q in order to minimize the mean square approxima-
In this paragraph,
tion error atwethefind m and
output q in
of the order tounder
multiplier minimize the mean
the hypothesis ofsquare approxima-
both uniform and non-
tion error at the output
uniform of the multiplier
distributed under
input signals. In thethe hypothesis
following, of bothinputs
we consider uniform
with and non-
half-normal
uniform distributed input
distribution signals.
in the In the case,
non-uniform following, we consider
whose probability inputs
density withishalf-normal
function as follows:
distribution in the non-uniform case, whose probability √ 2
density function is as follows:
2 − A2
f ( A) = √
𝐴2
·e 2σ f or A ≥ 0 (12)
√2 σ
− π
𝑓(𝐴) = 2
∙ 𝑒 2𝜎 𝑓𝑜𝑟 𝐴 ≥ 0 (12)
𝜎 √ 𝜋
where σ, being the standard deviation of the underlying normal variable, is also related to
where σ, being thethe
standard deviation
standard of A. of the underlying normal variable, is also related
deviation
Before proceeding, let us assume A and B to be independent, and let us re-write
to the standard deviation of A.
equation (8) as follows with the help of Figure 3b:
Before proceeding, let us assume A and B to be independent, and let us re-write equa-
tion (8) as follows with the help of Figure 3b: A L i f A < 2m


Aessm = A i f 2m ≤𝑚A < 2m+q (13)
𝐴𝐿 M𝑖𝑓 𝐴m<
A H i f 2 +q2≤ A < 2n − 1
𝐴𝑒𝑠𝑠𝑚 = { 𝐴𝑀 𝑖𝑓 2𝑚 ≤ 𝐴 < 2𝑚+𝑞 (13)
where the conditions A < 2m𝐴, 𝐻
2m ≤ 2m+q
𝑖𝑓A2<𝑚+𝑞 ≤, 𝐴
m+q n
< 22𝑛 −≤1A < 2 −1 recall the conditions
and
(αAH , αAM ) = 00, (αAH , αAM ) = 01 and (αAH , αAM ) = 10 or 11, respectively. Defining
where the conditions A <·22SHa,essm
A0 essm = Aessm m, 2m ≤ A < 02m+q, and 2m+q ≤ A <, we
and B essm = Bessm ·2SHb,essm 2n−1
canrecall theexact
write the conditions
inputs as (α AH,
follows:
αAM) = 00, (αAH, αAM) = 01 and (αAH, αAM) = 10 or 0
11, respectively. Defining A’ essm =

Aessm·2SHa,essm and B’essm = Bessm·2SHb,essm, we can writeA = Athe


essm + e A inputs as follows:
exact
0 (14)
B = Bessm + e B
𝐴= 𝐴′𝑒𝑠𝑠𝑚 + 𝑒𝐴
(14)

𝐵 = 𝐵𝑒𝑠𝑠𝑚 + 𝑒𝐵
with eA, eB that are the truncation errors due to the segmentation:
0 𝑖𝑓 𝐴 < 2𝑚 0 𝑖𝑓 𝐵 < 2𝑚
𝑞−1 𝑞−1
𝑘 𝑚 𝑚+𝑞
∑ 𝑎𝑘 2 𝑖𝑓 2 ≤ 𝐴 < 2 ∑ 𝑏𝑘 2𝑘 𝑖𝑓 2𝑚 ≤ 𝐵 < 2𝑚+𝑞
𝑒𝐴 = 𝑘=0 𝑒𝐵 𝑘=0 (15)
Electronics 2023, 12, 446 8 of 21

with eA , eB that are the truncation errors due to the segmentation:

0i f A < 2m 0i f B < 2m
 

 

 

 q −1 
 q −1
∑ a k 2k i f 2m ≤ A < 2m + q ∑ bk 2 k i f 2 m ≤ B < 2 m + q

 

 
eA = k =0 eB k =0 (15)
 
n − m −1 n − m −1

 

 ∑ ak 2 i f 2  ∑ bk 2 i f 2
 
k m+q
 A  2n − 1 k m+q
≤ B ≤ 2n − 1

 

 
k =0 k =0

In (15) we assume bits ak , bk to be independent from A and B, respectively, and to be


uniform random variables with probability 12 of being ‘1’.
Using (14), the exact product is:
 0  0 
γ = A· B = Aessm + e A · Bessm + e B
0 0 0 0
(16)
= Aessm · Bessm + Aessm ·e B + Bessm ·e A + e A ·e B

Since the gESSM computes only the term A’essm ·B’essm , the segmentation error is:
0 0
eessm = Aessm ·e B + Bessm ·e A + e A ·e B (17)

Re-writing (14) as A’essm = A − eA , B’essm = B − eB and substituting in (17), we find:

eessm = A·e A + B·e B − e A ·e B (18)

Neglecting the small term eA ·eB for the sake of simplicity, we compute the mean square
approximation error by squaring (18) and by using the expectation operator:
h i h i h i h i h i
E e2essm = E A2 · E e2B + E B2 · E e2A + 2· E[ A·e A ]· E[ A·e B ] (19)

Since A and B have the same distribution, we have E[A2] = E[B2], as well as E[e2 A] = E[e2 B]
and E[A·e2 A] = E[B·e2 B] for the previous hypothesis. Therefore, Equation (19) becomes
h i h i h i
E e2essm = 2· E A2 · E e2A + 2· E[ A·e A ]2 (20)

As the computation of E[A·eA ]2 is not straightforward, we can exploit the Cauchy–


Schwarz inequality E[A·eA ]2 ≤ E[A2 ]·E[e2 A ] to find the upper limit of E[e2 essm ]:
h i h i h i
E e2essm ≤ 4· E A2 · E e2A (21)

Here, E[A2 ] depends on the statistic of the input signal, whereas E[e2 A ], which is the
mean square value of the approximation error committed on A, depends on m and q. Then,
as suggested by the above inequalities, minimizing the upper limit (i.e., minimizing E[e2 A ])
minimizes the overall mean square approximation error of the multiplier.
Starting from (15), we can write E[e2 A ] as follows:
 !2   !2 
h i q −1 n − m −1
E e2A = E ∑ ak 2k · P( A M ) + E ∑ a k 2k  · P ( A H ) (22)
k =0 k =0

with P(AM ) and P(AH ) that are the probability of having A in the ranges [2m , 2m+q ) and
[2m+q , 2n − 1], respectively. Table 3 collets the expressions of P(AM ) and P(AH ) for the
uniform and the half-normal cases, where erf (·) represents the so-called error function
(details on the computation are reported in Appendix A for the half-normal case).
Electronics 2023, 12, x FOR PEER REVIEW 9 of 21

Electronics 2023, 12, 446 9 of 21

1 1 1
𝐸[𝑒𝐴2 ] = [ (4𝑞 − 1) + 21 (2𝑞−1 − 1) − (4𝑞−1 − 1)] ∙ 𝑃(𝐴𝑀 )
6 4 6
Table 3. Probability of selecting AM1 and AH as a function
1 𝑛−𝑚of the input distribution.
1
𝑛−𝑚 (23)
+ [ (4 − 1) + 2 (2𝑛−𝑚−1 − 1) − (4𝑛−𝑚−1 − 1)]
Input Stochastic Distribution 6 4
P(AM ) 6 P(AH )
∙ 𝑃(𝐴𝐻 )
1 1
2m+q −2m 2n− 1 − 2m+q 
 
2n −1 · 1 ·
Uniform 2n −
with P(AM)Half-normal
and P(AH) that also depend on mm+qand q (see mTable 3).er f 2n√−1 − er f 2m√+q
 
er f 2 √ − er f 2√
The behavior of E[e2A] with respect toσm2 and q is shown σ 2 in Figure 2 compared
σ 5, σ 2 to the

simulation results. In this study, the input A is an n = 16 bits integer signal with uniform
We underline
distribution in Figurethat 5a,
the and
presence of P(AM
half-normal ) and P(AHwith
distribution ) in (22) highlights
σ = 1024, 2048, the
andrelation
16,384 in
between the approximation error and the stochastic distribution
Figure 5b–d. We achieve the simulation results by segmenting 10 input samples of 6the inputs. of A and
Solving the the
by computing expectations
mean square in (22),
valueweoffind
the the following expression
approximation error. for E[e2 A ] (refer to
Appendix B for details):
As shown, the theoretical results perfectly match with the simulations. For fixed m,
increasing
 2 hq1 decreases E[e2A] in the uniform case. Therefore,
q − 1) + 1 21 2q−1 − 1 − 1 4q−1 − 1 · P ( A )
 i the optimal point qopt, able to
E e
minimize = E[e ( 4
6 A], is value 6of q (that is qopt = M
A 2
h the maximum
4 n – m – 1). On
i
the other hand,
(23)
n−Figure
+ 16 (4in −m 2optimal
+ 14 2nwith
m − 1) 5b,c, n−m−1 −points −m=−4,
1 − 61in4qnopt 1−

E[e2A] shows minima m1= 8· Pand
( A Hqopt
) = 2, m =
10 for σ=1024, and qopt = 5, m = 8 and qopt = 3, m = 10 for σ = 2048. When σ becomes large,
E[e2P(A
with A] again
M ) anddecreases with
P(AH ) that q, making
also dependqon = nm–and
m –q1(see
the best
Tablechoice.
3).
The behavior of aE[e 2 respectoftoqmisand q is shown importance
In conclusion, A ] with
proper selection of paramount in Figure 5, for compared
optimizing to the
the
simulation
accuracy of results. In this study,
the multiplier, theto
leading input A isthe
placing n = 16 bits
an middle integer
segment insignal with uniform
any position between
distribution
the LSB andinthe Figure
MSB5a, and
of A (inhalf-normal
contrast with distribution
[29] whichwithalwaysσ =fixes
1024,the2048, and segment
middle 16,384 inat
Figure 5b–d.ofWe
the center theachieve
input).the simulation
In addition, theresults by segmenting
statistical properties 10 of6the
input samples
input signals A and
ofstrongly
byaffect
computing the mean square value of the approximation
the optimal value of q, as demonstrated by the results of Figure 5. error.

(a) (b)

(c) (d)
Figure5.5.Mean
Figure Mean square
square error on on the
theinput
inputsignal
signalAAasasa function of m
a function of and q with
m and (a) uniform
q with distri-
(a) uniform
bution andand
distribution half-normal
half-normaldistribution in in
distribution thethe
cases of of
cases (b)(b)
σσ= 1024, (c)(c)σ σ= =
= 1024, 2048,
2048,and
and(d)
(d)σσ==16,384.
16,384.In
Inthis
thisexample,
example,AAisisan
aninteger
integersignal
signalexpressed
expressedon onnn==16 16bits.
bits.
Electronics 2023, 12, 446 10 of 21

As shown, the theoretical results perfectly match with the simulations. For fixed m,
increasing q decreases E[e2 A ] in the uniform case. Therefore, the optimal point qopt , able to
minimize E[e2 A ], is the maximum value of q (that is qopt = n − m − 1). On the other hand,
E[e2 A ] shows minima in Figure 5b,c, with optimal points in qopt = 4, m = 8 and qopt = 2,
m = 10 for σ=1024, and qopt = 5, m = 8 and qopt = 3, m = 10 for σ = 2048. When σ becomes
large, E[e2 A ] again decreases with q, making q = n − m − 1 the best choice.
In conclusion, a proper selection of q is of paramount importance for optimizing the
accuracy of the multiplier, leading to placing the middle segment in any position between
the LSB and the MSB of A (in contrast with [29] which always fixes the middle segment at
Electronics 2023, 12, x FOR PEER REVIEW 10 of 21
the center of the input). In addition, the statistical properties of the input signals strongly
affect the optimal value of q, as demonstrated by the results of Figure 5.

4. Results
4. Results
4.1. Assessment of Accuracy
4.1. Assessment of Accuracy
We study the accuracy of the gESSM by exploiting the error metrics commonly used
We study the accuracy of the gESSM by exploiting the error metrics commonly used
in the literature. To this end, let us define the approximation error E = Y − Yapprx and
in the literature. To this end, let us define the approximation error E = Y – Yapprx and the
the Error Distance ED = |E|, with Y and Yapprx that are the exact and the approximate
Error
product. Distance
Naming ED =avg(
|E|,·) with
and Y andthe
Ymax Yapprx that areoperator
average the exactand andthethe maximum
approximate product.
value of Y,
Naming avg(·) and Y max the average operator and the maximum
respectively, with Ymax = (2n − 1)2 , we define the Normalized Mean Error Distance as value of Y, respectively,
with
NMED Ymax = (2n − 1)2, we
= avg(ED)/Y define the Normalized Mean Error Distance as NMED =
max , the Mean Relative Error Distance as avg(ED/Y), and the Number
avg(ED)/Y max, the Mean Relative Error Distance as avg(ED/Y), and the Number of Effective
of Effective Bits as NoEB = 2·n − log2 (1 − Erms ), with Erms being the root mean square
Bits
value asofNoEB
E. = 2·n − log2(1 − Erms), with Erms being the root mean square value of E.
Figure
Figure 66 depicts
depicts thethe NoEB
NoEB as as aa function
functionof ofqqfor
formm==8,8,10. 10.Please
Pleasenotenotethat
thatthethecases
casesq
=q 4,
= 4, m = 8 and q = 3, m = 10 give the ESSM described in [29], and that for q = 0 the
m = 8 and q = 3, m = 10 give the ESSM described in [29], and that for q = 0 we obtain we
performances of the SSM multiplier.
obtain the performances of the SSMInmultiplier.
this analysis, the error
In this performances
analysis, are computed
the error performances
by
aremultiplying
computed by 106multiplying
input samples, 106 expressed on n =expressed
input samples, 16 bits, considering
on n = 16both bits,uniform
consideringand
half-normal distribution with σ = 2048 for the sake of demonstration.
both uniform and half-normal distribution with σ = 2048 for the sake of demonstration. As shown in Figure
6a, shown
As the NoEB in slowly
Figure improves
6a, the NoEB withslowly
q, achieving the best
improves withresult for qopt =the
q, achieving n –best
m – result
1. On the
for
other hand, the NoEB reaches the peak value with q = 5, m =
qopt = n − m − 1. On the other hand, the NoEB reaches the peak value with qopt = 5,
opt 8 and qopt = 3, m = 10, respec-
m=8
tively,
and qopt when
= 3, the
m =inputs are half-normal.
10, respectively, when These results
the inputs areare in agreement
half-normal. withresults
These the analysis
are in
of the previous
agreement with thesection, since
analysis of the
the NoEB
previousreflects the since
section, behavior of thereflects
the NoEB overallthe mean square
behavior of
approximation
the overall meanerror. squareInapproximation
addition, this study confirms
error. In addition, that the
this input
study statistical
confirms thatproperties
the input
affect the properties
statistical quality of affect
results,
theand that of
quality positioning
results, and AMthat
in the middle (as
positioning AMin in[29]) generally
the middle (as
does not achieve the best accuracy.
in [29]) generally does not achieve the best accuracy.

(a) (b)
Figure 6. NoEB
NoEBwith
withrespect
respecttotoq qfor
formm= =8 and
8 andmm= 10 forfor
= 10 (a)(a)
uniform
uniformdistributed inputs
distributed andand
inputs for (b)
for
half-normal distributed
(b) half-normal inputs
distributed inputs(with σ = σ2048).
(with TheThe
= 2048). number
numberof bits of the
of bits inputs
of the is nis= n
inputs 16.= 16.

For the sake of comparison, we also analyze the performances of SSM [29], of cSSM
[30] (with three corrective terms), and of segmented multipliers described in [27,28,31].
The multipliers [13,16,19] are also investigated, which exploit approximate compression.
The works [27,28] employ a dynamic segmentation, whereas [31] employs a hybrid tech-
nique by cascading a static stage and a dynamic stage. In [27] (named DRUM in the fol-
Electronics 2023, 12, 446 11 of 21

For the sake of comparison, we also analyze the performances of SSM [29], of cSSM [30]
(with three corrective terms), and of segmented multipliers described in [27,28,31]. The
multipliers [13,16,19] are also investigated, which exploit approximate compression. The
works [27,28] employ a dynamic segmentation, whereas [31] employs a hybrid technique by
cascading a static stage and a dynamic stage. In [27] (named DRUM in the following), the
parameter k defines the bit-width of the selected segment, whereas [28] (named TOSAM in
the following) exploits a multiply-and-add operation for realizing the product. Here, h bits
of the multiplicands are truncated, and t = h + 4 bits of the addends are discarded for hard-
ware simplification. In [31] (named HSM) the static stage selects p-bit segments, whereas the
dynamic one chooses (p/2)-bit segments. In [16] (referred as Qiqieh in the following), the
parameter L defines the number of rows compressed by an OR gate, whereas [19] (referred
to as AHMA in the following) compresses the PPM with approximate 4–2 compressors. We
highlight that the HDL code of [27,30] is available on [32,33], respectively.
Table 4 collects the error metrics of the investigated multipliers when the inputs are
uniform and half-normal (with σ = 2048), respectively. For the gESSM, we consider the
points q = 5 and q = 7 for m = 8, and q = 3 and q = 5 for m = 10, which achieved best
performances in the previous analysis. Please note that only the case q = 3, m = 10 places
AM at the center of the inputs as in [29].

Table 4. Error metrics of the investigated multipliers for n = 16 bits.

Uniform Distribution Half-Normal Distribution (σ = 2048)


Multiplier
NMED MRED NoEB NMED MRED NoEB
SSM [29] m=8 1.93 ×·10−3 2.08 × 10−2 8.8 8.29 × 10−5 1.87 × 10−1 13.1
m = 10 4.73 × 10−4 3.99 × 10−3 10.8 1.46 × 10−5 1.96 × 10−2 15.3
cSSM [30] m=8 6.70 × 10−4 9.49 × 10−3 10.2 8.28 × 10−5 1.87 × 10−1 13.1
m = 10 1.63 × 10−4 1.73 × 10−3 12.2 1.46 × 10−5 1.96 × 10−2 15.3
TOSAM [28] h=3 2.69 × 10−3 1.05 × 10−2 7.8 6.24 × 10−6 9.98 × 10−3 16.4
h=4 1.34 × 10−3 5.27 × 10−3 8.8 3.13 × 10−6 5.02 × 10−3 17.3
DRUM [27] k=4 1.41 × 10−2 5.89 × 10−2 5.5 3.85 × 10−5 6.20 × 10−2 13.7
k=6 3.51 × 10−3 1.46 × 10−2 7.5 9.53 × 10−6 1.52 × 10−2 15.7
k=8 8.82 × 10−4 3.66 × 10−3 9.5 2.39 × 10−6 3.68 × 10−3 17.7
HSM [31] p=8 1.47 × 10−2 1.03 × 10−1 5.5 3.53 × 10−4 7.05 × 10−1 11.0
p = 10 7.15 × 10−3 3.72 × 10−2 6.5 1.02 × 10−4 1.70 × 10−1 12.3
p = 12 3.51 × 10−3 1.56 × 10−2 7.5 9.76 × 10−6 3.92 × 10−2 15.7
Qiqieh [16] L=2 2.43 × 10−4 2.88 × 10−3 11.0 1.90 × 10−5 3.27 × 10−2 14.0
L=4 1.12 × 10−2 5.90 × 10−2 5.7 5.78 × 10−5 8.35 × 10−2 12.8
Kulkarni [13] 1.39 × 10−2 3.32 × 10−2 4.7 1.28 × 10−5 1.74 × 10−2 14.1
AHMA [19] 2.14 × 10−2 1.18 × 10−1 4.9 1.65 × 10−4 2.42 × 10−1 11.3
gESSM m = 8, q = 5 1.73 × 10−3 9.68 × 10−3 8.8 1.06 × 10−5 2.55 × 10−2 16.1
m = 8, q = 7 1.45 × 10−3 1.19 × 10−2 9.1 4.24 × 10−5 9.93 × 10−2 14.1
m = 10, q = 3 4.26 × 10−4 2.22 × 10−3 10.9 1.64 × 10−6 2.21 × 10−3 18.5
m = 10, q = 5 3.55 × 10−4 2.30 × 10−3 11.1 7.24 × 10−6 9.73 × 10−3 16.3

In the uniform case, the performances of the gESSM are very close to the SSM multi-
plier, with NoEB of about 9 and 11 bits with m = 8 and m = 10, and NMED, MRED in the
ranges [3 × 10−4 , 2 × 10−3 ], [2 × 10−3 , 1.2 × 10−2 ], respectively. A modest improvement is
registered only in the cases q = 7, m = 8 and q = 5, m = 10, as expected from the previous
considerations, with a NoEB increase of 0.3 bits. Among the segmented multipliers, cSSM
offers best accuracy in the uniform case, with NoEB improvement of 1.4 bits with respect
to SSM, and NMED, MRED in the order of 10−4 and 2 × 10−3 (see the case m = 10). The
other implementations exhibit lower performances in general, with NoEB limited between
Electronics 2023, 12, 446 12 of 21

5.5 and 9.5 bits. Only Qiqieh L = 2, using approximate compression technique, is able to
approach a NoEB of 11 bits and NMED, MRED comparable to SSM, cSSM, and gESSM.
On the other hand, the accuracy of the gESSM strongly improves with respect to the
SSM when the inputs are half-normal, exhibiting a NoEB increase up to 3 bits with q = 5,
m = 8, and up to 3.2 bits with q = 3, m = 10. The NMED also improves, achieving values
in the order of 10−5 with q = 5, m = 8, and 10−6 with q = 3, m = 10. Conversely, the cSSM
multiplier does not show improvements, with performances very close to the SSM. Among
the other implementations, DRUM is the only one able to offer an accuracy close to the
gESSM multiplier, with NoEB up to 17.7 bits in the case k = 8.

4.2. Hardware Implementation Results


We synthesize the investigated multipliers in TSMC 28 nm CMOS technology using
0.9V standard voltage library, in Cadence Genus, exploiting a physical flow in order to
improve the accuracy of the estimation of power consumption. For all the circuits, a clock
period of 500ps is considered, whereas the power consumption is computed by simulating
the post-synthesis netlist with 105 input samples, with both uniform and half-normal
distribution (σ = 2048) at a toggle rate of 1GHz. In the simulation, Standard Delay Format
and Toggle Count Format files are used for the annotation of the path delays and of the
switching activity. At the same time, we also assess the minimum delay by synthesizing
each multiplier at the maximum frequency able to allow a positive slack.
Results are collected in Table 5. As shown, the gESSM multipliers allow a reduction
of area up to 71.4% with q = 7, m = 8, and in the range 47%/50% with m = 10. The SSM
and cSSM multipliers offer superior reductions (up to 76% and 75%, respectively), whereas
best results are achieved with DRUM k = 6 and HSM p = 4 (reduction of 84% and 86%,
respectively). We also express the complexity of the circuits in terms of equivalent NAND
count, considering as reference a two-input NAND gate with drive strength 2x and area of
0.63µm2 . Also in this case, the gESSM exhibits remarkable improvements with respect to
the exact multiplier and an acceptable worsening with respect to SSM and cSSM.
The gESSM reduces the minimum delay up to 12.8% with respect to the exact imple-
mentation. On the other hand, the SSM and cSSM produce faster results due to the simpler
segmentation algorithm. The minimum delay of DRUM and HSM increases with k and p,
respectively, up to +8.6%, whereas best performances are achieved with Qiqieh, Kulkarni,
and AHMA (with reductions up to 38%).
In the case of uniform distributed inputs, the gESSM multipliers show remarkable
power savings, ranging between 53.7% and 78.1%. On the other hand, the implementations
SSM and cSSM are able to obtain more than 83% of power reduction with m = 8. DRUM
k = 4 and HSM p = 8 achieve best performances, with improvements in the order of −90%.
When the input is half-normal, the power saving of gESSM is of 71% and 27% in the
optimal points q = 5, m = 8, and q = 3, m = 10, and is larger than 83% in the case q = 7, m = 8.
SSM and cSSM continue to exhibit high power reductions (around 88%/89% with m = 8),
whereas power saving of DRUM k = 4 and HSM p = 8 reaches up 76.2% and 84%.
We underline that, despite the reduced power saving with half-normal distribution
in the optimal points, the gESSM multipliers offer the best accuracy, showing superior
error metrics if compared to the other implementations. Therefore, the loss of electrical
performances is more than compensated by the reduced approximation error.

Table 5. Hardware implementation results of the investigated multipliers for n = 16 bits.

Minimum Equivalent Power @1GHz Power @1GHz


Multiplier Area [µm2 ]
Delay [ps] NAND Count (Uniform Input) (Half-Normal Input)
Exact 336 791.3 1256 1300.3 721.8
SSM [29] m=8 272 (−19.0%) 190.1 (−76.0%) 302 201.4 (−84.5%) 77.6 (−89.2%)
m = 10 313 (−6.8%) 308.6 (−61.0%) 490 346.8 (−73.3%) 281.8 (−61.0%)
Electronics 2023, 12, 446 13 of 21

Table 5. Cont.

Minimum Equivalent Power @1GHz Power @1GHz


Multiplier Area [µm2 ]
Delay [ps] NAND Count (Uniform Input) (Half-Normal Input)
cSSM [30] m=8 272 (−19.0%) 197.4 (−75.0%) 313 214.7 (−83.5%) 85.6 (−88.1%)
m = 10 313 (−6.8%) 352.3 (−55.5) 559 395.0 (−69.6%) 358.6 (−50.3%)
TOSAM [28] h=3 311 (−7.4%) 341.2 (−56.9%) 542 367.1 (−71.8%) 394.5 (−45.3%)
h=4 335 (−0.3%) 494.9 (−37.4%) 786 582.4 (−55.2%) 613.5 (−15.0%)
DRUM [27] k=4 257 (−23.5%) 126.5 (−84.0%) 201 155.9 (−88.0%) 171.8 (−76.2%)
k=6 357 (+6.3%) 241.9 (−69.4%) 384 389.1 (−70.1%) 377.4 (−47.7%)
k=8 365 (+8.6%) 414.3 (−47.6%) 658 691.1 (−46.9%) 656.2 (−9.1%)
HSM [31] p=8 251 (−25.3%) 112.9 (−85.7%) 179 137.3 (−89.4%) 115.7 (−84.0%)
p = 10 354 (+5.4%) 204.8 (−74.1%) 325 306.3 (−76.4%) 339.6 (−53.0%)
p = 12 364 (+8.3%) 347.1 (−56.1%) 551 538.2 (−58.6%) 582.8 (−19.3%)
Qiqieh [16] L=2 262 (−22.0%) 440.8 (−44.3%) 700 578.6 (−55.5%) 330.9 (−54.2%)
L =Electronics
4 21812,(−
2023, 35.1%)
x FOR 271.9 (−65.6%)
PEER REVIEW 432 385.8 (−70.3%) 241.6 (−66.5%)
13 of 21

Kulkarni [13] 289 (−14.0%) 508.9 (−35.7%) 808 620.4 (−52.3%) 364.7 (−49.5%)
AHMA [19] 208 (−38.1%) 282.4 (−64.3%) 448 327.5 (−74.8%) 252.1 (−65.1%)
gESSM m = 8, q = 5 312 (−7.1%) 235.6 (−70.2%) 347 289.9 (−77.7%) 210.63 (−70.8%)
gESSM m = 8, q = 5 312m(− = 7.1%)
8, q = 7 293235.6 (−70.2%)
(−12.8%) 226.0 (−71.4%)347 359 −77.7%)
289.9 (284.4 (−78.1%) 210.63 −70.8%)
122.20((−83.1%)
m = 8, q = 7 293m(−= 12.8%)
10, q = 3 226.0
327 (−71.4%)
(−2.7%) 393.4 (−50.3%)359 624 −78.1%)
284.4 (510.9 (−60.7%) 122.20 −83.1%)
524.8 ((−27.3%)
m = 10, q = 3 327m(− 2.7%)
= 10, q=5 393.4
329 (−50.3%)
(−2.1%) 420.2 (−46.9%)624 667 −60.7%)
510.9 (602.0 (−53.7%) 524.8 −27.3%)
494.6((−31.5%)
m = 10, q = 5 329 (−2.1%) 420.2 (−46.9%) 667 602.0 (−53.7%) 494.6 (−31.5%)
We underline that, despite the reduced power saving with half-normal distribution
in the optimal points, the gESSM multipliers offer the best accuracy, showing superior
4.3. Image Processing
error Application
metrics if compared to the other implementations. Therefore, the loss of electrical
performances is more than compensated by the reduced approximation error.
We study the performances of the investigated multipliers in image filtering applica-
tions. Named I(x,y) the pixel
4.3. Image of the
Processing input image with coordinates x, y, the filtering operation
Application
realizes the relation We study the performances of the investigated multipliers in image filtering applica-
tions. Named I(x,y) the pixel of the input image with coordinates x, y, the filtering opera-
d d
tion realizes the relation
I f ( x, y) = ∑ ∑ I ( x + i, y + j)·h(i + d + 1, j + d + 1)
𝑑 𝑑
(24)
i =−d j=−d
𝐼𝑓 (𝑥, 𝑦) = ∑ ∑ 𝐼(𝑥 + 𝑖, 𝑦 + 𝑗) ∙ ℎ(𝑖 + 𝑑 + 1, 𝑗 + 𝑑 + 1) (24)
𝑖=−𝑑 𝑗=−𝑑
with If (x,y) which is the pixel of the output image, and with h which is the kernel matrix. In
with If(x,y)
our case, we consider a 5 which is the pixelkernel,
× 5 gaussian of the output image, and
hGAUSSIAN withfor
, used h which is the kernel
smoothing matrix.
operations,
In our case, we consider a 5 × 5 gaussian kernel, hGAUSSIAN, used for smoothing operations,
and a 5 × 11 motion kernel, hMOTION
and a 5 × 11 motion
, able to approximate the linear motion of a camera.
kernel, hMOTION, able to approximate the linear motion of a camera.
Figure 7a,b report the coefficients of h GAUSSIAN
Figure 7a,b report the coefficients of hand hMOTION
GAUSSIAN , expressed
and hMOTION , expressedasasinteger numbers
integer numbers
on n = 16 bits. on n = 16 bits.

(a) (b)

(c)

Figure 7. Kernel Figure 7.


matrix forKernel matrix
(a) the for (a) the
gaussian andgaussian
(b) theand (b) thefilter.
motion motion(c)
filter. (c) Histogram
Histogram of occurrencesfor
of occurrences
for the Mandrill image.
the Mandrill image.
For our analysis, we process three test images, Lena, Cameraman, and Mandrill,
whose pixel values are represented on n = 16 bits. For the sake of demonstration, Figure
7c depicts the histogram of occurrences for Mandrill, showing that the probability of as-
suming values in [0, 2n−1] is almost spread across the whole range. We assess the perfor-
mances by exploiting the Mean Structural Similarity Index (SSIM), able to measure the
Electronics 2023, 12, 446 14 of 21

For our analysis, we process three test images, Lena, Cameraman, and Mandrill, whose
pixel values are represented on n = 16 bits. For the sake of demonstration, Figure 7c depicts
the histogram of occurrences for Mandrill, showing that the probability of assuming values
in [0, 2n − 1] is almost spread across the whole range. We assess the performances by
exploiting the Mean Structural Similarity Index (SSIM), able to measure the similarity
between images, and the Peak Signal-to-Noise ratio (PSNR), expressed in dB, taking as
reference the exact filtered image.
Table 6 collects the results, showing the average SSIM and PSNR obtained with the
smoothing and the motion application. In addition, the overall average SSIM and PSNR
are presented for facilitating the comparisons. All the multipliers allow for achieving SSIM
very close to 1, with the static segmented implementations that exhibit best results. The
PSNR of cSSM strongly increases if compared with SSM (up to about +14 dB on average in
the case m = 10), whereas the improvement is more modest with the gESSM (up to +4.1dB
with m = 8 and +6dB with m = 10 on average). Again, the performances of the gESSM
depend on the statistical properties of the input image and on the choice of q.

Table 6. Accuracy performances of the investigated multipliers in image processing applications.

Multiplier Gaussian Filter Motion Filter Average


SSIM PSNR (dB) SSIM PSNR (dB) SSIM PSNR (dB)
SSM [29] m=8 1.000 42.6 1.000 42.7 1.000 42.6
m = 10 1.000 53.4 1.000 55.2 1.000 54.3
cSSM [30] m=8 1.000 58.4 1.000 54.7 1.000 56.5
m = 10 1.000 68.5 1.000 67.6 1.000 68.0
TOSAM [28] h=3 1.000 48.8 0.999 54.8 0.999 51.8
h=4 1.000 63.9 1.000 63.4 1.000 63.7
DRUM [27] k=4 0.984 35.7 0.980 37.9 0.982 36.8
k=6 0.999 51.7 0.999 48.9 0.999 50.3
k=8 1.000 64.8 1.000 62.8 1.000 63.8
HSM [31] p=8 0.982 35.7 0.978 36.0 0.980 35.8
p = 10 0.996 45.2 0.994 45.7 0.995 45.5
p = 12 0.999 51.7 0.999 48.9 0.999 50.3
Qiqieh [16] L=2 1.000 63.9 1.000 65.0 1.000 64.5
L=4 0.982 32.0 0.981 31.2 0.981 31.6
Kulkarni [13] 0.993 39.2 0.997 42.6 0.995 40.9
AHMA [19] 0.950 25.3 0.948 25.4 0.949 25.4
gESSM m = 8, q = 5 1.000 44.4 1.000 45.5 1.000 45.0
m = 8, q = 7 1.000 47.1 1.000 46.3 1.000 46.7
m = 10, q = 3 1.000 53.7 1.000 57.4 1.000 55.5
m = 10, q = 5 1.000 60.0 1.000 60.4 1.000 60.2

The dynamic segmented multipliers exhibit large PSNR with TOSAM and DRUM
(more than 60 dB), whereas performances are limited with HSM. Among multipliers with
approximate compressors, only Qiqieh L = 2 is able to overcome 60dB of PSNR, whereas
Kulkarni and AHMA show lower performances. Figure 8 offers the results obtained with
the segmented multipliers for the Lena image. As shown, the results of gESSM are very
close to the exact case (as demonstrated by the high values of SSIM and PSNR), whereas
some degradations are registered with DRUM k = 4 and HSM p = 8.
The dynamic segmented multipliers exhibit large PSNR with TOSAM and DRUM
(more than 60 dB), whereas performances are limited with HSM. Among multipliers with
approximate compressors, only Qiqieh L = 2 is able to overcome 60dB of PSNR, whereas
Kulkarni and AHMA show lower performances. Figure 8 offers the results obtained with
the segmented multipliers for the Lena image. As shown, the results of gESSM are very
Electronics 2023, 12, 446 15 of 21
close to the exact case (as demonstrated by the high values of SSIM and PSNR), whereas
some degradations are registered with DRUM k = 4 and HSM p = 8.

Electronics 2023, 12, x FOR PEER REVIEW 15 of 21

Original Exact filtered SSM, m = 8 SSM, m = 10 cSSM, m = 8

Electronics 2023, 12, x FOR PEER REVIEW 15 of 21

cSSM, m = 10 TOSAM, h = 3 TOSAM, h = 4 DRUM, k = 4 DRUM, k=6


DRUM, k=8 HSM, p=8 HSM, p=10 HSM, p=12 gESSM, m = 8, q = 5

DRUM, k=8 HSM,


gESSM, m =p=8
8, q = 7 HSM,
gESSM, m =p=10
10, q = 3 HSM,
gESSM, m =p=12
10, q = 5 gESSM, m = 8, q = 5

Figure 8. Lena image filtered by means of segmented multipliers.

4.4. Audio Application


As a further example, we investigate the use of the proposed gESSM and the other
gESSM, m = 8,for
multipliers q = implementing
7 gESSM, man = 10, q = 3 filter.gESSM,
audio Filtering m =is10,
a qwell
= 5 diffused operation in audio

processing,
Figure
Figure8.
8.Lena
Lenaable
imageto realize
image filtered frequency
filteredbybymeans
meansof ofequalization
segmented and noise reduction. In this example,
segmented multipliers.
multipliers.
we elaborate the signal by considering a linear phase, low-pass, generalized Equiripple,
4.4.Audio
187-th
4.4. Audio
order,Application
finite impulse response (FIR) filter, with pass-band up to 0.1667 π rad/sam-
Application
ple andAs stop-band
As aa further from
further example,
example, 0.1958weπinvestigate
we rad/samplethe
investigate with
the use
use attenuation
of the
of of 85dB.
the proposed
proposed The module
gESSM
gESSM theof
andthe
and the
other
other
impulse response
multipliers
multipliers for is shown in an
forimplementing
implementing Figure
anaudio
audio 9afilter.
with Filtering
filter. the taps represented
Filtering isaawell
is as integer
welldiffused
diffused numbers
operation
operation ex-
inaudio
in audio
pressed on
processing, n =
able16 bits.
to realize frequency equalization and noise
processing, able to realize frequency equalization and noise reduction. In this example,reduction. In this example, we
The
elaborate audio
the signal
signal by used for
consideringthis atrial
linearis p232_016.wav,
phase, low-pass, from
we elaborate the signal by considering a linear phase, low-pass, generalized Equiripple, the library
generalized [34].
Equiripple,We also
187-th
superimpose
order, order,
187-th an
finite impulseexternal
finite impulse gaussian
response noise
(FIR)
response with
filter,
(FIR) variance
with withof
pass-band
filter, −30dB and
up to
pass-band upquantize
0.1667 the
π resulting
π rad/sample
to 0.1667 and
rad/sam-
signal on
stop-band n = 16
from bits.
0.1958 The π histogram
rad/sample of occurrences
with attenuation of theof input
85dB.
ple and stop-band from 0.1958 π rad/sample with attenuation of 85dB. The module of the signal,
The depicted
module of in
the Figure
impulse
9b, highlights
response
impulse response a close
is shown to
isinshownhalf-normal
Figure withdistribution.
in9aFigure the taps represented
9a with as integerasnumbers
the taps represented expressedex-
integer numbers on
n = 16 bits.
pressed on n = 16 bits.
The audio signal used for this trial is p232_016.wav, from the library [34]. We also
superimpose an external gaussian noise with variance of −30dB and quantize the resulting
signal on n = 16 bits. The histogram of occurrences of the input signal, depicted in Figure
9b, highlights a close to half-normal distribution.

(a) (b)
Figure
Figure9.9.(a)
(a)Module
Moduleofofthe
theimpulse
impulseresponse
responseofofthe
thelow-pass
low-passFIR
FIRfilter
filterand
and(b)
(b)histogram
histogramofofoccur-
occur-
rences
rencesofofthe
theaudio
audiosignal.
signal.

For
Thethe sakesignal
audio of comparison, we show
used for this trial isthe mean squarefrom
p232_016.wav, errorthe
(MSE) between
library [34]. the
We ap-
also
(a)
proximate
superimpose andanthe exact gaussian
external output for each
noise multiplier.
with (b) and
of −30dB
variance Therefore, the quantize
lower thetheMSE, the
resulting
better the multiplier accuracy.
Figure 9. (a) Module of the impulse response of the low-pass FIR filter and (b) histogram of occur-
Figure
rences of the 10 shows
audio the performances, with multiplications revisited as sign-magnitude
signal.
operations. The results for the gESSM multipliers are highlighted in violet (m = 8) and in
red (m = 10).
For the As
sakeshown, the accuracy
of comparison, we of the gESSM
show again
the mean varies
square in dependence
error (MSE) betweenon q,
thewith
ap-
Electronics 2023, 12, 446 16 of 21

signal on n = 16 bits. The histogram of occurrences of the input signal, depicted in Figure 9b,
highlights a close to half-normal distribution.
For the sake of comparison, we show the mean square error (MSE) between the
approximate and the exact output for each multiplier. Therefore, the lower the MSE, the
better the multiplier accuracy.
Electronics 2023, 12, x FOR PEER REVIEW Figure 10 shows the performances, with multiplications revisited as sign-magnitude
16 of 21
operations. The results for the gESSM multipliers are highlighted in violet (m = 8) and in
red (m = 10). As shown, the accuracy of the gESSM again varies in dependence on q, with
best performance achieved with q = 3, m = 10. In this application, gESSM overcomes cSSM
better
boththan
withthe
m other
= 8 andimplementations, withathe
m = 10, which offer exception
worse MSE. Inof general,
DRUM kthe = 8,gESSM
featuring the
performs
best accuracy in this case.
better than the other implementations, with the exception of DRUM k = 8, featuring the
best accuracy in this case.

Figure 10.10.
Figure MSE between
MSE thethe
between approximate
approximateandand
thethe
exact results.
exact TheThe
results. MSE forfor
MSE gESSM m =m8=and
gESSM m =m =
8 and
10 10
areare
highlighted in violet
highlighted and
in violet red,
and respectively.
red, respectively.

5. Discussion
5. Discussion
AsAs shown
shown in the
in the previous
previous sections,
sections, thethe position
position of of
thethe middle
middle segment
segment AMAaffects
M affects
thethe
accuracy of the multiplier, achieving different results dependent on the
accuracy of the multiplier, achieving different results dependent on the statistical proper- statistical properties
tiesofofthe
theinputs.
inputs. Indeed,
Indeed, thetheaccuracy
accuracy mainly
mainly depends
depends (i) on thethe
(i) on probability of AofofAassuming
probability of as-
values in the ranges [2 m , 2m+q ) and [2m+q , 2n –1], and (ii) on the resolution of A .
suming values in the ranges [2 , 2 ) and [2 , 2 –1], and (ii) on the resolution of AM.
m m+q m+q n M
Figure
Figure 11 11 shows
shows thethe behavior
behavior of P(A
of P(A M) and P(AHP(A
M ) and H ) with
) with respectrespect to m
to q for q for
= 8,min=the
8, in
uniform case and in the half-normal distribution with σ = 2048 and σ = 16,384. We remem-We
the uniform case and in the half-normal distribution with σ = 2048 and σ = 16,384.
berremember that the analytical
that the analytical expressions expressions of P(A
of P(AM) and P(A are P(A
MH))and H ) are
shown shown3.in Table 3.
in Table
In the uniform case (Figure 11a), P(AH ) is very close to 1 for small values of q. There-
fore, A is mainly approximated with AH , with negative effects on the multiplier accuracy.
Increasing q, P(AM ) increases, whereas P(AH ) reduces. This improves the accuracy, since
the probability of approximating A with AM grows up. When q = n − m − 1, P(AM ) equals
P(AH ). As a consequence, the segmentation fairly chooses between AM and AH , allowing
the approximation error to minimize. Therefore, increasing q allows the error performances
to improve with respect to the SSM multiplier. Nevertheless, cSSM exhibits better error
results also considering the optimal gESSM, since the correction technique allows the ap-
proximation error to reduce when AH is chosen. These trends are almost confirmed in the
image processing applications, where the kernels and the input images prefer the selection
of the most significant segments.
As shown in the previous sections, the position of the middle segment AM affects the
accuracy of the multiplier, achieving different results dependent on the statistical proper-
ties of the inputs. Indeed, the accuracy mainly depends (i) on the probability of A of as-
suming values in the ranges [2m, 2m+q) and [2m+q, 2n–1], and (ii) on the resolution of AM.
Figure 11 shows the behavior of P(AM) and P(AH) with respect to q for m = 8, in the
Electronics 2023, 12, 446 17 of 21
uniform case and in the half-normal distribution with σ = 2048 and σ = 16,384. We remem-
ber that the analytical expressions of P(AM) and P(AH) are shown in Table 3.

(a) (b) (c)


Figure 11.
11. P(A
P(AM)) and P(AH) for (a) the uniform distribution and the half-normal distribution in the
Figure M and P(AH ) for (a) the uniform distribution and the half-normal distribution in the
cases (b) σ = 2048 and (c) σσ == 16,384.
cases (b) σ = 2048 and (c) 16,384.

At the same time, the power consumption strongly reduces both with gESSM and
with cSSM and SSM, whereas lower improvements are registered with the other DSM
multipliers. This is mostly due to the employment of leading one detector and encoders,
used to perform the dynamic segmentation. On the other hand, the power saving of gESSM
is slightly weaker than SSM and cSSM due to the different selection mechanism.
The hardware performances of Qiqieh, Kulkarni, and AHMA also show interesting
results, due to the reduced complexity of the PPM compression stage, but at the cost of an
important loss of the quality of results.
When the distribution is half-normal, the overall mean square error presents a min-
imum for small values of σ. Indeed, with reference to the case σ = 2048 in Figure 11b,
P(AM ) increases up to q = 5 and is constant for q > 5. On the other hand, for large values
of q, the resolution of AM worsens. This leads to q = 5 as the optimum point since P(AM )
is maximized with AM that offers the best possible accuracy. Furthermore, Figure 5 of
Section 3.2 shows also that the position of the optimal point depends on the standard
deviation of the inputs: the higher σ, the higher qopt . This is explained in Figure 11c for the
case σ = 16,384, where P(AM ) reaches the peak value only for q = n − m − 1, thus moving
ahead the optimal point.
With reference to the case σ = 2048, the accuracy of cSSM is very close to SSM since the
probability of choosing AH is low and the correction term is practically unused. Conversely,
the gESSM is able to improve the performances with a NoEB of 18.5 bits, also overcoming
the other implementations. This scenario is confirmed by the audio processing analysis.
In this application, the gESSM performs better than the cSSM, achieving a MSE of about
10−8 . From an electrical point of view, the power reduction offered by gESSM is remarkable
when m = 8, and decreases if m = 10. Conversely, SSM and cSSM again exhibit reductions
up to 89.2% and 88.1%, but at the cost of limited accuracy performances.
In order to assess the multipliers considering both the error features and the electrical
performances, we show the plot of the power saving with respect to the NMED and the
MRED for uniform and half-normal inputs (with σ = 2048) in Figure 12. As shown, the
cSSM multipliers are on the pareto front when the inputs are uniform, offering large power
saving with a high quality of results. On the contrary, when the input is half-normal, the
proposed gESSM with q = 5, m = 8 and q = 3, m = 10 define the pareto front for NMED in
the range [9 × 10−6 , 5 × 10−5 ], and MRED in the range [2 × 10−2 , 10−1 ], offering the best
Electronics 2023, 12, 446 18 of 21

Electronics 2023, 12, x FOR PEER REVIEW 18 of 21


trade-off between accuracy and power consumption. Therefore, the gESSM results in the
best choice when the inputs have a non-uniform distribution.

(a) (b)

(c) (d)
Figure 12.
Figure 12. In
In the
the case
case of
of uniform distribution: (a)
uniform distribution: (a) NMED
NMED vs. vs. Power
Power saving
saving and
and (b)
(b) MRED
MRED vs.
vs. Power
Power
saving. In the case of half-normal distribution (σ = 2048): (c) NMED vs. Power saving
saving. In the case of half-normal distribution (σ = 2048): (c) NMED vs. Power saving andand (d) MRED
vs. Power saving.
(d) MRED vs. Power saving.

6. Conclusions
this paper,
In this paper,we wehave
haveanalyzed
analyzedthe theperformances
performances ofof
thethe ESSM
ESSM multiplier
multiplier as aasfunction
a func-
tion
of theofposition
the position
of theofmiddle
the middle
segment segment
and ofand of the statistical
the statistical properties
properties of thesignals.
of the input input
signals.
While theWhile the standard
standard implementation
implementation of the ESSMof the ESSM
places theplaces
middle thesegment
middle atsegment
the centerat the
of
the input,
center weinput,
of the have moved
we have themoved
middlethesegment
middlefrom the LSB
segment fromto the
the LSB
MSBtointheorder
MSB to in
find the
order
configuration best able to best
to find the configuration minimize theminimize
able to mean square approximation
the mean error. To this error.
square approximation aim, two To
design
this aim,parameters
two design were exploited:
parameters m, defining
were exploited: them,accuracy
definingandthe the size ofand
accuracy the multiplier,
the size of
and q, definingand
the multiplier, theq,position
definingof thethe middle
position ofsegments
the middlefor further for
segments error tuning.
further error We have
tuning.
described the hardware
We have described implementation
the hardware of the proposed
implementation of thegESSM,
proposed andgESSM,
we haveand analytically
we have
demonstrated the possibility
analytically demonstrated choosing qoffor
theofpossibility minimizing
choosing q for the overall approximation
minimizing the overall approx- error
in a mean square sense.
imation error in a mean square sense.
The error metrics
The error metrics reveal
revealaastrong
strongdependence
dependenceon onq qand
andonon the
the statistical
statistical properties
properties of
of
thethe input
input signals.
signals. When
When thethe inputsare
inputs areuniform,
uniform,the thebest
bestaccuracy
accuracyisis achieved
achieved whenwhen qq
reaches
reaches thethe maximum
maximum value,value, whereas
whereas minimum
minimum pointspoints arise
arise in
in the half-normal case
the half-normal case (with
(with
σ = 2048). The gESSM is not able to overcome cSSM with uniform distribution,
σ = 2048). The gESSM is not able to overcome cSSM with uniform distribution, but exhibits but exhibits
best
best results
results with
with half-normal
half-normal inputs (achieving NoEB
inputs (achieving NoEB of of 18.5
18.5 bits). These trends
bits). These trends are are also
also
confirmed in image and audio applications, giving best results in audio filtering. The elec-
trical performances also exhibit satisfactory results, with power reductions up to 78% and
83% in the uniform and half-normal cases, respectively.
Electronics 2023, 12, 446 19 of 21

confirmed in image and audio applications, giving best results in audio filtering. The
electrical performances also exhibit satisfactory results, with power reductions up to 78%
and 83% in the uniform and half-normal cases, respectively.
From the comparison of the error metrics and the power saving of Figure 12, the
gESSM results in the best choice when the input signal is non-uniform, offering the best
trade-off between power and accuracy.

Author Contributions: Conceptualization, G.D.M., G.S. and A.G.M.S.; methodology, G.D.M. and G.S.;
software, G.D.M. and G.S.; validation, G.D.M., G.S. and A.G.M.S.; formal analysis, G.D.M., G.S. and
A.G.M.S.; investigation, G.D.M., G.S. and A.G.M.S.; data curation, G.D.M. and G.S.; writing—original
draft preparation, G.D.M., G.S., A.G.M.S. and D.D.C.; writing—review and editing, A.G.M.S. and D.D.C.;
visualization, G.D.M. and G.S.; supervision, A.G.M.S. and D.D.C.; project administration, A.G.M.S. and
D.D.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The Verilog code is available on GitHub at https://github.com/
GenDiMeo/gESSM, accessed on 16 February 2020.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A
Let consider the normal random variable A0 with zero mean and standard deviation
σ. The half-normal random variable A is obtained by computing the absolute value of A0 ,
i.e., A = |A0 |.
In order to compute P(AM ) and P(AH ), let us consider the probability of having A in
the range [0, a]:
Za  
a
P (0 ≤ A ≤ a ) = f ( A)da = er f √ (A1)
σ 2
0

where f (A) is the pdf of A (see (12)), and erf (·) is the error function.
Therefore, observing that P(AM ) = P(0 ≤ A ≤ 2m+q ) − P(0 ≤ A ≤ 2m ) and P(AH ) = P(0
≤ A ≤ 2n −1)−P(0 ≤ A ≤ 2m+q ), we obtain the results shown in Table 3.

Appendix B
In order to compute (23), let us concentrate on the first summation in (22), writing the
following equality:
 !2 
q −1 q −1  q −2 q −1
" #
2
E ∑ ak 2 k  = E ∑ ak 2 k
+ 2 ∑ a k 2k ∑ a j 2 j (A2)
k =0 k =0 k =0 j = k +1

Exploiting the linearity of the expectation operator and the independence between the
bits, we obtain  !2 
q −1 q −1
E ∑ ak 2k  = 1 · ∑ 22k 2
k =0 k =0
" # (A3)
q −2 q −1 q −2 q −1
E 2 ∑ ak 2k ∑ a j 2 j = 21 · ∑ 2k ∑ 2 j
k =0 j = k +1 k =0 j = k +1

under the hypothesis E[ak ] = 1/2.


Therefore, observing that

q −1
1−r q
∑ rk = 1−r
k =0
q −1 q −1 k
(A4)
∑ rk = ∑ r j − ∑ r j
j = k +1 j =0 j =0
Electronics 2023, 12, 446 20 of 21

with r that is a natural number, we have the following expressions after simple algebra:
" #
q −1  2
E ∑ a k 2k = 16 (4q − 1)
k = 0
"
q −2 q −1
# (A5)
1 q q −1
k j
 1 q −1
E 2 ∑ ak 2 ∑ a j 2 = 4 2 2

−1 − 6 4 −1
k =0 j = k +1

Applying the same reasoning for the second summation, we obtain the (23).

References
1. Spagnolo, F.; Perri, S.; Corsonello, P. Approximate Down-Sampling Strategy for Power-Constrained Intelligent Systems. IEEE Access
2022, 10, 7073–7081. [CrossRef]
2. Vaverka, F.; Mrazek, V.; Vasicek, Z.; Sekanina, L. TFApprox: Towards a Fast Emulation of DNN Approximate Hardware
Accelerators on GPU. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE),
Grenoble, France, 9–13 March 2020; pp. 294–297. [CrossRef]
3. Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural
Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [CrossRef]
4. Montanari, D.; Castellano, G.; Kargaran, E.; Pini, G.; Tijani, S.; De Caro, D.; Strollo, A.G.M.; Manstretta, D.; Castello, R. An FDD
Wireless Diversity Receiver With Transmitter Leakage Cancellation in Transmit and Receive Bands. IEEE J. Solid State Circuits
2018, 53, 1945–1959. [CrossRef]
5. Kiayani, A.; Waheed, M.Z.; Antilla, L.; Abdelaziz, M.; Korpi, D.; Syrjala, V.; Kosunen, M.; Stadius, K.; Ryynamen, J.; Valkama, M.
Adaptive Nonlinear RF Cancellation for Improved Isolation in Simultaneous Transmit–Receive Systems. IEEE Trans. Microw.
Theory Tech. 2018, 66, 2299–2312. [CrossRef]
6. Zhang, T.; Su, C.; Najafi, A.; Rudell, J.C. Wideband Dual-Injection Path Self-Interference Cancellation Architecture for Full-Duplex
Transceivers. IEEE J. Solid State Circuits 2018, 53, 1563–1576. [CrossRef]
7. Di Meo, G.; De Caro, D.; Saggese, G.; Napoli, E.; Petra, N.; Strollo, A.G.M. A Novel Module-Sign Low-Power Implementation for
the DLMS Adaptive Filter With Low Steady-State Error. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 297–308. [CrossRef]
8. Meher, P.K.; Park, S.Y. Critical-Path Analysis and Low-Complexity Implementation of the LMS Adaptive Algorithm. IEEE Trans.
Circuits Syst. I Regul. Pap. 2014, 61, 778–788. [CrossRef]
9. Jiang, H.; Liu, L.; Jonker, P.P.; Elliott, D.G.; Lombardi, F.; Han, J. A High-Performance and Energy-Efficient FIR Adaptive Filter
Using Approximate Distributed Arithmetic Circuits. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 313–326. [CrossRef]
10. Esposito, D.; Di Meo, G.; De Caro, D.; Strollo, A.G.M.; Napoli, E. Quality-Scalable Approximate LMS Filter. In Proceedings of the
2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Bordeaux, France, 9–12 December 2018;
pp. 849–852. [CrossRef]
11. Di Meo, G.; De Caro, D.; Petra, N.; Strollo, A.G.M. A Novel Low-Power High-Precision Implementation for Sign–Magnitude
DLMS Adaptive Filters. Electronics 2022, 11, 1007. [CrossRef]
12. Bruschi, V.; Nobili, S.; Terenzi, A.; Cecchi, S. A Low-Complexity Linear-Phase Graphic Audio Equalizer Based on IFIR Filters.
IEEE Signal Process. Lett. 2021, 28, 429–433. [CrossRef]
13. Kulkarni, P.; Gupta, P.; Ercegovac, M. Trading Accuracy for Power with an Underdesigned Multiplier Architecture. In Proceedings
of the 2011 24th Internatioal Conference on VLSI Design, Chennai, India, 2–7 January 2011; pp. 346–351. [CrossRef]
14. Zervakis, G.; Tsoumanis, K.; Xydis, S.; Soudris, D.; Pekmestzi, K. Design-Efficient Approximate Multiplication Circuits Through
Partial Product Perforation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2016, 24, 3105–3117. [CrossRef]
15. Zacharelos, E.; Nunziata, I.; Saggese, G.; Strollo, A.G.M.; Napoli, E. Approximate Recursive Multipliers Using Low Power
Building Blocks. IEEE Trans. Emerg. Top. Comput. 2022, 10, 1315–1330. [CrossRef]
16. Qiqieh, I.; Shafik, R.; Tarawneh, G.; Sokolov, D.; Yakovlev, A. Energy-efficient approximate multiplier design using bit significance-
driven logic compression. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne,
Switzerland, 27–31 March 2017; pp. 7–12. [CrossRef]
17. Esposito, D.; Strollo, A.G.M.; Alioto, M. Low-power approximate MAC unit. In Proceedings of the 2017 13th Conference on Ph.D.
Research in Microelectronics and Electronics (PRIME), Giardini Naxos-Taormina, Italy, 12–15 June 2017; pp. 81–84. [CrossRef]
18. Fritz, C.; Fam, A.T. Fast Binary Counters Based on Symmetric Stacking. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25,
2971–2975. [CrossRef]
19. Ahmadinejad, M.; Moaiyeri, M.H.; Sabetzadeh, F. Energy and area efficient imprecise compressors for approximate multiplication
at nanoscale. Int. J. Electron. Commun. 2019, 110, 152859. [CrossRef]
20. Yang, Z.; Han, J.; Lombardi, F. Approximate compressors for error-resilient multiplier design. In Proceedings of the 2015 IEEE
International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), Amherst, MA, USA,
12–14 October 2015; pp. 183–186. [CrossRef]
Electronics 2023, 12, 446 21 of 21

21. Ha, M.; Lee, S. Multipliers With Approximate 4–2 Compressors and Error Recovery Modules. IEEE Embed. Syst. Lett. 2018,
10, 6–9. [CrossRef]
22. Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Meo, G.D. Comparison and Extension of Approximate 4-2 Compressors for
Low-Power Approximate Multipliers. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 3021–3034. [CrossRef]
23. Park, G.; Kung, J.; Lee, Y. Design and Analysis of Approximate Compressors for Balanced Error Accumulation in MAC Operator.
IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 2950–2961. [CrossRef]
24. Kong, T.; Li, S. Design and Analysis of Approximate 4–2 Compressors for High-Accuracy Multipliers. IEEE Trans. Very Large Scale
Integr. (VLSI) Syst. 2021, 29, 1771–1781. [CrossRef]
25. Jou, J.M.; Kuang, S.R.; Chen, R.D. Design of low-error fixed-width multipliers for DSP applications. IEEE Trans. Circuits Syst. II
Analog. Digit. Signal Process. 1999, 46, 836–842. [CrossRef]
26. Petra, N.; De Caro, D.; Garofalo, V.; Napoli, E.; Strollo, A.G.M. Design of Fixed-Width Multipliers With Linear Compensation
Function. IEEE Trans. Circuits Syst. I Regul. Pap. 2011, 58, 947–960. [CrossRef]
27. Hashemi, S.; Bahar, R.I.; Reda, S. DRUM: A Dynamic Range Unbiased Multiplier for approximate applications. In Proceedings
of the 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Austin, TX, USA, 2–6 November 2015;
pp. 418–425. [CrossRef]
28. Vahdat, S.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable
Approximate Multiplier. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 27, 1161–1173. [CrossRef]
29. Narayanamoorthy, S.; Moghaddam, H.A.; Liu, Z.; Park, T.; Kim, N.S. Energy-Efficient Approximate Multiplication for Digital
Signal Processing and Classification Applications. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2015, 23, 1180–1184. [CrossRef]
30. Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Saggese, G.; Di Meo, G. Approximate Multipliers Using Static Segmentation:
Error Analysis and Improvements. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 2449–2462. [CrossRef]
31. Li, L.; Hammad, I.; El-Sankary, K. Dual segmentation approximate multiplier. Electron. Lett. 2021, 57, 718–720. [CrossRef]
32. GitHub. Available online: https://github.com/scale-lab/DRUM (accessed on 18 April 2020).
33. GitHub. Available online: https://github.com/astrollo/SSM (accessed on 16 February 2020).
34. DataShare. Available online: https://datashare.ed.ac.uk/handle/10283/2791 (accessed on 21 August 2017).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like