Electronics 12 00446 v2
Electronics 12 00446 v2
Article
Design of Generalized Enhanced Static Segment Multiplier
with Minimum Mean Square Error for Uniform and
Nonuniform Input Distributions
Gennaro Di Meo * , Gerardo Saggese , Antonio G. M. Strollo and Davide De Caro
Department of Electrical Engineering and Information Technology, University of Naples Federico II,
80125 Naples, Italy
* Correspondence: [email protected]
Abstract: In this paper, we analyze the performances of an Enhanced Static Segment Multiplier (ESSM)
when the inputs have both uniform and non-uniform distribution. The enhanced segmentation
divides the multiplicands into a lower, a middle, and an upper segment. While the middle segment
is placed at the center of the inputs in other implementations, we seek the optimal position able to
minimize the approximation error. To this aim, two design parameters are exploited: m, defining the
size and the accuracy of the multiplier, and q, defining the position of the middle segment for further
accuracy tuning. A hardware implementation is proposed for our generalized ESSM (gESSM), and an
analytical model is described, able to find m and q which minimize the mean square approximation
error. With uniform inputs, the error slightly improves by increasing q, whereas a large error decrease
is observed by properly choosing q when the inputs are half-normal (with a NoEB up to 18.5 bits for
a 16-bit multiplier). Implementation results in 28 nm CMOS technology are also satisfactory, with
area and power reductions up to 71% and 83%. We report image and audio processing applications,
showing that gESSM is a suitable candidate in applications with non-uniform inputs.
or in the final carry propagate adder of the multiplier. Since the PPM compression stage
is rich in half-adders and full-adders, the approximation of the compression circuit can
lead to a significant hardware improvement. In [13], the authors involve AND and OR
gates to merge the partial product generation stage with the compression step, while [14]
deletes some rows from the PPM at design time. In [15] a recursive approach is proposed,
in which the multiplier is decomposed into small approximate units. The paper [16]
shows a compression scheme in which OR gates substitute half-adders and full-adders,
whereas [17] improves this technique by compensating the mean approximation error.
In [18], fast counters encode the partial products by following a stacking approach, whereas
the works [19–24] analyze multipliers with approximate 4–2 compressors. In these papers,
the full-adders required for the realization of the exact compressor are substituted by simple
logic at the cost of an error in the computation, and the carry chain between compressors
is broken in order to optimize the critical path and to moderate the glitch propagation.
In [20], the authors propose three compressors with different levels of accuracy, while [21]
designs an error recovery module to improve the quality of results. The paper [22] shows
a statistical approach for ordering the partial products in approximate 4–2 compressors,
and analyzes the performances when different compressors are employed in the same
multiplier. In [23], compressors with positive and negative mean error are interleaved in
order to minimize the approximation effects, whereas [24] prefers NAND and NOR gates
to AND and OR gates for achieving high speed performances.
The fixed-width technique is a further approach able to reduce the power, providing a
way to discard some columns of the PPM [25,26]. In this case, properly weighing the partial
products in the truncated PPM reduces the approximation error of the multiplier [26].
Different from the previous works, the segmentation method reduces the bit-width of
the multiplicands with the aim to downsize the multiplier. The papers [27,28] describe a
dynamic segment method (DSM) in which the segment is selected starting from the leading
one bit of the multiplicand. While [27] adds a ‘1’ bit at the least significant position of the
segment for accuracy recovery, ref. [28] revises the multiplication as a multiply-and-add
operation and applies operand truncation for further simplification. On the contrary, the
paper [29] proposes a static segment method (SSM), which reduces the complexity of the
selection mechanism by choosing between two fixed m-bit segments, with n/2 ≤ m < n and
n that is the number of bits of the inputs. At the same time, an Enhanced SSM multiplier
(ESSM) is also proposed in [29], which allows for selecting between three fixed portions of
the inputs: the m most significant bits (MSBs), the m least significant bits (LSBs), and the m
central bits of the inputs. The paper [30] improves the accuracy of the SSM multipliers by
reducing the maximum approximation error, whereas in [31] the authors propose a hybrid
approach in which a static stage is cascaded to a dynamic stage. In these cases, error metric
results reveal satisfactory accuracy when the inputs have uniform distribution, along with
acceptable power improvements with respect to the exact and the DSM multipliers. At
the same time, these works do not offer an analysis with non-uniform distributed input
signals; in addition, the work [29] does not show a detailed analysis of the hardware
implementation of the ESSM multiplier.
In this paper, we analyze the performances of the ESSM multiplier as a function of
the input stochastic distribution and propose a novel implementation able to minimize
the mean square approximation error. Indeed, the statistical properties of a signal affect
the probability of assuming values in a range, giving high probability ranges and low
probability ranges. Starting from this observation, our idea is to properly place the central
segment (named middle segment in the following) in order to minimize the segmentation
error in the high probability ranges. To this aim, two design parameters are exploited: m,
which defines the size of the multiplier, and q, which defines the position of the middle
segment. For the error analysis, we consider inputs with uniform and non-uniform distri-
bution, taking into consideration half-normal signals for demonstration in this last case,
and also describe an analytical model able to find the optimal position qopt that minimizes
the multiplier error in a mean square sense.
Electronics 2023, 12, x FOR PEER REVIEW 3 of 21
distribution, taking into consideration half-normal signals for demonstration in this last
case, and also describe an analytical model able to find the optimal position qopt that min-
Electronics 2023, 12, 446 3 of 21
imizes the multiplier error in a mean square sense.
Simulation results match with the theoretical analysis, exhibiting accuracy perfor-
mances dependent on the input stochastic distribution and on the choice of m and q. Best
errorSimulation
metrics areresults
achievedmatch
withwith the theoretical
the middle segment analysis, exhibiting
placed toward accuracy
the MSBs if theperfor-
inputs
mances dependent on the input stochastic distribution and on the choice of m and q. Best
are uniform, and with middle segment placed at the center of the inputs if the distribution
error metrics areElectrical
is half-normal. achievedanalyses
with thealso
middle
showsegment placed
remarkable toward the
hardware MSBs if the if
improvements inputs
com-
are uniform, and with middle segment placed at the center of the inputs if the distribution is
pared with the exact multiplier, whereas only an acceptable degradation is registered with
half-normal. Electrical analyses also show remarkable hardware improvements if compared
respect to the SSM multipliers. Assessments of image and audio processing applications
with the exact multiplier, whereas only an acceptable degradation is registered with respect
confirm these trends, showing performances that depend on the position of the middle
to the SSM multipliers. Assessments of image and audio processing applications confirm
segment.
these trends, showing performances that depend on the position of the middle segment.
The paper is organized as follows: Section 2 shows the static segment method, also
The paper is organized as follows: Section 2 shows the static segment method, also
describing the correction technique of [30] and the enhanced segmentation presented in
describing the correction technique of [30] and the enhanced segmentation presented
[29]. Then, Section 3 describes the hardware structure of the proposed gESSM, along with
in [29]. Then, Section 3 describes the hardware structure of the proposed gESSM, along
the analytical model used to minimize the mean square value of the approximation error.
with the analytical model used to minimize the mean square value of the approximation
Section 4 shows the results in terms of error metrics, electrical performances, and applica-
error. Section 4 shows the results in terms of error metrics, electrical performances, and
tions in image and audio processing. A comparison with the state-of-the-art is also pro-
applications in image and audio processing. A comparison with the state-of-the-art is
posed.
also Section 5Section
proposed. further 5compares the multipliers
further compares finding thefinding
the multipliers pareto-optimal implemen-
the pareto-optimal
tations, and Section
implementations, 6 concludes
and the paper.the paper.
Section 6 concludes
2. Static
2. Static Segment
Segment Method
Method
2.1.
2.1. Static Segment
Segment Multiplier
Multiplier and
and Correction
Correction Technique
Technique
The
The SSM SSM technique
technique shown
shown in in [29]
[29] provides
provides for selecting m-bit
for selecting m-bit segments
segments from from the the
multiplicands, with n/2
multiplicands, n/2 ≤≤mm< <n,n,ininorder
ordertotoemploy
employ a smaller
a smaller m ×multiplier
mxm m multiplier instead instead of a
of
nxna nxn multiplier.
multiplier. As shown
As shown in Figure
in Figure 1 for
1 for the the unsigned
unsigned signal signal A,nif−the
A, if the n − m(i.e.,
m MSBs MSBs a15,
a14, ..,a15
(i.e., a10, )a14
are, ...,
lowa10the
) areleast
lowsignificant
the least significant m bits
m bits of the of the
input areinput
chosen, areforming
chosen, the forming segment the
segment
AL. On the AL .contrary,
On the contrary,
if any bitif any bit of
of the n-m n-m MSBs
theMSBs is highis high the most
the most significant
significant m bits m bitsare
are selected, forming the segment A . It is worth noting that the
selected, forming the segment AH. ItH is worth noting that the segmentation introduces an segmentation introduces
an error
error when when AHAis is chosen
H chosen since
since thethe bits
bits belongingtotoeAeAare
belonging aretruncated (i.e.,aa5,5 ,a4a,4 …,
truncated(i.e., , . . .a,0 ain
0
in Figure
Figure 1).1).InIn addition,mmisisthe
addition, theonly
onlyparameter
parameterableabletotodefine
define the
the accuracy
accuracy andand the the sizesize of of
the
the multiplier.
multiplier.
Figure 1.
Figure 1. Segmentation
Segmentation of
of the
the signal
signal A
A with
with nn == 16 bits and m = 10 bits.
Then,
Then, defining α asthe
αAAas theOR ORbetween
betweenthe then n–m− MSBs
m MSBs ofthe
of A, A, the segmented
segmented input
input Assm
A
isssm is
AL i f α A = 0
Assm = 𝐴 𝑖𝑓𝛼 = 0 (1)
𝐴𝑠𝑠𝑚 =A{H i𝐿f α A𝐴 = 1 (1)
𝐴𝐻 𝑖𝑓𝛼𝐴 = 1
A similar expression holds also for the input B and the corresponding segment Bssm .
A similar expression holds also for the input B and the corresponding segment Bssm.
Then, the segmented multiplication is
Then, the segmented multiplication is
SHa,ssm SHb,ssm
γ𝛾ssm
𝑠𝑠𝑚 =
= (𝐴
A ssm
𝑠𝑠𝑚 ·∙22 𝑆𝐻𝑎,𝑠𝑠𝑚 )· (𝐵
∙ Bssm
𝑠𝑠𝑚· 2∙ 2 𝑆𝐻𝑏,𝑠𝑠𝑚 )= ( A
= (𝐴ssm
𝑠𝑠𝑚· B∙ ssm )·2)SHssm
𝐵𝑠𝑠𝑚 ∙ 2𝑆𝐻𝑠𝑠𝑚 (2)
(2)
with SHa,ssm, SHb,ssm that are
with SHa,ssm, SHb,ssm that are
0 𝑖𝑓𝛼𝐴 = 0
𝑆𝐻𝑎, 𝑠𝑠𝑚 = { 𝑛 − 𝑚 𝑖𝑓𝛼𝐴 = 1
0 i f αA = 0 (3)
SHa, ssm = 0 𝑖𝑓𝛼𝐵 = 0
n −
𝑆𝐻𝑏, 𝑠𝑠𝑚f =α A{ 𝑛=−1𝑚
m i
𝑖𝑓𝛼𝐵 = 1 (3)
0 i f αB = 0
SHb, ssm =
n − m i f αB = 1
Electronics 2023, 12, 446 4 of 21
and SH = SHa,ssm + SHb,ssm, defining the left-shift used to express the result on 2·n bits:
SHssm = n − m i f α A = 0 , α B = 1 or i f α A = 1 , α B 0= 0
0 𝑖𝑓𝛼 𝐴 = 0 , 𝛼𝐵 = (4)
𝑆𝐻𝑠𝑠𝑚 = {𝑛 − 𝑚 𝑖𝑓𝛼𝐴 = 0 , 𝛼𝐵 = 1 𝑜𝑟 𝑖𝑓𝛼𝐴 = 1 , 𝛼𝐵 = 0 (4)
2·(n − m) i f α A = 1 , αB = 1
2 ∙ (𝑛 − 𝑚) 𝑖𝑓𝛼𝐴 = 1 , 𝛼𝐵 = 1
Figure
Figure 2a depicts
2a depicts the the hardware
hardware implementation
implementation of the
of the SSMSSM multiplier.
multiplier. TheThe multi-
multiplex-
plexers
ers on A andonBAapply
and Btheapply the segmentation
segmentation choosing
choosing between between the most
the most significant
significant and
and least
least significant portions of the inputs, whereas two OR gates compute
significant portions of the inputs, whereas two OR gates compute the selection flags αA the selection flags
andααABand αB. After
. After the mthe
×m mxm multiplier,
multiplier, a further
a further multiplexerrealizes
multiplexer realizesthe
the left-shift
left-shift described
described
in
in (4). (4).
(a) (b)
Figure
Figure 2. Approximate
2. Approximate multiplier
multiplier with
with (a)(a) static
static segmentmethod
segment methodand
and(b)
(b) segmented
segmented multiplier
multiplierwith
with
the correction technique of [30].
the correction technique of [30].
TheThe accuracy
accuracy of the
of the SSMSSM multiplier
multiplier is improved
is improved in [30]
in [30] byby minimizing
minimizing thethe approxi-
approxima-
mation error in the case α A = 1, αB = 1 (i.e., when both inputs are truncated). Here, the
tion error in the case αA = 1, αB = 1 (i.e., when both inputs are truncated). Here, the authors
authors
estimate theestimate the committed
committed error as error as
𝑚−1
2𝑛−2𝑚 m −1 𝑘
CT 𝐶𝑇
= 2=2n2−2m · ∑
∙∑ctk𝑐𝑡
2k𝑘 2 (5)
(5)
k =0𝑘=0
with
with
𝐶𝑡(𝑘a=
Ctk = (𝑎 m bn−m
k +n−𝑘+𝑛−𝑚 𝑏𝑛−𝑚−1 )𝑂𝑅(𝑏
−1 )OR −m an𝑎
(bk+n𝑘+𝑛−𝑚 m −1 ) )
−𝑛−𝑚−1 (6)
(6)
andand
addadd
CT CT to the
to the approximate
approximate product
product forfor compensation:
compensation:
𝛾𝑠𝑠𝑚,𝑐 = (𝐴𝑠𝑠𝑚 ∙ 𝐵𝑠𝑠𝑚 + 𝐶𝑇) ∙ 2𝑆𝐻 (7)
γssm,c = ( Assm · Bssm + CT )·2SH (7)
As detailed in [30], using two or three terms of the summation (5) sufficiently im-
proves the accuracy.
As detailed in [30], using two or three terms of the summation (5) sufficiently improves
Figure 2b shows the implementation of the corrected SSM multiplier (named cSSM
the accuracy.
inFigure
the following).
2b showsThe the correction term CT
implementation of isthe
combined
corrected with
SSM the product A(named
multiplier ssm·Bssm if αA = in
cSSM 1
and α = 1 (see the AND gate highlighted in red). It is also worth noting
the following). The correction term CT is combined with the product Assm ·Bssm if αA = 1
B that the correction
andtechnique
αB = 1 (seehas thea AND
minimum impact on the
gate highlighted hardware
in red). performances
It is also since
worth noting a fused
that PPM is
the correction
employed for realizing the (7).
technique has a minimum impact on the hardware performances since a fused PPM is
employed for realizing the (7).
2.2. Enhanced SSM Multiplier
2.2. Enhanced SSM Multiplier
The ESSM multiplier described in [29] allows for selecting between three segments
ofThe
theESSM
input, multiplier
each one having m bits
described in(see
[29]Figure
allows3a).for In this implementation,
selecting between threethe middle
segments
of the input, each one having m bits (see Figure 3a). In this implementation, the respect
segment A M is placed at the center of the signal (i.e., (n – m)/2 bits on the left with middle
segment AM is placed at the center of the signal (i.e., (n − m)/2 bits on the left with respect
x FOR PEER REVIEW
Electronics 2023, 12, 446 5 5of
of 21
the LSB,
to the LSB, see
see the
thefigure).
figure).AsAsthe
theposition
position AMAis
of of m ismagain
is fixed,
M fixed, is again
the the
onlyonly design
design pa-
parameter
rameter which
which defines
defines thethe accuracy
accuracy and
and thethe size
size of of
thethe multiplier.
multiplier.
(a)
(b)
Figure
Figure 3.
3. Segmentation
Segmentation of
of the
the input A in
input A in the
the case
case n
n== 16
16 and
and m
m == 88 with
with (a)
(a) the
the ESSM
ESSM method
method of
of [29]
[29]
and (b) the proposed generalized ESSM method in the case q = 5.
and (b) the proposed generalized ESSM method in the case q = 5.
In
In this
this case,
case, two two control
control flags flags areare required
requiredfor forthetheselection,
selection,named namedαα AH and αAM in
AH and αAM
the following. Therefore, defining α as the OR of
in the following. Therefore, defining αAH as the OR of the first (n − m)/2 MSBs
AH the first (n–m)/2 MSBs of A (i.e., a15of
, a14A,
…,
(i.e.,a12, highlighted
a15 , a14 , . . . , a12 in blue in Figure
, highlighted in blue 3a),
inand
Figure αAM3a), as and
the α ORAMof asthethe remaining (n – m)/2
OR of the remaining
MSBs
(n − m)/2 (i.e., MSBs
a11, a10(i.e.,
, …, a811, highlighted
, a10 , . . . , a8 ,in green in Figure
highlighted in green 3a),inthe segment
Figure Aessm
3a), the is computed
segment Aessm
as
is computed as
AL 𝐴 i f 𝑖𝑓 AH , α
(α(𝛼 AM ) = (0, 0)
𝐿 𝐴𝐻 , 𝛼𝐴𝑀 ) = (0,0)
Aessm = A M𝐴 i f𝑖𝑓(α(𝛼AH ,,α𝛼AM)) = (0,1)
(0, 1) (8)
𝐴𝑒𝑠𝑠𝑚= { 𝑀 𝐴𝐻 𝐴𝑀 = (8)
A H i f (α AH , α AM ) = ( 1, 0 ) or ( 1, 1 )
𝐴𝐻 𝑖𝑓 (𝛼𝐴𝐻 , 𝛼𝐴𝑀 ) = (1,0) 𝑜𝑟 (1,1)
A similar expression holds also for the segment Bessm , with the flags αBH , αBM that
A similar expression holds also for the segment Bessm, with the flags αBH, αBM that
handle the segmentation.
handle the segmentation.
Therefore, the approximate product is
Therefore, the approximate product is
𝛾γ𝑒𝑠𝑠𝑚
essm = essm∙·2
= (𝐴A𝑒𝑠𝑠𝑚 2SHa,essm
𝑆𝐻𝑎,𝑒𝑠𝑠𝑚 ) · (𝐵
∙ Bessm
𝑒𝑠𝑠𝑚· ∙
2 SHb,essm
2 𝑆𝐻𝑏,𝑒𝑠𝑠𝑚 )= ((𝐴
= Aessm𝑒𝑠𝑠𝑚· B∙ essm
𝐵𝑒𝑠𝑠𝑚)·2)SHessm
∙ 2𝑆𝐻𝑒𝑠𝑠𝑚 (9)
with SHa,essm, SHb,essm that are
with SHa,essm, SHb,essm that are
0 𝑖𝑓 (𝛼𝐴𝐻 , 𝛼𝐴𝑀 ) = (0,0)
0 i f
= { (𝑛 − 𝑚)/2 AH
𝑆𝐻𝑎, 𝑒𝑠𝑠𝑚
( α 𝑖𝑓 ,(𝛼 = (0, 0)
𝐴𝐻 , )𝛼𝐴𝑀 ) = (0,1)
α AM
SHa, essm= (n −𝑛m−)/2𝑚 𝑖𝑓i f(𝛼(α𝐴𝐻AH, 𝛼, 𝐴𝑀 ) =) =
α AM (1,0) 1) (1,1)
(0, 𝑜𝑟
(10)
(𝛼 (1, 0)(0,0)
)
n−m 0 𝑖𝑓
i f (α AH , α AM , 𝛼
𝐵𝐻) =𝐵𝑀 = or (1, 1)
𝑆𝐻𝑏, 𝑒𝑠𝑠𝑚 = { (𝑛 − 𝑚)/2 𝑖𝑓 (𝛼𝐵𝐻 , 𝛼𝐵𝑀 ) = (0,1)
(10)
0 𝑛 − 𝑚 𝑖𝑓i f(𝛼
(α BH, 𝛼, α BM ) = (0, 0)
𝐵𝐻 𝐵𝑀 ) = (1,0) 𝑜𝑟 (1,1)
SHb, essm= (n − m)/2 i f (α BH , α BM ) = (0, 1)
and SHessm defined in Table
1.
n−m i f (α BH , α BM ) = (1, 0) or (1, 1)
Table 1. Left-shift for the ESSM multiplier.
and SHessm defined in Table 1.
αAH, αAM, αBH, αBM SHessm
Table 1. Left-shift for the ESSM multiplier.
(0000) 0
αAH , αAM , (0100)
(0001), αBH , αBM (n – m)/2
SHessm
(0010), (0011), (0101),
(0000) (1000), (1100) n –0 m
(0110), (0111),
(0001), (1001),
(0100) (1101) (3/2)·
(n −(n – m)
m)/2
(0010), (0011), (0101), (1000),
(1010), (1011), (1110), (1100)
(1111) n − m
2·(n –m)
(0110), (0111), (1001), (1101) (3/2)·(n − m)
(1010), (1011), (1110), (1111) 2·(n –m)
As shown in the table, the left-shift SHessm ranges between five possible values, thus
requiring a 5 × 1 multiplexer to extend the result on 2·n bits.
Electronics 2023, 12, 446 6 of 21
As shown in the table, the left-shift SHessm ranges between five possible values, thus
requiring a 5 × 1 multiplexer to extend the result on 2·n bits.
Likewise, the approximate product is computed as in (9) with the final left-shift
SHessm defined in Table 2. Now, SHessm ranges between six possible values, thus calling
for a 6 × 1 multiplexing scheme.
two 3 × 1 multiplexers, where the first multiplexer applies the shift SHa,essm due to the
flags αAH , αAM , and the second one applies the shift SHb,essm due to αBH , αBM . It is worth
noting that this approach prevents the usage of large multiplexers with beneficial effects
noting that this approach prevents the usage of large multiplexers with beneficial effects on
on the hardware performances of the multiplier.
the hardware performances of the multiplier.
Figure 4. BlockFigure
diagram of the
4. Block proposed
diagram of thegeneralized ESSM multiplier.
proposed generalized ESSM multiplier.
0i f A < 2m 0i f B < 2m
q −1
q −1
∑ a k 2k i f 2m ≤ A < 2m + q ∑ bk 2 k i f 2 m ≤ B < 2 m + q
eA = k =0 eB k =0 (15)
n − m −1 n − m −1
∑ ak 2 i f 2 ∑ bk 2 i f 2
k m+q
A 2n − 1 k m+q
≤ B ≤ 2n − 1
k =0 k =0
Since the gESSM computes only the term A’essm ·B’essm , the segmentation error is:
0 0
eessm = Aessm ·e B + Bessm ·e A + e A ·e B (17)
Neglecting the small term eA ·eB for the sake of simplicity, we compute the mean square
approximation error by squaring (18) and by using the expectation operator:
h i h i h i h i h i
E e2essm = E A2 · E e2B + E B2 · E e2A + 2· E[ A·e A ]· E[ A·e B ] (19)
Since A and B have the same distribution, we have E[A2] = E[B2], as well as E[e2 A] = E[e2 B]
and E[A·e2 A] = E[B·e2 B] for the previous hypothesis. Therefore, Equation (19) becomes
h i h i h i
E e2essm = 2· E A2 · E e2A + 2· E[ A·e A ]2 (20)
Here, E[A2 ] depends on the statistic of the input signal, whereas E[e2 A ], which is the
mean square value of the approximation error committed on A, depends on m and q. Then,
as suggested by the above inequalities, minimizing the upper limit (i.e., minimizing E[e2 A ])
minimizes the overall mean square approximation error of the multiplier.
Starting from (15), we can write E[e2 A ] as follows:
!2 !2
h i q −1 n − m −1
E e2A = E ∑ ak 2k · P( A M ) + E ∑ a k 2k · P ( A H ) (22)
k =0 k =0
with P(AM ) and P(AH ) that are the probability of having A in the ranges [2m , 2m+q ) and
[2m+q , 2n − 1], respectively. Table 3 collets the expressions of P(AM ) and P(AH ) for the
uniform and the half-normal cases, where erf (·) represents the so-called error function
(details on the computation are reported in Appendix A for the half-normal case).
Electronics 2023, 12, x FOR PEER REVIEW 9 of 21
1 1 1
𝐸[𝑒𝐴2 ] = [ (4𝑞 − 1) + 21 (2𝑞−1 − 1) − (4𝑞−1 − 1)] ∙ 𝑃(𝐴𝑀 )
6 4 6
Table 3. Probability of selecting AM1 and AH as a function
1 𝑛−𝑚of the input distribution.
1
𝑛−𝑚 (23)
+ [ (4 − 1) + 2 (2𝑛−𝑚−1 − 1) − (4𝑛−𝑚−1 − 1)]
Input Stochastic Distribution 6 4
P(AM ) 6 P(AH )
∙ 𝑃(𝐴𝐻 )
1 1
2m+q −2m 2n− 1 − 2m+q
2n −1 · 1 ·
Uniform 2n −
with P(AM)Half-normal
and P(AH) that also depend on mm+qand q (see mTable 3).er f 2n√−1 − er f 2m√+q
er f 2 √ − er f 2√
The behavior of E[e2A] with respect toσm2 and q is shown σ 2 in Figure 2 compared
σ 5, σ 2 to the
simulation results. In this study, the input A is an n = 16 bits integer signal with uniform
We underline
distribution in Figurethat 5a,
the and
presence of P(AM
half-normal ) and P(AHwith
distribution ) in (22) highlights
σ = 1024, 2048, the
andrelation
16,384 in
between the approximation error and the stochastic distribution
Figure 5b–d. We achieve the simulation results by segmenting 10 input samples of 6the inputs. of A and
Solving the the
by computing expectations
mean square in (22),
valueweoffind
the the following expression
approximation error. for E[e2 A ] (refer to
Appendix B for details):
As shown, the theoretical results perfectly match with the simulations. For fixed m,
increasing
2 hq1 decreases E[e2A] in the uniform case. Therefore,
q − 1) + 1 21 2q−1 − 1 − 1 4q−1 − 1 · P ( A )
i the optimal point qopt, able to
E e
minimize = E[e ( 4
6 A], is value 6of q (that is qopt = M
A 2
h the maximum
4 n – m – 1). On
i
the other hand,
(23)
n−Figure
+ 16 (4in −m 2optimal
+ 14 2nwith
m − 1) 5b,c, n−m−1 −points −m=−4,
1 − 61in4qnopt 1−
E[e2A] shows minima m1= 8· Pand
( A Hqopt
) = 2, m =
10 for σ=1024, and qopt = 5, m = 8 and qopt = 3, m = 10 for σ = 2048. When σ becomes large,
E[e2P(A
with A] again
M ) anddecreases with
P(AH ) that q, making
also dependqon = nm–and
m –q1(see
the best
Tablechoice.
3).
The behavior of aE[e 2 respectoftoqmisand q is shown importance
In conclusion, A ] with
proper selection of paramount in Figure 5, for compared
optimizing to the
the
simulation
accuracy of results. In this study,
the multiplier, theto
leading input A isthe
placing n = 16 bits
an middle integer
segment insignal with uniform
any position between
distribution
the LSB andinthe Figure
MSB5a, and
of A (inhalf-normal
contrast with distribution
[29] whichwithalwaysσ =fixes
1024,the2048, and segment
middle 16,384 inat
Figure 5b–d.ofWe
the center theachieve
input).the simulation
In addition, theresults by segmenting
statistical properties 10 of6the
input samples
input signals A and
ofstrongly
byaffect
computing the mean square value of the approximation
the optimal value of q, as demonstrated by the results of Figure 5. error.
(a) (b)
(c) (d)
Figure5.5.Mean
Figure Mean square
square error on on the
theinput
inputsignal
signalAAasasa function of m
a function of and q with
m and (a) uniform
q with distri-
(a) uniform
bution andand
distribution half-normal
half-normaldistribution in in
distribution thethe
cases of of
cases (b)(b)
σσ= 1024, (c)(c)σ σ= =
= 1024, 2048,
2048,and
and(d)
(d)σσ==16,384.
16,384.In
Inthis
thisexample,
example,AAisisan
aninteger
integersignal
signalexpressed
expressedon onnn==16 16bits.
bits.
Electronics 2023, 12, 446 10 of 21
As shown, the theoretical results perfectly match with the simulations. For fixed m,
increasing q decreases E[e2 A ] in the uniform case. Therefore, the optimal point qopt , able to
minimize E[e2 A ], is the maximum value of q (that is qopt = n − m − 1). On the other hand,
E[e2 A ] shows minima in Figure 5b,c, with optimal points in qopt = 4, m = 8 and qopt = 2,
m = 10 for σ=1024, and qopt = 5, m = 8 and qopt = 3, m = 10 for σ = 2048. When σ becomes
large, E[e2 A ] again decreases with q, making q = n − m − 1 the best choice.
In conclusion, a proper selection of q is of paramount importance for optimizing the
accuracy of the multiplier, leading to placing the middle segment in any position between
the LSB and the MSB of A (in contrast with [29] which always fixes the middle segment at
Electronics 2023, 12, x FOR PEER REVIEW 10 of 21
the center of the input). In addition, the statistical properties of the input signals strongly
affect the optimal value of q, as demonstrated by the results of Figure 5.
4. Results
4. Results
4.1. Assessment of Accuracy
4.1. Assessment of Accuracy
We study the accuracy of the gESSM by exploiting the error metrics commonly used
We study the accuracy of the gESSM by exploiting the error metrics commonly used
in the literature. To this end, let us define the approximation error E = Y − Yapprx and
in the literature. To this end, let us define the approximation error E = Y – Yapprx and the
the Error Distance ED = |E|, with Y and Yapprx that are the exact and the approximate
Error
product. Distance
Naming ED =avg(
|E|,·) with
and Y andthe
Ymax Yapprx that areoperator
average the exactand andthethe maximum
approximate product.
value of Y,
Naming avg(·) and Y max the average operator and the maximum
respectively, with Ymax = (2n − 1)2 , we define the Normalized Mean Error Distance as value of Y, respectively,
with
NMED Ymax = (2n − 1)2, we
= avg(ED)/Y define the Normalized Mean Error Distance as NMED =
max , the Mean Relative Error Distance as avg(ED/Y), and the Number
avg(ED)/Y max, the Mean Relative Error Distance as avg(ED/Y), and the Number of Effective
of Effective Bits as NoEB = 2·n − log2 (1 − Erms ), with Erms being the root mean square
Bits
value asofNoEB
E. = 2·n − log2(1 − Erms), with Erms being the root mean square value of E.
Figure
Figure 66 depicts
depicts thethe NoEB
NoEB as as aa function
functionof ofqqfor
formm==8,8,10. 10.Please
Pleasenotenotethat
thatthethecases
casesq
=q 4,
= 4, m = 8 and q = 3, m = 10 give the ESSM described in [29], and that for q = 0 the
m = 8 and q = 3, m = 10 give the ESSM described in [29], and that for q = 0 we obtain we
performances of the SSM multiplier.
obtain the performances of the SSMInmultiplier.
this analysis, the error
In this performances
analysis, are computed
the error performances
by
aremultiplying
computed by 106multiplying
input samples, 106 expressed on n =expressed
input samples, 16 bits, considering
on n = 16both bits,uniform
consideringand
half-normal distribution with σ = 2048 for the sake of demonstration.
both uniform and half-normal distribution with σ = 2048 for the sake of demonstration. As shown in Figure
6a, shown
As the NoEB in slowly
Figure improves
6a, the NoEB withslowly
q, achieving the best
improves withresult for qopt =the
q, achieving n –best
m – result
1. On the
for
other hand, the NoEB reaches the peak value with q = 5, m =
qopt = n − m − 1. On the other hand, the NoEB reaches the peak value with qopt = 5,
opt 8 and qopt = 3, m = 10, respec-
m=8
tively,
and qopt when
= 3, the
m =inputs are half-normal.
10, respectively, when These results
the inputs areare in agreement
half-normal. withresults
These the analysis
are in
of the previous
agreement with thesection, since
analysis of the
the NoEB
previousreflects the since
section, behavior of thereflects
the NoEB overallthe mean square
behavior of
approximation
the overall meanerror. squareInapproximation
addition, this study confirms
error. In addition, that the
this input
study statistical
confirms thatproperties
the input
affect the properties
statistical quality of affect
results,
theand that of
quality positioning
results, and AMthat
in the middle (as
positioning AMin in[29]) generally
the middle (as
does not achieve the best accuracy.
in [29]) generally does not achieve the best accuracy.
(a) (b)
Figure 6. NoEB
NoEBwith
withrespect
respecttotoq qfor
formm= =8 and
8 andmm= 10 forfor
= 10 (a)(a)
uniform
uniformdistributed inputs
distributed andand
inputs for (b)
for
half-normal distributed
(b) half-normal inputs
distributed inputs(with σ = σ2048).
(with TheThe
= 2048). number
numberof bits of the
of bits inputs
of the is nis= n
inputs 16.= 16.
For the sake of comparison, we also analyze the performances of SSM [29], of cSSM
[30] (with three corrective terms), and of segmented multipliers described in [27,28,31].
The multipliers [13,16,19] are also investigated, which exploit approximate compression.
The works [27,28] employ a dynamic segmentation, whereas [31] employs a hybrid tech-
nique by cascading a static stage and a dynamic stage. In [27] (named DRUM in the fol-
Electronics 2023, 12, 446 11 of 21
For the sake of comparison, we also analyze the performances of SSM [29], of cSSM [30]
(with three corrective terms), and of segmented multipliers described in [27,28,31]. The
multipliers [13,16,19] are also investigated, which exploit approximate compression. The
works [27,28] employ a dynamic segmentation, whereas [31] employs a hybrid technique by
cascading a static stage and a dynamic stage. In [27] (named DRUM in the following), the
parameter k defines the bit-width of the selected segment, whereas [28] (named TOSAM in
the following) exploits a multiply-and-add operation for realizing the product. Here, h bits
of the multiplicands are truncated, and t = h + 4 bits of the addends are discarded for hard-
ware simplification. In [31] (named HSM) the static stage selects p-bit segments, whereas the
dynamic one chooses (p/2)-bit segments. In [16] (referred as Qiqieh in the following), the
parameter L defines the number of rows compressed by an OR gate, whereas [19] (referred
to as AHMA in the following) compresses the PPM with approximate 4–2 compressors. We
highlight that the HDL code of [27,30] is available on [32,33], respectively.
Table 4 collects the error metrics of the investigated multipliers when the inputs are
uniform and half-normal (with σ = 2048), respectively. For the gESSM, we consider the
points q = 5 and q = 7 for m = 8, and q = 3 and q = 5 for m = 10, which achieved best
performances in the previous analysis. Please note that only the case q = 3, m = 10 places
AM at the center of the inputs as in [29].
In the uniform case, the performances of the gESSM are very close to the SSM multi-
plier, with NoEB of about 9 and 11 bits with m = 8 and m = 10, and NMED, MRED in the
ranges [3 × 10−4 , 2 × 10−3 ], [2 × 10−3 , 1.2 × 10−2 ], respectively. A modest improvement is
registered only in the cases q = 7, m = 8 and q = 5, m = 10, as expected from the previous
considerations, with a NoEB increase of 0.3 bits. Among the segmented multipliers, cSSM
offers best accuracy in the uniform case, with NoEB improvement of 1.4 bits with respect
to SSM, and NMED, MRED in the order of 10−4 and 2 × 10−3 (see the case m = 10). The
other implementations exhibit lower performances in general, with NoEB limited between
Electronics 2023, 12, 446 12 of 21
5.5 and 9.5 bits. Only Qiqieh L = 2, using approximate compression technique, is able to
approach a NoEB of 11 bits and NMED, MRED comparable to SSM, cSSM, and gESSM.
On the other hand, the accuracy of the gESSM strongly improves with respect to the
SSM when the inputs are half-normal, exhibiting a NoEB increase up to 3 bits with q = 5,
m = 8, and up to 3.2 bits with q = 3, m = 10. The NMED also improves, achieving values
in the order of 10−5 with q = 5, m = 8, and 10−6 with q = 3, m = 10. Conversely, the cSSM
multiplier does not show improvements, with performances very close to the SSM. Among
the other implementations, DRUM is the only one able to offer an accuracy close to the
gESSM multiplier, with NoEB up to 17.7 bits in the case k = 8.
Table 5. Cont.
Kulkarni [13] 289 (−14.0%) 508.9 (−35.7%) 808 620.4 (−52.3%) 364.7 (−49.5%)
AHMA [19] 208 (−38.1%) 282.4 (−64.3%) 448 327.5 (−74.8%) 252.1 (−65.1%)
gESSM m = 8, q = 5 312 (−7.1%) 235.6 (−70.2%) 347 289.9 (−77.7%) 210.63 (−70.8%)
gESSM m = 8, q = 5 312m(− = 7.1%)
8, q = 7 293235.6 (−70.2%)
(−12.8%) 226.0 (−71.4%)347 359 −77.7%)
289.9 (284.4 (−78.1%) 210.63 −70.8%)
122.20((−83.1%)
m = 8, q = 7 293m(−= 12.8%)
10, q = 3 226.0
327 (−71.4%)
(−2.7%) 393.4 (−50.3%)359 624 −78.1%)
284.4 (510.9 (−60.7%) 122.20 −83.1%)
524.8 ((−27.3%)
m = 10, q = 3 327m(− 2.7%)
= 10, q=5 393.4
329 (−50.3%)
(−2.1%) 420.2 (−46.9%)624 667 −60.7%)
510.9 (602.0 (−53.7%) 524.8 −27.3%)
494.6((−31.5%)
m = 10, q = 5 329 (−2.1%) 420.2 (−46.9%) 667 602.0 (−53.7%) 494.6 (−31.5%)
We underline that, despite the reduced power saving with half-normal distribution
in the optimal points, the gESSM multipliers offer the best accuracy, showing superior
4.3. Image Processing
error Application
metrics if compared to the other implementations. Therefore, the loss of electrical
performances is more than compensated by the reduced approximation error.
We study the performances of the investigated multipliers in image filtering applica-
tions. Named I(x,y) the pixel
4.3. Image of the
Processing input image with coordinates x, y, the filtering operation
Application
realizes the relation We study the performances of the investigated multipliers in image filtering applica-
tions. Named I(x,y) the pixel of the input image with coordinates x, y, the filtering opera-
d d
tion realizes the relation
I f ( x, y) = ∑ ∑ I ( x + i, y + j)·h(i + d + 1, j + d + 1)
𝑑 𝑑
(24)
i =−d j=−d
𝐼𝑓 (𝑥, 𝑦) = ∑ ∑ 𝐼(𝑥 + 𝑖, 𝑦 + 𝑗) ∙ ℎ(𝑖 + 𝑑 + 1, 𝑗 + 𝑑 + 1) (24)
𝑖=−𝑑 𝑗=−𝑑
with If (x,y) which is the pixel of the output image, and with h which is the kernel matrix. In
with If(x,y)
our case, we consider a 5 which is the pixelkernel,
× 5 gaussian of the output image, and
hGAUSSIAN withfor
, used h which is the kernel
smoothing matrix.
operations,
In our case, we consider a 5 × 5 gaussian kernel, hGAUSSIAN, used for smoothing operations,
and a 5 × 11 motion kernel, hMOTION
and a 5 × 11 motion
, able to approximate the linear motion of a camera.
kernel, hMOTION, able to approximate the linear motion of a camera.
Figure 7a,b report the coefficients of h GAUSSIAN
Figure 7a,b report the coefficients of hand hMOTION
GAUSSIAN , expressed
and hMOTION , expressedasasinteger numbers
integer numbers
on n = 16 bits. on n = 16 bits.
(a) (b)
(c)
For our analysis, we process three test images, Lena, Cameraman, and Mandrill, whose
pixel values are represented on n = 16 bits. For the sake of demonstration, Figure 7c depicts
the histogram of occurrences for Mandrill, showing that the probability of assuming values
in [0, 2n − 1] is almost spread across the whole range. We assess the performances by
exploiting the Mean Structural Similarity Index (SSIM), able to measure the similarity
between images, and the Peak Signal-to-Noise ratio (PSNR), expressed in dB, taking as
reference the exact filtered image.
Table 6 collects the results, showing the average SSIM and PSNR obtained with the
smoothing and the motion application. In addition, the overall average SSIM and PSNR
are presented for facilitating the comparisons. All the multipliers allow for achieving SSIM
very close to 1, with the static segmented implementations that exhibit best results. The
PSNR of cSSM strongly increases if compared with SSM (up to about +14 dB on average in
the case m = 10), whereas the improvement is more modest with the gESSM (up to +4.1dB
with m = 8 and +6dB with m = 10 on average). Again, the performances of the gESSM
depend on the statistical properties of the input image and on the choice of q.
The dynamic segmented multipliers exhibit large PSNR with TOSAM and DRUM
(more than 60 dB), whereas performances are limited with HSM. Among multipliers with
approximate compressors, only Qiqieh L = 2 is able to overcome 60dB of PSNR, whereas
Kulkarni and AHMA show lower performances. Figure 8 offers the results obtained with
the segmented multipliers for the Lena image. As shown, the results of gESSM are very
close to the exact case (as demonstrated by the high values of SSIM and PSNR), whereas
some degradations are registered with DRUM k = 4 and HSM p = 8.
The dynamic segmented multipliers exhibit large PSNR with TOSAM and DRUM
(more than 60 dB), whereas performances are limited with HSM. Among multipliers with
approximate compressors, only Qiqieh L = 2 is able to overcome 60dB of PSNR, whereas
Kulkarni and AHMA show lower performances. Figure 8 offers the results obtained with
the segmented multipliers for the Lena image. As shown, the results of gESSM are very
Electronics 2023, 12, 446 15 of 21
close to the exact case (as demonstrated by the high values of SSIM and PSNR), whereas
some degradations are registered with DRUM k = 4 and HSM p = 8.
processing,
Figure
Figure8.
8.Lena
Lenaable
imageto realize
image filtered frequency
filteredbybymeans
meansof ofequalization
segmented and noise reduction. In this example,
segmented multipliers.
multipliers.
we elaborate the signal by considering a linear phase, low-pass, generalized Equiripple,
4.4.Audio
187-th
4.4. Audio
order,Application
finite impulse response (FIR) filter, with pass-band up to 0.1667 π rad/sam-
Application
ple andAs stop-band
As aa further from
further example,
example, 0.1958weπinvestigate
we rad/samplethe
investigate with
the use
use attenuation
of the
of of 85dB.
the proposed
proposed The module
gESSM
gESSM theof
andthe
and the
other
other
impulse response
multipliers
multipliers for is shown in an
forimplementing
implementing Figure
anaudio
audio 9afilter.
with Filtering
filter. the taps represented
Filtering isaawell
is as integer
welldiffused
diffused numbers
operation
operation ex-
inaudio
in audio
pressed on
processing, n =
able16 bits.
to realize frequency equalization and noise
processing, able to realize frequency equalization and noise reduction. In this example,reduction. In this example, we
The
elaborate audio
the signal
signal by used for
consideringthis atrial
linearis p232_016.wav,
phase, low-pass, from
we elaborate the signal by considering a linear phase, low-pass, generalized Equiripple, the library
generalized [34].
Equiripple,We also
187-th
superimpose
order, order,
187-th an
finite impulseexternal
finite impulse gaussian
response noise
(FIR)
response with
filter,
(FIR) variance
with withof
pass-band
filter, −30dB and
up to
pass-band upquantize
0.1667 the
π resulting
π rad/sample
to 0.1667 and
rad/sam-
signal on
stop-band n = 16
from bits.
0.1958 The π histogram
rad/sample of occurrences
with attenuation of theof input
85dB.
ple and stop-band from 0.1958 π rad/sample with attenuation of 85dB. The module of the signal,
The depicted
module of in
the Figure
impulse
9b, highlights
response
impulse response a close
is shown to
isinshownhalf-normal
Figure withdistribution.
in9aFigure the taps represented
9a with as integerasnumbers
the taps represented expressedex-
integer numbers on
n = 16 bits.
pressed on n = 16 bits.
The audio signal used for this trial is p232_016.wav, from the library [34]. We also
superimpose an external gaussian noise with variance of −30dB and quantize the resulting
signal on n = 16 bits. The histogram of occurrences of the input signal, depicted in Figure
9b, highlights a close to half-normal distribution.
(a) (b)
Figure
Figure9.9.(a)
(a)Module
Moduleofofthe
theimpulse
impulseresponse
responseofofthe
thelow-pass
low-passFIR
FIRfilter
filterand
and(b)
(b)histogram
histogramofofoccur-
occur-
rences
rencesofofthe
theaudio
audiosignal.
signal.
For
Thethe sakesignal
audio of comparison, we show
used for this trial isthe mean squarefrom
p232_016.wav, errorthe
(MSE) between
library [34]. the
We ap-
also
(a)
proximate
superimpose andanthe exact gaussian
external output for each
noise multiplier.
with (b) and
of −30dB
variance Therefore, the quantize
lower thetheMSE, the
resulting
better the multiplier accuracy.
Figure 9. (a) Module of the impulse response of the low-pass FIR filter and (b) histogram of occur-
Figure
rences of the 10 shows
audio the performances, with multiplications revisited as sign-magnitude
signal.
operations. The results for the gESSM multipliers are highlighted in violet (m = 8) and in
red (m = 10).
For the As
sakeshown, the accuracy
of comparison, we of the gESSM
show again
the mean varies
square in dependence
error (MSE) betweenon q,
thewith
ap-
Electronics 2023, 12, 446 16 of 21
signal on n = 16 bits. The histogram of occurrences of the input signal, depicted in Figure 9b,
highlights a close to half-normal distribution.
For the sake of comparison, we show the mean square error (MSE) between the
approximate and the exact output for each multiplier. Therefore, the lower the MSE, the
better the multiplier accuracy.
Electronics 2023, 12, x FOR PEER REVIEW Figure 10 shows the performances, with multiplications revisited as sign-magnitude
16 of 21
operations. The results for the gESSM multipliers are highlighted in violet (m = 8) and in
red (m = 10). As shown, the accuracy of the gESSM again varies in dependence on q, with
best performance achieved with q = 3, m = 10. In this application, gESSM overcomes cSSM
better
boththan
withthe
m other
= 8 andimplementations, withathe
m = 10, which offer exception
worse MSE. Inof general,
DRUM kthe = 8,gESSM
featuring the
performs
best accuracy in this case.
better than the other implementations, with the exception of DRUM k = 8, featuring the
best accuracy in this case.
Figure 10.10.
Figure MSE between
MSE thethe
between approximate
approximateandand
thethe
exact results.
exact TheThe
results. MSE forfor
MSE gESSM m =m8=and
gESSM m =m =
8 and
10 10
areare
highlighted in violet
highlighted and
in violet red,
and respectively.
red, respectively.
5. Discussion
5. Discussion
AsAs shown
shown in the
in the previous
previous sections,
sections, thethe position
position of of
thethe middle
middle segment
segment AMAaffects
M affects
thethe
accuracy of the multiplier, achieving different results dependent on the
accuracy of the multiplier, achieving different results dependent on the statistical proper- statistical properties
tiesofofthe
theinputs.
inputs. Indeed,
Indeed, thetheaccuracy
accuracy mainly
mainly depends
depends (i) on thethe
(i) on probability of AofofAassuming
probability of as-
values in the ranges [2 m , 2m+q ) and [2m+q , 2n –1], and (ii) on the resolution of A .
suming values in the ranges [2 , 2 ) and [2 , 2 –1], and (ii) on the resolution of AM.
m m+q m+q n M
Figure
Figure 11 11 shows
shows thethe behavior
behavior of P(A
of P(A M) and P(AHP(A
M ) and H ) with
) with respectrespect to m
to q for q for
= 8,min=the
8, in
uniform case and in the half-normal distribution with σ = 2048 and σ = 16,384. We remem-We
the uniform case and in the half-normal distribution with σ = 2048 and σ = 16,384.
berremember that the analytical
that the analytical expressions expressions of P(A
of P(AM) and P(A are P(A
MH))and H ) are
shown shown3.in Table 3.
in Table
In the uniform case (Figure 11a), P(AH ) is very close to 1 for small values of q. There-
fore, A is mainly approximated with AH , with negative effects on the multiplier accuracy.
Increasing q, P(AM ) increases, whereas P(AH ) reduces. This improves the accuracy, since
the probability of approximating A with AM grows up. When q = n − m − 1, P(AM ) equals
P(AH ). As a consequence, the segmentation fairly chooses between AM and AH , allowing
the approximation error to minimize. Therefore, increasing q allows the error performances
to improve with respect to the SSM multiplier. Nevertheless, cSSM exhibits better error
results also considering the optimal gESSM, since the correction technique allows the ap-
proximation error to reduce when AH is chosen. These trends are almost confirmed in the
image processing applications, where the kernels and the input images prefer the selection
of the most significant segments.
As shown in the previous sections, the position of the middle segment AM affects the
accuracy of the multiplier, achieving different results dependent on the statistical proper-
ties of the inputs. Indeed, the accuracy mainly depends (i) on the probability of A of as-
suming values in the ranges [2m, 2m+q) and [2m+q, 2n–1], and (ii) on the resolution of AM.
Figure 11 shows the behavior of P(AM) and P(AH) with respect to q for m = 8, in the
Electronics 2023, 12, 446 17 of 21
uniform case and in the half-normal distribution with σ = 2048 and σ = 16,384. We remem-
ber that the analytical expressions of P(AM) and P(AH) are shown in Table 3.
At the same time, the power consumption strongly reduces both with gESSM and
with cSSM and SSM, whereas lower improvements are registered with the other DSM
multipliers. This is mostly due to the employment of leading one detector and encoders,
used to perform the dynamic segmentation. On the other hand, the power saving of gESSM
is slightly weaker than SSM and cSSM due to the different selection mechanism.
The hardware performances of Qiqieh, Kulkarni, and AHMA also show interesting
results, due to the reduced complexity of the PPM compression stage, but at the cost of an
important loss of the quality of results.
When the distribution is half-normal, the overall mean square error presents a min-
imum for small values of σ. Indeed, with reference to the case σ = 2048 in Figure 11b,
P(AM ) increases up to q = 5 and is constant for q > 5. On the other hand, for large values
of q, the resolution of AM worsens. This leads to q = 5 as the optimum point since P(AM )
is maximized with AM that offers the best possible accuracy. Furthermore, Figure 5 of
Section 3.2 shows also that the position of the optimal point depends on the standard
deviation of the inputs: the higher σ, the higher qopt . This is explained in Figure 11c for the
case σ = 16,384, where P(AM ) reaches the peak value only for q = n − m − 1, thus moving
ahead the optimal point.
With reference to the case σ = 2048, the accuracy of cSSM is very close to SSM since the
probability of choosing AH is low and the correction term is practically unused. Conversely,
the gESSM is able to improve the performances with a NoEB of 18.5 bits, also overcoming
the other implementations. This scenario is confirmed by the audio processing analysis.
In this application, the gESSM performs better than the cSSM, achieving a MSE of about
10−8 . From an electrical point of view, the power reduction offered by gESSM is remarkable
when m = 8, and decreases if m = 10. Conversely, SSM and cSSM again exhibit reductions
up to 89.2% and 88.1%, but at the cost of limited accuracy performances.
In order to assess the multipliers considering both the error features and the electrical
performances, we show the plot of the power saving with respect to the NMED and the
MRED for uniform and half-normal inputs (with σ = 2048) in Figure 12. As shown, the
cSSM multipliers are on the pareto front when the inputs are uniform, offering large power
saving with a high quality of results. On the contrary, when the input is half-normal, the
proposed gESSM with q = 5, m = 8 and q = 3, m = 10 define the pareto front for NMED in
the range [9 × 10−6 , 5 × 10−5 ], and MRED in the range [2 × 10−2 , 10−1 ], offering the best
Electronics 2023, 12, 446 18 of 21
(a) (b)
(c) (d)
Figure 12.
Figure 12. In
In the
the case
case of
of uniform distribution: (a)
uniform distribution: (a) NMED
NMED vs. vs. Power
Power saving
saving and
and (b)
(b) MRED
MRED vs.
vs. Power
Power
saving. In the case of half-normal distribution (σ = 2048): (c) NMED vs. Power saving
saving. In the case of half-normal distribution (σ = 2048): (c) NMED vs. Power saving andand (d) MRED
vs. Power saving.
(d) MRED vs. Power saving.
6. Conclusions
this paper,
In this paper,we wehave
haveanalyzed
analyzedthe theperformances
performances ofof
thethe ESSM
ESSM multiplier
multiplier as aasfunction
a func-
tion
of theofposition
the position
of theofmiddle
the middle
segment segment
and ofand of the statistical
the statistical properties
properties of thesignals.
of the input input
signals.
While theWhile the standard
standard implementation
implementation of the ESSMof the ESSM
places theplaces
middle thesegment
middle atsegment
the centerat the
of
the input,
center weinput,
of the have moved
we have themoved
middlethesegment
middlefrom the LSB
segment fromto the
the LSB
MSBtointheorder
MSB to in
find the
order
configuration best able to best
to find the configuration minimize theminimize
able to mean square approximation
the mean error. To this error.
square approximation aim, two To
design
this aim,parameters
two design were exploited:
parameters m, defining
were exploited: them,accuracy
definingandthe the size ofand
accuracy the multiplier,
the size of
and q, definingand
the multiplier, theq,position
definingof thethe middle
position ofsegments
the middlefor further for
segments error tuning.
further error We have
tuning.
described the hardware
We have described implementation
the hardware of the proposed
implementation of thegESSM,
proposed andgESSM,
we haveand analytically
we have
demonstrated the possibility
analytically demonstrated choosing qoffor
theofpossibility minimizing
choosing q for the overall approximation
minimizing the overall approx- error
in a mean square sense.
imation error in a mean square sense.
The error metrics
The error metrics reveal
revealaastrong
strongdependence
dependenceon onq qand
andonon the
the statistical
statistical properties
properties of
of
thethe input
input signals.
signals. When
When thethe inputsare
inputs areuniform,
uniform,the thebest
bestaccuracy
accuracyisis achieved
achieved whenwhen qq
reaches
reaches thethe maximum
maximum value,value, whereas
whereas minimum
minimum pointspoints arise
arise in
in the half-normal case
the half-normal case (with
(with
σ = 2048). The gESSM is not able to overcome cSSM with uniform distribution,
σ = 2048). The gESSM is not able to overcome cSSM with uniform distribution, but exhibits but exhibits
best
best results
results with
with half-normal
half-normal inputs (achieving NoEB
inputs (achieving NoEB of of 18.5
18.5 bits). These trends
bits). These trends are are also
also
confirmed in image and audio applications, giving best results in audio filtering. The elec-
trical performances also exhibit satisfactory results, with power reductions up to 78% and
83% in the uniform and half-normal cases, respectively.
Electronics 2023, 12, 446 19 of 21
confirmed in image and audio applications, giving best results in audio filtering. The
electrical performances also exhibit satisfactory results, with power reductions up to 78%
and 83% in the uniform and half-normal cases, respectively.
From the comparison of the error metrics and the power saving of Figure 12, the
gESSM results in the best choice when the input signal is non-uniform, offering the best
trade-off between power and accuracy.
Author Contributions: Conceptualization, G.D.M., G.S. and A.G.M.S.; methodology, G.D.M. and G.S.;
software, G.D.M. and G.S.; validation, G.D.M., G.S. and A.G.M.S.; formal analysis, G.D.M., G.S. and
A.G.M.S.; investigation, G.D.M., G.S. and A.G.M.S.; data curation, G.D.M. and G.S.; writing—original
draft preparation, G.D.M., G.S., A.G.M.S. and D.D.C.; writing—review and editing, A.G.M.S. and D.D.C.;
visualization, G.D.M. and G.S.; supervision, A.G.M.S. and D.D.C.; project administration, A.G.M.S. and
D.D.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The Verilog code is available on GitHub at https://github.com/
GenDiMeo/gESSM, accessed on 16 February 2020.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A
Let consider the normal random variable A0 with zero mean and standard deviation
σ. The half-normal random variable A is obtained by computing the absolute value of A0 ,
i.e., A = |A0 |.
In order to compute P(AM ) and P(AH ), let us consider the probability of having A in
the range [0, a]:
Za
a
P (0 ≤ A ≤ a ) = f ( A)da = er f √ (A1)
σ 2
0
where f (A) is the pdf of A (see (12)), and erf (·) is the error function.
Therefore, observing that P(AM ) = P(0 ≤ A ≤ 2m+q ) − P(0 ≤ A ≤ 2m ) and P(AH ) = P(0
≤ A ≤ 2n −1)−P(0 ≤ A ≤ 2m+q ), we obtain the results shown in Table 3.
Appendix B
In order to compute (23), let us concentrate on the first summation in (22), writing the
following equality:
!2
q −1 q −1 q −2 q −1
" #
2
E ∑ ak 2 k = E ∑ ak 2 k
+ 2 ∑ a k 2k ∑ a j 2 j (A2)
k =0 k =0 k =0 j = k +1
Exploiting the linearity of the expectation operator and the independence between the
bits, we obtain !2
q −1 q −1
E ∑ ak 2k = 1 · ∑ 22k 2
k =0 k =0
" # (A3)
q −2 q −1 q −2 q −1
E 2 ∑ ak 2k ∑ a j 2 j = 21 · ∑ 2k ∑ 2 j
k =0 j = k +1 k =0 j = k +1
q −1
1−r q
∑ rk = 1−r
k =0
q −1 q −1 k
(A4)
∑ rk = ∑ r j − ∑ r j
j = k +1 j =0 j =0
Electronics 2023, 12, 446 20 of 21
with r that is a natural number, we have the following expressions after simple algebra:
" #
q −1 2
E ∑ a k 2k = 16 (4q − 1)
k = 0
"
q −2 q −1
# (A5)
1 q q −1
k j
1 q −1
E 2 ∑ ak 2 ∑ a j 2 = 4 2 2
−1 − 6 4 −1
k =0 j = k +1
Applying the same reasoning for the second summation, we obtain the (23).
References
1. Spagnolo, F.; Perri, S.; Corsonello, P. Approximate Down-Sampling Strategy for Power-Constrained Intelligent Systems. IEEE Access
2022, 10, 7073–7081. [CrossRef]
2. Vaverka, F.; Mrazek, V.; Vasicek, Z.; Sekanina, L. TFApprox: Towards a Fast Emulation of DNN Approximate Hardware
Accelerators on GPU. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE),
Grenoble, France, 9–13 March 2020; pp. 294–297. [CrossRef]
3. Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural
Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [CrossRef]
4. Montanari, D.; Castellano, G.; Kargaran, E.; Pini, G.; Tijani, S.; De Caro, D.; Strollo, A.G.M.; Manstretta, D.; Castello, R. An FDD
Wireless Diversity Receiver With Transmitter Leakage Cancellation in Transmit and Receive Bands. IEEE J. Solid State Circuits
2018, 53, 1945–1959. [CrossRef]
5. Kiayani, A.; Waheed, M.Z.; Antilla, L.; Abdelaziz, M.; Korpi, D.; Syrjala, V.; Kosunen, M.; Stadius, K.; Ryynamen, J.; Valkama, M.
Adaptive Nonlinear RF Cancellation for Improved Isolation in Simultaneous Transmit–Receive Systems. IEEE Trans. Microw.
Theory Tech. 2018, 66, 2299–2312. [CrossRef]
6. Zhang, T.; Su, C.; Najafi, A.; Rudell, J.C. Wideband Dual-Injection Path Self-Interference Cancellation Architecture for Full-Duplex
Transceivers. IEEE J. Solid State Circuits 2018, 53, 1563–1576. [CrossRef]
7. Di Meo, G.; De Caro, D.; Saggese, G.; Napoli, E.; Petra, N.; Strollo, A.G.M. A Novel Module-Sign Low-Power Implementation for
the DLMS Adaptive Filter With Low Steady-State Error. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 297–308. [CrossRef]
8. Meher, P.K.; Park, S.Y. Critical-Path Analysis and Low-Complexity Implementation of the LMS Adaptive Algorithm. IEEE Trans.
Circuits Syst. I Regul. Pap. 2014, 61, 778–788. [CrossRef]
9. Jiang, H.; Liu, L.; Jonker, P.P.; Elliott, D.G.; Lombardi, F.; Han, J. A High-Performance and Energy-Efficient FIR Adaptive Filter
Using Approximate Distributed Arithmetic Circuits. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 313–326. [CrossRef]
10. Esposito, D.; Di Meo, G.; De Caro, D.; Strollo, A.G.M.; Napoli, E. Quality-Scalable Approximate LMS Filter. In Proceedings of the
2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Bordeaux, France, 9–12 December 2018;
pp. 849–852. [CrossRef]
11. Di Meo, G.; De Caro, D.; Petra, N.; Strollo, A.G.M. A Novel Low-Power High-Precision Implementation for Sign–Magnitude
DLMS Adaptive Filters. Electronics 2022, 11, 1007. [CrossRef]
12. Bruschi, V.; Nobili, S.; Terenzi, A.; Cecchi, S. A Low-Complexity Linear-Phase Graphic Audio Equalizer Based on IFIR Filters.
IEEE Signal Process. Lett. 2021, 28, 429–433. [CrossRef]
13. Kulkarni, P.; Gupta, P.; Ercegovac, M. Trading Accuracy for Power with an Underdesigned Multiplier Architecture. In Proceedings
of the 2011 24th Internatioal Conference on VLSI Design, Chennai, India, 2–7 January 2011; pp. 346–351. [CrossRef]
14. Zervakis, G.; Tsoumanis, K.; Xydis, S.; Soudris, D.; Pekmestzi, K. Design-Efficient Approximate Multiplication Circuits Through
Partial Product Perforation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2016, 24, 3105–3117. [CrossRef]
15. Zacharelos, E.; Nunziata, I.; Saggese, G.; Strollo, A.G.M.; Napoli, E. Approximate Recursive Multipliers Using Low Power
Building Blocks. IEEE Trans. Emerg. Top. Comput. 2022, 10, 1315–1330. [CrossRef]
16. Qiqieh, I.; Shafik, R.; Tarawneh, G.; Sokolov, D.; Yakovlev, A. Energy-efficient approximate multiplier design using bit significance-
driven logic compression. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne,
Switzerland, 27–31 March 2017; pp. 7–12. [CrossRef]
17. Esposito, D.; Strollo, A.G.M.; Alioto, M. Low-power approximate MAC unit. In Proceedings of the 2017 13th Conference on Ph.D.
Research in Microelectronics and Electronics (PRIME), Giardini Naxos-Taormina, Italy, 12–15 June 2017; pp. 81–84. [CrossRef]
18. Fritz, C.; Fam, A.T. Fast Binary Counters Based on Symmetric Stacking. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25,
2971–2975. [CrossRef]
19. Ahmadinejad, M.; Moaiyeri, M.H.; Sabetzadeh, F. Energy and area efficient imprecise compressors for approximate multiplication
at nanoscale. Int. J. Electron. Commun. 2019, 110, 152859. [CrossRef]
20. Yang, Z.; Han, J.; Lombardi, F. Approximate compressors for error-resilient multiplier design. In Proceedings of the 2015 IEEE
International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), Amherst, MA, USA,
12–14 October 2015; pp. 183–186. [CrossRef]
Electronics 2023, 12, 446 21 of 21
21. Ha, M.; Lee, S. Multipliers With Approximate 4–2 Compressors and Error Recovery Modules. IEEE Embed. Syst. Lett. 2018,
10, 6–9. [CrossRef]
22. Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Meo, G.D. Comparison and Extension of Approximate 4-2 Compressors for
Low-Power Approximate Multipliers. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 3021–3034. [CrossRef]
23. Park, G.; Kung, J.; Lee, Y. Design and Analysis of Approximate Compressors for Balanced Error Accumulation in MAC Operator.
IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 2950–2961. [CrossRef]
24. Kong, T.; Li, S. Design and Analysis of Approximate 4–2 Compressors for High-Accuracy Multipliers. IEEE Trans. Very Large Scale
Integr. (VLSI) Syst. 2021, 29, 1771–1781. [CrossRef]
25. Jou, J.M.; Kuang, S.R.; Chen, R.D. Design of low-error fixed-width multipliers for DSP applications. IEEE Trans. Circuits Syst. II
Analog. Digit. Signal Process. 1999, 46, 836–842. [CrossRef]
26. Petra, N.; De Caro, D.; Garofalo, V.; Napoli, E.; Strollo, A.G.M. Design of Fixed-Width Multipliers With Linear Compensation
Function. IEEE Trans. Circuits Syst. I Regul. Pap. 2011, 58, 947–960. [CrossRef]
27. Hashemi, S.; Bahar, R.I.; Reda, S. DRUM: A Dynamic Range Unbiased Multiplier for approximate applications. In Proceedings
of the 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Austin, TX, USA, 2–6 November 2015;
pp. 418–425. [CrossRef]
28. Vahdat, S.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable
Approximate Multiplier. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 27, 1161–1173. [CrossRef]
29. Narayanamoorthy, S.; Moghaddam, H.A.; Liu, Z.; Park, T.; Kim, N.S. Energy-Efficient Approximate Multiplication for Digital
Signal Processing and Classification Applications. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2015, 23, 1180–1184. [CrossRef]
30. Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Saggese, G.; Di Meo, G. Approximate Multipliers Using Static Segmentation:
Error Analysis and Improvements. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 2449–2462. [CrossRef]
31. Li, L.; Hammad, I.; El-Sankary, K. Dual segmentation approximate multiplier. Electron. Lett. 2021, 57, 718–720. [CrossRef]
32. GitHub. Available online: https://github.com/scale-lab/DRUM (accessed on 18 April 2020).
33. GitHub. Available online: https://github.com/astrollo/SSM (accessed on 16 February 2020).
34. DataShare. Available online: https://datashare.ed.ac.uk/handle/10283/2791 (accessed on 21 August 2017).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.