0% found this document useful (0 votes)
30 views6 pages

12 Logarithm Approximate Floating

This document proposes using addition-as-int approximate multipliers to accelerate probabilistic circuits (PCs) for hardware-efficient inference on edge devices. PCs allow for reliable probabilistic reasoning but their computation requires many floating-point multiplications, which are expensive. Approximate multipliers can reduce hardware costs significantly with little impact on accuracy. The paper analyzes the expected approximation error and shows through simulation that the approach leads to accurate results for common queries while providing a way to trade off accuracy and efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

12 Logarithm Approximate Floating

This document proposes using addition-as-int approximate multipliers to accelerate probabilistic circuits (PCs) for hardware-efficient inference on edge devices. PCs allow for reliable probabilistic reasoning but their computation requires many floating-point multiplications, which are expensive. Approximate multipliers can reduce hardware costs significantly with little impact on accuracy. The paper analyzes the expected approximation error and shows through simulation that the approach leads to accurate results for common queries while providing a way to trade off accuracy and efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Logarithm-Approximate Floating-Point Multiplier

for Hardware-efficient Inference in Probabilistic Circuits

Lingyun Yao1 Martin Trapp2 Karthekeyan Periasamy1 Jelin Leslin1 Gaurav Singh1 Martin Andraud1

1
Electrical Engineering Dept., Aalto University, Espoo, Finland
2
Computer Science Dept., Aalto University, Espoo, Finland

Abstract the energy efficiency of dedicated AI processors by 10× –


100× compared to Graphical Processing Units [17]. How-
ever, NNs that have been adopted into real-world use of-
Machine learning models are increasingly being ten raise concerns related to their reliability, fairness, and
deployed onto edge devices, for example, for smart interpretability [9, 7] alongside their high inference costs
sensing, reinforcing the need for reliable and effi- [27, 23].
cient modeling families that can perform a variety
of tasks in an uncertain world (e.g., classification, Consequently, to be suitable for the challenges associated
outlier detection) without re-deploying the model. with edge AI, there is an urgent need to develop effective
Probabilistic circuits (PCs) offer a promising hardware acceleration of machine learning models that are
avenue for such scenarios as they support efficient probabilistic, i.e., they enable reasoning in an uncertain
and exact computation of various probabilistic world [6], and tractable, i.e., they can reliably answer many
inference tasks by design, in addition to having probabilistic queries without re-deployment. Recent work
a sparse structure. A critical challenge towards on tractable probabilistic models, specifically on probabilis-
hardware acceleration of PCs on edge devices is tic circuits (PCs) [2], poses a promising avenue as these mod-
the high computational cost associated with mul- els (i) exhibit high expressive efficiency (representational
tiplications in the model. In this work, we propose power), (ii) enable reliable [25, 13] and fair [1] reasoning,
the first approximate computing framework for and (iii) allow many probabilistic queries to be computed
energy-efficient PC computation. For this, we tractably by design. Yet, while pioneering works have ex-
leverage addition-as-int approximate multipliers, plored acceleration of PCs on Field Programmable Gate
which are significantly more energy-efficient than Arrays (FPGAs) [3, 21, 22] and Application-Specific Inte-
regular floating-point multipliers, while preserving grated Circuits (ASICs) [18, 19], the hardware acceleration
computation accuracy. We analyze the expected of PCs poses many open challenges. In particular, their irreg-
approximation error and show through hardware ularity (i.e., PCs are sparsely connected making parallelism
simulation results that our approach leads to a more challenging [20]) and high computation resolution
significant reduction in energy consumption with (i.e., probabilistic inference with PCs typically requires 30 –
low approximation error and provides a remedy 40 floating-point bits [22, 20] as arithmetics are performed
for hardware acceleration of general-purpose on probabilities) hinders their deployment on edge devices
probabilistic models. where efficiency and reduced resolution are key due to the
limited energy resources.
In this work, we propose to approximate floating-point mul-
1 INTRODUCTION tipliers through Addition-as-Int [10], suggesting high poten-
tial gains in computational efficiency (Addition-as-Int can
The development of smart sensing and Internet-of-Things reduce the hardware cost of multiplication by a factor of up
applications based on embedded artificial intelligence (AI), to 112×) with little impact on the accuracy of the computa-
such as smartphones, wearables, or other sensor networks, tions. In addition, we carry out a theoretical analysis of the
is pushing the computation of machine learning meth- expected error and show that our approach can result in ac-
ods directly onto edge devices. Recent innovations (e.g., curate computations for maximum-a-posteriori (MAP) and
[12, 26, 17]) have pushed up the computational efficiency marginal queries and enables to concisely trade-off accuracy
of deep feedforward neural networks (NNs) and improved and computational efficiency.

Accepted for the 6th Workshop on Tractable Probabilistic Modeling at UAI (TPM 2023).
(a) Probabilistic Circuit (b) Corresponding hardware representation for MAP Inference

max

w1,1 w1,2 w1,3

w1,1 w1,2 w1,3

1x1 =0 v1 1x1 =1 v2
max max
X1 = 0 X1 = 1
w3,2
w2,1 w2,2 w3,1 w3,2

w2,1 w2,2 w3,1

1x3 =0 1x2 =0 1x3 =1 1x2 =1 1x3 =0 1x3 =1 1x2 =0


X3 = 0 X2 = 0 X3 = 1 X2 = 1 X3 = 0 X3 = 1 X2 = 0 v3 v4 v5 v6 v7 v8 v9

Figure 1: Illustration of a PC (a) over discrete RVs (X1 , X2 , X3 ) and the corresponding hardware realization of MAP
inference (b). For this, sum nodes are replaced by max operators, and an additional propagation path for information bits is
added to back-track the most probable path (MAP result)

2 BACKGROUND: PROBABILISTIC nodes are decomposable.


CIRCUITS Definition 2.2 (Determinism). A sum node S is determinis-
tic if for every complete evidence x at most one child has
Probabilistic circuits (PCs) have recently been introduced a positive value. Consequently, a PC is deterministic if all
as an umbrella to unify a variety of existing tractable prob- sum nodes are deterministic.
abilistic models (e.g., [4, 14, 15, 8]). They represent the
(possibly unnormalized) distribution function (density or
We refer the reader to [2] for further details on the structural
mass function) of a multivariate probability distribution
properties of PCs.
over random variables (RVs) X = {Xi }di=1 through a di-
rected acyclic graph G. The computational
P graph (G) con-
stitutes weighted sums S(x) = C∈ch(S) wS,C C(x) with 3 APPROXIMATE COMPUTING FOR
P Q
C∈ch(S) wS,C = 1, products P(x) = C∈ch(S) C(x), and PROBABILISTIC CIRCUITS
leaf nodes associated with parametric functions, typically
assumed to be density/mass functions of univariate probabil- Assuming positive numbers in floating-point representation,
ity distributions L(x) = p(x | θL ). We use ch(N) to denote two operands x and y can be written as x = 2Ex (1 + Mx )
the set of children of a node (N) and θ denotes parameters and y = 2Ey (1 + My ). Note that we can omit the sign bit
of the parametric leaves. In addition, each node N ∈ G is and only have to consider their exponent (E) and mantissa
associated with a scope ψ(N) ⊆ X provided by a scope (M) values. Therefore, the exact product x × y is given as:
function ψ : N → P(X) [24], where P(X) denotes the
power set of X, specifying the set of RVs the node repre- x × y = 2Ex +Ey (1 + Mx )(1 + My ) (1)
sents a joint distribution over. Fig. 1(a) illustrates a PC over
three discrete RVs using indicator functions at the leaves, This product can be conveniently expressed in log-space,
where we use ⊕ to illustrate sum nodes and ⊗ for product i.e.,
nodes. Fig. 1(b) illustrates our proposed hardware realiza-
tion of MAP inference for a PC. A particularly relevant class log2 (x × y) = Ex + Ey + log2 (1 + Mx ) + log2 (1 + My ),
of PCs are those that are smooth and decomposable, as both (2)
properties are requirements for many probabilistic queries
to be computable exactly and in time linear in the number of A popular approximate solution is based on Mitchell’s
nodes of G. Henceforth, we will briefly review smoothness method [10]. To approximate the logarithm, Mitchell’s
and decomposability. method uses log2 (1 + F ) ≈ F , which is the first-order
Taylor series expansion of log2 (1 + F ). Using this approx-
Definition 2.1 (Smooth & Decomposability). A sum node S
imation, Eq. (2) becomes:
is smooth if all children have the same scope, i.e., ψ(C) =
ψ(C′ ), ∀C, C′ ∈ ch(S). Further, a product node P is de- log2 (x × y) ≈ Ex + Ey + Mx + My . (3)
composable if all children have pairwise disjoint scopes,
i.e., ψ(C) ∩ ψ(C′ ) = ∅, ∀C, C′ ∈ ch(P). A PC is smooth if Previous work pointed out that adding two IEEE 754
all sum nodes are smooth and decomposable if all product floating-point numbers with an integer addition instruction

2
results in Mitchell’s approximate multiplication and called 4.2 ENERGY SAVING WITH DIFFERENT
as Addition-As-Int (AAI) [11]. By doing so, we can directly NUMBER OF BITS
obtain an approximation from Eq. (2) to Eq. (3). Denoting
×
e as the approximate multiplication, we obtain: We replaced all multipliers with AAI to assess the error
and the power savings for MAR and MAP queries under

e y = FLOAT(INT(x) + INT(y)) (4) varying resolutions. For MAR queries, we computed the
squared error according to a software baseline (64-bits), i.e.,
2
P
Where INT(·) interprets the binary string of the IEEE x (p(x)−q(x)) where q(·) denotes the model with lower
754 floating-point representations as integer strings and resolution multipliers and p(·) the PC in software. In addi-
FLOAT(·) interprets the resulting integer string back to tion, we calculated the maximum and minimum obtainable
the IEEE 754 floating-point representation. Therefore, per- errors. For MAP queries, we calculated the MAP inference
forming AAI in hardware only requires integer addition accuracy over the latent variables (assuming complete evi-
operators. dence) regarding the baseline. We collected the optimized
bits in Table 1 where the Nb represents 32 bits, Nbe and Nba
are the number of bits related to the smallest error in the
Exact multiplier exact multiplier and approximate multiplier respectively.
AAI
40
Power (µW)

MAR queries. With AAI, the error varies across bench-


marks but generally requires higher exponent bits E , c.f.
20

Fig. 3. In practice, exact multipliers produce a small er-


ror at the tested resolutions, as seen in Table 1. Indeed, E
0

determines the minimum representable value, and M repre-


0 5 10 15 20
sents the quantization in every exponent range, which only
Number of mantissa bits depends on the representation error. Going from a 32-bit
resolution to Nbe enables saving around 2× power. We find
Figure 2: Power cost of multipliers on 65nm CMOS using 8 that using AAI can allow for 24× to 40× extra savings if
exponent bits. the tolerated error is a few percent. The total power savings
from 32-bit to the optimal AAI are between 56× and 88×,
c.f. Table 1 .

4 EXPERIMENTS MAP queries. We find that the resolution of MAP com-


putation can be drastically reduced while introducing no
We evaluated our approach on four benchmark data sets: error since MAP stays correct as long as the argmax at sum
NLTCS, Jetser, DNA, and Book, which are a subset of fre- nodes stays the same. Further, AAI multipliers can achieve
quently used data sets in the community (e.g., [16, 5, 24]). higher accuracy for fewer bits, c.f. Fig. 4. In contrast to
We generated PC structures and parameters using Learn- exact floating-point multiplication, where mantissa values
SPN [5], a popular method for structure learning, resulting are normalized (see Appendix A), and successive multi-
in smooth and decomposable PCs. All evaluations are per- plications result in smaller mantissa values, AAI handles
formed on the test set. normalization by using a carry, hence, requiring fewer bits.
Most power savings are obtained from Nb to Nbe , i.e. 18.6×.
Switching for AAI increases savings by up to 11×. Total
4.1 POWER CONSUMPTION COMPARISON power savings can be 206×, c.f. Table 1 .
BETWEEN EXACT MULTIPLICATIONS AND
AAI
5 CONCLUSION AND DISCUSSION
Floating-point and AAI multipliers have been designed and
simulated for various resolutions in a 65nm CMOS technol- We introduced approximate computing in PCs to increase
ogy, and models have been fitted to the simulation results. their energy efficiency for deployment on edge devices
Fig. 2 shows the resulting model for 8 exponent bits and and provided a theoretical and empirical analysis of the
varying number of mantissa bits. We see that the hardware introduced error. Specifically, we investigated the energy
cost is dominated by mantissa processing, and the hardware efficiency and approximation error of Addition-as-Int mul-
complexity grows significantly with the number of man- tipliers in PCs for different benchmarks and query types
tissa bits. As AAI uses much simpler addition hardware, (marginals and MAP). Our results show that maximum
the complexity and power grow linearly with the number of power savings of 88× and 206× can be achieved for MAR
bits. and MAP queries, respectively.

3
Table 1: Overview of optimal configuration and performances over several data sets. Nbe and Nba correspond to the settings
with the smallest error and the loss is the error relative to the max. error.

Power Exact ⊗ AAI ⊗ Loss


Data set Query Nb = 32 Nbe Power Nba Power Exact AAI
µW ,@Nb (E,M) µW ,@Nbe (E,M) µW ,@Nba % %
NLTCS MAP 85482 5,3 4594 5,1 414 0 0
MAR 85482 8,15 36699 8,7 1035 3e-7 0.8
Jester MAP 660408 5,3 35492 5,1 3199 0 0
MAR 660408 8,15 283530 11,11 11731 4e-7 5.9
DNA MAP 674902 5,3 36271 5,1 3269 0 0
MAR 674902 11,15 306942 11,3 7629 3e-6 3.3
Book MAP 1272053 5,3 68364 5,1 6162 0 0
MAR 1272053 8,15 546124 11,7 18488 7e-6 0.4

NLTCS Jester DNA Book


·10−3 ·10−26 ·10−68 ·10−2
3 2 1
Approx. Error
AAI Multi.

1
2
1 0.5
0.5
1

0 0 0 0

3 7 11 15 3 7 11 15 3 7 11 15 3 7 11 15
·10−3 ·10−26 ·10−68 ·10−2
3 2 1
Approx. Error
Exact Multi.

1
2
1 0.5
0.5
1

0 0 0 0

3 7 11 15 3 7 11 15 3 7 11 15 3 7 11 15
Mantissa Mantissa Mantissa Mantissa

Figure 3: Results for AAI (first row) and exact (second row) multipliers using varying number of exponent ( E=8,
E=11) and mantissa bits. Maximum possible error ( ) is shown for reference.

NLTCS Jester DNA Book


1 1 1 1
AAI Multi.
MAP ACC

0.8 0.8 0.8


0.5
0.6 0.6 0.6
0
1 2 5 8 1 2 5 8 1 2 5 8 1 2 5 8

1 1 1 1
Exact Multi.
MAP ACC

0.8 0.8 0.8


0.5
0.6 0.6 0.6
0
1 2 5 8 1 2 5 8 1 2 5 8 1 2 5 8
Exponent Exponent Exponent Exponent

Figure 4: MAP accuracy (ACC) results for AAI (first row) and exact (second row) multipliers using varying the number of
exponent and mantissa bits ( m=1, m=3, m=5).

4
Acknowledgements [10] John N Mitchell. Computer multiplication and divi-
sion using binary logarithms. IRE Transactions on
MT acknowledges funding from the Academy of Finland Electronic Computers, (4):512–517, 1962.
(grant number 347279).
MA acknowledges partial funding from the Academy of Fin- [11] Tsuguo Mogami. Deep neural network training with-
land through the project WHISTLE (grant number 332218). out multiplications. arXiv preprint arXiv:2012.03458,
This work has also been partially funded by the European 2020.
Union through the SUSTAIN project. Views and opinions
expressed are, however, those of the author(s) only and [12] B. Moons and M. Verhelst. Energy-efficiency and
do not necessarily reflect those of the European Union or accuracy of stochastic computing circuits in emerg-
EISMEA. Neither the European Union nor the granting ing technologies. IEEE Journal on Emerging and
authority can be held responsible for them. Selected Topics in Circuits and Systems, 4(4):475 –
486, 2014. ISSN 2156-3357. doi: 10.1109/JETCAS.
2014.2361070.
References
[13] Robert Peharz, Antonio Vergari, Karl Stelzner, Alejan-
[1] YooJung Choi. Probabilistic Reasoning for Fair and dro Molina, Martin Trapp, Xiaoting Shao, Kristian Ker-
Robust Decision Making. PhD thesis, 2022. sting, and Zoubin Ghahramani. Random sum-product
networks: A simple and effective approach to proba-
[2] YooJung Choi, Antonio Vergari, and Guy Van den bilistic deep learning. In Amir Globerson and Ricardo
Broeck. Probabilistic circuits: A unifying framework Silva, editors, 35th Conference on Uncertainty in Arti-
for tractable probabilistic models. oct 2020. ficial Intelligence (UAI), volume 115 of Proceedings
of Machine Learning Research, pages 334–344. AUAI
[3] Young-kyu Choi, Carlos Santillana, Yujia Shen, Adnan Press, 2019.
Darwiche, and Jason Cong. Fpga acceleration of prob-
abilistic sentential decision diagrams with high-level [14] Hoifung Poon and Pedro M. Domingos. Sum-product
synthesis. ACM Trans. Reconfigurable Technol. Syst., networks: A new deep architecture. In Fábio Gagliardi
sep 2022. ISSN 1936-7406. doi: 10.1145/3561514. Cozman and Avi Pfeffer, editors, 27th Conference on
Uncertainty in Artificial Intelligence (UAI), pages 337–
[4] Adnan Darwiche. A differential approach to inference 346. AUAI Press, 2011.
in bayesian networks. J. ACM, 50(3):280–305, 2003.
doi: 10.1145/765568.765570. [15] Tahrima Rahman, Prasanna V. Kothalkar, and Vibhav
Gogate. Cutset networks: A simple, tractable, and
[5] Robert Gens and Domingos Pedro. Learning the struc- scalable approach for improving the accuracy of chow-
ture of sum-product networks. In International con- liu trees. In Toon Calders, Floriana Esposito, Eyke
ference on machine learning, pages 873–880. PMLR, Hüllermeier, and Rosa Meo, editors, European Confer-
2013. ence in Machine Learning and Knowledge Discovery
in Databases ECML, volume 8725 of Lecture Notes
[6] Zoubin Ghahramani. Probabilistic machine learning in Computer Science, pages 630–645. Springer, 2014.
and artificial intelligence. Nature, 521(7553):452–
459, May 2015. ISSN 1476-4687. doi: 10.1038/ [16] Amirmohammad Rooshenas and Daniel Lowd. Learn-
nature14541. ing sum-product networks with direct and indirect vari-
able interactions. In International Conference on Ma-
[7] Ari Heljakka, Martin Trapp, Juho Kannala, and Arno chine Learning, pages 710–718. PMLR, 2014.
Solin. Disentangling model multiplicity in deep learn-
ing. arXiv preprint arXiv: 2206.08890, 2023. [17] Jae-sun Seo, Jyotishman Saikia, Jian Meng, Wangxin
He, Han-sok Suh, Anupreetham, Yuan Liao, Ahmed
[8] Doga Kisa, Guy Van den Broeck, Arthur Choi, and Hasssan, and Injune Yeo. Digital versus analog arti-
Adnan Darwiche. Probabilistic sentential decision di- ficial intelligence accelerators: Advances, trends, and
agrams. In Chitta Baral, Giuseppe De Giacomo, and emerging designs. IEEE Solid-State Circuits Maga-
Thomas Eiter, editors, 14th International Conference zine, 14(3):65–79, 2022. doi: 10.1109/MSSC.2022.
on Principles of Knowledge Representation and Rea- 3182935.
soning KR. AAAI Press, 2014.
[18] N. Shah, L. I. G. Olascoaga, S. Zhao, W. Meert, and
[9] Gary Marcus. The next decade in AI: Four steps to- M. Verhelst. 9.4 piu: A 248gops/w stream-based pro-
wards robust artificial intelligence. arXiv preprint cessor for irregular probabilistic inference networks
arXiv: 2002.06177, 2020. using precision-scalable posit arithmetic in 28nm. In

5
2021 IEEE International Solid- State Circuits Confer- for edge inference of deep neural networks. Nature
ence (ISSCC), volume 64, pages 150–152, 2021. doi: Electronics, 1(4):216–222, 2018.
10.1109/ISSCC42613.2021.9366061.
[19] N. Shah, W. Meert, and M. Verhelst. Dpu-v2: Energy-
efficient execution of irregular directed acyclic graphs.
In 2022 55th IEEE/ACM International Symposium on
Microarchitecture (MICRO), pages 1288–1307, Los
Alamitos, CA, USA, oct 2022. IEEE Computer Soci-
ety.
[20] Nimish Shah, Laura I Galindez Olascoaga, Wannes
Meert, and Marian Verhelst. Problp: A framework for
low-precision probabilistic inference. In Proceedings
of the 56th Annual Design Automation Conference
2019, pages 1–6, 2019.
[21] L. Sommer, J. Oppermann, A. Molina, C. Binnig,
K. Kersting, and A. Koch. Automatic mapping of
the sum-product network inference problem to fpga-
based accelerators. In 2018 IEEE 36th International
Conference on Computer Design (ICCD), pages 350 –
357, 2018. doi: 10.1109/ICCD.2018.00060.
[22] Lukas Sommer, Lukas Weber, Martin Kumm, and An-
dreas Koch. Comparison of arithmetic number for-
mats for inference in sum-product networks on fpgas.
In 2020 IEEE 28th Annual international symposium
on field-programmable custom computing machines
(FCCM), pages 75–83. IEEE, 2020.
[23] Emma Strubell, Ananya Ganesh, and Andrew McCal-
lum. Energy and policy considerations for deep learn-
ing in NLP. In Proceedings of the 57th Conference of
the Association for Computational Linguistics (ACL),
pages 3645–3650. Association for Computational Lin-
guistics, 2019.
[24] Martin Trapp, Robert Peharz, Hong Ge, Franz
Pernkopf, and Zoubin Ghahramani. Bayesian learning
of sum-product networks. In Hanna M. Wallach, Hugo
Larochelle, Alina Beygelzimer, Florence d’Alché-Buc,
Emily B. Fox, and Roman Garnett, editors, 32nd Con-
ference on Neural Information Processing Systems
(NeurIPS), pages 6344–6355, 2019.
[25] Fabrizio Ventola, Steven Braun, Zhongjie Yu, Martin
Mundt, and Kristian Kersting. Probabilistic circuits
that know what they don’t know. arXiv preprint arXiv:
2302.06544, 2023.
[26] N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay,
L. Chen, B. Zhang, and P. Deaville. In-memory com-
puting: Advances and prospects. IEEE Solid-State
Circuits Magazine, 11(3):43–55, Summer 2019. ISSN
1943-0590. doi: 10.1109/MSSC.2019.2922889.
[27] Xiaowei Xu, Yukun Ding, Sharon Xiaobo Hu, Michael
Niemier, Jason Cong, Yu Hu, and Yiyu Shi. Scaling

You might also like