0% found this document useful (0 votes)
3 views

Approximation_of_Hardware_Accelerators_driven_by_Machine-Learning_Models__Embedded_Tutorial

The document presents a tutorial on hardware approximation techniques driven by machine learning models, aimed at reducing power consumption in electronic circuits. It discusses the application of machine learning in designing approximate components and synthesizing hardware accelerators, highlighting various methodologies and their benefits. The tutorial also emphasizes the importance of automated design systems and the role of machine learning in enhancing the efficiency of circuit approximation processes.

Uploaded by

smanasvitareddyp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Approximation_of_Hardware_Accelerators_driven_by_Machine-Learning_Models__Embedded_Tutorial

The document presents a tutorial on hardware approximation techniques driven by machine learning models, aimed at reducing power consumption in electronic circuits. It discusses the application of machine learning in designing approximate components and synthesizing hardware accelerators, highlighting various methodologies and their benefits. The tutorial also emphasizes the importance of automated design systems and the role of machine learning in enhancing the efficiency of circuit approximation processes.

Uploaded by

smanasvitareddyp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS)

Approximation of Hardware Accelerators


2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS) | 979-8-3503-3277-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/DDECS57882.2023.10139484

driven by Machine-Learning Models


(Embedded Tutorial)
Vojtech Mrazek
Faculty of Information Technology, Brno University of Technology, Brno, Czechia
[email protected]

Abstract—The goal of this tutorial is to introduce functional the algorithms can use the exact solution as a starting point.
hardware approximation techniques employing machine learning Examples of the design methodologies [1] include systematic
methods. Functional approximation changes the function of a pruning [8] and netlist rewriting [9] methods using greedy
circuit slightly in order to reduce its power consumption. Machine
learning models can help to estimate the error and the resulting algorithms. More advanced transformations are offered by al-
circuit power consumption. The use of these techniques will be gorithms based on genetic programming [10], [11]. In addition,
presented at multiple levels - at the individual component level other ML methods, such as matrix factorization [12], can be
and the higher level of HW accelerator synthesis. used in logic rewriting.
Index Terms—Approximate computing, Machine Learning, 2) A cross-technology library adoption can be helpful
Estimation, Prediction
when users want to use a highly optimized library of ap-
I. I NTRODUCTION proximate components (e.g., for 45 nm ASIC technology
such as [13]) in a different technology such as FPGA. The
Approximate computing has been appearing in hardware ASIC parameters are not correlated with the FPGA parameters.
designs for several years. A significant part of the research Moreover, there are thousands of components in the libraries
has focused on functional approximation, which involves in- which makes an exhausive search infeasible. Therefore, se-
troducing small errors into the computation for the benefit of lecting the Pareto-optimal component w.r.t. error and the target
more energy-efficient or faster processing of the input data. technology hardware parameters is complicated. However, ML
Functional approximation at the circuit level can be divided models can help us to estimate the hardware parameters in the
into two basic tasks [1]: (i) design of approximate components, new technology. The models can be based on traditional ML
in particular adders and multipliers, and (ii) high-level approx- models [14], [15], convolutional neural networks [16], [17],
imate synthesis of complex hardware accelerators. Manual ap- or graph neural networks [18]. The ML-based estimation has
proaches have often been used for the approximate component been successfully used to transform ASIC libraries to Virtex7
design. In the manual methodologies, for example, elementary FPGAs [14].
units such as full-adders or 2x2 multipliers were replaced 3) Automated approximation of the entire accelerators
by approximate implementation [2], [3], or the structure of can be done globally by modification of syntax tree describing
circuits was modified [4], the longest computational paths of the accelerator design [19]. These methods have scalability
the circuit were cut [5], or other mathematical properties of issues that need to be addressed. Other approaches assign the
circuits were exploited [6], [7]. Similarly, the application of approximate components from a library instead of exact com-
these components in accelerators was guided by the expert ponents [20]–[23]. They are trying to find a close-to-optimal
knowledge of designers. assignment guided by some multi-objective heuristic algorithm
Automated design systems help the designers with the (e.g., NSGA-II [24]). The key part of these algorithms is the
approximation at multiple levels. In automated design systems, evaluation of the overall quality of results (QoR) and HW
machine learning (ML) methods often find applications for parameters of the approximate accelerator. Some approaches
both component and accelerator approximation. They can use fast simulation of a subset of the test dataset to obtain QoR
also be useful for supporting synthesis methods in parameter [21], [22]. Other methods use a fast ML model to estimate the
estimation. overall parameters of the candidate approximate accelerator
Overall, the application of ML algorithms in a functional based on the parameters of the particular components [20],
approximation can be divided into the following groups: [23]. In another study, ML methods were employed to predict
1) Automated design of arithmetic approximate circuits how well an approximate multiplier is likely to work in an
is, in fact, a search for a circuit representing a new logic approximate neural network [25].
function with respect to accuracy and hardware parameters.
In contrast to data-driven regression, the advantage is that II. E XAMPLES OF APPLICATION ML TECHNIQUES
To demonstrate the application of ML in the approximate
This work was supported by the Czech Science Foundation project 21-
13001S. The author thanks all his collaborators to the presented works, namely hardware design, three approaches were selected and described
L. Sekanina, Z. Vasicek, M. Shafique, M.A. Hanif, and B.S. Prabakaran. below.
979-8-3503-3277-3/23/$31.00 ©2023 IEEE

91
Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on November 12,2024 at 09:02:54 UTC from IEEE Xplore. Restrictions apply.
A. Automated circuit approximation driven by data 1023 possible solutions in a few hours, while the exhaustive
Libraries of approximate circuits are composed of fully search would take four months on a high-end processor.
characterized digital circuits that can be used as building III. C ONCLUSION
blocks of energy-efficient implementations of hardware ac-
As shown, ML models can significantly help in the design
celerators. They can be employed not only to speed up the
of libraries of approximate components and approximate ac-
accelerator development but also to analyze how an accelerator
celerators. Other ways combine different approaches in the
responds to introducing various approximate operations. In pa-
literature, e.g., creating specialized libraries and then building
per [11], an application-tailored, data-driven, fully automated
entire accelerators from them [20] or using partial evaluation
method for functional approximation of combinational circuits
to determine QoR [1]. However, by using a suitable ML model,
is proposed. It is demonstrated how an application-level error
we can reach a good quality result with a lower effort.
metric such as classification accuracy can be translated to a
component-level error metric needed for an efficient and fast R EFERENCES
search in the space of approximate low-level components that [1] I. Scarabottolo et al., “Approximate logic synthesis: A survey,” Proc.
are used in the application. This is possible by employing a IEEE, vol. 108, no. 12, pp. 2195–2213, 2020.
weighted mean error distance (WMED) metric for steering the [2] P. Kulkarni et al., “Trading accuracy for power with an underdesigned
multiplier architecture,” in Int. Conf. VLSI Design, 2011, pp. 346–351.
circuit approximation process which is conducted by means [3] M. Shafique et al., “A low latency generic accuracy configurable adder,”
of genetic programming. WMED introduces a set of weights in DAC ’15), 2015, pp. 1–6.
(calculated from the data distribution measured on a selected [4] H. R. Mahdiani et al., “Bio-inspired imprecise computational blocks
for efficient vlsi implementation of soft-computing applications,” IEEE
signal in a given application) determining the importance of Trans. Circuits Syst. I, Reg. Papers, vol. 57, 2010.
each input vector for the approximation process. [5] M. A. Hanif et al., “Quad: Design and analysis of quality-area optimal
low-latency approximate adders,” in DAC ’17, 2017, pp. 1–6.
B. ApproxFPGAs: adoption of ASIC library to FPGA [6] J. N. Mitchell, “Computer multiplication and division using binary
logarithms,” IRE Trans. Electronic Computers, vol. EC-11, no. 4, 1962.
Existing approximation techniques have predominantly fo- [7] M. S. Ansari et al., “A hardware-efficient logarithmic multiplier with
cused on ASICs, while not achieving similar gains when improved accuracy,” in DATE ’19, 2019, pp. 928–931.
deployed for FPGA-based accelerator systems, due to the [8] D. Shin et al., “A new circuit simplification method for error tolerant
applications,” in DATE ’11, 2011, pp. 1–6.
inherent architectural differences between the two. In the [9] S. Venkataramani et al., “Substitute-and-simplify: A unified design
ApproxFPGAs work [14], a framework was proposed, which paradigm for approximate and quality configurable circuits,” in DATE
leverages statistical or machine learning models to effectively ’13, 2013, pp. 1367–1372.
[10] Z. Vasicek et al., “Evolutionary approach to approximate digital circuits
explore the architecture-space of state-of-the-art ASIC-based design,” IEEE Trans. Evol. Comput., vol. 19, no. 3, 2015.
approximate circuits to cater them for FPGA-based systems [11] ——, “Automated circuit approximation method driven by data distri-
given a simple RTL description of the target application. bution,” in DATE ’19, 2019, pp. 96–101.
[12] S. Hashemi et al., “Blasys: Approximate logic synthesis using boolean
matrix factorization,” in DAC ’18, 2018, pp. 1–6.
C. AutoAx: automated approximation of accelerators [13] V. Mrazek et al., “Evoapprox8b: Library of approximate adders and mul-
Because the libraries of approximate components contain tipliers for circuit design and benchmarking of approximation methods,”
in DATE ’17, 2017, pp. 258–261.
from tens to thousands of approximate implementations for a [14] B. S. Prabakaran et al., “Approxfpgas: Embracing asic-based approxi-
single arithmetic operation, it is intractable to find an optimal mate arithmetic components for fpga-based systems,” in DAC ’20, 2020.
combination of approximate circuits in the library, even for an [15] C. Xu et al., “Sns’s not a synthesizer: A deep-learning-based synthesis
predictor,” in Int. Symp. Computer Architecture (ISCA ’22), 2022.
application consisting of a few operations. An open problem [16] Y. Zhou et al., “Primal: Power inference using machine learning,” in
is ”how to effectively combine circuits from these libraries DAC ’19, ser. DAC ’19, 2019.
to construct complex approximate accelerators”. The AutoAx [17] Z. Xie et al., “Fast ir drop estimation with machine learning,” in ICCAD
’20, 2020.
algorithm [23] represents a methodology for searching, se- [18] Y. Zhang et al., “Grannite: Graph neural network inference for transfer-
lecting, and combining the most suitable approximate circuits able power estimation,” in DAC ’20, 2020.
from a set of available libraries to generate an approximate [19] K. Nepal et al., “Automated high-level generation of low-power approx-
imate computing circuits,” IEEE Trans. Emerging Topics Comp., 2017.
accelerator for a given application. To enable fast design space [20] S. Ullah et al., “Appaxo: Designing application-specific approximate
generation and exploration, the methodology utilizes machine operators for fpga-based embedded systems,” ACM Trans. Embed.
learning techniques to create computational models estimating Comput. Syst., vol. 21, no. 3, 2022.
[21] V. Mrazek et al., “Alwann: Automatic layer-wise approximation of deep
the overall quality of processing and hardware cost without neural network accelerators without retraining,” in ICCAD ’19, 2019.
performing full synthesis at the accelerator level. Using the [22] S. Barone et al., “Multi-objective application-driven approximate design
methodology, hundreds of approximate accelerators (for a method,” IEEE Access, vol. 9, pp. 86 975–86 993, 2021.
[23] V. Mrazek et al., “Autoax: An automatic design space exploration
Sobel edge detector) were constructed. The accelerators show and circuit building methodology utilizing libraries of approximate
different but relevant tradeoffs between the quality of pro- components,” in DAC’19, 2019.
cessing and hardware cost and a corresponding Pareto-frontier [24] K. Deb et al., “A fast and elitist multiobjective genetic algorithm: Nsga-
ii,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp.
was identified. Furthermore, when searching for approximate 182–197, Apr 2002.
implementations of a generic Gaussian filter consisting of 17 [25] M. S. Ansari et al., “Improving the accuracy and hardware efficiency of
arithmetic operations, the AutoAx approach enables to identify neural networks using approximate multipliers,” IEEE Trans. Very Large
Scale Integration (VLSI) Systems, 2020.
approximately 103 highly important implementations from

92
Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on November 12,2024 at 09:02:54 UTC from IEEE Xplore. Restrictions apply.

You might also like