0% found this document useful (0 votes)

16 views5 pages

Optimizacion

This paper explores the optimization of stochastic computing (SC) based deep learning systems through the implementation of parallel finite state machines (FSM) to reduce latency issues associated with long bitstreams. The study compares the accuracy and performance of parallel FSMs against traditional serial implementations, demonstrating that parallel FSMs can achieve comparable accuracy while significantly decreasing computation time. The findings suggest that integrating parallelism into SC-based deep learning systems can effectively address hardware constraints while maintaining performance.

Uploaded by

TOMAS MANCERA ESTRADA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

Optimizacion

Uploaded by

TOMAS MANCERA ESTRADA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Optimization of Stochastic Computing Based Deep

Learning Systems with Parallel Finite State Machine

Implementation
Jinjie Liu
Department of Electrical Engineering & Computer Science University of Michigan, Ann Arbor
[email protected]

ABSTRACT fraction of these applications poses strict hardware constraints on

Deep learning has become an increasingly heated topic as artificial the implementation of such deep learning systems. The limitations
intelligence is on the rise. At the same time, hardware restrictions include but are not limited to low power consumption and low
in real applications have driven the investigation of combining area costs, contradicting the intrinsic properties of deep learning
another rising technique, stochastic computing (SC), with deep neural networks since deep layered structures profoundly
learning systems to achieve low power costs. By far, operations contributes to both better learning performance and higher
successfully implemented include addition, multiplication, inner classification accuracy [1].
product, and other more complicated nonlinear functions such as To significantly save hardware complexities, especially for large
hyperbolic tangent (tanh) with linear finite state machines (FSM). deep learning systems, research interests have shifted towards the
The inner product implementation realizes convolution, a core of introduction of stochastic computing (SC), a well-known
neural networks, therefore encouraging SC- based deep learning computing technique that saves both energy and circuit size [2].
neural network implementations. Meanwhile, extremely long More importantly, it happens to be capable of implementing the
bitstream lengths are needed to achieve satisfying accuracy, core computation units incorporated in neural network systems,
especially for large-scale deep learning systems, causing latency and its innate error tolerance property further justifies its use.
issues. The integration of parallelism is thus considered in an
attempt to alleviate the latency issue. In this paper, an optimization Extensive and many of which successful experiments and research
to stochastic computing based deep learning system is proposed have been conducted to integrate stochastic computing into deep
by introducing parallel FSM implementations to replace serial learning systems [1][3][4]. Nonetheless, a simple integration of
ones generally used in previous works. Substituting serial linear stochastic computing and deep learning neural networks does not
FSMs with several parallel linear FSMs of the same size yet with present a flawless solution. It is already recognized that the long
shorter bitstream length, parallel FSM aims at trading hardware bitstreams used for stochastic circuit processing profoundly
for processing latency. The accuracy performance of a sample undermines the efficiency of computation [1].
parallel FSM unit is evaluated against its counterpart in serial Meanwhile, Ma and Lilja have proposed an overhaul on the linear
implementation before a case study verifies that the replacement FSM, by replacing a single serial linear FSM with a series of
sacrifices little accuracy, while reducing computing time parallel linear FSMs of the same size, i.e. with the same total
exponentially in actual deep learning system realizations. number of states, that utilizes shorter bitstreams [5]. This
substitution is an inspiration drawn from parallelism, which serves
CCS Concepts as a compromise of circuit delay and hardware cost.
• Computing Methodologies ➝ Parallel computing
methodologies ➝ Parallel algorithms • Computing In this paper, the influence of applying parallel FSM to stochastic
Methodologies ➝ Machine learning. computing based deep learning system and its performance with
different bitstream lengths is examined. This paper poses and
Keywords explores the following questions: First, how accurate the parallel
Parallel FSM; deep learning; stochastic computing; neural network FSM units are compared to the serially implemented ones. Second,
what impact parallel FSM integration exerts on actual deep
1. INTRODUCTION learning system applications.
As artificial intelligence gains more and more grounds in the field
of science and technology, the growth of the field casts much light 2. STOCHASTIC COMPUTING AND SC-
on deep learning systems. Deep learning has applications in a BASED DEEP LEARNING SYSTEM
broad range, including autonomous vehicles, smart appliances, 2.1 Stochastic Computing
wearable devices, and other mobile devices. However, a decent
In contrast to conventional computation with binary values,
Permission to make digital or hard copies of all or part of this work for
stochastic computing deals with binary bitstreams that represents a
personal or classroom use is granted without fee provided that copies are value in range [0, 1] by the probability of bits being 1 in the
not made or distributed for profit or commercial advantage and that copies stream. For instance, a bitstream of 01101010 with probability
bear this notice and the full citation on the first page. To copy otherwise, P(𝑋𝑋 = 1) = 4⁄8 = 0.5 represents the value 0.5. This is typically
or republish, to post on servers or to redistribute to lists, requires prior known as the unipolar encoding, and the range [0, 1] can be
specific permission and/or a fee. further extended to [-1, 1] to satisfy the need of representing
ICACS'20, January 6–8, 2020, Rabat, Morocco negative values with bipolar encoding [6]. The equivalent bipolar
© 2020 Association for Computing Machinery.
value 𝑌𝑌 represented by the same bitstream can be found from the
ACM ISBN 978-1-4503-7732-4/20/01…$15.00
represented unipolar value 𝑋𝑋 by 𝑌𝑌 = 2𝑋𝑋 − 1. Thus, for the example
DOI: https://doi.org/10.1145/3423390.3426727

22
above, bitstream 01101010 both represents 0.5 in unipolar and 2 × range of both encoding methods, the output is scaled, yet the
0.5 − 1 = 0 in bipolar encoding. scaling can be cancelled out by wire shifting at no additional cost.
Multiplication in unipolar and bipolar can be realized by a single
AND gate and XNOR gate respectively (Fig. 1) [7].

Figure 2. Scaled addition.

More complicated non-linear functions can be realized along with
(a) (b) the utilization of linear FSM of general form shown in Fig. 3 [7].
Figure 1. Multiplication in (a) unipolar (b) bipolar. For instance, the linear FSM corresponding to the hyperbolic
tangent function (tanh) is shown in Fig. 4 [7].
Similarly, addition in both encoding can be realized with a 2-to-1
multiplexer (MUX) and a select line with bitstream possibility P(𝑆𝑆
= 1) = 0.5, as shown in Fig. 2 [2]. However, due to the confined

Figure 3. General linear FSM state diagram structure.

Figure 4. Linear FSM for tanh function state diagram.

Multiple reasons support the use of stochastic computing. First, have already proposed a reconfigurable large-scale SC-based deep
the unique properties of stochastic computing empower energy- learning framework [1]. A reconfigurable neuron diagram is
efficient computation frameworks. Stochastic computing has proposed, in an attempt to replace as much arithmetic operation
already proved its strength in hardware complexity with successful units with stochastic computing elements as possible, as shown in
applications in image processing techniques [8]. Second, Fig. 5 [1].
stochastic computing manifests extraordinary error tolerance [2], a
property of great use when it comes to neural network training and
classification [1]. Moreover, the majority of core calculation units
in neural networks can find its counterpart in stochastic
computing. For instance, the convolution operation in
convolutional neural networks (ConvNN) can be realized with a
combination of stochastic computational elements of
multiplication and addition. The activation functions used in
ConvNN, for example rectified linear units (ReLU) and the
sigmoid function [1], can be replaced by the stochastic computing
implementation of the hyperbolic function (tanh) discussed above
[7].

2.2 SC-Based Deep Learning System

It is only up till recently that research efforts have turned to
stochastic computing based deep learning systems since both Figure. 5 Reconfigurable neuron diagram.
fields started to gain attention within a decade or so. Yet Ren et al.

23
3. PARALLEL VS. SERIAL LINEAR FSM
AND INTEGRATION INTO DEEP
LEARNING SYSTEM
3.1 Parallel vs. Serial Linear FSM
The parallel implementation of linear FSM is first proposed by Ma
and Lilja [5]. The idea originates from the concern that for a single
serial linear FSM operating on long bitstreams, for instance, a 16-
state FSM accepting bitstreams of length 1024 and initialized to
state 0, a total of 16 clock cycles is needed to step from state 0 to
state 15 for an input value close to 1, which incurs a 16⁄1024 =
1.56% error. In contrast, for a parallel FSM implementation
proposed by Ma and Lilja, 32 parallel copies of 16-state linear
FSMs are created for the operation [5]. The 1024-bit stream is
equally divided into 32-bit streams for each copy of FSM [5]. In
terms of initial states, a look up table (LUT) is created beforehand
storing a series of vectors that indicates the number of copies of Figure 7. Tanh(8, x) of parallel vs. Serial linear FSM (16-state).
FSM to start at each state, and a majority gate acting as the
estimator draws the correct vector out of the LUT [5]. Such
vectors are called steady-state distribution vectors, determined by It can be concluded that serial FSM implementations outperform
input values [9]. For instance, with an input value of 0.5, initial the parallel one in the long run, while the parallel implementation
states are evenly distributed, resulting in 2 FSMs starting at each does demonstrate its strength when the input values are closer to
state for the 32 parallel FSMs. With an input of 0, on the other one. Besides, for the 16-state FSM for example, parallel FSM
hand, all 32 FSMs start at state 0. In this way, the implementation requires a total of 32 clock cycles to execute, while the serial one
eschews the initialization issue and outperforms the serial FSM needs 1024 cycles. The reduction of delay becomes more
implementation, especially for inputs closer to 1. significant as bitstream length increases.
To illustrate the effect, a simulation on both the serial and parallel 3.2 Integration of Parallel FSM to Deep
implementation is conducted on the stochastic tanh function in
Fig. 6 and Fig. 7, and the mean squared errors (MSE) are
Learning Framework
calculated for both an 8-state and the 16-state FSM example in The overall structure of the neuron remains unchanged, as shown
Table 1. For statistical significance, each implementation is run 10 in Fig. 5, while all the tanh blocks are replaced with the
times for each input value. aforementioned parallel FSM. The actual configuration of the
parallel implementation, i.e. the number of copies of parallel
FSMs, the total number of steady-state distribution vectors, etc.
vary from case to case. It is worth noting that under the same
bitstream length, different parameters in parallel FSM
implementation may lead to slightly different results. The total
number of states for each copy of parallel FSM may differ from
that of the serial FSM, where the actual outputs may exhibit higher
accuracy.

Figure 6. Tanh(4, x) of parallel vs. serial linear FSM (8-state).

Table 1. MSE of parallel vs. serial linear FSM simulation

MSE Serial Parallel
8-state 0.003 0.004
16-state 0.002 0.003
Figure 8. 16-state parallel tanh FSM vs. tanh(16, x).

For example, for the 16-state parallel vs serial FSM simulation

above, the parallel implementation actually resembles the function
tanh(32/2, x) = tanh(16, x), which should be realized by a serial

24
FSM with a total of 32 states, better with an MSE of 0.001 (Fig. group. Since the simulation is not to verify the robustness of the
8). SC based neural network, but the influence of integration of
parallel FSM, the system was trained just to an acceptable
4. CASE STUDY: MNIST DIGIT accuracy. The codes for software simulation are developed based
CLASSIFICATION WITH SC- BASED DEEP on [11]. All stochastic linear FSM modules are rewritten to
simulate parallel FSM implementations.
LEARNING NEURAL NETWORK AND
PARALLEL FSM IMPLEMENTATION Side-by-side comparison of parallel vs. serial FSM based network
In this section, a handwritten digit classification problem is solved training on output layer weights and classification accuracy
with database MNIST [10]. The database comprises 60,000 against training epochs under identical training parameters are
sample images in the training set and 10,000 samples in the testing shown in Fig. 9.

(a)

(b)
Figure 9. Network training results comparison: (a) serial FSM (b) parallel FSM.

Figure 10. Case study: parallel vs. serial FSM classification accuracy

25
The comparison clearly shows the successful derivation of output IEEE International Conference on Rebooting Computing
layer weights and ability to reach comparable classification (ICRC), San Diego, CA, pp. 1-7, 2016.
accuracy for parallel FSMs. Even though the two simulations [2] A. Alaghi and J. P. Hayes, “Survey of stochastic computing,”
show slight differences, it is within the normal range considering ACM Trans. Embed. Comput. Syst, vol. 12, no. 92, May
the nature of neural network training (Fig. 10). 2013.
Due to software limitations, only ten tests are done for each [3] C. Lammie and M. R. Azghadi, "Stochastic Computing for
different bit-stream length, and the accuracy percentage can be a Low- Power and High-Speed Deep Learning on FPGA,"
little circumstantial. However, the general trend still holds that SC 2019 IEEE International Symposium on Circuits and Systems
based deep learning systems with serial and parallel FSM (ISCAS), Sapporo, Japan, pp. 1-5, 2019.
implementation are on par in terms of classification performance,
yet the reduction in time delay by parallelism is significant. The [4] A. Ardakani, F. Leduc-Primeau, N. Onizawa, T. Hanyu and
actual improvement in delay is not measurable in software W. J. Gross, “VLSI Implementation of Deep Neural Network
simulations, and an implementation of the system on a filed Using Integral Stochastic Computing,” in IEEE Transactions
programming gate array (FPGA) board is necessary to quantify the on Very Large Scale Integration (VLSI) Systems, vol. 25, no.
improvement. 10, pp. 2688-2699, Oct. 2017.
[5] C. Ma and D. J. Lilja, “Parallel implementation of finite state
5. CONCLUSION machines for reducing the latency of stochastic computing,”
In this paper, the influence of parallel linear FSM implementation 2018 19th International Symposium on Quality Electronic
of linear FSMs used in stochastic based deep learning systems is Design (ISQED), Santa Clara, CA, pp. 335-340, 2018.
examined. The introduction of parallelism mainly aims at
[6] B. Gaines, “Stochastic computing systems,” Advances in
alleviating the striking latency issues induced by long bitstreams
Information Systems Science, vol. 2, no. 2, pp. 37–172, 1969.
used by stochastic computing. Both the individual module
comparison in Section 3 and the case study in Section 4 proves the [7] B. D. Brown and H. C. Card, “Stochastic neural computation
introduction of parallel FSM comparable to the original systems, I: computational elements,” IEEE Trans. Comput., vol. 50,
especially for longer bitstreams. This paper is still lacking in that pp. 891-905, Sept. 2001.
the case study has not been implemented on an actual FPGA board [8] P. Li, D. J. Lilja et al., “Computation on stochastic bit
to time the difference in latency. Future work may include actual streams digital image processing case studies,” IEEE
hardware simulations and implementations to quantify the latency Transactions on Very Large Scale Integration (VLSI)
improvements with such substitution. Systems, vol. 22, no. 3, pp. 449–462, 2014.
6. ACKNOWLEDGEMENT [9] A. A. Markov, “Extension of the limit theorems of
This paper would not have been possible without the support of probability theory to a sum of variables connected in a
Professor Yiyu Shi from University of Notre Dame. His well- chain,” reprinted in Appendix B of: R. Howard. Dynamic
organized and explained introduction to machine learning and Probabilistic Systems, volume 1: Markov Chains. John Wiley
neural networks provides a head start for my implementations of and Sons, 1971.
parallel FSM and experiments apropos the deep learning systems. [10] Yann LeCun, Corinna Cortes, and Christopher J.C. Burges.
I would also like to express my gratitude towards two teaching The MNIST database of handwritten digits.
assistants, Yunlong Jia and Qi Wang, for their prompt and http://yann.lecun.com/exdb/ mnist.
valuable advice whenever questions arise during the research.
[11] A. E. Solomou, "adamsolomou/SC-DNN", GitHub, 2020.
7. REFERENCES [Online]. Available: https://github.com/adamsolomou/SC-
DNN.
[1] A. Ren, Z. Li et al., “Designing reconfigurable large-scale
deep learning systems using stochastic computing,” 2016

Grade 9 Rationalized Mathematics Lesson Plans Term 1
No ratings yet
Grade 9 Rationalized Mathematics Lesson Plans Term 1
147 pages
CSEC Mathematics June 1996 P2
100% (1)
CSEC Mathematics June 1996 P2
13 pages
CBSE Class 6 Playing With Numbers Worksheet
75% (4)
CBSE Class 6 Playing With Numbers Worksheet
5 pages
Mcqs
100% (1)
Mcqs
2 pages
Coordinate Geometry: Coordinate Geometry Is Considered To Be One of The Most
100% (1)
Coordinate Geometry: Coordinate Geometry Is Considered To Be One of The Most
5 pages
Ca Foundation Maths Test-3 Set A
No ratings yet
Ca Foundation Maths Test-3 Set A
14 pages
Year 8 Spring Core MS 2020
100% (1)
Year 8 Spring Core MS 2020
3 pages
Marketing Measurement and Forecasting
86% (14)
Marketing Measurement and Forecasting
16 pages
HPC Calculation SHEET 1 ROW
No ratings yet
HPC Calculation SHEET 1 ROW
7 pages
VCQ SummerSchool GianLucaGiorgi
No ratings yet
VCQ SummerSchool GianLucaGiorgi
216 pages
Analyzing Operational Flexibility of Electric Power Systems
No ratings yet
Analyzing Operational Flexibility of Electric Power Systems
10 pages
Kippap Handout Sec 39 RCD Columns W Bending
No ratings yet
Kippap Handout Sec 39 RCD Columns W Bending
14 pages
Project Definition Rating Index (PDRI)
No ratings yet
Project Definition Rating Index (PDRI)
12 pages
ABB机器人编程手册
No ratings yet
ABB机器人编程手册
1,280 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
5.3.conjugate Beam Method
No ratings yet
5.3.conjugate Beam Method
14 pages
9th - PF - 21-22 - Maths - 2 - 05-Coordinate Geometry
No ratings yet
9th - PF - 21-22 - Maths - 2 - 05-Coordinate Geometry
18 pages
Ch12 - Distributed Deep Learning
No ratings yet
Ch12 - Distributed Deep Learning
60 pages
List of Symbols 2011 Modern Engineering Thermodynamics
No ratings yet
List of Symbols 2011 Modern Engineering Thermodynamics
2 pages
CN U2
No ratings yet
CN U2
162 pages
Revised Report Final-1
No ratings yet
Revised Report Final-1
28 pages
A Survey On Distributed Machine Learning
No ratings yet
A Survey On Distributed Machine Learning
33 pages
VLSI System Testing: BIST Motivation
No ratings yet
VLSI System Testing: BIST Motivation
28 pages
Mel709 22
No ratings yet
Mel709 22
18 pages
A Scalable Bayesian Inference Accelerator for Unsupervised Learning - (Ko 等) - 2020
No ratings yet
A Scalable Bayesian Inference Accelerator for Unsupervised Learning - (Ko 等) - 2020
27 pages
Efficient Quantum Circuit Simulation by Tensor Network Methods On Modern Gpus
No ratings yet
Efficient Quantum Circuit Simulation by Tensor Network Methods On Modern Gpus
26 pages
Self-Organizing Democratized Learning Toward Large-Scale Distributed Learning Systems
No ratings yet
Self-Organizing Democratized Learning Toward Large-Scale Distributed Learning Systems
13 pages
A Survey of Artificial Neural Networks Based Fault Detection and
No ratings yet
A Survey of Artificial Neural Networks Based Fault Detection and
5 pages
Dire Dawa University Institute of Technology School of Computing Department of Computer Science
No ratings yet
Dire Dawa University Institute of Technology School of Computing Department of Computer Science
13 pages
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
101 pages
Clustered Federated Learning - Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
No ratings yet
Clustered Federated Learning - Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
13 pages
Thesis Proposal: Scaling Distributed Machine Learning With System and Algorithm Co-Design
No ratings yet
Thesis Proposal: Scaling Distributed Machine Learning With System and Algorithm Co-Design
12 pages
Fast and Scaled Counting-Based Stochastic Computing Divider Design
No ratings yet
Fast and Scaled Counting-Based Stochastic Computing Divider Design
11 pages
Reconfigurable Multiplier
No ratings yet
Reconfigurable Multiplier
16 pages
541 - Literature Review
No ratings yet
541 - Literature Review
19 pages
Resiliency of Deep Neural Networks Under Quantizations
No ratings yet
Resiliency of Deep Neural Networks Under Quantizations
11 pages
Implementation of Deep Neural Network Using VLSI B
No ratings yet
Implementation of Deep Neural Network Using VLSI B
8 pages
Finite Basis Physics-Informed Neural Networks (Fbpinns) : A Scalable Domain Decomposition Approach For Solving Differential Equations
No ratings yet
Finite Basis Physics-Informed Neural Networks (Fbpinns) : A Scalable Domain Decomposition Approach For Solving Differential Equations
39 pages
Integration, The Vlsi Journal: Taiyu Cheng, Yukata Masuda, Jun Chen, Jaehoon Yu, Masanori Hashimoto
No ratings yet
Integration, The Vlsi Journal: Taiyu Cheng, Yukata Masuda, Jun Chen, Jaehoon Yu, Masanori Hashimoto
13 pages
E8627 IranArze
No ratings yet
E8627 IranArze
18 pages
An Efficient Stochastic Convolution Architecture Based On Fast FIR Algorithm
No ratings yet
An Efficient Stochastic Convolution Architecture Based On Fast FIR Algorithm
5 pages
An Energy Efficient Convolutional Neural Network Accelerator For Speech Classification Based On FPGA and Quantization
No ratings yet
An Energy Efficient Convolutional Neural Network Accelerator For Speech Classification Based On FPGA and Quantization
13 pages
Electronics 13 02846
No ratings yet
Electronics 13 02846
14 pages
A Survey On Federated Learning Systems: Vision, Hype and Reality For Data Privacy and Protection
No ratings yet
A Survey On Federated Learning Systems: Vision, Hype and Reality For Data Privacy and Protection
41 pages
2021 Asian Conference On Innovation in Technology: (ASIANCON 2021)
No ratings yet
2021 Asian Conference On Innovation in Technology: (ASIANCON 2021)
16 pages
A Hardware-Oriented Echo State Network and Its FPGA Implementation
No ratings yet
A Hardware-Oriented Echo State Network and Its FPGA Implementation
5 pages
10T SRAM Computing-in-Memory Macros For Binary and
No ratings yet
10T SRAM Computing-in-Memory Macros For Binary and
15 pages
Lecture#3
No ratings yet
Lecture#3
16 pages
12 Logarithm Approximate Floating
No ratings yet
12 Logarithm Approximate Floating
6 pages
Federated Learning A Survery
No ratings yet
Federated Learning A Survery
31 pages
Gradient-Congruity Guided Federated Sparse
No ratings yet
Gradient-Congruity Guided Federated Sparse
12 pages
Introduction To Weight Quantization PDF
No ratings yet
Introduction To Weight Quantization PDF
9 pages
Efficient Frequency Selective Surface Analysis Via End-to-End Model-Based Learning
No ratings yet
Efficient Frequency Selective Surface Analysis Via End-to-End Model-Based Learning
5 pages
ML Archs
No ratings yet
ML Archs
36 pages
Improved Low-Power Cost-Effective DCT Implementation Based On Markov Random Field and Stochastic Logic
No ratings yet
Improved Low-Power Cost-Effective DCT Implementation Based On Markov Random Field and Stochastic Logic
11 pages
Signal Processing With Pulse Trains: An Algebraic Approach-Part I
No ratings yet
Signal Processing With Pulse Trains: An Algebraic Approach-Part I
11 pages
Systolic Array
No ratings yet
Systolic Array
9 pages
Implementation of Efficient Parallel Discrete Cosine Transform Using Stochastic Logic
No ratings yet
Implementation of Efficient Parallel Discrete Cosine Transform Using Stochastic Logic
4 pages
SDVL06
No ratings yet
SDVL06
4 pages
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
No ratings yet
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
6 pages
Approximate Softmax Functions For Energy-Efficient Deep Neural Networks
No ratings yet
Approximate Softmax Functions For Energy-Efficient Deep Neural Networks
13 pages
Spra 948
No ratings yet
Spra 948
13 pages
Weight-Oriented Approximation For Energy-Efficient Neural Network Inference Accelerators
No ratings yet
Weight-Oriented Approximation For Energy-Efficient Neural Network Inference Accelerators
14 pages
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
No ratings yet
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
12 pages
Shlezinger Et Al. - 2020 - Federated Learning With Quantization Constraints
No ratings yet
Shlezinger Et Al. - 2020 - Federated Learning With Quantization Constraints
5 pages
Performance Evaluation of Fixed-Point Array Multipliers On Xilinx Fpgas
No ratings yet
Performance Evaluation of Fixed-Point Array Multipliers On Xilinx Fpgas
5 pages
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
No ratings yet
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
12 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Acm Tecs 2013
No ratings yet
Acm Tecs 2013
19 pages
EasyChair Preprint 15723
No ratings yet
EasyChair Preprint 15723
10 pages
An Efficient Hardware Implementation of Artificial Neural Network Based On Stochastic Computing
No ratings yet
An Efficient Hardware Implementation of Artificial Neural Network Based On Stochastic Computing
6 pages
FPGA Implementation of A Trained Neural Network: Seema Singh, Shreyashree Sanjeevi, Suma V, Akhil Talashi
No ratings yet
FPGA Implementation of A Trained Neural Network: Seema Singh, Shreyashree Sanjeevi, Suma V, Akhil Talashi
10 pages
Assignment
No ratings yet
Assignment
7 pages
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
No ratings yet
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
4 pages
Supporting Future Electrical Utilities - Infotech
No ratings yet
Supporting Future Electrical Utilities - Infotech
6 pages
3.OO Testing
No ratings yet
3.OO Testing
9 pages
IJME Vol 7 Iss 4 Paper 9 1260 1264
No ratings yet
IJME Vol 7 Iss 4 Paper 9 1260 1264
5 pages
Phase Plane Analysis - 3
No ratings yet
Phase Plane Analysis - 3
21 pages
Common AMS - Assignment - 1
No ratings yet
Common AMS - Assignment - 1
3 pages
Wca Regulations and Guidelines
No ratings yet
Wca Regulations and Guidelines
25 pages
Comparison Study and Analysis of Implementing Activation Function of Machine Learning in MATLAB and FPGA
No ratings yet
Comparison Study and Analysis of Implementing Activation Function of Machine Learning in MATLAB and FPGA
10 pages
Electrical and Electronics Engineering: An International Journal (ELELIJ)
No ratings yet
Electrical and Electronics Engineering: An International Journal (ELELIJ)
9 pages
14622inferenceforsingleproportions 160909005557
No ratings yet
14622inferenceforsingleproportions 160909005557
19 pages
Frames of References 5th Sem Nep
No ratings yet
Frames of References 5th Sem Nep
16 pages
Is 10719 (Iso 1302) - 5
No ratings yet
Is 10719 (Iso 1302) - 5
1 page
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
No ratings yet
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
3 pages
Guttman 1999
No ratings yet
Guttman 1999
12 pages
Studies Analysing FWD Test Results: S.No. Paper Name Journal Objectives Methodology Conclusion
No ratings yet
Studies Analysing FWD Test Results: S.No. Paper Name Journal Objectives Methodology Conclusion
4 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)

Optimizacion

Uploaded by

Optimizacion

Uploaded by

Optimization of Stochastic Computing Based Deep

Learning Systems with Parallel Finite State Machine

ABSTRACT fraction of these applications poses strict hardware constraints on

Figure 2. Scaled addition.

Figure 3. General linear FSM state diagram structure.

Figure 4. Linear FSM for tanh function state diagram.

2.2 SC-Based Deep Learning System

Figure 6. Tanh(4, x) of parallel vs. serial linear FSM (8-state).

Table 1. MSE of parallel vs. serial linear FSM simulation

For example, for the 16-state parallel vs serial FSM simulation

You might also like