Accelerated Deep Learning Inference From Constrained Embedded Devices

This document proposes using hardware looping and dot product instructions to accelerate deep learning inference on constrained embedded devices. It evaluates implementing a convolutional neural network (CNN) on an FPGA using these techniques. Specifically, it creates a CNN with simple loops, hardware looping, loop unrolling, and a combination. Hardware looping alone reduces clock cycles, while combining it with dot product instructions reduces cycles further. Evaluating the techniques on a Lenet-5 CNN with the MNIST dataset, hardware looping and dot product instructions show potential to enable low-cost microcontrollers to support deep learning applications.

Uploaded by

Bhargav Bhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Accelerated Deep Learning Inference From Constrained Embedded Devices

Uploaded by

Bhargav Bhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Accelerated Deep Learning Inference from

Constrained Embedded Devices

Bhargav Bhat1, Abhay A Deshpande2
1
Department of Electronics and Communication Engineering, RV College of Engineering, Bengaluru, India
2
Department of Electronics and Communication Engineering, RV College of Engineering, Bengaluru, India
[email protected]; [email protected]

Abstract— Hardware looping is a feature of some processor example, protection concerns, security, high idleness,
instruction sets whose hardware can repeat the body of a correspondence power utilization, and dependability.
loop automatically, rather than requiring software These calculations perform massive arithmetic
instructions which take up cycles (and therefore time) to do computations. To accelerate these calculations at a
so. Loop Unrolling is a loop transformation technique that sensible cost in equipment, we can utilize an instruction
attempts to advance a program's execution speed to the set extension comprised of two instruction types-
detriment of its twofold size, which is a methodology known hardware loops and dot product instructions. The primary
as space–time tradeoff. A convolutional neural network is commitments of this paper are as per the following:
created with simple loops, with hardware looping, with loop
unrolling and with both hardware looping and loop • We propose an approach for computing neural network
unrolling, and a comparison is made to evaluate the functions that are advanced for the utilization of hardware
effectiveness of hardware looping and loop unrolling. The loops and dot product instructions.
hardware loops alone will add to a cycle check decline, while
the mix of hardware loops and dot product instructions will • The effectiveness of hardware loops and dot product
decrease the clock cycle tally further. The CNN is simulated instructions for performing deep learning functions are
on Xilinx Vivado 2021.1 running on Zync-7000 FPGA.  evaluated, and
Index Terms— Convolutional Neural Network, Deep
Learning, FPGA, Hardware Looping, Loop Unrolling, • To perform Lenet-5 Neural Network on Zync-7000
Vivado.
There have been different ways to deal with accelerate
profound learning capacities. The methodologies can be
I. INTRODUCTION sorted into two sets. In the first set, the approaches which
which attempt to enhance the size of the neural
organizations, or, all in all, upgrade the software.
Profound learning calculations have seen achievement in Approaches in the second set try to optimize the
a wide assortment of uses, for example, machine equipment or hardware on which neural networks are
interpretation, picture and discourse acknowledgment, running. As the set used here deals with hardware
and self-driving vehicles. In any case, these calculations optimization, we will focus on the related approaches for
have as of late acquired traction in the installed hardware optimization, and only momentarily observe the
frameworks space. Most implanted frameworks depend progress in software optimizations. A simple instruction
on modest microcontrollers with restricted memory limit, set is proposed to evaluate the effectiveness of hardware
and, subsequently, are commonly seen as not equipped loops and dot product instructions with fully upgraded
for running profound learning calculations. Nevertheless, assembly functions for the fully connected convolutional
we think about that progressions in pressure of neural neural network [1]. A custom instruction set architecture
organizations and neural organization engineering, is used for efficient realization of artificial neural
combined with an improved guidance set design, could networks and can be parameterized to a subjective fixed-
make microcontroller-grade processors appropriate for point design [2]. A CNN –specific Instruction set
explicit low-force profound learning applications. Such architecture is used which deploys the instruction
intricacy is substantially a lot for memory obliged parameters with high flexibility which embeds parallel
microcontrollers that have memory sizes indicated in computation and data reuse parameters in the instructions
kilobytes. Some embedded system designer’s work [3]. The instruction extensions and micro architectural
around the quandary of restricted resources by processing advancements to increase computational thickness and to
neural networks in the cloud. Nonetheless, this limit the amount of pressure towards shared memory
arrangement is restricted to regions with access to the hierarchy in RISC Processors [4]. A repetitive neural
Web. Cloud preparing likewise has different burdens, for organization into a convolutional neural network, and the
profound provisions of the picture were learnt in equal
utilizing the convolutional neural network and the
Manuscript received Month date, 2020; revised XX XX, 2020; intermittent neural organization [5]. The framework of
accepted XX XX, 2020.
Corresponding author: Name (email: XXXXXXX). Complex Network Classifier (CNC) is used by
Identify applicable funding agency here, project number. If none, integrating network embedding and convolutional neural
delete this line. network to tackle the problem of network classification.
By training the classifier on synthetic complex network B. Lenet
data, they showed that CNC can not only classify
Lenet (also called Lenet-5) is an exemplary
networks with high accuracy and robustness but can also
extract the features of the networks automatically [6]. A convolutional neural network which utilizes
valued prediction method is used to exploit the spatial convolutions, pooling and fully connected layers. Lenet is
correlation of zero-valued activations within the CNN used for handwritten digit recognition with the MNIST
output feature maps, thereby saving convolution dataset.
operations [7]. The impact of packet loss on data integrity
is reduced by taking advantage of the deep network’s C. MNSIT Dataset
ability to understand neural data and by using a data MNIST is the acronym for Modified National Institute of
repair method based on convolutions neural network [8]. Standards and Technology database. The MNIST
The instruction set simulation process is used soft-core database contains 60,000 training images and 10,000
Reduced Instruction Set Computer (RISC) processor. testing images. MNIST dataset pictures have the
They provided reliable simulation platform in creating measurements 28 x 28. To get the MNIST pictures
customizable instruction set for Application Specific
Instruction Set Processor (ASIP) [9]. RISC-V ISA measurement to the meet the necessities of the input
compatible processor and effects of instruction set is layer, the 28 x 28 pictures are cushioned or padded. Some
analyzed on the pipeline/micro-architecture design in of the test pictures from MNIST test dataset are as shown
terms of instructions encoding, functionality of in Fig 1:
instructions, instruction types, decoder logic complexity,
data hazard detection, register file organization and
access, functioning of pipeline, effect of branch
instructions, control flow, data memory access, operating
modes and execution unit hardware resources [10].

Deep learning algorithms are used increasingly in smart

applications. Some of them also run in Internet of Things
(IoT) devices. IoT Analytics reports that, by 2025, the
number of IoT devices will rise to 22 billion. The
motivation for our work stems from the fact that the rise
of the IoT will increase the need for low-cost devices
built around a single microcontroller capable of
supporting deep learning algorithms. Accelerating deep
learning inference in constrained embedded devices, Fig 1: Some Test pictures from MNIST dataset
presented in this paper, is our attempt in this direction. D. Building Blocks of Convolutional Neural Network
The remainder of this paper is organized as follows.  Convolutional Layer is the core center structure
Section II presents the related work in hardware and of CNN. In a CNN, the information is a tensor
software enhancements aimed at accelerating the neural with a shape: (number of data sources) x (input
network computation. Section III shows the methodology height) x (input width) x (input channels). In the
concerned with the project. Section IV shows the results wake of going through a convolutional layer, the
and the subsequent discussions regarding the obtained picture becomes disconnected to an element
results. Section V presents the conclusion and plans for map, likewise called an actuation map, with
additional work for the future.
shape: (number of data sources) x (highlight
map height)) x (include map width) x (feature
II. BACKGROUND map channels). Convolutional layers convolve
the information and pass its outcome to the
A. Convolutional Neural Networks following layer.
A convolutional neural organization (CNN, or  Pooling Layer: Convolutional networks may
ConvNet) is a class of artificial neural organization, most incorporate local and/or global pooling layers
generally applied to understand visual imagery. A CNN is alongside traditional convolutional layers.
a deep learning algorithm which can can take in an Pooling layers diminish the components of
information picture, allot significance, to different information by combining the outputs of neuron
viewpoints/objects in the picture and have the option to clusters at one layer into a single neuron in the
separate or differentiate one from the other. The pre- following layer. Local pooling combines small
processing needed in a CNN is a lot of lower when clusters, tiling sizes such as 2 x 2 are commonly
contrasted with other characterization calculations. While used. Global pooling acts on all the neurons of
in crude techniques channels are hand-designed, with the feature map. There are two common types of
enough preparing, CNNs can gain proficiency with these pooling in popular use: max and average. Max
channels/qualities. CNN has a lot of applications in pooling uses the maximum value of each local
decoding facial recognition, analyzing documents, cluster of neurons in the feature map, while
historic and environmental collections, understanding average pooling takes the average value.
climate, grey areas to see holistic view of what a human  Fully Connected Layers: Fully connected layers
sees, advertising and other fields. connect every neuron in one layer to each
neuron in another layer. It is equivalent to a
conventional multi-layer perceptron neural
network (MLP). The smoothed network goes
through a fully connected layer to characterize
the pictures.
 Receptive field: In neural networks, each neuron
gets input from some number of locations in the
past layer. In a convolutional layer, each neuron
receives input from only a restricted area of the
previous layer called the neuron's receptive field.
Ordinarily the area is a square (e.g. 5 by 5
neurons). Whereas, in a fully connected layer,
the receptive field is the entire past layer.
Accordingly, in each convolutional layer, each Fig 2: Architecture of Lenet-5 Model
neuron takes input from a bigger region in the
input than previous layers. This is expected due The following table is used to understand the
to applying the convolution over and over, architecture in more detail.
which takes into account the value of a pixel, as Layer #filters Filter Stride Size of Activati
/ Size Feature on
well as its encompassing pixels. When utilizing Neuron map Function
dilated layers, the number of pixels in the Input - - - 32x32x1
receptive field remains constant, but the field is Conv 1 6 5x5 1 28x28x6 tanh
more scantily populated as its measurements Avg 2x2 2 14x14x6
grow when combining the impact of several Pooling 1
Conv 2 16 5x5 1 10x10x1 tanh
layers. 6
 Weights: Every neuron in a neural network Avg 2x2 2 5x5x16
registers an output value by applying a particular Pooling 2
function to the input values received from the Conv 3 120 5x5 1 120 tanh
receptive field in the past layer. The function Fully - - - 84 tanh
Connected
that is applied to the input values is dictated by a 1
vector of weights and a bias (ordinarily real Fully - - - 10 Softmax
numbers). Learning consists of iteratively Connected
adjusting these biases and weights. 2
 The vector of weights and the bias are called
filters and represent specific features of the input The first layer is the input layer with highlight map
(e.g., a specific shape). A distinctive feature of size 32 x 32 x 1.
CNNs is that many neurons can share the same Then, at that point, we have the first convolution layer
filter. This decreases the memory footprint with 6 channels of size 5 x 5 and step is 1. The initiation
because a single bias and a single vector of work utilized at his layer is hyperbolic tangent (tanh).
weights are utilized across all receptive fields The output feature map is 28 x 28 x 6.
that share that filter, as opposed to each Then, we have an averagel pooling layer with channel
receptive field having its own bias and vector size 2 x 2 and step 1. The subsequent component map is
weighting. 14 x 14 x 6. Since the pooling layer doesn't influence the
quantity of channels.
III. LENET ARCHITECTURE After this comes the second convolution layer with 16
filters of 5 x 5 and step 1. Also, the activation function is
Lenet is a pre-trained Convolutional Neural Network tanh. Now the output size is 10 x 10 x 16.
Model used for recognizing handwritten and machine- Again comes the other average pooling layer of 2 x 2
printed characters. The organization has 5 layers with with step 2. As a result, the size of the feature map
learnable boundaries and thus named Lenet-5. It has three reduced to 5 x5 x16.
arrangements of convolution layers with a blend of The final pooling layer has 120 filters of 5 x5 with
average pooling. After the convolution and average stride 1 and activation function tanh. Now the output size
pooling layers, we have two fully connected layers. is 120.
Finally, a Softmax classifier is used to characterize the The next is a fully connected layer with 84 neurons
pictures into separate class. The lenet layers are depicted that result in the output to 84 values and the activation
in the following fig 2. function used here is again tanh.
The last layer is the output layer with 10 neurons and
Softmax function. The Softmax gives the likelihood that
an information point has a place with a specific class. The
most elevated worth is then anticipated.
This is the entire architecture of the Lenet-5 model.
The number of trainable parameters of this architecture is
around 60,000.

IV. RESULTS
The, TABLE gives the hardware cost occupied by our
design on the Zync-7000 board. It shows that the design
occupies 47% of LUTs, 19% of LUTRAMs, 28% of FFs,
59% of BRAMs, 54% of DSP, and 3% of BUFG.
Although, the implemented design can be displayed to
gives us idea on how the design have been distributed, Fig 4: Behavioral simulation on Xilinx Vivado
placed and routed on the selected Zync-7000 board.
C. Implementation
Resource Utilization Available Utilization % Once the implementation is reached the
LUT 25456 53200 47.85
implementation summary that summarizes all
LUTRAM 3478 17400 19.99
FF 30456 106400 28.62 implantation report will be provided. The Fig 5 depicts
BRAM 83.5 140 59.64 the implemented design.
DSP 120 220 54.55
BUFG 1 32 3.13

A. Software Model (MATLAB)

In the software model goes through the whole LeNet
CNN and give us its prediction. Fig 3 depicts result of the
software model on matlab.

Fig 5: Implemented Design of Lenet-5

Fig 3: Matlab Result

B. Hardware Model V. CONCLUSION

For the Hardware model, it gives us 10 outputs for the In this paper, we proposed Lenet-5 Convolutional
score of all 10 digits which are digit 0, 1, 2, 3, 4, 5, 6, 7, Neural Network with optimizations done using hardware
8, and 9. The highest score should be the correct answer. looping and dot product units, which provided high
So, you can see from all images the the highest score is accuracy when recognizing handwritten data using the
the correct value. Fig 4 shows the simulation result from MNSIT dataset. The hardware loops alone contribute to
Xilinx Vivado. In the below example where the number 0 24% cycle count decrease, while the dot products reduce
has been taken, the highest score 5 shows that 0 is the the cycle count by 27%. As embedded systems are highly
predicted number. price-sensitive, this is an important consideration. Getting
the sizes of neural networks down is an essential step in
expanding the possibilities for neural networks in
embedded systems.
An interesting topic for further research is Posit - an Algorithm," in IEEE Access, vol. 8, pp. 125731-125744, 2020,
alternative floating-point number format that may offer doi: 10.1109/ACCESS.2020.3006097.
[6] Xin, J. Zhang and Y. Shao, "Complex network classification with
additional advantages, as it has an increased dynamic convolutional neural network," in Tsinghua Science and
range at the same word size. Because of the improved Technology, vol. 25, no. 4, pp. 447-457, Aug. 2020, doi:
dynamic range, weights could be stored in lower 10.26599/TST.2019.9010055.
precision, thus, again, decreasing the memory [7] Shomron and U. Weiser, "Spatial Correlation and Value
Prediction in Convolutional Neural Networks," in IEEE Computer
requirements. Combining the reduced size requirements Architecture Letters, vol. 18, no. 1, pp. 10-13, 1 Jan.-June 2019,
with low-cost ISA improvements could make neural doi: 10.1109/LCA.2018.2890236
networks more ubiquitous in the price-sensitive [8] Y. Qie, P. Song and C. Hao, "Data Repair Without Prior
embedded systems market. Knowledge Using Deep Convolutional Neural Networks," in IEEE
Access, vol. 8, pp. 105351-105361, 2020, doi:
10.1109/ACCESS.2020.2999960.
[9] A. J. Salim, S. I. M. Salim, N. R. Samsudin and Y. Soo,
CONFLICT OF INTEREST "Customized instruction set simulation for soft-core RISC
processor," 2012 IEEE Control and System Graduate Research
The authors declare no conflict of interest. Colloquium, Shah Alam, Selangor, 2012, pp. 38-42, doi:
10.1109/ICSGRC.2012.6287132.
[10] A. Raveendran, V. B. Patil, D. Selvakumar and V. Desalphine, "A
AUTHOR CONTRIBUTIONS RISC-V instruction set processor-micro-architecture design and
analysis," 2016 International Conference on VLSI Systems,
Bhargav Bhat conducted the research, analyzed data, Architectures, Technology and Applications (VLSI-SATA),
and wrote the paper; Dr. Abhay A Deshpande acted as Bangalore, 2016, pp. 1-7, doi: 10.1109/VLSI-
supervisor and advisor of the research and edited the SATA.2016.7593047.
paper.

REFERENCES Bhargav Bhat is a post graduate student of

VLSI Design and Embedded systems. He is
(Periodical style)
currently pursuing his final year in Master of
[1] J. Vreča et al., "Accelerating Deep Learning Inference in Technology degree at RV College of
Constrained Embedded Devices Using Hardware Loops and a Dot Engineering, Bengaluru, India. His areas of
Product Unit," in IEEE Access, vol. 8, pp. 165913-165926, 2020, interest are VLSI, RTL design and Automotive
doi: 10.1109/ACCESS.2020.3022824. Embedded Systems.
[2] D. Valencia, S. F. Fard and A. Ali mohammad, "An Artificial
Neural Network Processor With a Custom Instruction Set
Architecture for Embedded Applications," in IEEE Transactions
on Circuits and Systems I: Regular Papers, vol. 67, no. 12, pp.
5200-5210, Dec. 2020, doi: 10.1109/TCSI.2020.3003769.
[3] X. Chen and Z. Yu, "A Flexible and Energy-Efficient Dr. Abhay Deshpande is currently working as
Convolutional Neural Network Acceleration With Dedicated ISA Associate Professor in the Department of
and Accelerator," in IEEE Transactions on Very Large Scale Electronics and Communication Engineering at
Integration (VLSI) Systems, vol. 26, no. 7, pp. 1408-1412, July RV College of Engineering, Bengaluru. He is
2018, doi: 10.1109/TVLSI.2018.2810831. having 8 years of teaching experience and 1
[4] M. Gautschi, Pasquale Davide Schiavone, Andreas Traber, Igor year of industry experience. He received Ph.D
Loi, "Near-Threshold RISC-V Core With DSP Extensions for degree in control system from VTU belagavi
Scalable IoT Endpoint Devices," in IEEE Transactions on Very and M.Tech from VTU Belagavi. His research
Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700- interests include control systems and Robotics
2713, Oct. 2017, doi: 10.1109/TVLSI.2017.2654506. and DSP.
[5] Y. Tian, "Artificial Intelligence Image Recognition
Method Based on Convolutional Neural Network

Nptel Research Assignment 7
100% (2)
Nptel Research Assignment 7
6 pages
COA Multiple Choice Questions and Answers PDF
100% (2)
COA Multiple Choice Questions and Answers PDF
32 pages
Nptel Research Assignment 2
No ratings yet
Nptel Research Assignment 2
5 pages
Applsci 12 10771 v2
No ratings yet
Applsci 12 10771 v2
44 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Research On Opencl Optimization For Fpga Deep Learning Application
No ratings yet
Research On Opencl Optimization For Fpga Deep Learning Application
19 pages
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
10.1515 - Nanoph 2020 0297
No ratings yet
10.1515 - Nanoph 2020 0297
12 pages
applsci-15-00688-v3
No ratings yet
applsci-15-00688-v3
21 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
Capra 2020
No ratings yet
Capra 2020
48 pages
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
101 pages
Design and Implementation of Convolutional Neural Network Accelerator Based On RISCV
No ratings yet
Design and Implementation of Convolutional Neural Network Accelerator Based On RISCV
6 pages
Bringing Deep Learning To Embedded Systems: Mark Nadeski
No ratings yet
Bringing Deep Learning To Embedded Systems: Mark Nadeski
7 pages
Accelerating Deep Neural Networks Implem
No ratings yet
Accelerating Deep Neural Networks Implem
18 pages
An_End-to-End_Workflow_to_Efficiently_Compress_and_Deploy_DNN_Classifiers_on_SoC_FPGA
No ratings yet
An_End-to-End_Workflow_to_Efficiently_Compress_and_Deploy_DNN_Classifiers_on_SoC_FPGA
4 pages
A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration
No ratings yet
A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration
10 pages
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
No ratings yet
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
19 pages
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
No ratings yet
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
8 pages
Embedded Deep Learning Accelerators - A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators - A Survey On Recent Advances
19 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
Sway 020 A
No ratings yet
Sway 020 A
7 pages
10.1109VDAT50263.2020.9190274
No ratings yet
10.1109VDAT50263.2020.9190274
6 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
Analog Architectures For Neural Network Acceleration Based On Non-Volatile Memory
No ratings yet
Analog Architectures For Neural Network Acceleration Based On Non-Volatile Memory
35 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
3460971
No ratings yet
3460971
20 pages
transforming-edge-ai-with-npus-in-microcontrollers
No ratings yet
transforming-edge-ai-with-npus-in-microcontrollers
12 pages
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
No ratings yet
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
7 pages
Quantization and Deployment Od DNN On Microcontroller
No ratings yet
Quantization and Deployment Od DNN On Microcontroller
34 pages
thesis-2
No ratings yet
thesis-2
144 pages
Implementation of FPGA-based Accelerator For CNN
No ratings yet
Implementation of FPGA-based Accelerator For CNN
7 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
02-92
No ratings yet
02-92
15 pages
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
No ratings yet
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
18 pages
Tcas-I Haco Final
No ratings yet
Tcas-I Haco Final
14 pages
rongshi2019
No ratings yet
rongshi2019
4 pages
TensorFlow Lite Micro Embedded Machine L
No ratings yet
TensorFlow Lite Micro Embedded Machine L
13 pages
BNN in FPGA
No ratings yet
BNN in FPGA
15 pages
FP-BNN-on-FPGA
No ratings yet
FP-BNN-on-FPGA
15 pages
Unleashing the Potential of Alternative Deep Learning Hardware - EE Times
No ratings yet
Unleashing the Potential of Alternative Deep Learning Hardware - EE Times
5 pages
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
No ratings yet
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
12 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
Fast Algorithms For Spiking Neural Network Simulation With Fpgas
No ratings yet
Fast Algorithms For Spiking Neural Network Simulation With Fpgas
34 pages
Design and implementation of deep neural network hardware chip and its performance analysis
No ratings yet
Design and implementation of deep neural network hardware chip and its performance analysis
10 pages
MythicWhitepaper-2019oct31
No ratings yet
MythicWhitepaper-2019oct31
9 pages
EAI - Lecture 2
No ratings yet
EAI - Lecture 2
21 pages
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
No ratings yet
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
12 pages
Reconfigurable Hardware Design Approach For Economic Neural Network
No ratings yet
Reconfigurable Hardware Design Approach For Economic Neural Network
5 pages
Train Me If You Can: Decentralized Learning On The Deep Edge
No ratings yet
Train Me If You Can: Decentralized Learning On The Deep Edge
18 pages
2020 Hong
No ratings yet
2020 Hong
18 pages
Lecture 1 - Intro
No ratings yet
Lecture 1 - Intro
57 pages
Redactor
No ratings yet
Redactor
8 pages
Performance Modeling For CNN Inference Accelerators On FPGA
No ratings yet
Performance Modeling For CNN Inference Accelerators On FPGA
14 pages
FireFly A High-Throughput Hardware Accelerator For Spiking Neural Networks With Efficient DSP and Memory Optimization
No ratings yet
FireFly A High-Throughput Hardware Accelerator For Spiking Neural Networks With Efficient DSP and Memory Optimization
14 pages
Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators
No ratings yet
Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators
14 pages
Tiny Machine Learning
No ratings yet
Tiny Machine Learning
39 pages
Fpga Model To Implement Handwritten Digit Recognition
No ratings yet
Fpga Model To Implement Handwritten Digit Recognition
48 pages
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
No ratings yet
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
21 pages
Spiking Neiral Network
No ratings yet
Spiking Neiral Network
15 pages
Fixed-Point CNN For FPGA
No ratings yet
Fixed-Point CNN For FPGA
7 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
PDF 16mve31
No ratings yet
PDF 16mve31
3 pages
Harry Potter Rise From Dust by Bluezz-17-w1QkNYFP
No ratings yet
Harry Potter Rise From Dust by Bluezz-17-w1QkNYFP
810 pages
IREB
No ratings yet
IREB
5 pages
International Journal of Electronics and Communications (AEÜ)
No ratings yet
International Journal of Electronics and Communications (AEÜ)
13 pages
Microprocessors and Microsystems: Prateek Sikka, Abhijit R Asati, Chandra Shekhar
No ratings yet
Microprocessors and Microsystems: Prateek Sikka, Abhijit R Asati, Chandra Shekhar
8 pages
Microprocessors and Microsystems: Prateek Sikka, Abhijit R. Asati, Chandra Shekhar
No ratings yet
Microprocessors and Microsystems: Prateek Sikka, Abhijit R. Asati, Chandra Shekhar
6 pages
Assignment 3 PDF
100% (1)
Assignment 3 PDF
6 pages
Assignment 3 PDF
100% (1)
Assignment 3 PDF
6 pages
PG - June 2020: S.No M BT CO
No ratings yet
PG - June 2020: S.No M BT CO
2 pages
Nptel Research Assignment 1
100% (3)
Nptel Research Assignment 1
5 pages
The Philosophy of Computer Science (Stanford Encyclopedia of Philosophy)
No ratings yet
The Philosophy of Computer Science (Stanford Encyclopedia of Philosophy)
35 pages
Computer System and Organization-Chap1
No ratings yet
Computer System and Organization-Chap1
10 pages
CAO Notes 1
No ratings yet
CAO Notes 1
48 pages
UNIT I (1)
No ratings yet
UNIT I (1)
35 pages
Compiling LISP Procedures
No ratings yet
Compiling LISP Procedures
5 pages
Problem Problem-Solving and and Program Design
No ratings yet
Problem Problem-Solving and and Program Design
20 pages
Computer Hardware
No ratings yet
Computer Hardware
64 pages
ATmega 328 P
No ratings yet
ATmega 328 P
17 pages
09-Advanced Computer Architecture
No ratings yet
09-Advanced Computer Architecture
1 page
Csc303 _syllabus_ Module _one and Two
No ratings yet
Csc303 _syllabus_ Module _one and Two
52 pages
exploring-instruction-set-architectural-variations-x86-arm-and-riscv-in-computeintensive-applications
No ratings yet
exploring-instruction-set-architectural-variations-x86-arm-and-riscv-in-computeintensive-applications
6 pages
Processor Architecture and Basics
No ratings yet
Processor Architecture and Basics
12 pages
Parallel Processors: Session4 Program Partitioning and Computational Granularity
No ratings yet
Parallel Processors: Session4 Program Partitioning and Computational Granularity
39 pages
COA Assignment-3_Unit-3
No ratings yet
COA Assignment-3_Unit-3
2 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
Title:: in Prelab Task1, We Will Count The 1's in Number. First Take Value in Ay Register. Then Take
No ratings yet
Title:: in Prelab Task1, We Will Count The 1's in Number. First Take Value in Ay Register. Then Take
5 pages
GNS 108-WPS Office
No ratings yet
GNS 108-WPS Office
30 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
9 pages
Von Newman Architecture
No ratings yet
Von Newman Architecture
5 pages
Microprocessor - CP - 2021 - BB Upload
0% (1)
Microprocessor - CP - 2021 - BB Upload
9 pages
DSP Lab Manual - Final
No ratings yet
DSP Lab Manual - Final
72 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Lecture 1 - Parallel and Distributed Computing
100% (1)
Lecture 1 - Parallel and Distributed Computing
25 pages
Opena, Neil Piolo Tristian L
No ratings yet
Opena, Neil Piolo Tristian L
9 pages
Session - 23 Operands Instruction Formats and Addressing Modes
No ratings yet
Session - 23 Operands Instruction Formats and Addressing Modes
27 pages
Data Processing SS2 Second Term-1
No ratings yet
Data Processing SS2 Second Term-1
37 pages
Introduction To Flowchart
No ratings yet
Introduction To Flowchart
34 pages
module4_DDCO
No ratings yet
module4_DDCO
27 pages
Session 01 Introduction
No ratings yet
Session 01 Introduction
16 pages

Accelerated Deep Learning Inference From Constrained Embedded Devices

Uploaded by

Accelerated Deep Learning Inference From Constrained Embedded Devices

Uploaded by

Accelerated Deep Learning Inference from

Constrained Embedded Devices

Deep learning algorithms are used increasingly in smart

A. Software Model (MATLAB)

Fig 5: Implemented Design of Lenet-5

Fig 3: Matlab Result

B. Hardware Model V. CONCLUSION

REFERENCES Bhargav Bhat is a post graduate student of

You might also like