Rongshi 2019

This paper presents an FPGA-based implementation of the Lenet-5 convolutional neural network (CNN) to enhance image recognition performance. The authors optimize convolution operations to improve computing parallelism, data throughput, and energy efficiency, achieving over four times the throughput of traditional processors while consuming only 1.8W of power. The results demonstrate the potential of FPGA platforms for efficient deep learning applications, with a total performance of 0.343 GFLOPS.

Uploaded by

hathanh2124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views4 pages

Rongshi 2019

Uploaded by

hathanh2124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2019 3rd International Conference on Circuits, System and Simulation

Accelerator Implementation of Lenet-5 Convolution Neural Network Based on

FPGA with HLS

Dai Rongshi Tang Yongming

School of Electronic Science and Engineering School of Electronic Science and Engineering
Southeast University Southeast University
Nanjing, China Nanjing, China
e-mail: [email protected] e-mail: [email protected]

Abstract—Convolution neural network is widely used in image performance of convolution neural network, which can
recognition because it can imitate the behavioral balance the data throughput and power consumption properly.
characteristics of biological visual nerve, and has high However, due to the high cost and long R & D cycle of
recognition accuracy. It is a kind of feed forward neural ASIC chip, it is difficult to use it on a large scale. Therefore,
network which contains convolution computation and has deep FPGA is a suitable platform in a short time. The convolution
structure. Also it is one of the representative algorithms of neural network accelerator based on FPGA has attracted
deep learning. Because the convolution neural network has a more and more researchers' attention, because it has the
special calculation mode, the general processor is not high in advantages of good performance, high efficiency, fast
the implementation efficiency of the convolution neural
development cycle and strong reconstruction ability [1][2][3].
network and can not meet the performance requirements. In
In the implementation of different model algorithms,
order to solve this problem, we implement convolution neural
network on FPGA and optimize convolution operation to there are different optimization schemes. While making
improve computing parallelism, data throughput and energy reasonable use of the resources of FPGA, the efficiency
efficiency of traditional processors. Finally, we implemented optimization can be satisfied as far as possible. Different
the convolution neural network of the Lenet-5 model on the design schemes may lead to the huge transformation of
ZYBOZ7 FPGA board and compared it with traditional performance, especially the data read many times, the
processor. We realized the fast recognition of a picture at the multiple storage results in the waste of resources and the
frequency of 100M Hz with DMA control. The data delay of time. This paper completes the acceleration of
throughput of FPGA is more than four times higher than that convolution neural network on FPGA. Firstly, we propose an
of general processor, and the power consumption is 1.8W, operation mode to realize data flow on FPGA. Secondly, we
which is much lower than that of general processor. reduce the repeated reading of data through register buffer
area in the convolution process, and puts forward an
Keywords-FPGA; Lenet-5; HLS; convolution neural network; operation mode to realize the data flow in the process of
optimization convolution. In the end, in order to avoid the time delay
caused by the processor and the low bandwidth effect of the
I. INTRODUCTION
interface part, the direct memory acquisition method is
In recent years, convolution neural network is widely adopted in the data stream acquisition.
used as a kind of neural network, which has great effects in
different fields. As a well-known in-depth learning II. BACKGROUND
framework, convolution neural networks have a lot of
A. CNN Basics
applications in various fields, including machine vision,
image search, image classification and so on [10]. Inspired The research on convolution neural network began from
by biological visual nerve, convolution neural network uses 1980s to 1990s. Time delay network and Lenet-5 are the
convolution kernel for sliding operation, extract features earliest convolution neural networks. After the 20th century,
from image, and transforms features into final results by convolution neural network has been developed rapidly with
mapping between layers. And it has relatively high accuracy the development of deep learning theory and the
in image recognition. improvement of numerical computing equipment, and has
In most of the front-end platform implementation been widely used in computer vision, natural language
methods, convolution neural network implementation processing and other fields [9]. The convolution neural
depends on CPU or GPU to complete calculation. But for network is used to construct the visual perception mechanism
some tasks, the front-end platform has the characteristics of of bionic organisms. The convolution kernel parameters in
small size and low power, and the general processor is not the hidden layer and the sparse connection between the
efficient enough. Then we need to find a new front-end network layers enable the convolution neural network to
platform replacement to complete the task [8]. lattice the layer data with less computation in order to
According to the above requirements, FPGA and ASIC facilitate feature extraction. When convolution neural
chips are the new development direction [3] [9]. FPGA and networks are used as supervised learning algorithms,
ASIC chips are used as front-end accelerators to improve the feedforward part is often used for image recognition,
classification, and feedback part for network training.

Most users use trained weight data to use convolution features of these images are not unique. They can contain
neural network and complete some real-time tasks, so the most of the shapes, mainly through the sliding of the
speed of feedforward calculation is more important. In this convolution core. The weight of convolution kernel should
paper, we focus on the acceleration of feedforward part of include one-to-one mapping between input layer and output
FPGA in convolution neural networks. layer. Classifier is the process of transforming feature layer
The traditional neural network architecture consists of into output structure.
two parts: feature extractor and classifier. The function of the As shown in Figure 1, in the case of Lenet-5, feature
feature extractor is to extract the features of the input image. extractors often contain convolution and downsampling
By mapping the feature map to another feature map, the layers. The full connection layer is a classifier.

Figure 1. Lenet-5 structure

The Lenet-5 network model is a three-layer convolution The data operations of convolution neural networks are
layer, a two-layer downsampling layer and a two-layer full carried out in FPGA, but the RAM resources in FPGA are
connection layer. often very low. The intermediate results are cached by
TABLE I. LENET-5 NETWORK LAYER PARAMETER
interacting with the external DDR. Data transmission
through high-speed data interface, it starts to speed up the
Layer Conv1 Conv2 Conv3 Fullconnect1 Fullconnect2 operation process.
Weights <1,5,5,6> <6,5,5,16> <16,5,5,120> <120,84> <84,10> III. ACCELERATOR DESIGN EXPLORATION
Bias 6 16 120 84 10 This section starts with an overview of our accelerator
architecture and introduces several design challenges on the
B. FPGA Speedup FPGA platform. In order to overcome these challenges, we
FPGA clock is usually just a few hundred MHz, but the propose corresponding optimization techniques.
general purpose processor's main frequency is as high as a A. Design Overview
few GHz, but it's often speeded up faster than the general
purpose processor. A general purpose processor may need
many clock cycles to perform specific operations (such as
signal processing, image processing), but FPGA can directly
generate a dedicated circuit by programming a reformer
circuit, optimizing the parallel operation, pipelining, and
optimizing the memory, improves the speed of reading and
writing, greatly improves the speed of operation on specific
operations.
In this paper, there are several optimization methods to
improve the speed in FPGA [12].
Array partitioning can change the order of the array in
memory, as well as the number of ports to increase the data
exchange rate.
Array Reshaping can change the bit width of the memory. Figure 2. Overview of accelerator design
By changing the bit width, more data can be transferred in
one input / output. As shown in Figure 2, a CNN accelerator design includes:
Unrolled Loop can be expanded by unfolding the loop, the arm, DDR, AXI DMA, CNN IP kernel. Arm interacts
increases the parallelism of read and write and reduces the with the outside data, writes the input stream data into the
delay of operations. DDR, and then DMA imports the data stream into the CNN
Loop Pipelining pipelining the loop to start running the IP core through the AXI stream protocol in MM2S (memory-
next command before the last command operation is over. mapped to stream).

65
The CNN IP kernel is optimized by vivado HLS (high- the weight and out variables are reorganized appropriately
level synthesis). The whole system is generated by diagram [6], as shown in Figure 7.
in vivado and the arm-side operation is completed in vivado
SDK.
B. HLS Optimization
Xilinx Vivado ®High Level Synthesis tool converts C
language to register transfer level (RTL) implementation,
and can be integrated into Xilinx field programmable gate
array (FPGA). The C specification can be written using the C,
SystemC or Open Computing language (OpenCL) API C
kernel, and FPGA provides a massively parallel architecture
that outperforms traditional processors in terms of
performance, cost, and power consumption.
Its main advantages are:
1) Improve the development efficiency of hardware
design. Figure 3. Unroll loop circuit
2) Improve the system performance of software design.
3) At the C language level, the algorithm is developed We get the formula (2) to shorten the performance to a
quarter of the original.
and verified.
4) Using optimization instructions to complete C out = tp ∗ (weight1 + weight2 + weight3 + weight4)
language to HDL multi-implementation. (2)
5) Create readable and portable C language code.
Although the loop has been optimized, the reading of
For-loop optimization: The most time-consuming part of
parameters after the loop unfolds will first be limited to the
the convolution neural network is convolution operation,
port of memory, one clock memory can only be read one
which mainly optimizes the convolution operation [7]. For
data, so memory optimization is required. the main methods
three-dimensional input images, convolution operations
are Array partitioning and Array reshaping.
usually have six layers of loops, namely, input layers, output
After optimization, the number of memory ports and the
layers, image row, image column, convolution kernel row
multiple of loop expansion can be the same.
and convolution kernel column. Therefore, convolution
operations by traditional processors will spend a lot of time C. System Setup
on convolution operations [5]. The high-level integrated IP core is added to the vivado
project. After configuring the system through ARM, the data
The convolution operation code is as follows: flow is controlled by DMA [11], and the data in DDR is
for(i=0;i<Depth;i++){ directed to the convolution neural network (IP).
for(m=0;m<N4;m++){ DMA transfer copies data from one address space to
for(n=0;n<N4;n++){ another. When ARM initializes this transmission action, the
for(p=0;p<M;p++){ transmission action itself is implemented and completed by
for(q=0;q<M;q++){ the DMA controller. DMA transmission is very important for
tp=imagein[i][m+p][n+q]; high-performance embedded system algorithms and
for(j=0;j<Depth1;j++){ networks.
#pragma HLS UNROLL As shown in the following figure, first configure the data
out[j][m][n]+=(tp*weight[i][j][p][q]); in DDR through ARM, then configure DMA to read the data
}}}}}} stream through AXI stream, and DMA input the data stream
i is the number of input layers, j is the number of output layers,
into CNN IP, ARM to read the recognition result of
m is the output image row, n is the output image column, p is the
convolution kernel row, and Q is the convolution kernel column.
convolution neural network through AXI lite bus [4].

By formula (1)
out[j][m][n]+= ( imagein[i][m + p][n + q] ∗ weight[i][j][p][q])
(1)
The input imagein is independent of the j (output layer)
size, so the input layer is assigned to the tp register outside
the output layer. Convolution operations will increase
Depth1 times as fast as resources allow.
Convolution operations up to six layers of Loop.
Considering the resource capacity, the cyclic UNROLL, and
Figure 4. Convolution neural network control structure

66
The main functions in the control structure are as follows: 4.7x speedup compared with the general purpose processor
1) Write data streams to DDR implementation. With lower power consumption, the total
2) Configure DMA module by AXI lite performance of our accelerator reaches 0.343 GFLOPS.
3) Read data streams through DMA to write to CNN IP
4) Finally read the recognition result V. CONCLUSION
In this paper, we proposed a CNN FPGA acceleration
IV. EVALUATION method based on Lenet-5 model with FPGA platform ZYBO
This section first introduces our lab environment settings, Z7. First, the calculation and memory access of CNN is
and then provides a full range of lab results. optimized, and then all possible problems are modeled under
the Lenet-5 model to find the optimal solution for each layer.
A. Experimental Setup We find the best optimization design by comparing the
The accelerator is designed with Vivado HLS (v2018.2). results of many experiments. Finally, the performances on
The tool allows the accelerator to be implemented in C the ZYBO Z7 board can achieve achieved a low power of
language and exports RTL for a Vivado IP kernel. CNN 1.8 W with a data throughput of 0.343 GFLOP which was
design C code by adding a compilation wizard defined by much better than on traditional processor. Due to the unique
HLS to implement parallelization, and the parallel version is structure and the computing ability of the FPGA, the FPGA
verified by a time series analysis tool. The rapid pre- has great potential in low power consumption, and the
synthesis simulation is accomplished by the combination of development of the FPGA has a wide prospect in the case of
C simulation and C/RTL simulation of the tool. The pre- satisfying the load requirement.
synthesis resource report is used to design space exploration
and performance estimation. The exported RTL is REFERENCES
synthesized and implemented using Vivado v2018.2. [1] D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj, C. Alfredo, M.
Our implementation is based on a ZYBO Z7 board with a Berin, and C. Eugenio. Accelerating deep neural networks on mobile
Xilinx FPGA chip zynq7020. It operates at 100MHz and the processor with embedded programmable logic. In NIPS 2013. IEEE,
2013.
software implementation runs on the Intel (R) Core (TM)
i5x4590 [email protected] which contains a 4-core, based on [2] S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf.
A programmable parallel accelerator for learning and classification.
windows 10 system, and a Dell-based 02YYK5 board. In Proceedings of the 19th international conference on Parallel
architectures and compilation techniques. ACM, 2010, pp.273-284.
B. Result
[3] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A
In this section, we first report on resource occupancy, and dynamically configurable coprocessor for convolutional neural
then compare the software implementation of (CPU) with networks. In ACM SIGARCH Computer Architecture News, vol. 38,
our accelerator implementation of (FPGA). Finally, the ACM, 2010, pp. 247-257.
comparison between our implementation and the existing [4] Zhang C, Li P, Sun G, et al. Optimizing fpgabased accelerator design
FPGA implementation is given. for deep convolutional neural networks[C]. Acm/sigda International
Symposium on Field-programmable Gate Arrays. ACM, 2015,
Routing is provided by the Vivado toolset. The tool then pp.161-170.
reports on resource occupancy, as shown in Table II. We can [5] J. Cong and B. Xiao. Minimizing computation inconvolutional neural
see that our CNN accelerator has almost fully utilized the networks. In Artificial NeuralNetworks and Machine Learning,
hardware resources of FPGA. In Table III, we compare CPU ICANN 2014, Springer, 2014, pp.281-290.
to FPGA in detail. In the use of the CPU, we built the [6] Ma Y, Cao Y, Vrudhula S, et al. Optimizing loop operation and
lenent5 network through MATLAB and realized the function dataflow in FPGA acceleration of deep convolutional neural networks.
of recognizing handwritten numbers. Acm/sigda International Symposium on Field-programmable Gate
Arrays. ACM, 2017, pp. 45-54.
TABLE II. FPGA RESOURCE UTILIZATION [7] Qiu J, Wang J, Yao S, et al. Going Deeper with Embedded FPGA
Resource DSP BRAM LUT FF Platform for Convolutional Neural Network. Proceedings of the 2016
Used 125 119.5 14659 14172 ACM/SIGDA International Symposium on Field-Programmable Gate
Available 220 140 53200 106400 Arrays. ACM, 2016, pp. 26-35.
Utilization 56.82% 85.36% 27.55% 13.32% [8] Cesare Alippi, Simone Disabato, et al. Moving Convolutional Neural
Networks to Embedded Systems: The AlexNet and VGG-16 case [C].
TABLE III. PERFORMANCE COMPARISON TO CPU ACM/IEEE International Conference on Information Processing in
Float CPU 3.3GHz FPGA Sensor Networks, 2018, pp. 212-223
32bit ms ms [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
Layer1 9.230 2.72026 classification with deep convolutional neural networks. In F. Pereira,
Layer2 19.513 1.97655 C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural
Layer3 13.275 3.11795 Information Processing Systems 25, Curran Associates, Inc,2012, pp.
Layer4 0.201 1.21707 1097-1105.
Layer5 0.163 0.10270
[10] Ketkar N. Convolutional Neural Networks[J]. 2017.
Total 42.382 9.13516
Overall GFLOPS 0.073 0.343 [11] Kim D, Managuli R, Kim Y. Data Cache and Direct Memory Access
Speedup 1X 4.7X in Programming Mediaprocessors,20rd ed., vol. 4. Micro IEEE,
2001, pp.33-42.
We choose the CPU of 3.3GHz and the FPGA [12] Xilinx:Vivado Design Suite User Guide High-Level
Synthesis,UG902,2017.
accelerator to compare. The FPGA implementation achieves

Final Business Plan For Quarrying and Aggregate Plant
100% (4)
Final Business Plan For Quarrying and Aggregate Plant
46 pages
Lareau, William - Office Kaizen - Transforming Office Operations Into A Strategic Competitive Advantage-American Society For Quality (ASQ) (2003)
100% (2)
Lareau, William - Office Kaizen - Transforming Office Operations Into A Strategic Competitive Advantage-American Society For Quality (ASQ) (2003)
178 pages
(Lecture Notes in Computer Science 10943) Ying Tan, Yuhui Shi, Qirong Tang - Data Mining and Big Data-Springer International Publishing (2018)
No ratings yet
(Lecture Notes in Computer Science 10943) Ying Tan, Yuhui Shi, Qirong Tang - Data Mining and Big Data-Springer International Publishing (2018)
792 pages
Practical Performance Profiling
No ratings yet
Practical Performance Profiling
562 pages
Computers As Components: Principles of Embedded Computing System Design
No ratings yet
Computers As Components: Principles of Embedded Computing System Design
9 pages
Dsi 142
100% (1)
Dsi 142
19 pages
Design Compiler J-2014.09 TrainingUpdate
No ratings yet
Design Compiler J-2014.09 TrainingUpdate
103 pages
Serviceflyer 1
No ratings yet
Serviceflyer 1
8 pages
Denis Bakhvalov - Performance Analysis and Tuning On Modern CPUs
No ratings yet
Denis Bakhvalov - Performance Analysis and Tuning On Modern CPUs
175 pages
Ebook A Product Managers Handbook For Mobile App Optimization
No ratings yet
Ebook A Product Managers Handbook For Mobile App Optimization
28 pages
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
No ratings yet
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
102 pages
KHNC Fox Tips Tricks
No ratings yet
KHNC Fox Tips Tricks
43 pages
Manual QuartusII
No ratings yet
Manual QuartusII
73 pages
Pynq Classification
No ratings yet
Pynq Classification
65 pages
CD Unit 4 Compiler Design Jntuk r20
No ratings yet
CD Unit 4 Compiler Design Jntuk r20
17 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
Chapter 4
No ratings yet
Chapter 4
52 pages
Optimizing Stored Procedure Performance: Kimberly L. Tripp Solid Quality Learning
No ratings yet
Optimizing Stored Procedure Performance: Kimberly L. Tripp Solid Quality Learning
38 pages
FPGA Implementation of MIMO System Using Xilinx System Generator For Efficient Hardware/ Software Co-Design
No ratings yet
FPGA Implementation of MIMO System Using Xilinx System Generator For Efficient Hardware/ Software Co-Design
9 pages
CSE 219 Computer Science III: Code Profiling
No ratings yet
CSE 219 Computer Science III: Code Profiling
25 pages
A Survey of FPGA Based Accelerators For
No ratings yet
A Survey of FPGA Based Accelerators For
32 pages
Cafpga: An Automatic Generation Model For CNN Accelerator
No ratings yet
Cafpga: An Automatic Generation Model For CNN Accelerator
30 pages
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
No ratings yet
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
31 pages
Unit - Iv Run Time Storage Organization
No ratings yet
Unit - Iv Run Time Storage Organization
15 pages
Acados - A Modular Open-Source Framework For Fast Embedded Optimal Control
No ratings yet
Acados - A Modular Open-Source Framework For Fast Embedded Optimal Control
37 pages
Research On Opencl Optimization For Fpga Deep Learning Application
No ratings yet
Research On Opencl Optimization For Fpga Deep Learning Application
19 pages
2019 Ics Mcdanel Zhang Kung Dong
No ratings yet
2019 Ics Mcdanel Zhang Kung Dong
12 pages
An Efficient Hardware Accelerator For Structured Sparse Convolutional Neural Networks On Fpgas
No ratings yet
An Efficient Hardware Accelerator For Structured Sparse Convolutional Neural Networks On Fpgas
12 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
Electronics 08 00065
No ratings yet
Electronics 08 00065
19 pages
SSO Feature (Sap)
No ratings yet
SSO Feature (Sap)
8 pages
A Reconfigurable CNN-Based Accelerator Design For Fast and
No ratings yet
A Reconfigurable CNN-Based Accelerator Design For Fast and
20 pages
Efficient FPGA Implementation For Object
No ratings yet
Efficient FPGA Implementation For Object
21 pages
10 3390@electronics8030295
No ratings yet
10 3390@electronics8030295
15 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
No ratings yet
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
18 pages
Big O
No ratings yet
Big O
20 pages
BNN in FPGA
No ratings yet
BNN in FPGA
15 pages
FP BNN On FPGA
No ratings yet
FP BNN On FPGA
15 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
Electronics 13 01564 v2
No ratings yet
Electronics 13 01564 v2
18 pages
Applsci 15 00688 v3
No ratings yet
Applsci 15 00688 v3
21 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
Optimizing FPGA-based Accelerator Design For Deep Convolutional Neural Networks
No ratings yet
Optimizing FPGA-based Accelerator Design For Deep Convolutional Neural Networks
10 pages
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
No ratings yet
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
8 pages
Data and Hardware Efficient Design For Convolutional Neural Network!
No ratings yet
Data and Hardware Efficient Design For Convolutional Neural Network!
10 pages
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
No ratings yet
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
7 pages
QtestBASE Product Brochure en
No ratings yet
QtestBASE Product Brochure en
10 pages
Gartner Cost Optimization Decision Framework It Eu
No ratings yet
Gartner Cost Optimization Decision Framework It Eu
6 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
FPGA Implementation of Convolutional Neural Networ PDF
No ratings yet
FPGA Implementation of Convolutional Neural Networ PDF
10 pages
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
No ratings yet
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
7 pages
FPGA Convolution Network Acceleration
No ratings yet
FPGA Convolution Network Acceleration
9 pages
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
No ratings yet
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
5 pages
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
No ratings yet
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
8 pages
286 1006 1 PB
No ratings yet
286 1006 1 PB
8 pages
A Scalable FPGA Based Accelerator For Tiny-YOLO-V2
No ratings yet
A Scalable FPGA Based Accelerator For Tiny-YOLO-V2
9 pages
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
No ratings yet
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
9 pages
A CNN Accelerator On FPGA With A Flexible Structure
No ratings yet
A CNN Accelerator On FPGA With A Flexible Structure
6 pages
1 s2.0 S1877050922005701 Main
No ratings yet
1 s2.0 S1877050922005701 Main
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Salesforce Optimization Report - V1
No ratings yet
Salesforce Optimization Report - V1
4 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
No ratings yet
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
5 pages
10 1109vdat50263 2020 9190274
No ratings yet
10 1109vdat50263 2020 9190274
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
No ratings yet
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
5 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
Cao 2019
No ratings yet
Cao 2019
5 pages
Hardware Implementation of Neural Networks
No ratings yet
Hardware Implementation of Neural Networks
5 pages
7-Research On FPGA High-Performance Implementation Method of CNN
No ratings yet
7-Research On FPGA High-Performance Implementation Method of CNN
5 pages
Research On FPGA Based Convolutional Neural Network Acceleration Method
No ratings yet
Research On FPGA Based Convolutional Neural Network Acceleration Method
4 pages
2025 Applied Computing Unit 3 and 4 SDSoftware Tools Functions Outcome Specific Requirements
No ratings yet
2025 Applied Computing Unit 3 and 4 SDSoftware Tools Functions Outcome Specific Requirements
19 pages
FPT2017 PipeCNN
No ratings yet
FPT2017 PipeCNN
4 pages
Readme
No ratings yet
Readme
6 pages
An Efficient Hardware Accelerator For Block Sparse Convolutional Neural Networks On FPGA
No ratings yet
An Efficient Hardware Accelerator For Block Sparse Convolutional Neural Networks On FPGA
4 pages
Irmak2021energy Efficient
No ratings yet
Irmak2021energy Efficient
4 pages
10 1109@mwscas48704 2020 9184436
No ratings yet
10 1109@mwscas48704 2020 9184436
4 pages
Design and Implementation of An NoC-Based Convolution Architecture With GEMM and Systolic Arrays
No ratings yet
Design and Implementation of An NoC-Based Convolution Architecture With GEMM and Systolic Arrays
4 pages
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
No ratings yet
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
4 pages
Eco Lines: Optimising Hull Lines - To Maximise Fuel Efficiency
No ratings yet
Eco Lines: Optimising Hull Lines - To Maximise Fuel Efficiency
2 pages
Machine Dependent and Machine Independent Optimizations, Local and Global Optimizations, Code Generation
No ratings yet
Machine Dependent and Machine Independent Optimizations, Local and Global Optimizations, Code Generation
2 pages
Need For A Specific Indian Code - Standard For Optimization of Telecom Structures - Seshendra Kumar, C. Eng (I), M.I.E. and Jason T. Kahrs, P.E.
No ratings yet
Need For A Specific Indian Code - Standard For Optimization of Telecom Structures - Seshendra Kumar, C. Eng (I), M.I.E. and Jason T. Kahrs, P.E.
5 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Karan Chawda 355867 202410141103 Resume
No ratings yet
Karan Chawda 355867 202410141103 Resume
1 page
PM Chi Zhang
No ratings yet
PM Chi Zhang
1 page
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet

Rongshi 2019

Uploaded by

Rongshi 2019

Uploaded by

2019 3rd International Conference on Circuits, System and Simulation

Accelerator Implementation of Lenet-5 Convolution Neural Network Based on

Dai Rongshi Tang Yongming

978-1-7281-3657-8/19/$31.00 ©2019 IEEE 64

Figure 1. Lenet-5 structure

You might also like