Fully Convolutional

Uploaded by

REAL Gyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views4 pages

Fully Convolutional

Uploaded by

REAL Gyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2017 First New Generation of CAS

A Convolutional Neural Network Fully

Implemented on FPGA for Embedded Platforms
Marco Bettoni∗ , Gianvito Urgese∗ , Yuki Kobayashi† , Enrico Macii∗ , and Andrea Acquaviva∗
∗ Politecnico di Torino, Torino, Italy, 0039 011 090 7042. Email: [email protected]
† NEC Corporation, Kawasaki, Japan. Email: [email protected]

Abstract—Convolutional Neural Networks (CNNs) allow fast The chosen model defines how to perform the training and
and precise image recognition. Nowadays this capability is highly the computation involved during the test, specifying parame-
requested in the embedded system domain for video processing ters and functions to be used for CNN recognition. The CNN
applications such as video surveillance and homeland security.
Moreover, with the increasing requirement of portable and flow is summarized as follows:
ubiquitous processing, power consumption is a key issue to be • Convolution: The matricial convolution operator is applied
accounted for. over the feature maps of the Input image, such as the RGB
In this paper, we present an FPGA implementation of CNN color channels. The computation is shown in Equation 1,
designed for addressing portability and power efficiency. Perfor- where O is the number of Output feature maps of size H ×
mance characterization results show that the proposed imple- W , I is the number of Input feature maps, and K × K is
mentation is as efficient as a general purpose 16-core CPU, and the size of the Kernel, which is the convolution operand
almost 15 times faster than a SoC GPU for mobile application. obtained from the training.
Moreover, external memory footprint is reduced by 84% with
respect to a standard CNN software application. Out[O][H][W ] = ΣIi=0 ΣK K
kh=0 Σkw=0
In[i][H + kh][W + kw] × Kernel[O][i][kh][kw] (1)
I. I NTRODUCTION
• Activation: A threshold function which is applied on the
In this paper, we propose the design of a hardware architec- convolution output. The ReLU (x) = max(0, x) function
ture implementing a customizable Convolutional Neural Net- is widely adopted, but others are common as well, such as
work (CNN) framework where several CNN schemas can be Tanh and Sigmoid.
configured and executed. We analyzed the CNN computational
flow for identifying the most critical points to be parallelized in • Pooling: The average or maximum value over an input re-
the FPGA implementation. We described the CNN framework gion is evaluated, generating a resized image representative
of the pool values. Equation 2 shows the Pooling by Average
architecture using a High Level Synthesis (HLS) language and operation, where P h × P w is the pooling window size.
tested the new HW-CNN module on an Altera Stratix V FPGA
embedded in a Terasic DE-5-Net board. Out[O][H][W ] = (ΣP h Pw
ph=0 Σpw=0

The CNN algorithm performs fast and precise image recog- In[O][H + ph][W + pw])/P h/P w (2)
nition, which is a highly requested feature in the context of • Fully-Connected (FC): Implemented at the end of a CNN,
embedded systems. The biggest involvement of this type of the FC layers provides the classification of the features
algorithms can be found in the Artificial Intelligence field, extracted by convolution. The FC layers are implemented
bringing contribution to numerous applications, such as in as in Equation 3:
fire detection in forests [1], robotics [2], autonomous driving Out[O] = ΣIi=0 In[i] × Kernel[O][i] (3)
[3] and mobile applications [4]. In this latter domain, battery
lifetime and memory resources are a serious concern for CNN
implementations.
Several CNN models are available in the literature for
general purpose applications. In 2012, Alex-Net Model [5]
has been the first efficient application of the CNN and lately
more accurate models have been proposed, such as GoogLe-
Net [6] featuring the Inception concept and Fast R-CNN [7]
with advanced capabilities for detecting the position of the
subject in the picture.
For teaching the CNN to recognize defined objects, the
network needs to be Trained. During the Training Phase, a set
of labeled images is used for generating the set of parameters
to be applied in the neural network. By means of the Test
Phase, the capability of the network to identify and classify Fig. 1: Convolution process representation. The magnifications are
the pictures is evaluated. representative of the CNN edge-detection.

978-1-5090-6447-2/17 $31.00 © 2017 IEEE 25

49
DOI 10.1109/NGCAS.2017.16
The Alex-Net model is used as a reference in this work,
since it performs an accurate recognition (84.7% Top-5 ac-
curacy) and it is generally used as a benchmark for CNN
implementations. This model includes 5 Convolutional Layers
followed by 3 FC Layers, and makes use of both Activation
and Pooling, requiring in total nearly 1.5 billions operations.
Figure 1 shows an example of the Alex-Net model used
to set-up our HW-CNN architecture running on the FPGA.
The RGB channels of an image are provided as inputs and
processed by the following 8 layers. The intermediate images
are shown, highlighting the edge-detection capability of the
CNN.
A common approach for CNN acceleration exploits GPU Fig. 2: CNN Pipeline. The data-flow pass through the DL, CV, AT,
cards, able to perform recognition over several hundreds of PL, UL Units, all controlled by the CU. A double buffering system
images per second [8]. An alternative state-of-art approach is implemented as PP buffers, allowing concurrent operation on the
leverages on FPGA for matricial convolution acceleration picture tiles.
while the other computation steps are performed on a general
purpose CPU [9].
In this paper we proposed a CNN fully implemented component described in pseudo-C to an RTL format.
in FPGA, which executes Convolution, Activation, Pool- We implemented a parallel version of a general and cus-
ing and FC layers. More specifically, the proposed so- tomizable CNN architecture. For this purpose, we used two
lution named HW-CNN has the following characteristics: parallelization techniques: the Tiling Technique designed by
• Standalone Implementation, (no GPU, no CPU); Zang et al. [9] and the Pipelining Technique commonly
implemented for data stream elaborations.
• Power efficient CNN computation;
The Tiling Technique has been exploited to overcome the
• Low FPGA resources and memory dependency; data dependency of the CNN calculation, which, due to the
• Software reprogramming for CNN model compliance. massive recurrence of the data, does not allow to fit the FPGA
To characterize the proposed implementation, we performed internal memory. The input image is therefore fragmented
comparative performance and power evaluations against a in tiles smaller than the original picture, and the CNN can
software version running on a general purpose CPU and a performed by computing each tile individually. The biggest
mobile SoC GPU. Memory utilization results are also reported. advantage of performing the tiling technique on an FPGA is
Overall, the results show that the proposed implementation the significant parallelization degree achievable by computing
is power efficient and lend itself to be adopted in mobile several tiles efficiently on the hardware logic. In our HW-
application with stringent power and resource requirements. CNN implementation, up to 8 tiles of size 32 × 32 pixels are
The rest of the paper is organized as follows: a description computed in parallel.
of the internal implementation (Section II), the performance The Pipeline is composed of 5 units: Download (DL),
and obtained results (Section III) and finally the conclusion Convolution (CV), Activation (AT), Pooling (PL) and Upload
(Section IV). (UL), which perform the homonyms functions. The scheduling
of the function is managed by the Control Unit (CU), which
II. I MPLEMENTATION generates the parameters for each unit depending on the CNN-
The developed architecture can be configured for execute all Model size and structure.
the steps of a CNN model defined by users. Thus, it can be This configuration is shown in Figure 2. The computation
entrusted for computing recognition of a picture or a video- flow passes through all the units from DL to UL, where the
stream. The HW-CNN allows great compatibility with any extremes are dedicated to the data transfer with the on-board
CNN model because the CNN parameters can be reconfigured RAM. In order to avoid the data hazard existing between
in software. For computing a classification, the input image two consequent unit, a double buffering [11] system has
is passed from an external communication interface (LAN or been adopted, implementing the Ping-Pong Technique already
Serial connection). Then, the HW-CNN compute all the CNN exploited in [9]. The buffer duplication prevents the units to
steps generating a list of recognition decision that is sent back read uncompleted data, or to update data which has not been
to the host. The recognition is completely performed on the elaborated yet. In the successive pipeline stage, the CU unit
FPGA, which requires a DDR-RAM to store the raw input swaps the data by means of control logic depicted in the Ping
image and the intermediate results. Pong (PP) Buffer of Figure 2.

A. FPGA Implementation B. Convolution Unit

We developed the HW-CNN implementation using the NEC The core of the CNN computation is the CV Unit, where
CyberWorkBench HLS compiler (CWB) [10], exporting the both Convolution and Fully-Connected layers are computed.

50
26
III. R ESULTS AND D ISCUSSIONS
We conﬁgured the HW-CNN with the Alex-Net model structure
and parameters used in the Cong work [9]. The following eval-
uation has been compared with a software implementation, since
the FPGA-based implementation [9] found in literature reports only
performances the CNN steps computed on FPGA and not considering
all the other steps executed on the host. Thus making impractical a
direct comparison between the two architectures. We tested the per-
formances by comparing different CNN software implementations:
an optimized C code designed by Zhang et al. [9], the Caffe Python
Library working in CPU-only mode (without GPU parallelization)
[12] and, for the GPU mobile comparison, the clBLAS OpenCL
library which has been evaluated by Lokhmotov et al. [13].

A. Timing and Power Results

For the timing performances, we considered the time required to
Fig. 3: Convolution Unit (CV) schema. The iI matrices on the left compute an image recognition. The time required by our HW-CNN is
side are the Input tiles, the oO matrices in the right side are the reported in Figure 5a where it is compared with the Caffe execution
Output tiles and the Ko,i are the Kernel matrices. over an Intel Xeon CPU E5-2630 v3 @ 2.40GHz 32-CPU and with
the clBLAS library tested on an ARM Mali-T628 GPU.
For the power comparison, the Performance per Watt unit has
been used, a value obtained by the Equation 5 where the number
The convolution operator requires to perform a considerable of operations executed by the device is considered, altogether with
amount of MAC operations, which is proportional to the the Thermal Device Power in Watt.
Neural Network size and the image resolution. The HW-CNN Operations
Performance per Watt = (5)
implementation optimizes this computation by parallelizing 24 T ime × P ower
times the MAC operator. The aim of this comparison is to give an idea of the different
In Figure 3 is represented the CV schema of the HW- timing performances, considering the discrepancies in terms
CNN implementation, where it is possible to notice the MAC of hardware and level of portability. We designed the HW-
parallelization of I Input tiles and O outputs. CNN module with the clear intent to reduce at minimum the
The CV Unit has been efficiently optimized by an internal hardware requirements, while relaxing the timing constraint to
scheduling which avoids idleness among the CV components. a reasonable value, still bearable for the mobile user. On the
In fact, for the MAC computation, data must be acquired by the other side, the FPGA adoption brings a speed-up by almost
input buffer, and after the operation, stored in the output buffer. 15 times over the SoC GPU.
The process has been internally pipelined for guaranteeing that Table I reports the CNN power efficiencies of the considered
the address calculation, MAC execution, buffer reading and implementations. The general purpose CPU values have been
writing were performed in a single clock cycle. This internal extracted from [9], where the results has been obtained by
pipelining has been efficiently coded by means of the CWB executing an optimized CNN code on an Intel Xeon CPU E5-
automatic pipelining feature, and proved on the real hardware 2430 (@2.20GHz).
with the scheduling shown in Figure 4. This comparison shows that running the Alex-Net Model
The CV operation is repeated for each pixel that compose on the FPGA using our HW-CNN architecture is as power
the Output tile. The loop logic has been hard-coded for the efficient as the CPU implementation running on 16 threads
Twidth × Theight dimension of the Output tile. Eventually, and almost 3 times more efficient than the software exe-
Equation 4 computes the number of clock cycles necessary cution without parallelization. The comparison with mobile
to the CV unit to complete a CNN stage, which depends on
the kernel size K ×K, the Output tile size and the CV internal
pipeline latency Tpl .
CVtime = Tpl + K 2 × Twidth × Theight (4) (a) Time Performances (b) Memory Reduction

Fig. 5: In 5a are compared the recognition times over a CPU, on the

FPGA implementation and a SoC GPU, highlighting the performance
Fig. 4: Convolution Unit internal pipeline. cost for the sake of portability. 5b compares the RAM memory
requirements, where the FPGA makes efﬁcient use of on-chip data
compared with the other software implementations.

51
27
TABLE I: Power performances. GPU in both terms of timing (15×) and power (16×). The
OP/s Power Perf. per Watt proposed implementation is even 3 times more power efficient
Device [GOP/s] [W] [GOP/s/W] with respect to the reference CPU, and equivalently efficient
GPU Mobile OpenCL [13] 0.02 3 0.007 to the same CPU parallelized 16 times. Moreover, the require-
CPU single thread [9] 3.54 95 0.037
CPU 16-threads [9] 12.87 95 0.135
ment of an external memory has been reduced by 83%, when
FPGA HW-CNN [This work] 0.75 5.54 0.135 compared to the software version of CNN.
Finally, this architecture has been designed to allow software
reconfiguration, which allows the user to apply various CNN
hardware shows that the current mobile implementation of model and to efficiently test the same picture against different
CNN are outperformed by the FPGA by more than a 19× trained data and recognized classes.
factor. This demonstrates the effectiveness of the low-power
ACKNOWLEDGMENTS
FPGA computation, and the possibility to adopt similar CNN
implementation for mobile applications where battery life is The HLS compiler and the technical support was provided
the major constrain. by NEC Corporation, Japan.
R EFERENCES
B. Resource usage
[1] Qingjie Zhang et al. “Deep Convolutional Neural Networks for Forest
The percentage of required FPGA resources is reported Fire Detection”. In: 2016 International Forum on Management, Edu-
in Table II. The synthesis report shows that few resources cation and Information Technology Application. Atlantis Press. 2016.
[2] Lei Tai and Ming Liu. “Deep-learning in Mobile Robotics-from
have been implemented, allowing the architecture to fit more Perception to Control Systems: A Survey on Why and Why not”. In:
compact FPGAs, such as the Xilinx Zynq. The most required arXiv preprint arXiv:1612.07139 (2016).
is the on-chip memory, which has been exploited for the main [3] Mariusz Bojarski et al. “End to end learning for self-driving cars”. In:
arXiv preprint arXiv:1604.07316 (2016).
purpose of caching the intermediate CNN results on FPGA. [4] Ryosuke Tanno and Keiji Yanai. “Caffe2C: A Framework for Easy
Implementation of CNN-based Mobile Applications”. In: Adjunct
TABLE II: FPGA Resources. Proceedings of the 13th International Conference on Mobile and
Resource Stratix V - FPGA Chip Usage Ubiquitous Systems: Computing Networking and Services. ACM. 2016,
Logic 65,463 ALMs (28%) pp. 159–164.
Register 3.5kB (3%) [5] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet
DSP 104 blocks (41%) classification with deep convolutional neural networks”. In: Advances
Memory 4,752kB (73%) in neural information processing systems. 2012, pp. 1097–1105.
[6] Christian Szegedy et al. “Going deeper with convolutions”. In: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern
The memory necessary on the on-board DDR RAM is Recognition. 2015, pp. 1–9.
greatly reduced, as it is shown in Figure 5b. The chart reports [7] Shaoqing Ren et al. “Faster r-cnn: Towards real-time object detection
with region proposal networks”. In: Advances in neural information
the comparison of RAM memory requirements for performing processing systems. 2015, pp. 91–99.
the Alex-Net model on both FPGA and software. The values [8] NVIDIA. GPU-Based Deep Learning Inference. URL: https://www.
are motivated by the fact that during the CNN execution, the nvidia . com / content / tegra / embedded - systems / pdf / jetson tx1
whitepaper.pdf (visited on 01/18/2017).
intermediate results are stored in the PP buffers along the [9] Chen Zhang et al. “Optimizing fpga-based accelerator design for
pipeline, rather than transferred to the RAM memory. deep convolutional neural networks”. In: Proceedings of the 2015
The reduced amount of external Memory and FPGA re- ACM/SIGDA International Symposium on Field-Programmable Gate
Arrays. ACM. 2015, pp. 161–170.
sources are significant figures for encourage the adoption of [10] Kazutoshi Wakabayashi. “CyberWorkBench: Integrated design envi-
HW-CNN-like architectures in mobility applications. While ronment based on C-based behavior synthesis and verification”. In:
this allows the implementation to be placed on smaller FPGA VLSI Design, Automation and Test, 2005.(VLSI-TSA-DAT). 2005 IEEE
VLSI-TSA International Symposium on. IEEE. 2005, pp. 173–176.
or silicon component, the architecture can be easily adapted [11] Wikipedia. Multiple buffering. URL: https : / / en . wikipedia . org / wiki /
to fit the smallest FPGA devices, exploiting the versatility of Multiple buffering (visited on 02/28/2017).
the HLS synthesis coupled with the modular programming [12] Evan Shelhamer Yangqing Jia. Caffe - Deep learning framework by the
BVLC. URL: http://caffe.berkeleyvision.org (visited on 01/20/2017).
technique. [13] Anton Lokhmotov and Grigori Fursin. “Optimizing convolutional neu-
ral networks on embedded platforms with OpenCL”. In: Proceedings
IV. C ONCLUSION of the 4th International Workshop on OpenCL. ACM. 2016, p. 10.
In this paper we propose a Convolutional Neural Network
(CNN) fully implemented in FPGA, that enables image recog-
nition in low-power embedded systems with limited resources.
This features have been made feasible by extending state-
of-the-art implementations where only the convolution step
is accelerated on FPGA, with the modular addition of extra
functionalities. This modularity guarantees compliance with
existing CNN models, but also the possibility to easily intro-
duce new functionalities.
The experiments show that our HW-CNN can quickly per-
form image recognitions, outperforming the reference SoC

52
28

Energy-Efficient CNN Hardware Design
No ratings yet
Energy-Efficient CNN Hardware Design
72 pages
RM Merged Files
No ratings yet
RM Merged Files
207 pages
100 Free Fonts PDF
100% (1)
100 Free Fonts PDF
61 pages
Content Server Instalation Guide For UNIX (Install - Cserver - Unix - 10en)
No ratings yet
Content Server Instalation Guide For UNIX (Install - Cserver - Unix - 10en)
146 pages
Advantages and Limitations of Fully On-Chip CNN
No ratings yet
Advantages and Limitations of Fully On-Chip CNN
5 pages
MaxTune Intelligent Servo Drive
No ratings yet
MaxTune Intelligent Servo Drive
414 pages
A CNN Accelerator On FPGA Using Depthwise
No ratings yet
A CNN Accelerator On FPGA Using Depthwise
5 pages
Pynq Classification
No ratings yet
Pynq Classification
65 pages
Sensors 23 02045
No ratings yet
Sensors 23 02045
16 pages
A Survey of FPGA Based Accelerators For
No ratings yet
A Survey of FPGA Based Accelerators For
32 pages
Fin Irjmets1684902949
No ratings yet
Fin Irjmets1684902949
6 pages
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
No ratings yet
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
7 pages
Convolutional Neural Networks For Image Processing: An Application in Robot Vision
No ratings yet
Convolutional Neural Networks For Image Processing: An Application in Robot Vision
14 pages
PM Chi Zhang
No ratings yet
PM Chi Zhang
1 page
A Scalable FPGA Based Accelerator For Tiny-YOLO-V2
No ratings yet
A Scalable FPGA Based Accelerator For Tiny-YOLO-V2
9 pages
GISخارطة الطريق لتعلم PDF
No ratings yet
GISخارطة الطريق لتعلم PDF
28 pages
Report23 24
No ratings yet
Report23 24
55 pages
Efficient CNN Accelerator On FPGA
No ratings yet
Efficient CNN Accelerator On FPGA
9 pages
7-Research On FPGA High-Performance Implementation Method of CNN
No ratings yet
7-Research On FPGA High-Performance Implementation Method of CNN
5 pages
Image Skin Cancer Classification Based On FPGA and Convolutional Neural Network
No ratings yet
Image Skin Cancer Classification Based On FPGA and Convolutional Neural Network
7 pages
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
No ratings yet
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
9 pages
NullHop A Flexible Convolutional Neural Network Accelerator Based On Sparse Representations of Feature Maps
No ratings yet
NullHop A Flexible Convolutional Neural Network Accelerator Based On Sparse Representations of Feature Maps
13 pages
Sensors 19 00350
No ratings yet
Sensors 19 00350
14 pages
Data and Hardware Efficient Design For Convolutional Neural Network!
No ratings yet
Data and Hardware Efficient Design For Convolutional Neural Network!
10 pages
System Development Methods
No ratings yet
System Development Methods
68 pages
Shravya Banala
No ratings yet
Shravya Banala
29 pages
Electronics: FPGA Implementation For CNN-Based Optical Remote Sensing Object Detection
No ratings yet
Electronics: FPGA Implementation For CNN-Based Optical Remote Sensing Object Detection
24 pages
Kanoria Shubham Anil 2023HT01569
No ratings yet
Kanoria Shubham Anil 2023HT01569
9 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
Cafpga: An Automatic Generation Model For CNN Accelerator
No ratings yet
Cafpga: An Automatic Generation Model For CNN Accelerator
30 pages
Emerging Field of NoC-Based Convolution Architecture With Systolic Arrays
No ratings yet
Emerging Field of NoC-Based Convolution Architecture With Systolic Arrays
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
2017.01.jssc - Eyeriss Design
No ratings yet
2017.01.jssc - Eyeriss Design
12 pages
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
No ratings yet
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
5 pages
FPT2017 PipeCNN
No ratings yet
FPT2017 PipeCNN
4 pages
3098 15835 1 PB 2011 PDF
No ratings yet
3098 15835 1 PB 2011 PDF
6 pages
10 1109vdat50263 2020 9190274
No ratings yet
10 1109vdat50263 2020 9190274
6 pages
TPV Ug
No ratings yet
TPV Ug
263 pages
Electronics 08 00065
No ratings yet
Electronics 08 00065
19 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
FPGA Convolution Network Acceleration
No ratings yet
FPGA Convolution Network Acceleration
9 pages
A CNN Accelerator On FPGA With A Flexible Structure
No ratings yet
A CNN Accelerator On FPGA With A Flexible Structure
6 pages
Advancements in Image Classification Using Convolutional Neural Network
No ratings yet
Advancements in Image Classification Using Convolutional Neural Network
8 pages
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
No ratings yet
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
4 pages
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
No ratings yet
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
18 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
Java IO Correcto
No ratings yet
Java IO Correcto
249 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
Cell Phone Virus and Security: A Technical Seminar ON
No ratings yet
Cell Phone Virus and Security: A Technical Seminar ON
5 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
Cao 2019
No ratings yet
Cao 2019
5 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Irmak2021energy Efficient
No ratings yet
Irmak2021energy Efficient
4 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
Research On FPGA Based Convolutional Neural Network Acceleration Method
No ratings yet
Research On FPGA Based Convolutional Neural Network Acceleration Method
4 pages
Aws Cost Optimization
No ratings yet
Aws Cost Optimization
33 pages
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
Building A Home Network: Kent Reuber Reuber@stanford - Edu
No ratings yet
Building A Home Network: Kent Reuber Reuber@stanford - Edu
33 pages
Collaborative Assignment
No ratings yet
Collaborative Assignment
1 page
Rongshi 2019
No ratings yet
Rongshi 2019
4 pages
Fixed-Point CNN For FPGA
No ratings yet
Fixed-Point CNN For FPGA
7 pages
JTECCNN
No ratings yet
JTECCNN
6 pages
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
No ratings yet
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
5 pages
10 1109@mwscas48704 2020 9184436
No ratings yet
10 1109@mwscas48704 2020 9184436
4 pages
Pressntation of Computer
No ratings yet
Pressntation of Computer
11 pages
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
No ratings yet
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
8 pages
Timing Analysis in Physical Design
100% (2)
Timing Analysis in Physical Design
32 pages
Guddu Jha - Organized
No ratings yet
Guddu Jha - Organized
3 pages
SPC570S40E3
No ratings yet
SPC570S40E3
68 pages
Siprotec PRP HSR v1 Profile
No ratings yet
Siprotec PRP HSR v1 Profile
2 pages
Ex2 - Worldwide Emission
No ratings yet
Ex2 - Worldwide Emission
21 pages
Implementation of CNN On Zynq Based FPGA For Real-Time Object Detection
No ratings yet
Implementation of CNN On Zynq Based FPGA For Real-Time Object Detection
7 pages
Itecompsysl Activity 1 Report
No ratings yet
Itecompsysl Activity 1 Report
8 pages
Frequently Asked Questions and Issues Related To The "Digitalsystemsvm" Virtual Machine
No ratings yet
Frequently Asked Questions and Issues Related To The "Digitalsystemsvm" Virtual Machine
9 pages
Boa
No ratings yet
Boa
26 pages
Data Sheet: PCX8582X-2 Family
No ratings yet
Data Sheet: PCX8582X-2 Family
20 pages
Sudhir Kanjibhai Patel: Curriculum Vitae
No ratings yet
Sudhir Kanjibhai Patel: Curriculum Vitae
8 pages
Broadcast and Communication, Qualitron 2013, Eng
No ratings yet
Broadcast and Communication, Qualitron 2013, Eng
13 pages
Lecture 1 Second Course
No ratings yet
Lecture 1 Second Course
16 pages
Discoverer 10.1.2 For R12
No ratings yet
Discoverer 10.1.2 For R12
18 pages
ES8388 Control Register - ES8388 Register
No ratings yet
ES8388 Control Register - ES8388 Register
1 page
Emily Collins: Objective
No ratings yet
Emily Collins: Objective
1 page
Connect: Cloudera and Informatica Unleash The Power of Hadoop
No ratings yet
Connect: Cloudera and Informatica Unleash The Power of Hadoop
2 pages
NKB1000
No ratings yet
NKB1000
1 page
HLDS Double-Layer DVD Drive Specs
No ratings yet
HLDS Double-Layer DVD Drive Specs
5 pages
PoE NVR Kits PDF
No ratings yet
PoE NVR Kits PDF
1 page
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet

Fully Convolutional

Uploaded by

Fully Convolutional

Uploaded by

2017 First New Generation of CAS

A Convolutional Neural Network Fully

978-1-5090-6447-2/17 $31.00 © 2017 IEEE 25

A. FPGA Implementation B. Convolution Unit

A. Timing and Power Results

Fig. 5: In 5a are compared the recognition times over a CPU, on the

You might also like