0% found this document useful (0 votes)

30 views5 pages

An Efficient Reconfigurable Hardware Accelerator For CNN

This document presents a reconfigurable hardware accelerator for Convolutional Neural Networks (CNN) that enhances computational efficiency by utilizing basic processing elements and a Benes network for layer configuration. The proposed architecture is network-agnostic, allowing it to support various CNN models, and demonstrates a significant performance improvement over existing architectures, particularly for AlexNet. The design achieves a reduction in execution clock cycles, making it suitable for mobile devices and data centers.

Uploaded by

Chandra shekar chandu3333 Chandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views5 pages

An Efficient Reconfigurable Hardware Accelerator For CNN

Uploaded by

Chandra shekar chandu3333 Chandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

An Efficient Reconfigurable Hardware Accelerator

for Convolutional Neural Networks

Anaam Ansari†∗ , Kiran Gunnam‡∗ , Tokunbo Ogunfunmi§∗
† [email protected],‡ [email protected],§ [email protected]
,
∗ Department of Electrical Engineering
Santa Clara University, Santa Clara, California 95053

Abstract—Convolutional Neural Networks (CNN) have proven B. CNN Accelerator

to be very effective in image and speech recognition. The
increasing usage of such applications in mobile devices and Every convolutional layer within the network requires a
data centers have led the researchers to explore application large amount of computation to be carried out. There are
specific hardware accelerators for CNN. However, most of these several challenges in budgeting the resources available on an
approaches are limited to a specific network such as Alexnet. We FPGA and power limitations that come with it. Thus, CNN
propose a reconfigurable technique that can be extended to be a
network-agnostic architecture that supports various networks.
accelerators are designed to iteratively perform the function
The technique described in this paper aims at developing a of each layer.
reconfigurable accelerator that uses basic processing elements
(PE) as building blocks of its computational engine. In our design, II. R ELATED W ORK
we control the configuration of each layer using a switching
control logic and a Benes network. In addition to potentially There are many implementations that adopt some degree of
supporting all the various CNN architectures, our computation reconfigurability in their design. The architecture proposed in
engine design has a 94% improvement in the convolutional layer [3], features a reconfigurable 2-dimensional grid of processing
execution time for AlexNet compared to the state-of-the art
architecture that only supports AlexNet. elements using on-chip memory to perform secondary opera-
Index Terms—Convolutional Neural Network, reconfigurable tions. Each processing elements in the computation engine has
accelerator, computational engine, efficient accelerator, Benes independent off-chip memory. We see a different approach in
Network, AlexNet,processing element, switching control logic, [4], where the authors limit the external memory accesses by
network-agnostic having a memory hierarchy. They use local scratchpads and
global buffers to do so in an energy efficient manner. They
I. I NTRODUCTION achieve energy efficiency as a result of reducing external data
Convolutional Neural Networks are modelled after an optic accesses. The design in [5] uses a variable sized convolu-
nerve, emulate the behavior of an animal’s visual cortex. tional layer processor to distribute computational resources to
Convolutional Neural Networks are a type of multilayer neural compute each layer. In [6],the authors determine the optimal
network that are trained with a back propagation algorithm. implementation parameters for each convolutional layer of
Images have the property of spatial correlation which is ex- Alexnet [7]. FPGA optimization techniques such as loop
ploited by convolution neural networks for image recognition unrolling, loop tiling and transformation have been employed
and classification. Owing to this correlation, CNNs are able to push the accelerator to achieve efficiency. The analysis done
to shed the full connectivity of a regular neural network and by the authors in [6] delineates that variable loop dimensions
be locally connected instead [1]. Thus, convolutional neural warrants different implementation variants. The work done in
networks are best suited for applications involving image and [6], explores the communication to computation ratio, and
video processing. the computation roofline space for each layer to determine
the computational performance with optimal unroll factors
A. CNN complexity < Tm , Tn >, where Tm and Tn are the tile size of the outputs
A convolutional layer is the core building block of a and inputs of the computation engine respectively as seen in
CNN. It performs dot products using its processing elements Table I These unroll factors are variable in nature. Variable
(mostly termed a neuron) that have adaptable weights and implementations parameters are very difficult to implement
biases[2]. The convolution operation essentially performs dot in hardware. Their solution tries to circumvent this challenge
products between the filters and local regions of the input. The by choosing uniform unroll factors which are < 64, 7 >.
forward movement of a CNN involves filtering the input which Using a uniform implementation parameter to design the
produces a 2-dimensional activation map. This activation map computation engine, causes a degradation in performance. In
is a response of the filters that are locally focused and moved our design, we address this degradation by choosing variable
across the input payload. As the forward movement of the implementation parameters that can be customized for each
layers execute, the filters adaptively update their weights in layer of CNN and simulate the HDL design. Since, our design
order to better detect geometric patterns and visual features. is reconfigurable, it can be extended to any CNN such as

978-1-5386-1823-3/17/$31.00 ©2017 IEEE 1337 Asilomar 2017

GoogleNet or Microsoft’s ResidualNet. However, in this paper • The requisite network of processing elements will be
we focus on AlexNet. managed by a switching control logic.
Table I: Optimal unroll factors for Alexnet [6] 1) Processing Element: A processing element will be the
denominational unit in our computational engine.The design
Layer Tm Tn of our processing element is hierarchical in nature.
1 48 3
2 20 24 The atomic unit of our design is P E 2 which is described
3 96 5 in Figure 2. It has 4 inputs and one output and an enable input.
4 95 5 The function of the P E 2 block is to perform dot products
5 32 15
uniform unroll factor 64 7
with the inputs in[1] and in[2] and adaptable filter weights
w[1] and w[2]. The unit is activated only if the enable pin is
set to high. We will use P E 2 blocks to construct P E 4 and
III. E XTENDED S UMMARY P E 8 blocks.
Our design makes use of the variable implementation pa-
rameters needed to implement each layer of the CNN. We in-
tend to transform the fixed computational engine into an array
of reconfigurable processing elements (P E). The processing
elements in the array that are required to execute a layer will
be activated with the help of a Benes network switch and
some switching control logic. This results in reducing the cost
of operation in terms of execution clock-cycles and makes for
a faster computational engine than the one implemented with
uniform parameters. Thus, convolutional layers will execute
with a lower number of execution clock-cycles than they would
for a fixed unroll factor for all layers.
A. Architecture Overview
Figure 1 describes our proposed architecure. The novel
features of our architecure are as follows: Figure 2: Building block processing element P E 2

The P E 4 block is made up of three P E 2 blocks. This

is described in Figure 3. It has 6 input and 3 outputs. These
can also be used as individual outputs of the constituent P E 2
block. The P E 4 block has 4 enable inputs that are distributed
among the composite P E 2 blocks and the complete unit. We
have the choice to use the block as a whole or individual P E 2
blocks.
We use 2 P E 4 blocks and a P E 2 blocks to construct
the P E 8 block or in other words 7 P E 2 blocks make
the P E 8 unit. We treat P E 8 as the universal processing
element module of our design. It has 14 inputs and 7 outputs
as described in Figure 4. The seven outputs are the outputs of
the 7 composite P E 2 blocks that make up the P E 8 block.
There are 10 enable inputs to this block. They are distributed
among the 7 P E 2 blocks on it. These enable lines help us
invoke the individual subblock of P E 8 in thier individual
capacity as P E 4, P E 2 or as a complete unit, P E 8.
Figure 1: Architecture Overview The design of the new processing element will facilitate the
use of its sub-blocks selectively.
• The computational engine will be an array of P E units. 2) Benes Switch: We use a Benes network[8, 9] to manage
This will act as a bank of processing elements, ready to the input to the reconfigurable computational engine as shown
be used for the execution of a convolutional layer. These in Figure 5. A Benes network which receives N inputs has
processing elements are held together with a switching 2log2 (N ) − 1 stages of interconnected switches. There are
layer. This switching layer will facilitate the interconnec- exactly N/2 number of 2 × 2 crossbar switches per stage as
tions in the P E engine. shown in Figure 5b. Every switch in the Benes network has
• The input to the computational engine will be managed binary functions based on the control bit b: (i) The input is
by a Benes network. translated to the output as it was received if the bit was b. (ii)

1338
(a) 2 × 2 crossbar

(b) N × N Benes network

Figure 5: Benes Network

accepts inputs Tm and Tn and outputs control bits for the

Switching Layer that controls the P E engine and the input to
the Benes network.

Figure 3: Building block processing element P E 4

Figure 6: Control Signals for Switching layer and Processing

Engine

IV. FPGA AND H ARDWARE IMPLEMENTATION

A. Layer 1
Layer 1 operates with unroll parameters < 48, 3 > where
Tn is 3 which is the input unroll factor and Tm is 48 which
is the output unroll factor. For this operation, we let 3 inputs
and weights at a time into the Benes network. For this layer,
consider one P E 8, from which we use the composite two
P E 4 units to process the inputs and weights. Thus, each
P E 8 will produce 2 outputs as shown in Figure 7. As a
result, we will require 24 P E 8 units in stage 1 and none
in stage 2. The PE engine simulation with this configuration
completes up in 21680 clock cycles.

Figure 4: Building block processing element P E 8 B. Layer 2

The unroll factors for this layer are < 20, 24 >. For this
layer, we need a Benes network to send 24 pixels worth of
and the two inputs are swapped and translated to the output information. We will need 3 complete P E 8 blocks along
if the bit received was b0 as shown in Figure 5a [8]. In our with P E 4 to compute a result as seen in Figure 8. The
design we use a 32 × 32 Benes network to parse a certain PE simulation completes in 2711 clock cycles. We require
number of pixels of data from the 256 × 256 sized image. It 60 P E 8 units in stage and 1 and 10 P E 8 in stage 2.
provides a fast solution with a minimum critical path delay.
3) Switching Control Logic: The primary role of the C. Layer 3
switching control is to select the processing elements, that Layer 3 has unroll factors < 96, 5 >. Since Tn is 5 we
maybe needed to carry out the computation of a convolutional need to send 5 pixel worth of information at a time. The
layer. It can invoke the P E 8, P E 4 or P E 2 processing arrangement would require a P E 4 from the P E 8 in stage 1
elements in the best combination that may be required to and then a P E 2 from the next stage P E 8 unit to complete
execute a convolutional layer on our computation engine. It the operation. Thus, we will require 48 P E 8 units in stage

1339
Figure 7: Configuration Setup for Layer 1
Figure 9: Configuration Setup for Layer 3 and Layer 4

used for layer 4. Thus the number of clockcycles to execute

this layer are the same as Layer 3 which is 13006 clock cycles.

E. Layer 5
Layer 5 has unroll factors < 32, 15 >. We need to send
15 pixels worth of information at a time. The arrangement
requires that we use two P E 8 at the stage 1 and P E 2 in
stage 2 as seen in Figure 10. Thus, we would require 64 P E 8
units in stage 1 and 16 P E 8 in stage 2.

V. S IMULATION R ESULTS
According our configuration setup we determined the num-
ber of P E 8 units that will be required for each stage. It is
given in the Table II.

Table II: Processing elements (P E 8) required for each state

Layer Stage 1 Stage 2
1 24 0
2 60 10
3 48 24
4 48 24
5 64 16
Figure 8: Configuration Setup for Layer 2 design choice 64 24

The total number of clock cycles our design takes to parse

1 and 24 P E 8 units in stage 2 as seen in Figure 9. The PE and compute the entire input payload of an image of size
simulation completes in 13006 clock cycles. 256 × 256 respectively are given in Table III.
Thus, the total time taken for this design is 54490 clock
D. Layer 4
cycles. As we can see, our design does better than the design
Layer 4 is similar to Layer 3 where in the unroll factors are with uniform unroll factor < 64, 7 > which takes 1008246
< 95, 5 >. The configuration for layer 3 in Figure 9 can be clock cycles [6]. Thus the P E engine provides us with a

1340
R EFERENCES
[1] Y. Lecun et al. “Gradient-based learning applied to doc-
ument recognition”. In: Proceedings of the IEEE 86.11
(Nov. 1998), pp. 2278–2324. ISSN: 0018-9219. DOI: 10.
1109/5.726791.
[2] A Karpathy. “Convolutional Neural Net-
works(CNNs/ConvNets)”. In: (2016), p. 2016. URL:
http://cs231n.github.io/.
[3] Srihari Cadambi et al. “A Programmable Parallel Accel-
erator for Learning and Classification”. In: Proceedings
of the 19th International Conference on Parallel Archi-
tectures and Compilation Techniques. PACT ’10. Vienna,
Austria: ACM, 2010, pp. 273–284. ISBN: 978-1-4503-
0178-7. DOI: 10 . 1145 / 1854273 . 1854309. URL: http :
//doi.acm.org/10.1145/1854273.1854309.
[4] Y. H. Chen, J. Emer, and V. Sze. “Eyeriss: A Spatial
Architecture for Energy-Efficient Dataflow for Convolu-
tional Neural Networks”. In: 2016 ACM/IEEE 43rd An-
nual International Symposium on Computer Architecture
(ISCA). June 2016, pp. 367–379. DOI: 10.1109/ISCA.
2016.40.
[5] Yongming Shen, Michael Ferdman, and Peter Milder.
“Maximizing CNN Accelerator Efficiency Through Re-
Figure 10: Configuration Setup for Layer 5 source Partitioning”. In: CoRR abs/1607.00064 (2016).
arXiv: 1607 . 00064. URL: http : / / arxiv. org / abs / 1607 .
Table III: PE simulation clock cycles
00064.
Layer Cycles for PE engine [6] Chen Zhang et al. “Optimizing FPGA-based Accelerator
1 21682
2 2724
Design for Deep Convolutional Neural Networks”. In:
3 13008 Proceedings of the 2015 ACM/SIGDA International Sym-
4 13008 posium on Field-Programmable Gate Arrays. FPGA ’15.
5 4068 Monterey, California, USA: ACM, 2015, pp. 161–170.
total 54490
ISBN : 978-1-4503-3315-3. DOI : 10 . 1145 / 2684746 .
2689060. URL: http : / / doi . acm . org / 10 . 1145 / 2684746 .
94% improvement.The Fmax of our design is 100M Hz where 2689060.
Fmax is the maximum frequency our design can have. [7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-
ton. “ImageNet Classification with Deep Convolutional
VI. C ONCLUSION Neural Networks”. In: Advances in Neural Information
Our work proposes a hardware accelerator architecture Processing Systems 25. Ed. by F. Pereira et al. Curran
design having a reconfigurable computational engine. The Associates, Inc., 2012, pp. 1097–1105. URL: http : / /
reconfigurable computational engine aims at achieving a fast papers.nips.cc/paper/4824-imagenet-classification-with-
performance in terms of the minimum number of execution deep-convolutional-neural-networks.pdf.
clock-cycles needed to execute the CNN. The design of our [8] D. Nassimi and S. Sahni. “A Self-Routing Benes Net-
architecture is truly run-time reconfigurable and can poten- work and Parallel Permutation Algorithms”. In: IEEE
tially support multiple advanced CNNs such as AlexNets, Transactions on Computers C-30.5 (May 1981), pp. 332–
GoogleNet and Microsoft’s ResidualNet. 340. ISSN: 0018-9340. DOI: 10.1109/TC.1981.1675791.
[9] K. K. Gunnam et al. “VLSI Architectures for Layered
VII. F UTURE W ORK Decoding for Irregular LDPC Codes of WiMax”. In:
We need to look at additional techniques like pipelining that 2007 IEEE International Conference on Communica-
maybe compatible with our design. In the future, we need to tions. June 2007, pp. 4542–4547. DOI: 10 . 1109 / ICC .
implement this design on an FPGA platform and benchmark it 2007.750.
against a GPU implementation. We also need to translate this
design to develop a potentially network agnostic architecture.
This would require to design a P E engine to accomodate
layers of all known dimensions of all existing networks.

1341

Software Multiples Normalizing at 6x NTM Revenue
No ratings yet
Software Multiples Normalizing at 6x NTM Revenue
16 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
No ratings yet
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
18 pages
Cafpga: An Automatic Generation Model For CNN Accelerator
No ratings yet
Cafpga: An Automatic Generation Model For CNN Accelerator
30 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
FPGA Convolution Network Acceleration
No ratings yet
FPGA Convolution Network Acceleration
9 pages
A Survey of FPGA Based Accelerators For
No ratings yet
A Survey of FPGA Based Accelerators For
32 pages
A New Hardware-Efficient VLSI-Architecture of GoogLeNet CNN-Model Based Hardware Accelerator For Edge Computing Applications
No ratings yet
A New Hardware-Efficient VLSI-Architecture of GoogLeNet CNN-Model Based Hardware Accelerator For Edge Computing Applications
4 pages
Irmak2021energy Efficient
No ratings yet
Irmak2021energy Efficient
4 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
286 1006 1 PB
No ratings yet
286 1006 1 PB
8 pages
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
No ratings yet
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
102 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
10 1109@mwscas48704 2020 9184436
No ratings yet
10 1109@mwscas48704 2020 9184436
4 pages
10.1109 fpl53798.2021.00061
No ratings yet
10.1109 fpl53798.2021.00061
6 pages
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
No ratings yet
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
5 pages
Data and Hardware Efficient Design For Convolutional Neural Network!
No ratings yet
Data and Hardware Efficient Design For Convolutional Neural Network!
10 pages
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
No ratings yet
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
31 pages
Article Report Final
No ratings yet
Article Report Final
9 pages
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
No ratings yet
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
11 pages
FPT2017 PipeCNN
No ratings yet
FPT2017 PipeCNN
4 pages
An Analog Neural Network Processor Programmable Topology: Boser, Jane D
No ratings yet
An Analog Neural Network Processor Programmable Topology: Boser, Jane D
9 pages
A Convolutional Neural Network Accelerator Architecture
No ratings yet
A Convolutional Neural Network Accelerator Architecture
5 pages
Rongshi 2019
No ratings yet
Rongshi 2019
4 pages
PE Implementation Paper
No ratings yet
PE Implementation Paper
2 pages
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
No ratings yet
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
13 pages
Electronics 08 00065
No ratings yet
Electronics 08 00065
19 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
No ratings yet
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
9 pages
Tesi
No ratings yet
Tesi
73 pages
Lecture02 - High-Level Digital Design Automation
No ratings yet
Lecture02 - High-Level Digital Design Automation
34 pages
PM Chi Zhang
No ratings yet
PM Chi Zhang
1 page
Kernel Slides
No ratings yet
Kernel Slides
33 pages
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
No ratings yet
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
8 pages
Customizable Computing
No ratings yet
Customizable Computing
120 pages
Mhamdan Publication
No ratings yet
Mhamdan Publication
7 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
10 1109vdat50263 2020 9190274
No ratings yet
10 1109vdat50263 2020 9190274
6 pages
Design and Implementation of Convolutional Neural Network Accelerator Based On RISCV
No ratings yet
Design and Implementation of Convolutional Neural Network Accelerator Based On RISCV
6 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
MAC
No ratings yet
MAC
5 pages
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
No ratings yet
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
4 pages
Fixed-Point CNN For FPGA
No ratings yet
Fixed-Point CNN For FPGA
7 pages
7-Research On FPGA High-Performance Implementation Method of CNN
No ratings yet
7-Research On FPGA High-Performance Implementation Method of CNN
5 pages
Performance Modeling For CNN Inference Accelerators On FPGA
No ratings yet
Performance Modeling For CNN Inference Accelerators On FPGA
14 pages
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
No ratings yet
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
8 pages
Design and Implementation of An NoC-Based Convolution Architecture With GEMM and Systolic Arrays
No ratings yet
Design and Implementation of An NoC-Based Convolution Architecture With GEMM and Systolic Arrays
4 pages
Research On Opencl Optimization For Fpga Deep Learning Application
No ratings yet
Research On Opencl Optimization For Fpga Deep Learning Application
19 pages
An Efficient Hardware Accelerator For Block Sparse Convolutional Neural Networks On FPGA
No ratings yet
An Efficient Hardware Accelerator For Block Sparse Convolutional Neural Networks On FPGA
4 pages
Emerging Field of NoC-Based Convolution Architecture With Systolic Arrays
No ratings yet
Emerging Field of NoC-Based Convolution Architecture With Systolic Arrays
6 pages
Fully Convolutional
No ratings yet
Fully Convolutional
4 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
An Efficient Hardware Accelerator For Structured Sparse Convolutional Neural Networks On Fpgas
No ratings yet
An Efficient Hardware Accelerator For Structured Sparse Convolutional Neural Networks On Fpgas
12 pages
A CNN Accelerator On FPGA With A Flexible Structure
No ratings yet
A CNN Accelerator On FPGA With A Flexible Structure
6 pages
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Daikin VRV 5 Tai Lieu Huong Dan Lap Dat Va Van Hanh 3
No ratings yet
Daikin VRV 5 Tai Lieu Huong Dan Lap Dat Va Van Hanh 3
64 pages
Mini Jolly Dali 20 Manual
No ratings yet
Mini Jolly Dali 20 Manual
6 pages
Halo Lighting Incandescent Downlighting Catalog 1985
No ratings yet
Halo Lighting Incandescent Downlighting Catalog 1985
44 pages
Review Module 29 - Engineering Mechanics 2 Part 1
No ratings yet
Review Module 29 - Engineering Mechanics 2 Part 1
2 pages
Bill June24
No ratings yet
Bill June24
1 page
Task Card 5 - Confidence Intervals
No ratings yet
Task Card 5 - Confidence Intervals
3 pages
Landslide Cameron Highland
No ratings yet
Landslide Cameron Highland
14 pages
Local Link Portlaoise To Roscrea Bus Timetable
No ratings yet
Local Link Portlaoise To Roscrea Bus Timetable
2 pages
PQ PDF
No ratings yet
PQ PDF
74 pages
Oxford Exam Excellence Recording 26
No ratings yet
Oxford Exam Excellence Recording 26
1 page
Path Bursary Application Form
No ratings yet
Path Bursary Application Form
2 pages
Incongruities: This Comes From A Difference Between What A Product/service Actually Is and What
No ratings yet
Incongruities: This Comes From A Difference Between What A Product/service Actually Is and What
2 pages
Palm 11
No ratings yet
Palm 11
8 pages
Steven R Dobbs: Marketing and Communications Manager
No ratings yet
Steven R Dobbs: Marketing and Communications Manager
3 pages
Silt Measuring Instrument
No ratings yet
Silt Measuring Instrument
6 pages
On Line Audit 2
No ratings yet
On Line Audit 2
2 pages
Chp2-Binary Numbers and Codes (15.1.09)
No ratings yet
Chp2-Binary Numbers and Codes (15.1.09)
16 pages
Information About Netbook Axioo Neon CNW
0% (1)
Information About Netbook Axioo Neon CNW
16 pages
04 Present Scenarios Opportunities and Obstacles of E Business in Bangladesh - Doc Final
No ratings yet
04 Present Scenarios Opportunities and Obstacles of E Business in Bangladesh - Doc Final
22 pages
MEROX
No ratings yet
MEROX
8 pages
BTEUP Exam Dates/ Schedule 2015
No ratings yet
BTEUP Exam Dates/ Schedule 2015
175 pages
Acct Statement XX0471 11012025
No ratings yet
Acct Statement XX0471 11012025
5 pages
Beyond The Breaking Point The Hidden Cost of Student Stress
No ratings yet
Beyond The Breaking Point The Hidden Cost of Student Stress
3 pages
Sample Whistle Blower Policy
No ratings yet
Sample Whistle Blower Policy
2 pages
КМиОЗ ПП ENG
No ratings yet
КМиОЗ ПП ENG
35 pages
FAT For PLC PDF Programmable Logic Controller
No ratings yet
FAT For PLC PDF Programmable Logic Controller
1 page
Commercial Banks: Sector Update
No ratings yet
Commercial Banks: Sector Update
19 pages
Shrey Choubey: Career Objective Skills
No ratings yet
Shrey Choubey: Career Objective Skills
2 pages
Leave and License Agreement: LICENSOR" (Which Expression Shall Unless It Be Repugnant To The Context or Meaning
No ratings yet
Leave and License Agreement: LICENSOR" (Which Expression Shall Unless It Be Repugnant To The Context or Meaning
7 pages

An Efficient Reconfigurable Hardware Accelerator For CNN

Uploaded by

An Efficient Reconfigurable Hardware Accelerator For CNN

Uploaded by

An Efficient Reconfigurable Hardware Accelerator

for Convolutional Neural Networks

Abstract—Convolutional Neural Networks (CNN) have proven B. CNN Accelerator

978-1-5386-1823-3/17/$31.00 ©2017 IEEE 1337 Asilomar 2017

The P E 4 block is made up of three P E 2 blocks. This

(b) N × N Benes network

Figure 5: Benes Network

accepts inputs Tm and Tn and outputs control bits for the

Figure 3: Building block processing element P E 4

Figure 6: Control Signals for Switching layer and Processing

IV. FPGA AND H ARDWARE IMPLEMENTATION

Figure 4: Building block processing element P E 8 B. Layer 2

used for layer 4. Thus the number of clockcycles to execute

Table II: Processing elements (P E 8) required for each state

The total number of clock cycles our design takes to parse

You might also like