0% found this document useful (0 votes)

182 views4 pages

High Performance FPGA Based CNN Accelerator

Over the years, convolutional neural networks have been used in different applications, due to their ability to perform tasks using a reduced number of parameters compared to other in-depth learning methods

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views4 pages

High Performance FPGA Based CNN Accelerator

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Volume 6, Issue 3, March – 2021 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

High Performance FPGA Based CNN Accelerator

Pratiksha B. Dange Dr. S.L. Haridas
Department of Electronics & Telecommunication Engineering Department of Electronics & Telecommunication Engineering
J D College of Engineering & Management J D College of Engineering & Management
Nagpur, India Nagpur, India

Abstract:- Over the years, convolutional neural networks address the expansion of this "big data" the answer is found in
have been used in different applications, due to their Artificial Intelligence (AI). We can define it as an AI software
ability to perform tasks using a reduced number of or hardware application that thinks and solves problems as a
parameters compared to other in-depth learning methods. human being can. Problems ranging from language translation
However, the use of power and memory constraints, to internal image separation to understand and understand
which are often marginal and portable, often conflict with different faces and people. What we have found so far is small
the requirements of accuracy and latency. For these AI, which uses some algorithms and techniques to solve some
reasons, commercial commercial accelerators have specific problems.
become popular and their design is built on the tendencies
of the overall convolutional network models. However, the Since neural networks are naturally similar, they can
layout of the gate-mounted gateway represents an draw a significant amount in comparison to FPGAs (Field
attractive view because it offers the opportunity to use a Programmable Gate Arrays). Performance in FPGAs has been
hardware design designed for a particular model of a shown to have significantly lower power consumption per
convolutional network, with promising results in terms of function than equivalent in GPU (Graphical Processing Unit),
latency and power consumption. In this article, we which is a requirement for embedded systems. However,
propose a complete accelerator for chip-programmable implementation is no small feat because the development of
gate array hardware for convolutional neural network FPGAs is often done in hardware that describes the hardware,
partition, designed for a keyword recognition system. e.g. VHDL.

Keywords:- CNN, Accelerator, FPGA.

I. INTRODUCTION

As communication systems evolve and power levels

increase, the spectrum is pushed up to higher waves to deal
with the bulk of the information. With the introduction of 5G
mobile technology, these assumptions are thought to be as
high as between 3 and 27 GHz [10] which will go far beyond
the standard radar spectra, especially with the K-band radar.
With this comes the need to improve spectrum sensitivity and
signal identification algorithms that allow sensors and radios Figure 1 Humans and AI
to detect and identify spectrum users and participants. These
algorithms have traditionally been the result of hand-crafted Within the AI field, we find Machine Learning (ML),
masterpieces. With the recent practice of using a machine to which consists of using a large set of data and a number of
learn to process powerful results, neural networks have been partition algorithms to change the standard method we are
shown to do well in the problem of radio signal recognition. accustomed to making a program. With our standard
The culture of neural networks, and especially deep neural programming method, we create many algorithms that are
networks, has been applied to graphics processing units complex, but we are well versed in each of them. The basic
(GPUs). Today's GPUs are very powerful and have many idea of creating a division is to get big data and simply
similar computer features that are well suited for deep perform tasks to understand which of the data we need and
applications. Unfortunately GPUs work poorly with power to thus improve hands-on results and get a system that without
make them unsuitable for power-limited applications. Radar writing the whole algorithm, is able to make decisions based
works with a large amount of data and requires high on available data. Some of these categories follow
throughput and low latency, if the radar is installed in an mathematical methods we know, such as straight lines,
object without external power resources, it should also use as polynomial functions, mathematical functions, and so on.
much power as possible. For this reason, it may be necessary Some of these are very good at predicting a particular type of
to use these algorithms on customized hardware to meet these behavior that is very difficult to record in an algorithm such as
requirements. Field-based gate layout (FPGAs) has a good guessing the price of a house based on a history series.
balance between cost, energy efficiency and computational Further, as an ML branch, we find a specific data learning
resources that make it a good fit for this application. In recent process, known as Deep learning (DL).
years we have had a huge data boom in many fields. To

IJISRT21MAR422 www.ijisrt.com 1013

Volume 6, Issue 3, March – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
purposes. Nevertheless, since they were designed for the
implementation of generic CNNs, their architectures are
extremely flexible at the expense of the optimization of the
single model.

For such a reason, hardware accelerators customized for

a specific application might offer an interesting alternative for
accelerating CNNs. In particular, field-programmable gate
arrays (FPGAs) represent an interesting trade-off between
cost, flexibility, and performances especially for applications
whose architectures have been changing too rapidly to rely on
application-specific integrated circuits (ASICs) and whose
production volumes might be not sufficient. FPGAs offer high
flexibility at the same time, which permits the implementation
of different models with a high degree of parallelism and the
Figure 1.2: AI world possibility of customizing the architecture for a specific
application.
The development of DL occurred in a similar manner to
a study mainly of neural networks. It is characterized by II. LITERATURE SURVEY
efforts to create a learning model with multiple levels of
automation, where deeper levels take into account the effects A well-executed block of floating point-point (BFP)
of previous levels, transforming them and further amusing. was adopted in our accelerator to determine the functional
This intuition at the reading level gives a name to the whole tendency of deep neural networks in this paper. Feature maps
field and is inspired by how the brain of a mammal processes and model parameters are represented in 16-bit and 8-bit
information and learns, responding to external factors. formats, respectively, for off-chip memory, which can reduce
memory and band-band requirements by 50% and 75%
compared to 32-bit FP colleague. The proposed 8-bit BFP
figure with optimized advantages and flexibility of
performance-based schemes improves energy efficiency and
triple hardware. The FPGA-based CNN accelerator is
distributed on the Xilinx VC709 test board [1].

The functional design of the hardware is introduced

based on FFT performance due to the radix-2 frequency
decimation algorithm (R2DIF) and a distributed method that
allows data to be shared efficiently by keeping registry
changes. A well-rounded method / design uses a rotating
computer algorithm for converting connections (m-CORDIC)
and Radix-2r according to a coding system to replace
complex multiplication such as FFT. The m-CORDIC
Figure 1.3: Mammal brain and Convolutional process algorithm improves computer integration, while Radix-2r
allows logarithmic reduction of adder steps. Suggested design
Over the years, convolutional neural networks (CNNs) does not require large memory blocks used to retain a feature
have found application in many different fields such as object like twiddle [2].
detection [1, 2], object recognition [3, 4], depending on
memory and energy consumption, which often contradicts the CNN accelerator based on FPGA. The most effective
requirements of delay and accuracy. In particular, with accelerator function is designed to build a flexible neural
standard purpose-based solutions based on the use of a network and memory optimization with the use of low-cost
microcontroller, limited available memory limits network resources. The results show performance gains and power
complexity, with a potential impact on t accuracy.[7] compared to the Core i5 CPU and GTX 960 GPU [3].

In the same way, microcontroller based systems feature The hardware model is designed for CNN's advanced
the worst trade-off between power consumption and timing step-by-step use of hardware definition language, including
performances [8]. For this reason, commercial hardware CNN's computer architecture, multi-layer use, weight loading
accelerators for CNNs such as Neural Compute Stick (NCS) system, and data interference [4].
[9], Neural Compute Stick 2 (NCS2) [9], and Google Coral
[10] were produced. Such products feature optimized The possibility that existing existing low-power register
hardware architectures that allow to realize inferences of CNN (RTL) strategies could serve as a low-power design scheme
models with low latency and reduced power consumption. to accelerate an CNN-based object recognition system in
Standard communication protocols, such as Universal Serial contrast to conventional strategies. Many of the most
Bus (USB) 3.0., are generally exploited for communication effective design strategies for CNN acceleration focus on

IJISRT21MAR422 www.ijisrt.com 1014

Volume 6, Issue 3, March – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
High-level Synthesis (HLS) features, such as memory shrinking transistor costs. Moore observed that the cost of
bandwidth usage, network architecture, data reuse, and batch transistors depends on two factors. One is the density of
editing [5] transistors that can be crammed in onto a single chip, and the
second is the yield of fabrication. To maintain Moore’s law,
The CNN accelerator on the Xilinx ZYNQ 7100 two factors are critical:
hardware platform accelerates both standard resolution and 1) transiztor size - the smaller the better.
depth of cleverly divided convolution. Taking the design of 2) wafer size - the larger the better since more chips can be
the MobileNet + SSD network as a speed, the accelerator produced from a fixed number of processing steps.
simulated the measurement of the entire computer network
under the ZYNQ 7100 roof model. chip on the chip using the VI. DESIGN ARCHETECTURE
data streaming interface and set the ping-pong buffer mode.
[6] Accelerator architectures for neural nets There exists
two major architectures of hardware accelerators for neural
III. MOTIVATION networks, single computation unit accelerators or streaming
accelerators. Single computation unit (SCU) accelerators
A major problem with the implementation of the CNN- have a similar construction to a RISC CPU that executes
based model in the FPGA regarding the limitations in terms of instructions with a fixed datapath. Instead of an ALU or FPU,
hardware resources (combinations, sequences, Digital Signal the SCU accelerator has a dedicated matrix multiplier tailored
Processors (DSPs), ram blocks, etc.) of such devices. CNN for big matricies or a systolic array of computation elements.
algorithms are based on Multiply-and-Accumulate (MAC) When a network is to be accelerated on an SCU accelerator,
operations that require a large number of logical objects or instructions are generated for that specific network. The
DSPs. In addition, CNNs are characterized by a large number accelerator can then execute these instructions from memory.
of parameters such as resource requirements, number of work These types of accelerators are very flexible since the only
per second (Gops), Density Efficiency (DE), time required to network specific data that has to be stored are the instructions
create layers of CONVs, FC and Softmax, and Power and the parameters. This enables networks with different
Efficiency (PEff). For these reasons, the hardware speed of the architectures to be executed on the same accelerator. These
hardware was carefully designed taking into account the trade accelerators suit systems that execute several different
between the measurement period and the available resources. networks since the accelerator can be shared between the
tasks and the overhead to execute a new network is not that
IV. PROBLEM STATEMENT big. Even though SCU accelerators are often fixed for all
types of networks, they can be tailored to specific networks
Hardware accelerator performing convolution function, with regard to the width of datapath and size of matrix
the most critical function of ConvNet. To give an idea of the multiplier/systolic array to better match the layers sizes in the
computer load of ConvNet, in the AlexNet model, for network and yield a higher resource utilization. This semi
example, 90% of the processing time is spent on convolution tailoring of the SCU accelerators tend to reach higher
tasks. Moreover, the complexity of this network is strongly performance on CNN’s with a uniform structure. This is
linked to their depth and one of the major problems is that because the utilization of the shared compute unit increases if
this type of work requires a lot of memory. We will try to use the kernel sizes between layers are similar. It is also true if
the structures that perform this function in a structured way, the size ratio between different layers is a power of two.
and to use strategies to reduce the amount of access to When the deployment of neural networks onto these types of
external DRAM linked to FPGA during the calculation of accelerator are automated, the automation framework
convolution. The purpose of both is to exploit as much as. generates the instructions and quantizes the weights as
needed
V. HARDWARE ACCELERATORS

A hardware accelerator is a specialized hardware unit

that performs a set of tasks with higher performance or better
energy efficiency than a general-purpose CPU. Example of
common accelerators are GPUs, digital signal processors
(DSPs) and fixed-function application specific integrated
circuits (ASICs) like video decoders [11]. To understand why
accelerators have become so important, the history of the
semiconductor industry has to be taken into account. The
semiconductor industry has historically been driven by two
scaling laws: Moore’s Law and Dennard scaling. It is these
two scaling trends that have made CMOS technology so
popular in the computing industry. Moore’s law states that Figure 3: Basic structure of a single compute unit accelerator
the number of transistors that can economically be fit onto an
integrated circuit doubles every two years. It is more than just Accelerators with the streaming architecture always
shrinking transistors to yield better integration capabilities, it tailor the hardware with respect to the target network. The
is fundamentally a cost-scaling law. Moore was interested in layers are often directly implemented in hardware and its

IJISRT21MAR422 www.ijisrt.com 1015

Volume 6, Issue 3, March – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
possible to get a very high level of parallelism and utilization. [4]. Xiaofeng Chen1, Jingyu Ji1, Shaohui Mei1, Yifan
The intermediate results between layers can be stored in Zhang1, Manli Han2, Qian Du, “FPGA BASED
registers, memory or directly pipelined into the next layer. IMPLEMENTATION OF CONVOLUTIONAL
This architecture is better suited for smaller networks since a NEURAL NETWORK FOR HYPERSPECTRAL
direct mapping can consume a lot of resources. One way to CLASSIFICATION”, IGARSS ©2018 IEEE.
circumvent this resource constraint issue is by using a method [5]. Heekyung Kim and Ken Choi, “Low Power FPGA-SoC
called folding. With folding, one layer at a time is executed Design Techniques for CNN-based Object Detection
on the FPGA and the FPGA is reconfigured between each Accelerator”, ©2019 IEEE.
layer. Since the FPGA needs to be reconfigured between each [6]. Bing Liu , Danyin Zou, Lei Feng, Shou Feng, Ping Fu
layer, batches of data has to be executed to yield a sensible and Junbao Li, “ An FPGA-Based CNN Accelerator
throughput. Folding can generally yield a very high Integrating Depthwise Separable Convolution”,
throughput since the level of parallelism in each layer tend to Electronics 2019, 8, 281;
be high but the latency is often big since large batches of data doi:10.3390/electronics8030281
has to be executed. Figure 4 shows a block diagram of a [7]. Yuchi Tian et al. “Deep Test: Automated Testing of
simple streaming accelerator. Deep-neural network- driven Autonomous Cars”. In:
Proceedings of the 40th International Conference on
Software Engineering. ICSE ’18. 2018, pp. 303–314.
[8]. C. Zhang et al., “Optimizing fpga-based accelerator
design for deep convolutional neural networks,” in
Proc. ACM/SIGDA Int. Symp. Field-Programmable
Gate Arrays, Feb. 2015, pp. 161–170.
[9]. J. Qiu et al., “Going deeper with embedded fpga
platform for convolutional neural network,” in Proc.
ACM/SIGDA Int. Symp. Field-Programmable Gate
Figure 4: Basic structure of an streaming accelerator Arrays, pp. 26–35, 2016.
[10]. K. Guo et al., “Angel-Eye: A complete design flow for
VII. CONCLUSION mapping CNN onto embedded FPGA,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 37,
The design of cnn accelerator for improving no. 1, pp. 35–47, Jan. 2018.
performance of the system is very much important. The [11]. H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, and L. Wang,
design of accelerator which makes reduce of load over “A high performance FPGA-based accelerator for large-
CPU/GPU. The design improves the efficiency of a system. scale convolutional neural networks,” in Proc. 26th Int.
An hardware accelerator able to merge the demands in terms Conf. Field Program. Logic Appl., Aug./Sep. 2016, pp.
of speed and power through a careful analysis of the possible 1–9.
parallelization inside the CNN algorithm.

REFERENCES

[1]. Xiaocong Lian , Member, IEEE, Zhenyu Liu, Member,

IEEE, Zhourui Song, Jiwu Dai, Wei Zhou , Member,
IEEE, and Xiangyang Ji , Member, IEEE, “High-
Performance FPGA-Based CNN Accelerator With
Block-Floating-Point Arithmetic” ,IEEE
TRANSACTIONS ON VERY LARGE SCALE
INTEGRATION (VLSI) SYSTEMS, VOL. 27, NO. 8,
AUGUST 2019.
[2]. M. S. Kavitha1 · P. Rangarajan2, “An Efficient FPGA
Architecture for Reconfigurable FFT Processor
Incorporating an Integration of an Improved CORDIC
and Radix-2r Algorithm”, Circuits, Systems, and Signal
Processing. https://doi.org/10.1007/s00034-020-01436-
4
[3]. Sheping Zhai, Cheng Qiu, Yuanyuan Yang, Jing Li and
Yiming Cui,Sheping Zhai,, Cheng Qiu1, Yuanyuan
Yang1, Jing Li1 and Yiming Cui1, “Design of
Convolutional Neural Network Based on FPGA”.
CISAT 2018

IJISRT21MAR422 www.ijisrt.com 1016

FPGA Implementation of A Face Recognition System
No ratings yet
FPGA Implementation of A Face Recognition System
5 pages
Magnetic Bearing Analysis
100% (1)
Magnetic Bearing Analysis
72 pages
Logic Design Theory by NN Biswas
No ratings yet
Logic Design Theory by NN Biswas
3 pages
Customizable Embedded Processors PDF
No ratings yet
Customizable Embedded Processors PDF
527 pages
Image Processing With VHDL PDF
No ratings yet
Image Processing With VHDL PDF
131 pages
Embedded Systems Iare
100% (1)
Embedded Systems Iare
137 pages
A Reconfigurable CNN-Based Accelerator Design For Fast and Energy-Efficient Object Detection System On Mobile FPGA
No ratings yet
A Reconfigurable CNN-Based Accelerator Design For Fast and Energy-Efficient Object Detection System On Mobile FPGA
8 pages
Fpga Based Embedded System
No ratings yet
Fpga Based Embedded System
74 pages
The Future of Ferroelectric Field-Effect Transistor Technology
No ratings yet
The Future of Ferroelectric Field-Effect Transistor Technology
10 pages
Pearls Analysis of Saving and Credit Cooperatives of Jhapa
100% (1)
Pearls Analysis of Saving and Credit Cooperatives of Jhapa
55 pages
Fundamentals of Operating Systems-April-2024
No ratings yet
Fundamentals of Operating Systems-April-2024
450 pages
Quartus II Handbook Volume 2: Design Implementation and Optimization
No ratings yet
Quartus II Handbook Volume 2: Design Implementation and Optimization
321 pages
FPGA-based System For Heart Rate Monitoring PDF
No ratings yet
FPGA-based System For Heart Rate Monitoring PDF
12 pages
Image Processing Based Facial Emotion Recognition: A Project Report On
No ratings yet
Image Processing Based Facial Emotion Recognition: A Project Report On
39 pages
Unit 1 WSN
No ratings yet
Unit 1 WSN
139 pages
SILICA Xilinx Zynq ZedBoard Vivado Workshop Ver1.0
No ratings yet
SILICA Xilinx Zynq ZedBoard Vivado Workshop Ver1.0
61 pages
Master Thesis FPGA
No ratings yet
Master Thesis FPGA
144 pages
Image Processing Using Fpgas: Imaging
No ratings yet
Image Processing Using Fpgas: Imaging
4 pages
Field Programmable Gate Arrays PDF
No ratings yet
Field Programmable Gate Arrays PDF
2 pages
CMOS Mixed Signal Circuit Design
No ratings yet
CMOS Mixed Signal Circuit Design
261 pages
Fpga Adv WKB 62
No ratings yet
Fpga Adv WKB 62
638 pages
Design and Simulation of A PCI Express Based Embed
No ratings yet
Design and Simulation of A PCI Express Based Embed
7 pages
Reconfigurable Hardware Design Approach For Economic Neural Network
No ratings yet
Reconfigurable Hardware Design Approach For Economic Neural Network
5 pages
MTech Digital Image Processing Syllabus
No ratings yet
MTech Digital Image Processing Syllabus
57 pages
Vlsi Coursefile
No ratings yet
Vlsi Coursefile
124 pages
IOT Colour Based Product Sorting Machine Project With Real
No ratings yet
IOT Colour Based Product Sorting Machine Project With Real
20 pages
Model-Based Engineering of Embedded Systems - The SPES 2020 Methodology (PDFDrive)
No ratings yet
Model-Based Engineering of Embedded Systems - The SPES 2020 Methodology (PDFDrive)
297 pages
ASIC & FPGA Chip Design:: Mahdi Shabany
No ratings yet
ASIC & FPGA Chip Design:: Mahdi Shabany
102 pages
EC6703 Embedded and Real Time Systems
100% (2)
EC6703 Embedded and Real Time Systems
168 pages
18MT57-MITE - 16 Laboratory Manual - Format - MITE - 16 - VI - 18MTL57
No ratings yet
18MT57-MITE - 16 Laboratory Manual - Format - MITE - 16 - VI - 18MTL57
31 pages
RISC-VTF RISC-V Based Extended Instruction Set For Transformer
No ratings yet
RISC-VTF RISC-V Based Extended Instruction Set For Transformer
6 pages
VLSI Book
No ratings yet
VLSI Book
6 pages
Gps VHDL
0% (1)
Gps VHDL
69 pages
Bits F416 1808
No ratings yet
Bits F416 1808
4 pages
Embedded System LESSONPLAN
No ratings yet
Embedded System LESSONPLAN
7 pages
Klayout-0 21 16
No ratings yet
Klayout-0 21 16
511 pages
Ece5023 Memory-Design-And-testing TH 1.1 47 Ece5023
No ratings yet
Ece5023 Memory-Design-And-testing TH 1.1 47 Ece5023
2 pages
SIH 2024 PDF Format For Presentation
No ratings yet
SIH 2024 PDF Format For Presentation
4 pages
8051 Microcontroller
No ratings yet
8051 Microcontroller
60 pages
VLSI Project - Final
No ratings yet
VLSI Project - Final
13 pages
Anti Fuse
No ratings yet
Anti Fuse
2 pages
Nvidia Cuda Arc
No ratings yet
Nvidia Cuda Arc
16 pages
Block Diagram Reduction Techniques
No ratings yet
Block Diagram Reduction Techniques
47 pages
VLSI Lab Manual
No ratings yet
VLSI Lab Manual
68 pages
Ju Summer Training On Vlsi Design
No ratings yet
Ju Summer Training On Vlsi Design
1 page
B e Ece-2021
No ratings yet
B e Ece-2021
389 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
B20-ml Basedbotnet Attack in IoT Devices
No ratings yet
B20-ml Basedbotnet Attack in IoT Devices
66 pages
Assignment
No ratings yet
Assignment
29 pages
WP 01065 Do254 Support For Fpga Design Flows
No ratings yet
WP 01065 Do254 Support For Fpga Design Flows
5 pages
Data Communication Faq
No ratings yet
Data Communication Faq
4 pages
Project Report
No ratings yet
Project Report
24 pages
Nios 2
No ratings yet
Nios 2
57 pages
CLPD
No ratings yet
CLPD
2 pages
Traffic Signal Annunciator: Government College of Engineering, Jalgaon 425002
No ratings yet
Traffic Signal Annunciator: Government College of Engineering, Jalgaon 425002
32 pages
Dijsktra Thesis
No ratings yet
Dijsktra Thesis
65 pages
Rajasthan Technical University: Syllabus: M.Tech (Vlsi Design) 1Mvl1 Advanced Mathematics
No ratings yet
Rajasthan Technical University: Syllabus: M.Tech (Vlsi Design) 1Mvl1 Advanced Mathematics
6 pages
Data Converters
No ratings yet
Data Converters
37 pages
Mtech Vlsi
No ratings yet
Mtech Vlsi
15 pages
Beyond The Tests: How Portfolios Whisper of Equity and Engagement in Our Classrooms
100% (1)
Beyond The Tests: How Portfolios Whisper of Equity and Engagement in Our Classrooms
2 pages
FPG A
No ratings yet
FPG A
29 pages
FPGA Design Methodologies
No ratings yet
FPGA Design Methodologies
9 pages
Xilinx Spartan-3E
No ratings yet
Xilinx Spartan-3E
37 pages
DS FT2232H
No ratings yet
DS FT2232H
67 pages
Pain Management Strategies For Jugular Bulb Dehiscence Using Midazolam and Morphine in Resource-Limited Primary Emergency Services: A Case Report and Literature Review
No ratings yet
Pain Management Strategies For Jugular Bulb Dehiscence Using Midazolam and Morphine in Resource-Limited Primary Emergency Services: A Case Report and Literature Review
10 pages
Evaluating The Effectiveness of Lean Management in Agriculture: The Case of Nature's Gift Banana Farm, Lilongwe, Malawi
No ratings yet
Evaluating The Effectiveness of Lean Management in Agriculture: The Case of Nature's Gift Banana Farm, Lilongwe, Malawi
5 pages
Search For Binary Companions Around Millisecond Pulsars
No ratings yet
Search For Binary Companions Around Millisecond Pulsars
13 pages
Computer-Science UC Berkeley
No ratings yet
Computer-Science UC Berkeley
49 pages
Kareem W
No ratings yet
Kareem W
7 pages
Memory Practicum PDF
No ratings yet
Memory Practicum PDF
11 pages
University Libraries and The Use of Open Educational Resources (OERs) in Blended Learning (BL) : Effective Strategies From Nairobi County
No ratings yet
University Libraries and The Use of Open Educational Resources (OERs) in Blended Learning (BL) : Effective Strategies From Nairobi County
7 pages
Step by Step Tutorial On DSP Using Xilinx System Generator (NOT HOMEWORK)
No ratings yet
Step by Step Tutorial On DSP Using Xilinx System Generator (NOT HOMEWORK)
20 pages
Exploring The Association Between Attachment and Bullying Among Adolescents Through Bowlbian Perspective
No ratings yet
Exploring The Association Between Attachment and Bullying Among Adolescents Through Bowlbian Perspective
10 pages
Cell Design Issues: Reading
No ratings yet
Cell Design Issues: Reading
22 pages
Electrical Job Reports - Download Now
No ratings yet
Electrical Job Reports - Download Now
46 pages
Introduction To FPGA
No ratings yet
Introduction To FPGA
34 pages
Avnet Xilinx Product Selection Guide 2021 EN PDF
No ratings yet
Avnet Xilinx Product Selection Guide 2021 EN PDF
68 pages
Thesis of Humanoid Robot
No ratings yet
Thesis of Humanoid Robot
96 pages
Social Medias Influence On Modern Language and Communication Skills
No ratings yet
Social Medias Influence On Modern Language and Communication Skills
12 pages
Enhancing Cloud Security With Fuzzy Logic A Comprehensive Approach To Authentication, Data Recovery, and Privateness
No ratings yet
Enhancing Cloud Security With Fuzzy Logic A Comprehensive Approach To Authentication, Data Recovery, and Privateness
12 pages
Unpacking Financial Interventions Link To Student Academic Performance in Public Secondary Schools: A Nyamira County Level Analysis, Kenya
No ratings yet
Unpacking Financial Interventions Link To Student Academic Performance in Public Secondary Schools: A Nyamira County Level Analysis, Kenya
11 pages
Potential Wound Healing Activity of Citrus Micrantha Rut. (Biasong) Ethanolic Peel Extract On Excised Cutaneous Wounds in Male Albino Mice
No ratings yet
Potential Wound Healing Activity of Citrus Micrantha Rut. (Biasong) Ethanolic Peel Extract On Excised Cutaneous Wounds in Male Albino Mice
11 pages
ASIC Assignment 02
No ratings yet
ASIC Assignment 02
2 pages
Efficacy, Safety, and Feasibility of Verapamil in The Management of Atrial Fibrillation in Emergency Services With Limited Resources: A Systematic Review
No ratings yet
Efficacy, Safety, and Feasibility of Verapamil in The Management of Atrial Fibrillation in Emergency Services With Limited Resources: A Systematic Review
13 pages
Comparative Analysis of Different Clock Gating Techniques
No ratings yet
Comparative Analysis of Different Clock Gating Techniques
55 pages
Personal-Professional Attributes of Teachers and Learning Competence of Junior High School Students
No ratings yet
Personal-Professional Attributes of Teachers and Learning Competence of Junior High School Students
28 pages
Field Programmable Gate Array
No ratings yet
Field Programmable Gate Array
18 pages
A Study To Assess The General Mental Health Among College Students in Selected Colleges at Kannur District
No ratings yet
A Study To Assess The General Mental Health Among College Students in Selected Colleges at Kannur District
5 pages
Quantifying, Measuring, and Correlating Socio - Cultural Variables: An Indispensable Technique For Diverse Fields of The Social Sciences
No ratings yet
Quantifying, Measuring, and Correlating Socio - Cultural Variables: An Indispensable Technique For Diverse Fields of The Social Sciences
12 pages
Intercalating A Multi-Barreled Approach To Educational and Pedagogical Reform: A Brief Summation of Our Publications On Pedagogy
No ratings yet
Intercalating A Multi-Barreled Approach To Educational and Pedagogical Reform: A Brief Summation of Our Publications On Pedagogy
12 pages
Mediating Conflicts: Challenges of School Grievance Committee
No ratings yet
Mediating Conflicts: Challenges of School Grievance Committee
4 pages
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
No ratings yet
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
18 pages
Childhood Adversity and Its Echoes in Adult Intimate Relationships
No ratings yet
Childhood Adversity and Its Echoes in Adult Intimate Relationships
9 pages
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
No ratings yet
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
17 pages
Parental Influence On Aggression and Self-Esteem Among Young Adults: An Indian Context
No ratings yet
Parental Influence On Aggression and Self-Esteem Among Young Adults: An Indian Context
6 pages
On Efficiency Enhancement of SHA-3 For FPGA-Based Multimodal Biometric Authentication
No ratings yet
On Efficiency Enhancement of SHA-3 For FPGA-Based Multimodal Biometric Authentication
14 pages
How To Use FTDI USB Fpga Boards
No ratings yet
How To Use FTDI USB Fpga Boards
14 pages
Open-Source Crypto IP Cores For FPGAs Overview and Evaluation
No ratings yet
Open-Source Crypto IP Cores For FPGAs Overview and Evaluation
8 pages
Gastrointestinal Stromal Tumour (GIST)
No ratings yet
Gastrointestinal Stromal Tumour (GIST)
5 pages
Detailed Rab-Advertisement
No ratings yet
Detailed Rab-Advertisement
9 pages
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
No ratings yet
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
12 pages
List of FPGA Projects
No ratings yet
List of FPGA Projects
3 pages
Using A PicoBlaze Processor To Traffic Light Control
No ratings yet
Using A PicoBlaze Processor To Traffic Light Control
9 pages
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
No ratings yet
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
11 pages
Hardware Implementation of OTFS Modulation Using CORDIC Algorithm
No ratings yet
Hardware Implementation of OTFS Modulation Using CORDIC Algorithm
5 pages
Xilinx 1
No ratings yet
Xilinx 1
27 pages
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
No ratings yet
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
8 pages
Hardware & Software Co-Design by Giovanni de Micheli, Fellow, Ieee, and Rajesh K. Gupta, Member, Ieee
No ratings yet
Hardware & Software Co-Design by Giovanni de Micheli, Fellow, Ieee, and Rajesh K. Gupta, Member, Ieee
1 page
Dental Care Flip Model: Dental Health Education To Improve Dental Health Maintenance Behavior of Elementary School Students
No ratings yet
Dental Care Flip Model: Dental Health Education To Improve Dental Health Maintenance Behavior of Elementary School Students
8 pages
Isolated Fallopian Tube Torsion Caused by A Mature Cystic Teratoma: A Rare Case Report
No ratings yet
Isolated Fallopian Tube Torsion Caused by A Mature Cystic Teratoma: A Rare Case Report
6 pages
VLSI Design Style
No ratings yet
VLSI Design Style
1 page
B.E Projects VLSI LIST
No ratings yet
B.E Projects VLSI LIST
3 pages
Cardiovascular Catastrophe in Catastrophic Antiphospholipid Syndrome: A Case Report
No ratings yet
Cardiovascular Catastrophe in Catastrophic Antiphospholipid Syndrome: A Case Report
5 pages
Pamectomy in Lobular Breast Cancer
No ratings yet
Pamectomy in Lobular Breast Cancer
3 pages
Digital Transformation in The Judiciary: Evaluating The Impact of Court Case Management Systems On Reducing Case Backlogs and Enhancing Efficiency in Subordinate Courts of Tamil Nadu
No ratings yet
Digital Transformation in The Judiciary: Evaluating The Impact of Court Case Management Systems On Reducing Case Backlogs and Enhancing Efficiency in Subordinate Courts of Tamil Nadu
2 pages
Bare-Metal Embedded C Programming: Develop high-performance embedded systems with C for Arm microcontrollers
From Everand
Bare-Metal Embedded C Programming: Develop high-performance embedded systems with C for Arm microcontrollers
Israel Gbati
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Web Commerce Security: Design and Development
From Everand
Web Commerce Security: Design and Development
Hadi Nahari
No ratings yet
Emerging Technologies in Information and Communications Technology
From Everand
Emerging Technologies in Information and Communications Technology
Fouad Sabry
No ratings yet

High Performance FPGA Based CNN Accelerator

Uploaded by

High Performance FPGA Based CNN Accelerator

Uploaded by

Volume 6, Issue 3, March – 2021 International Journal of Innovative Science and Research Technology

High Performance FPGA Based CNN Accelerator

Keywords:- CNN, Accelerator, FPGA.

As communication systems evolve and power levels

IJISRT21MAR422 www.ijisrt.com 1013

For such a reason, hardware accelerators customized for

The functional design of the hardware is introduced

IJISRT21MAR422 www.ijisrt.com 1014

A hardware accelerator is a specialized hardware unit

IJISRT21MAR422 www.ijisrt.com 1015

[1]. Xiaocong Lian , Member, IEEE, Zhenyu Liu, Member,

IJISRT21MAR422 www.ijisrt.com 1016

You might also like