High Performance FPGA Based CNN Accelerator
High Performance FPGA Based CNN Accelerator
ISSN No:-2456-2165
Abstract:- Over the years, convolutional neural networks address the expansion of this "big data" the answer is found in
have been used in different applications, due to their Artificial Intelligence (AI). We can define it as an AI software
ability to perform tasks using a reduced number of or hardware application that thinks and solves problems as a
parameters compared to other in-depth learning methods. human being can. Problems ranging from language translation
However, the use of power and memory constraints, to internal image separation to understand and understand
which are often marginal and portable, often conflict with different faces and people. What we have found so far is small
the requirements of accuracy and latency. For these AI, which uses some algorithms and techniques to solve some
reasons, commercial commercial accelerators have specific problems.
become popular and their design is built on the tendencies
of the overall convolutional network models. However, the Since neural networks are naturally similar, they can
layout of the gate-mounted gateway represents an draw a significant amount in comparison to FPGAs (Field
attractive view because it offers the opportunity to use a Programmable Gate Arrays). Performance in FPGAs has been
hardware design designed for a particular model of a shown to have significantly lower power consumption per
convolutional network, with promising results in terms of function than equivalent in GPU (Graphical Processing Unit),
latency and power consumption. In this article, we which is a requirement for embedded systems. However,
propose a complete accelerator for chip-programmable implementation is no small feat because the development of
gate array hardware for convolutional neural network FPGAs is often done in hardware that describes the hardware,
partition, designed for a keyword recognition system. e.g. VHDL.
I. INTRODUCTION
In the same way, microcontroller based systems feature The hardware model is designed for CNN's advanced
the worst trade-off between power consumption and timing step-by-step use of hardware definition language, including
performances [8]. For this reason, commercial hardware CNN's computer architecture, multi-layer use, weight loading
accelerators for CNNs such as Neural Compute Stick (NCS) system, and data interference [4].
[9], Neural Compute Stick 2 (NCS2) [9], and Google Coral
[10] were produced. Such products feature optimized The possibility that existing existing low-power register
hardware architectures that allow to realize inferences of CNN (RTL) strategies could serve as a low-power design scheme
models with low latency and reduced power consumption. to accelerate an CNN-based object recognition system in
Standard communication protocols, such as Universal Serial contrast to conventional strategies. Many of the most
Bus (USB) 3.0., are generally exploited for communication effective design strategies for CNN acceleration focus on
REFERENCES