0% found this document useful (0 votes)

32 views

A General Neural Network Hardware Architecture On FPGA

The document describes a general neural network hardware architecture implemented on an FPGA SOC platform. It discusses the objectives, neural network perspectives, and hardware architecture. The architecture contains modules for forward and backward propagation to enable neural network training and can be adapted for different network types and sizes.

Uploaded by

ben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

A General Neural Network Hardware Architecture On FPGA

Uploaded by

ben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A General Neural Network Hardware

Architecture on FPGA
Yufeng Hao
Dept. of Electronic, Electrical and
Systems Engineering
University of Birmingham, Edgbaston,
Birmingham, B152TE, UK
Email: [email protected]
Abstract—Field Programmable Gate Arrays (FPGAs) plays an increasingly important
role in data sampling and processing industries due to its highly parallel architecture, low
power consumption, and flexibility in custom algorithms. Especially, in the artificial
intelligence field, for training and implement the neural networks and machine learning
algorithms, high energy efficiency hardware implement and massively parallel computing
capacity are heavily demanded. Therefore, many global companies have applied FPGAs
into AI and Machine learning fields such as autonomous driving and Automatic Spoken
Language Recognition (Baidu) [1] [2] and Bing search (Microsoft) [3]. Considering the
FPGAs great potential in these fields, we tend to implement a general neural network
hardware architecture on XILINX ZU9CG System On Chip (SOC) platform [4], which
contains abundant hardware resource and powerful processing capacity. The general
neural network architecture on the FPGA SOC platform can perform forward and
backward algorithms in deep neural networks (DNN) with high performance and easily
be adjusted according to the type and scale of the neural networks.

Index Terms—General Neural Network (GNN), Field Programmable Gate Arrays

(FPGAs), Systems On Chip (SOC).

1 Introduction
The basic elements of FPGA are Configurable Logic Block (CLB) and interconnecting
resources [5], which are flexible and convenient to implement interfaces such as I2C,
SPI, and so on, and control circuits for specific requirements since FPGA was firstly
introduced about 30 years ago. And with the development of integrated circuit (IC)
technology, FPGA has held an outstanding performance in low-power and large-scale
parallel computing domain. Compared to CPU or GPU which are based on Von
Neumann or Harvard Architecture, FPGA has a more flexible framework to implement
algorithms. The instructions and data in FPGA can be designed in a more efficient way
without constraint of fixed architectures, which is suitable for designers to explore the
high performance implement approaches in power or computing sensitive fields.
Besides, though its clock frequency is slower than that of CPU or GPU, FPGA usually
executes operations in a few clock periods, which makes FPGA to hold a competitive
advantage in real-time data processing and low power consumption design, considering
CPU and GPU need dozens of instructions to execute one operation and higher
frequency leads to higher power consumption. Meanwhile, considering its inherent
parallel architecture, FPGA shows a powerful processing capacity in massive
convolution, multiply-accumulation, and other matrix operations which are essential in
current neural network or machine learning algorithms. Therefore, it is indispensable
to apply FPGA into the above fields to gain cost and real-time computing advantage.

2 Objective
We will build a general neural network hardware architecture on FPGA, which has an
outperformance in energy efficiency and real-time computation. Based on the
architecture, different types and scales of neural networks can be implemented and the
neural network training and deployment can be directly performed on FPGA.

3 Hardware Architecture Implement

3.1 Neural Networks Perspective
Artificial Neural Network consists of neuron cells, which are connected to each other
and arranged in layers [6]. Basically, one neural network has only one input layer and
one output layer but can hold many hidden layers. Each layer is composed of neuron
cells. Each neuron cell can be regarded as a nonlinear transformation unit which holds
different weights as the multiplier for different input data from the previous layer. The
basic unit of a neuron cell can be illustrated as following.
X1
W1
X2 W2 f(Σ (Wi*Xi)+b)
f
W3
X3
b

Figure 3-1 neuron cell unit

The system diagram of General Neural Network is shown below as figure 3-2. (The
picture is form http://neuralnetworksanddeeplearning.com/chap5.html )

Figure 3-2 general neural network system diagram

According to the number of hidden layers, the neural network can be classified as Deep
Neural Network (DNN) which usually has more than 2 layers and shallow neural
network. In terms of the direction of data flow, there are Recurrent Neural Network
(RNN) where there is a data flow between adjacent cells in the same layer and general
Neural Network where there is not such connection. Judged from the operations
executed by the neural network, there is a Convolutional Neural Network (CNN) which
mainly perform its function through convolutional operation. The nonlinear
transformation layers mentioned above are the core elements for all types of neural
network such as CNN, DNN, RNN, and so on. Essentially, it is the nonlinear
transformation that the neural network possesses the capacity of extracting high
dimension features of original data set.
Actually, Neural Network is a kind of data modeling algorithm with multi-layer and
nonlinear transformation [7]. There are two main procedures in design neural networks:
forward process and backward process [8]. In the forward process, we define the neural
network architecture such as how many layers, how many cells in each layer, and which
kinds of cells we choose, and then the input data can flow to the output. In the back
process, we define the loss function to calculate the gap between the prediction value
and the label values. Through calculating the gradient in the backward path, we upgrade
the weights of the neural network. One forward and one backward process can be called
as one epoch, and it is the training of neural network. Usually, a trained neural network
model need hundreds of epochs to be available.

3.2 General Neural Network Architecture implement

According to the general neural network structure which is shown in figure 3-2, we
implement general framework of a general neural network on FPGA shown in Figure
3-3, which contains the forward propagation, backward propagation, and control
modules to complete the training of the neural network. In the framework, we use the
highly reusable modules to perform the neural network matrix operations, which makes
it easy to adjust the types or the number of hidden layers of a neural network.

External memory
PS
AXI AXI AXI
buffer1
buffer0

Hidden
To output
S1 Weight Gain
buffer Raddr Rdata
Raddr Rdata
RAM Raddr Rdata
Control
Tanh Mult-add
Raddr output
Mult-add
LUT
Tanh(S1)
Exp
lut +
RAM Label
Mult-add bank Tanh bank Mult-add bank softmax
buffer
S2

(Ŷ-Y)
accu RAM
mult

PL
control

Figure 3-3 general neural network hardware architecture on FPGA SOC

The general neural network hardware architecture framework is shown as the figure 3-
3. The framework includes two processes: the forward process which generates the
prediction values and the backward process which upgrades the weights according to
the loss values. The mult-add bank, RAM, and Tanh bank modules can perform one
layer forward propagation operations. They can be reused to adjust the neural network
or increase the number of hidden layers. The accu and mult modules will perform the
back propagation operations. We will give a more detail explanation about the hardware
implement framework in the following paragraphs.
For the forward process, there are 3 matrix operation stages: input layer to hidden layer,
hidden layer to hidden layer, and hidden layer to output layer. In the first stage, the
input vectors Xi (i = 1,2,...T) multiply the first hidden layer weights matrix WH1XT (H1
equals to the dimension of the first hidden layer, T equals to the length of input vectors),
which are loaded form the buffer0 and buffer1. We use mult-add bank to accomplish
the above matrix operation. The mult-add bank consists of many parallel multiplication
and accumulation units, whose number can be adjusted according to the dimension of
hidden layer and logic resources on FPGA. Then the result matrix is stored in the S1
RAM as the input of activation function. There are lots of activation functions which
can be chose in practice such as sigmoid, Rectified Linear Unit (ReLU), Tanh, and so
on. We implement the activation functions by Lookup Table (LUT) through which
different initialization parameters can be loaded in order to perform different activation
functions. The output matrix of the first hidden layer (M1) is stored in the Tanh(S1)
RAM. In the second stage, the M1 multiplies the second hidden layer weights matrix
WH1XH2 (H2 equals to the dimension of the second hidden layer), which still is
implemented by mult-add bank. In figure 3-3, we only demonstrate a general neural
network with one hidden layer. The multi-hidden layers neural network hardware
implement can be carried out through the reuse of the mult-add bank, weights RAM,
and tanh bank units. The final stage in the forward process is to perform the matrix
mapping from last hidden layer to the output layer. According to the expression of
𝑧
𝑒 𝑗
softmax function: 𝜎(𝑧𝑗 ) = ∑𝑘 𝑧 , we can similarly implement the exponential
𝑘=1 𝑒 𝑘
function by LUT, then get the sum by addition units. According to experience value,
we set a maximum limit about the sum (eg. 1024) and then adjust the output as the ratio
of sum and the maximum limit.
For the backward process, we use cross-entropy loss function [9] [10] to calculate error
derivatives in back propagation algorithm. The detail of back propagation algorithm
operation is described in [11]. The core step is to calculate the multiplication of the
error and the derivative of the activation function. In implementation level, the back

propagation can be described as: 𝛾 ∑𝑚

𝑖=1 (ŷ𝑖 − 𝑦𝑖 ) S2 . The (ŷ𝑖 − 𝑦𝑖 ) is the disparity
X

of the prediction value and the label. The S 2 is the input vectors to the layer where the
X
residual signals generates in the back propagation path. The symbol means exterior
product operation, which can be performed by mult module in the framework.
The control unit in the bottom of the diagram controls the neural network training
processes, which provides the timing to read or write weights buffers, perform matrix
operations, and store calculation results.

4 Conclusion
The whole neural network algorithms is implemented on the XILINX ZU9CG FPGA
SOC platform, which contains abundant computing memory resources (2520 DSPs and
32Mb on-chip memory) and remarkable programming ability (Dual-core ARM Cortex
A53). In the hardware framework, different deep neural networks can be implemented
by reusing the forward propagation modules and slightly modifying the backward
propagation modules. For a larger scale deep neural network, a cluster of FPGA can be
integrated into a whole platform to perform the algorithm. We can also deploy deep
learning framework tools such as TensorFLow into this 64-bit FPGA SOC platform,
calling the FPGA hardware resources directly. This will provide a real-time, high energy
efficiency, and fast deployment embedded solution for deep learning and machine
learning applications.

REFERENCE
[1] S. Liu et al. Implementing a Cloud Platform for Autonomous Driving. arXiv:
1704.02696, 2017.
[2] J. Ouyang et al. SDA: Software-defined Accelerator for Large-scale DNN system.
ser. Hot Chips 26, 2014.
[3] L. Wirbel. “Xilinx SDAcel: A Unified Development Environment for Tomorrow’s
Data Center”. [Online] Available: https://www.xilinx.com/publications/prod_mktg/sdx
/sdaccel-wp.pdf
[4] “Zynq UltraScale + MPSOC Data Sheet: Overview”. [Online] Available: https://
www.xilinx.com/support/documentation/data_sheets/ds891-zynq-ultrascale-plus-over
view.pdf
[5] U. Farooq et al. Tree Based Heterogeneous FPGA Architectures, Application
Specific Exploration and Optimization. Springer, pp. 7-48, 2012.
[6] K. Gurney. An Introduction to Neural Networks. 1 edition, CRC Press, pp. 29-30,
1997.
[7] D. Svozil, et al. Introduction to Multi-layer Feed-forward Neural Networks.
Chemometrics and Intelligent Laboratory Systems, 39 (1): 43-62, 1997.
[8] M. Mazur. “A Step by Step Backpropagation Example”. [Online] Available:
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
[9] G.E. Nasr, et al. Cross Entropy Error Function in Neural Network: Forecasting
Gasoline Demand. FLAIRS-02 Proceedings, American Association for Artificial
Intelligence, pp. 381-384, 2002.
[10] S. Renals. “Multi-layer Neural Network”. [Online] Available: https://www.inf.ed.
ac.uk/teaching/courses/asr/2015-16/asrX1-nn.pdf
[11] M. Nielsen. “Neural Network and Deep Learning”. [Online] Available: http://
neuralnetworksanddeeplearning.com/chap3.html

ES Designer's Reference Handbook
No ratings yet
ES Designer's Reference Handbook
61 pages
Hardware Design For Machine Learning
No ratings yet
Hardware Design For Machine Learning
22 pages
FFSN Inplementation4
No ratings yet
FFSN Inplementation4
18 pages
02-92
No ratings yet
02-92
15 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Design and implementation of deep neural network hardware chip and its performance analysis
No ratings yet
Design and implementation of deep neural network hardware chip and its performance analysis
10 pages
24CH10039 AGV Task 4
No ratings yet
24CH10039 AGV Task 4
3 pages
Applsci 12 10771 v2
No ratings yet
Applsci 12 10771 v2
44 pages
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
No ratings yet
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
36 pages
Ann On Fpga
No ratings yet
Ann On Fpga
5 pages
Development and Implementation of Parameterized FPGA-Based General Purpose Neural Networks For Online Applications
No ratings yet
Development and Implementation of Parameterized FPGA-Based General Purpose Neural Networks For Online Applications
12 pages
Fpga 11893295 - 122
No ratings yet
Fpga 11893295 - 122
2 pages
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
541 - Literature Review
No ratings yet
541 - Literature Review
19 pages
Research On Opencl Optimization For Fpga Deep Learning Application
No ratings yet
Research On Opencl Optimization For Fpga Deep Learning Application
19 pages
New Dlau
No ratings yet
New Dlau
52 pages
Fpga Implementation of Neural Networks: Main Contents
No ratings yet
Fpga Implementation of Neural Networks: Main Contents
21 pages
Fixed-Point CNN For FPGA
No ratings yet
Fixed-Point CNN For FPGA
7 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
High Performance FPGA Based CNN Accelerator
No ratings yet
High Performance FPGA Based CNN Accelerator
4 pages
Unveiling_the_powerhouses_of_AI_A_comprehensive_st
No ratings yet
Unveiling_the_powerhouses_of_AI_A_comprehensive_st
9 pages
Analysis of single layer artificial neural network neuromorphic hardware chip
No ratings yet
Analysis of single layer artificial neural network neuromorphic hardware chip
11 pages
Mipsology Aws f1
No ratings yet
Mipsology Aws f1
10 pages
International Refereed Journal of Engineering and Science (IRJES)
No ratings yet
International Refereed Journal of Engineering and Science (IRJES)
4 pages
FPGA Based Artificial Neural Network
No ratings yet
FPGA Based Artificial Neural Network
11 pages
Design of Fpga Based General Purpose Neural Network: MR Prashant D.Deotaleproflalit Dole
No ratings yet
Design of Fpga Based General Purpose Neural Network: MR Prashant D.Deotaleproflalit Dole
5 pages
FPGA Implementation of A Trained Neural Network: Seema Singh, Shreyashree Sanjeevi, Suma V, Akhil Talashi
No ratings yet
FPGA Implementation of A Trained Neural Network: Seema Singh, Shreyashree Sanjeevi, Suma V, Akhil Talashi
10 pages
Spiking Neiral Network
No ratings yet
Spiking Neiral Network
15 pages
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
No ratings yet
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
21 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
04_abstract (1)
No ratings yet
04_abstract (1)
40 pages
NeuralNetworkforReal-TimeObjectDetectiononFPGA
No ratings yet
NeuralNetworkforReal-TimeObjectDetectiononFPGA
6 pages
Hardware Implementation of Neural Networks
No ratings yet
Hardware Implementation of Neural Networks
5 pages
MICRO22 - FPGA - DL - Deep Learning Optimized FPGA Architectures
No ratings yet
MICRO22 - FPGA - DL - Deep Learning Optimized FPGA Architectures
230 pages
Fast Algorithms For Spiking Neural Network Simulation With Fpgas
No ratings yet
Fast Algorithms For Spiking Neural Network Simulation With Fpgas
34 pages
ML Unit 4
No ratings yet
ML Unit 4
16 pages
FP-DNN An Automated Framework For Mapping
No ratings yet
FP-DNN An Automated Framework For Mapping
8 pages
A Review Paper On Artificial Neural Network in Cognitive Science
No ratings yet
A Review Paper On Artificial Neural Network in Cognitive Science
6 pages
Vlsi For Neural Networks and Their Applications
No ratings yet
Vlsi For Neural Networks and Their Applications
12 pages
HCL Shiv Vlsi
No ratings yet
HCL Shiv Vlsi
12 pages
inbound6702194954077661265
No ratings yet
inbound6702194954077661265
42 pages
Redactor
No ratings yet
Redactor
8 pages
Engineering Applications of FPGAs
100% (2)
Engineering Applications of FPGAs
230 pages
Esteban Tlelo-Cuautle, Jose Rangel-Magdaleno, Luis Gerardo de la Fraga (auth.)-Engineering Applications of FPGAs_ Chaotic Systems, Artificial Neural Networks, Random Number Generators, and Secure Comm.pdf
No ratings yet
Esteban Tlelo-Cuautle, Jose Rangel-Magdaleno, Luis Gerardo de la Fraga (auth.)-Engineering Applications of FPGAs_ Chaotic Systems, Artificial Neural Networks, Random Number Generators, and Secure Comm.pdf
230 pages
Fpga Implementation of A Multilayer Perceptron Neural Network Using VHDL
No ratings yet
Fpga Implementation of A Multilayer Perceptron Neural Network Using VHDL
4 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
Gan Fpga
No ratings yet
Gan Fpga
35 pages
Sway 020 A
No ratings yet
Sway 020 A
7 pages
Unit 4 Notes New
No ratings yet
Unit 4 Notes New
49 pages
FPGA - Based Accelerators of Deep LearningNetworks For Learning and Classification
100% (1)
FPGA - Based Accelerators of Deep LearningNetworks For Learning and Classification
37 pages
Image Processing Using VHDL
No ratings yet
Image Processing Using VHDL
36 pages
FPGA Genreal Paper
No ratings yet
FPGA Genreal Paper
7 pages
Benchmarking_Contemporary_Deep_Learning_Hardware_and_Frameworks_A_Survey_of_Qualitative_Metrics
No ratings yet
Benchmarking_Contemporary_Deep_Learning_Hardware_and_Frameworks_A_Survey_of_Qualitative_Metrics
8 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
Example poster (1)
No ratings yet
Example poster (1)
1 page
1- DLA Compiler and FPGA Overlay for Neural Network Inference Acceleration
No ratings yet
1- DLA Compiler and FPGA Overlay for Neural Network Inference Acceleration
8 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
Reconfigurable Hardware Design Approach For Economic Neural Network
No ratings yet
Reconfigurable Hardware Design Approach For Economic Neural Network
5 pages
Neural Networks
No ratings yet
Neural Networks
51 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Unit 2 - ARM7 Based Microcontroller
No ratings yet
Unit 2 - ARM7 Based Microcontroller
112 pages
BT Thesis
No ratings yet
BT Thesis
249 pages
Hardware Engineering - internship - NVIDIA
No ratings yet
Hardware Engineering - internship - NVIDIA
2 pages
Raspberry Pi
No ratings yet
Raspberry Pi
7 pages
Collaborative Coexistence Interface Between Cypress-to-Cypress Solutions and Cypress-To-Third-Party Chips
No ratings yet
Collaborative Coexistence Interface Between Cypress-to-Cypress Solutions and Cypress-To-Third-Party Chips
7 pages
B.E. 2015 Pattern Timetable For Nov - Dec-2022 Exam
No ratings yet
B.E. 2015 Pattern Timetable For Nov - Dec-2022 Exam
16 pages
TMS320F206 Digital Signal Processor
No ratings yet
TMS320F206 Digital Signal Processor
57 pages
OS Important Questions For Externals
No ratings yet
OS Important Questions For Externals
45 pages
Emebbedd Question Bank
100% (3)
Emebbedd Question Bank
25 pages
Patient Health Monitoring System Using ESP32 and Blynk App
No ratings yet
Patient Health Monitoring System Using ESP32 and Blynk App
15 pages
Optimize The Operating Range For Improving The Cycle Life of Battery Energy
No ratings yet
Optimize The Operating Range For Improving The Cycle Life of Battery Energy
11 pages
Calibre DRC and LVS - Mentor
No ratings yet
Calibre DRC and LVS - Mentor
6 pages
Keywords: Smart Blind Stick, Node MCU ESP32, Infrared Sensors, Buzzer, APR 9600 Voice
No ratings yet
Keywords: Smart Blind Stick, Node MCU ESP32, Infrared Sensors, Buzzer, APR 9600 Voice
48 pages
Qsg106: Getting Started With Emberznet Pro
No ratings yet
Qsg106: Getting Started With Emberznet Pro
51 pages
SoC Design Flow
No ratings yet
SoC Design Flow
21 pages
Implementation and Functional Verification of RISC-V Core For Secure IoT Applications
No ratings yet
Implementation and Functional Verification of RISC-V Core For Secure IoT Applications
4 pages
MIMXRT1061CVJ5A
No ratings yet
MIMXRT1061CVJ5A
111 pages
Systematic Design Of Analog Ip Blocks Jan Vandenbussche Georges Gielen instant download
No ratings yet
Systematic Design Of Analog Ip Blocks Jan Vandenbussche Georges Gielen instant download
82 pages
Bluenrg 1
No ratings yet
Bluenrg 1
167 pages
Video Capsule Endoscopy - Review of Latest Electronics and Software Technologies
No ratings yet
Video Capsule Endoscopy - Review of Latest Electronics and Software Technologies
52 pages
DD
No ratings yet
DD
149 pages
Unit I Introduction
No ratings yet
Unit I Introduction
15 pages
How Big Is The FPGA Market
No ratings yet
How Big Is The FPGA Market
14 pages
ABX00083 Datasheet
No ratings yet
ABX00083 Datasheet
22 pages
ch21 Sip-Module
No ratings yet
ch21 Sip-Module
26 pages
2020-date-hypervisor
No ratings yet
2020-date-hypervisor
6 pages
Low-Power Wireless Communication Circuits and Systems 60GHz and Beyond - Kaixue Ma and Kiat Seng Yeo
100% (3)
Low-Power Wireless Communication Circuits and Systems 60GHz and Beyond - Kaixue Ma and Kiat Seng Yeo
359 pages
OpenRAM: An Open-Source Memory Compiler
No ratings yet
OpenRAM: An Open-Source Memory Compiler
6 pages
DFTWorkshop
No ratings yet
DFTWorkshop
3 pages

A General Neural Network Hardware Architecture On FPGA

Uploaded by

A General Neural Network Hardware Architecture On FPGA

Uploaded by

A General Neural Network Hardware

Index Terms—General Neural Network (GNN), Field Programmable Gate Arrays

3 Hardware Architecture Implement

Figure 3-1 neuron cell unit

Figure 3-2 general neural network system diagram

3.2 General Neural Network Architecture implement

Figure 3-3 general neural network hardware architecture on FPGA SOC

propagation can be described as: 𝛾 ∑𝑚

You might also like