A General Neural Network Hardware Architecture On FPGA
A General Neural Network Hardware Architecture On FPGA
Architecture on FPGA
Yufeng Hao
Dept. of Electronic, Electrical and
Systems Engineering
University of Birmingham, Edgbaston,
Birmingham, B152TE, UK
Email: [email protected]
Abstract—Field Programmable Gate Arrays (FPGAs) plays an increasingly important
role in data sampling and processing industries due to its highly parallel architecture, low
power consumption, and flexibility in custom algorithms. Especially, in the artificial
intelligence field, for training and implement the neural networks and machine learning
algorithms, high energy efficiency hardware implement and massively parallel computing
capacity are heavily demanded. Therefore, many global companies have applied FPGAs
into AI and Machine learning fields such as autonomous driving and Automatic Spoken
Language Recognition (Baidu) [1] [2] and Bing search (Microsoft) [3]. Considering the
FPGAs great potential in these fields, we tend to implement a general neural network
hardware architecture on XILINX ZU9CG System On Chip (SOC) platform [4], which
contains abundant hardware resource and powerful processing capacity. The general
neural network architecture on the FPGA SOC platform can perform forward and
backward algorithms in deep neural networks (DNN) with high performance and easily
be adjusted according to the type and scale of the neural networks.
1 Introduction
The basic elements of FPGA are Configurable Logic Block (CLB) and interconnecting
resources [5], which are flexible and convenient to implement interfaces such as I2C,
SPI, and so on, and control circuits for specific requirements since FPGA was firstly
introduced about 30 years ago. And with the development of integrated circuit (IC)
technology, FPGA has held an outstanding performance in low-power and large-scale
parallel computing domain. Compared to CPU or GPU which are based on Von
Neumann or Harvard Architecture, FPGA has a more flexible framework to implement
algorithms. The instructions and data in FPGA can be designed in a more efficient way
without constraint of fixed architectures, which is suitable for designers to explore the
high performance implement approaches in power or computing sensitive fields.
Besides, though its clock frequency is slower than that of CPU or GPU, FPGA usually
executes operations in a few clock periods, which makes FPGA to hold a competitive
advantage in real-time data processing and low power consumption design, considering
CPU and GPU need dozens of instructions to execute one operation and higher
frequency leads to higher power consumption. Meanwhile, considering its inherent
parallel architecture, FPGA shows a powerful processing capacity in massive
convolution, multiply-accumulation, and other matrix operations which are essential in
current neural network or machine learning algorithms. Therefore, it is indispensable
to apply FPGA into the above fields to gain cost and real-time computing advantage.
2 Objective
We will build a general neural network hardware architecture on FPGA, which has an
outperformance in energy efficiency and real-time computation. Based on the
architecture, different types and scales of neural networks can be implemented and the
neural network training and deployment can be directly performed on FPGA.
External memory
PS
AXI AXI AXI
buffer1
buffer0
Hidden
To output
S1 Weight Gain
buffer Raddr Rdata
Raddr Rdata
RAM Raddr Rdata
Control
Tanh Mult-add
Raddr output
Mult-add
LUT
Tanh(S1)
Exp
lut +
RAM Label
Mult-add bank Tanh bank Mult-add bank softmax
buffer
S2
(Ŷ-Y)
accu RAM
mult
PL
control
of the prediction value and the label. The S 2 is the input vectors to the layer where the
X
residual signals generates in the back propagation path. The symbol means exterior
product operation, which can be performed by mult module in the framework.
The control unit in the bottom of the diagram controls the neural network training
processes, which provides the timing to read or write weights buffers, perform matrix
operations, and store calculation results.
4 Conclusion
The whole neural network algorithms is implemented on the XILINX ZU9CG FPGA
SOC platform, which contains abundant computing memory resources (2520 DSPs and
32Mb on-chip memory) and remarkable programming ability (Dual-core ARM Cortex
A53). In the hardware framework, different deep neural networks can be implemented
by reusing the forward propagation modules and slightly modifying the backward
propagation modules. For a larger scale deep neural network, a cluster of FPGA can be
integrated into a whole platform to perform the algorithm. We can also deploy deep
learning framework tools such as TensorFLow into this 64-bit FPGA SOC platform,
calling the FPGA hardware resources directly. This will provide a real-time, high energy
efficiency, and fast deployment embedded solution for deep learning and machine
learning applications.
REFERENCE
[1] S. Liu et al. Implementing a Cloud Platform for Autonomous Driving. arXiv:
1704.02696, 2017.
[2] J. Ouyang et al. SDA: Software-defined Accelerator for Large-scale DNN system.
ser. Hot Chips 26, 2014.
[3] L. Wirbel. “Xilinx SDAcel: A Unified Development Environment for Tomorrow’s
Data Center”. [Online] Available: https://www.xilinx.com/publications/prod_mktg/sdx
/sdaccel-wp.pdf
[4] “Zynq UltraScale + MPSOC Data Sheet: Overview”. [Online] Available: https://
www.xilinx.com/support/documentation/data_sheets/ds891-zynq-ultrascale-plus-over
view.pdf
[5] U. Farooq et al. Tree Based Heterogeneous FPGA Architectures, Application
Specific Exploration and Optimization. Springer, pp. 7-48, 2012.
[6] K. Gurney. An Introduction to Neural Networks. 1 edition, CRC Press, pp. 29-30,
1997.
[7] D. Svozil, et al. Introduction to Multi-layer Feed-forward Neural Networks.
Chemometrics and Intelligent Laboratory Systems, 39 (1): 43-62, 1997.
[8] M. Mazur. “A Step by Step Backpropagation Example”. [Online] Available:
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
[9] G.E. Nasr, et al. Cross Entropy Error Function in Neural Network: Forecasting
Gasoline Demand. FLAIRS-02 Proceedings, American Association for Artificial
Intelligence, pp. 381-384, 2002.
[10] S. Renals. “Multi-layer Neural Network”. [Online] Available: https://www.inf.ed.
ac.uk/teaching/courses/asr/2015-16/asrX1-nn.pdf
[11] M. Nielsen. “Neural Network and Deep Learning”. [Online] Available: http://
neuralnetworksanddeeplearning.com/chap3.html