E3S Web of Conferences 229, 01024 (2021) https://doi.org/10.
1051/e3sconf/202122901024
ICCSRE’2020
FPGA-based Hardware Acceleration for SVM Machine Learning
Algorithm
Jakjoud Fatimazahra1*, Hatim Anas2, and Abella Bouaaddi1
1Laboratory of Energy Engineering, Materials and Systems, National School of Applied Sciences, Ibn Zohr University, Agadir,
Morocco
2 National School of Applied Sciences, Cady Ayad University, Marrakech, Morocco
Abstract. object recognition algorithms are both large consumers of computing power and memory which
affects the quality and performance especially when it comes to large image datasets, in this paper we
propose an algorithm for fruit/plant recognition that we will accelerate it using the PYNQ Board to evaluate
the execution time and the accuracy of the classifier.
1 Introduction acceleration methods. Rong Xie et al [1] present a
method of quick edge detection based on Zynq with the
Object recognition algorithms require a high- use of XfOpencv library. Accelerating edge detection
performance hardware architecture to ensure proper process is done due to parallel computing capabilities of
functioning. Due to the rapid development of image FPGA to increase process speed and reduce time-
processing techniques, the algorithms complexity and consuming, the accelerated algorithm with XfOpencv
image resolution lead us to choose a high end up in order has some advantage in computational time compared to
to perform image processing tasks. For this aim we find other environment. More recently Tzanos et al. [2]
that the image processing tasks are always executed by exhibit a hardware acceleration based on Naïve Bayes to
specific processors like DSPs. More the complexity of exceed embedded ARM processors, the proposed
functions increases we have to increase further the system can reduce the execution time up to 16,8x times
number of parallel units that become very expensive. while training part and up to 14x times in prediction
Hardware acceleration is the adapted solution to phase. Training implementation is based on three points,
improve performance by using hardware architectures first one is to store many data in local BRAM to have
dedicated to parallel processing. Field Programmable direct access from the fpga to reduce transferring data
Gate Array (FPGA) offer a hardware architecture to latency, second point is to avoid memory access
implement with lower costs. the latest version of FPGAs bottlenecks by partitioning large arrays into multiple
include configurations of general purpose logic smaller individual arrays, the last point is to pipelining
resources (LUTs) and registers with specialized loops which assured the maximum parallelism. In [3]
modules such as slice, memory and multipliers DSPs. was proposed a VLSI architecture of Naïve Bayes
In this paper, we present two object recognition classifier on FPGA for real time classification of facial
approaches dedicated to plants and crops supervision, expressions, this approach can perform real time
which will allow recognizing the fruits out of their classification operating at a frequency of 241,55 MHz.
leaves. That system is in great demand when it comes to [4] A framework to facilitate implementation of deep
a system for spooning fruits or robots which supervise learning algorithms using PYNQ platform is proposed
the growth of fruits and vegetables. Both approaches are [4], this solution will help data scientist and hardware
based on the SVM classifier. In the first one, we execute developers to combine the use of Deep learning model
the entire model on the Processing System part of the with architecture FPGA based.
Zynq, while we use the processing logic of Zynq for Other works are developed with the aim of reducing
accelerating the model in the second approach. complexity. Indeed, PYNQ platform is used to increase
accuracy performance of binary weights with respect to
allow power consumption [5]. Zhang et al. [6]
developed binarized neural networks framework to
2 Related work reduce FPGA hardware resources complexity.
In this section, we provide an overview of the
related works that regards machine learning
implemented on embedded systems and hardware
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
E3S Web of Conferences 229, 01024 (2021) https://doi.org/10.1051/e3sconf/202122901024
ICCSRE’2020
3 Proposed method Test image
Our method is based on a set of pre-processing
operations before classification phase. The size of the Grayscale conversion
database is 5000 images split into two classes (2500
images contain fruits and 2500 images of tree leaves) in
Resizer
order to normalize image size. Before starting feature
extraction phase, we resized the images to have a
standard size of 150x150x3 then, using Histogram of Features Extractors HOG
Oriented Gradient HOG, we extract the images’ features
to be classified; the last step consists in training SVMs
to have a convergent classification model. Our Load model
hypothesis concerning the acceleration of our algorithm
is to develop an overlay dedicated to the part of the
SVM classifier
resizing of images and to evaluate the execution time
and the performances of the classifier with and without
acceleration.
Classes
Fig.3. Classification SVM model
3.1 Feature Description
Feature description [7] is a representation of image by
extracting important information. The extracted feature
by HOG are computed by describing the local
distribution of the edge orientations and the
corresponding gradient magnitude, this is realized by
defining two locally units cell (8x8 pixels) and block
that contains 2x2 cells, this gives 16x16 pixels for each
Fig.1. Dataset image examples
hog feature [8]. At each pixel located at (x,y) an
orientation of local image gradient is computed after
determining the magnitude , and direction ,
Data Base , , , (1)
,
,
,
(2)
Grayscale conversion
Which , are first order Gaussian derivatives of the
image.
For each pixel in the orientation image, a histogram of
Resizer orientations is constructed over a cell. To generate the
feature descriptor vector, all adjacent cells are grouped
into normalized block histogram, these blocks will be
Feature Descriptor concatenated to form a descriptor.
HOG
3.2 Support vector machine
DB References SVM is a successful classifier in supervised learning. It
shows high performance in object recognition
application, the main objective of the SVM [9] is to find
the optimal hyperplane to maximize the separation of
SVM Training two class in case of binary classification. For each side
of the main hyperplane, two other parallel hyperplanes
are constructed and the algorithms try to find the best
separating hyperplane which maximizes the distance
Save model between secondary hyperplanes.
Fig.2. Training SVM model
2
E3S Web of Conferences 229, 01024 (2021) https://doi.org/10.1051/e3sconf/202122901024
ICCSRE’2020
Fig 4. SVM optimal Hyperplane
Fig.. Block Diagram in Vivado
3.3 Image resizer
we have leaned towards the normalization size of data,
since the database images are large. To minimize the
4 Result and Future Directions
execution time, we decided to reduce the size of the
images. Given the size of the database (5000 images), After bitstream generation and overlay, we
That will take more time to resize them to 150x150x3. prepared the PYNQ software environment by
To solve this problem, we have dedicated the execution configuring all modules that we are going to use while
of this part to the Programmable logic of the PYNQ. running the program. The SVM training is done on a
Python Productivity for Xilinx Zynq (PYNQ) is an host machine which lasted 25 hours in order to achieve
open source project developed to design embedded convergence then the model is tested on ZYNQ. the
systems for ZYNQ 7020 System on chip (SoC) [5]. table below shows the execution results according to
PYNQ Z1 is an FPGA Board equipped with the Pynq two experiment. First, we have run the whole algorithm
framework and in Zynq7020, an FPGA and a processor on the ARM which gave an accuracy of 97,5% in 5.4
mounted on the same chip. Specifically, the processing seconds then we accelerated resizing stage and the result
system PS consist of dual core ARM cortex A9 CPU goes up to 97,5% in 4.8 seconds.
with build-in Linux Ubuntu operating system and the
programmable logic PL (FPGA) unit is mounted with an Table 1. Classification result
Artix 7 family which contains 13300 logic slices, 630
KB of fast block RAM and 220 DSP slices, 512 MB Classifier Accuracy Time (s)
DDR3 with 16-bit bus. Hog SVM Classifier 97,5% 5.4
Pynq is adaptable for hardware acceleration [4], it Accelerated Hog SVM 97,5% 4.8
includes Python drivers that execute an application- classifier
programming interface (API) for FPGA bitstream
download and data transmission. With the use of Vivado
the programmable logic circuits are presented as Our purpose was to find a solution to accelerate
hardware libraries called overlays. the execution of object recognition algorithms without
losing the efficiency of the classifier, this solution is
only a first step for the integral acceleration of the
processing loop and the fact of using Pynq board is very
essential and promising. With the simplicity of use we
do not need to reprogram all the interactions interfaces
but it is enough to design the hardware architecture
which is adaptable with the solution general
functionalities.
5 Conclusion
in this paper we propose an approach for the
acceleration of pre-processing stage for object
recognition algorithm on PYNQ board. This solution
Fig.5. ZYNQ7020 Architecture will help us to generate the required interfaces to interact
with FPGA based implementation of object recognitions
The hardware architecture is designed to send data algorithms. The tool has been abstracted to a level at
to the IP of resizing via the DMA and reading back data which someone with the minimum knowledge in
all the IPs are developed by XfOpenCV which contains embedded electronics can develop a working model.
a set of kernels and image processing primitives that The hardware IP cores are developed with Vivado HLS
replicate the functionality of OpenCV library. 2019 and the SVM Classifier is programmed by the
python language. both approaches gave an accuracy of
3
E3S Web of Conferences 229, 01024 (2021) https://doi.org/10.1051/e3sconf/202122901024
ICCSRE’2020
97.5% but the execution time of the accelerate classifier
was reduced by 11.1% (0.6 seconds). in the future work
we intend to accelerate feature descriptor and
classification model.
References
1. Rong Xie, Xiaoqin Feng, A method of quick edge
detection based on Zynq, 3rd international
conference on cloud computing and internet of
thing, IEEE, (2018)
2. Georgios Tzanos, Christoforos Kachris, Dimitrios
Soudris, Hardware Acceleration on Gaussian
Naive Bayes machine learning algorithm,
international on modern circuits and systems
technologies, (2019)
3. P.Chaudhary and M.K.Sharma, VLSI Hardware
Architecture of Real time Pattern Classification
using Nave Bayed Classifier, in Proceedings of the
2017 2nd International Conference on Multimedia
Systems and Signal Processing- ICMSSP 2017
4. Luca Stornaiuolo, Marco D, Donatella Sciuto, On
how to efficiently implement Deep Learning
algorithms on PYNQ platform, Computer Society
annual symposium on VLSI, IEEE, 2018
5. “Bnn-pynq,” https://github.com/Xilinx/BNN-
PYNQ/.
6. C,Zhang, P.Li, G.Sun, Y.Guan, B.Xiao, J.Cong,
« Optimizing fpga-based accelerator design for
deeo convolutional neural networks », in
Proceedings of the 2015 ACM/SIGDA
International Symposium on Field-Programmable
Gate Arrays, ACM, 2015, pp.161-170
7. Wei Zhou, Ling Zhang, Xin Lou, Histogram of
Oriented Gradients Feature Extraction From Raw
Bayer Pattern Images, IEEE Transactions on
circuits and systems – II: Express Briefs, VOL. 67,
No 5, May 2020
8. Harihara Santosh Dadi, Gopala Krishna Mohan
Pillutla, Improved Face Recognition Rate Using
HOG Features and SVM Classifier, Journal of
Electronics and Communication Engineering,
ISSN:2278-8735, Volume 11, Issue 4, Ver. I, Jul-
Aug 2016, pp 34-44
9. Samy Bakheet, An SVM Framework for
Malignant Melanoma Detection Based on
Optimized HOG Features, Computation 2017,5,4