FPGA Implementation of A Face Recognition System
FPGA Implementation of A Face Recognition System
Pavan K R
Dept. of E & C
Ramaiah Institute of Technology
Bangalore, India
[email protected]
Abstract — Recently machine and deep learning algorithms the precision is reduced to 4-bits. This provides a high-
are considered very often for solving real life problems. These performance and low power alternative to traditional
algorithms are generally implemented on CPUs or GPUs, software implementation. In [2] the authors implement a
which are not meant to execute machine learning algorithms. CNN model on Xilinx ZYNQ FPGA. It provides a detailed
While these algorithms are required for artificial intelligence- architecture of CNN model. The paper also talks about CNN
based systems, they have the problem of being computationally compression and DNN accelerators which help to compress
intensive with large power consumption and execution time. the model parameters and reduces the number of operations.
Some applications also require functional safety, and GPUs are The paper provides alternative solutions such as parallel
expected to meet the functional safety requirements, which is a
execution, loop tiling, data reuse using internal memory,
time-consuming challenge for GPU designers. As an alternative
Field Programmable Gate Arrays (FPGAs) are preferred in
pipelining, data sizes to minimize memory footprint for
avionics and defense-based applications where functional optimizing FPGA resource utilization.
safety is a key factor. FPGA based systems offer increased A comparison of FPGAs and GPUs based on their
performance, lower power consumption, low latency and low performance is provided in [3]. The authors try to infer
implementation cost compared to CPUs and GPUs. They also whether FPGAs can beat GPUs in implementing next
can configure hardware meant for realizing machine learning generation DNN algorithms. ResNet-50 CNN algorithm was
algorithms which incorporates parallel execution and implemented on Intel’s Stratix 10 FPGA and Titan X GPU
customized data types. The primary goal of this paper is to
and proved that Stratix 10 is ~60% better than Titan X
implement a face recognition algorithm on Xilinx PYNQ Z2
FPGA and compare various performance parameters. FPGA
performance. FPGA also provided a moderate speedup of
was able to perform face recognition in real time with average 2.1x and aggressive speedup of 3.5x over Titan X. Stratix 10
precision of 91.6% and overall accuracy of 92%. delivers much better improvements in performance/Watt
over Titan X from 2.3x to 4.3x. In [4], the authors describe
Keywords — Machine learning, Face detection and implementation of parallel convolution binarized neural
recognition, FPGA implementation network (BNN) algorithm on PYNQ Z1 board. The main
aim is to reduce parameter size so that the parameters can be
I. INTRODUCTION accommodated within the 4.9MB RAM of PYNQ board.
Nowadays machine and deep learning algorithms are Using this BNN network the parameter size was reduced to
preferred for finding solutions to real life problems as these 2.3MB, which is very less compared to the parameter size of
algorithms provide enhanced system intelligence. But these a similar CNN implemented on the GPU. This model
algorithms have increased computational complexity provided an accuracy of 86% for CIFAR-10 database. One of
together with large power consumption and execution time. the important aspects that this paper describes is the
Recently GPUs are employed for implementing machine implementation of BNN on the PYNQ board using Vivado
learning algorithms, but CPUs or GPUs are not designed for tool, to synthesis and create the executable bit file which
this purpose. As an alternative, FPGAs have shown promise comprises the network architecture to configure the FPGA
in power consumption and performance, and so suitable for board. The model was uploaded onto the FPGA with the
realizing machine learning algorithms. FPGA offers saved weights of the network.
hardware specific design which guarantees increased In this paper, a face recognition system using Local
performance, lower power consumption, and decreased cost Binary Pattern Histogram (LBPH) is implemented on a CPU
compared to a CPU or GPU implementation. In addition, and FPGA. The algorithm utilized for face recognition is
they provide the ability to configure hardware, specific to elaborated and performance of the algorithms is compared in
machine learning algorithms, including parallel execution. terms of accuracy and execution time. The comparison
proves the superiority of FPGA over CPU for machine
II. EXISTING IMPLEMENTATIONS
learning algorithms implementation.
In [1], the authors implement a multilayer perceptron
(MLP) learning network for MNIST database on a Cyclone The paper is organized as follows: Section III describes
IVE FPGA. The FPGA design with 8-bit precision provides the proposed implementation on a Xilinx PYNQ Z2 FPGA
accuracy and execution time similar to a 32-bit software board. Section IV elaborates on the face recognition
solution but with a reduced clock frequency of 144 times. As algorithm considered for implementing on a FPGA. Section
power consumption is proportional to frequency, FPGA V discusses the experimental results, and the paper is
implementation provides power savings without concluded with future work elaborated in Section VI.
compromising on accuracy or performance. Accuracy
reduces from 89% to 78% and area decreases by 41% when
978-1-6654-2849-1/21/$31.00 ©2021
Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
III. FPGA IMPLEMENTATION detected by a Haar feature with a set of two adjacent
Figure 1 represents the generalized block diagram of the rectangles that lie above the eye and the cheek region. The
proposed implementation. It is divided into two sections – position of these rectangles is defined relative to the location
Software and Hardware. Software section comprises of data of a detection window that acts like a bounding box to the
acquisition and training of the model. Data acquisition target face as shown in Figure 2.
involves capturing images and videos from a visual light
camera. This also requires cleaning of images and choosing
suitable images for the dataset. Once dataset is prepared, a
suitable machine learning algorithm is selected for training
and testing on the host computer. The hardware consists of a
FPGA board with core ARM processor and the features
obtained from trained model are ported onto the FPGA. The
proposed model is again tested on FPGA. The predictions
made during the testing phase on the FPGA are considered
for evaluating the accuracy of the machine learning model. If
the performance is not satisfactory, the model is again
trained by incorporating suitable changes. This procedure is
repeated till the desired accuracy is achieved, which is the
final model to be ported on to the FPGA.
Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
face is detected. If the label is negative, the classification of system. LBPH considers facial data and returns the
this region is complete, and the detector slides the window to appropriate ids of the faces. The trained recognition
the next location. If the label is positive, the classifier passes information with the ids is loaded on to the FPGA board.
the region to the next stage. The face is detected at the
current window location when the final stage classifies the In the final phase or recognition phase, a test face is
region as positive. captured through the camera and if this person had his face in
the dataset, the recognizer recognizes by returning its id and
D. Local Binary Pattern Histogram (LBPH) an index indicating the confidence of the recognizer with this
Local Binary Pattern (LBP) [7] is a simple and efficient match or else it returns an unknown face label output. The
texture operator which labels image pixel intensities by face is detected using Haar cascade frontal face algorithm.
thresholding the neighborhood of each pixel and the result is LBPH is used for creating the HFV for recognition. The
a binary number. Local Binary Pattern Histogram (LBPH) algorithm finds the match for the gray scale image of
combines LBP with Histograms of Oriented Gradients captured portion of the face and returns the label or id of its
(HOG) descriptor for improving face detection performance probable owner and the confidence of the recognizer on this
considerably on many datasets. match.
LBPH uses four parameters: (i) Radius – to describe a A. Recognition Algorithm Workflow
circular LBP (ii) Neighbors – number of sample points in a The workflow methodology of the face recognition
circular LBP. This is generally set to 8. (iii) Grid X – number system is as follows:
of cells in horizontal direction. Large Grid X implies finer
grid and higher the dimensionality of the resulting feature • Face Detection – Pre-processing images and
vector. This is also generally set to 8. (iv) Grid Y – number creating a training dataset with Haar cascades
of cells in vertical direction. This is also generally set to 8. • Feature Extraction – Haar Features, Integral
E. Detection and Recognition Algorithm Images, Cascade Classifier and Training model
To train the algorithm, a dataset with facial images is • Face Recognition – Testing in real-time with
considered. An id or label is set for each image for the HFV based face recognizer
algorithm to use this information to recognize an input image
or test image. The first computational step of LBPH is to B. Results and Discussion
create an intermediate image by segmenting only the face. The camera is interfaced with the FPGA board using
The algorithm uses a sliding window, based on radius and Jupyter Notebook or the host computer using Pycharm IDE.
neighbors. For each neighbor of the central pixel with a The required data set is collected and stored with the
threshold as the central pixel value, a binary value is set to 1 respective class ids by capturing the data through the
for pixel values equal to or higher than the threshold and 0 interfaced camera, detecting the face and saving the image in
for pixel values lower than the threshold. These binary the corresponding folder. Figure 3 shows an example folder
values are concatenated as a vector and converted to a of the dataset stored with class id = 1. By default, class id = 0
decimal value. The Grid X and Grid Y parameters are used is considered as ‘unknown’ and faces to be recognized are
to divide the image into multiple grids. For a grid of size 8 x saved as class id = 1, 2, 3... Approximately 1000 images per
8 and a histogram with 256 gray scale values a histogram class are collected.
feature vector (HFV) of length 8 x 8 x 256 = 16,384 is
obtained.
Each image in the training dataset is represented in terms
of an HFV. For the given input image, the algorithm steps
are performed and a HFV is created which represents the
input image. The two HFVs are compared, and the input
image is recognized with the label of the closest HFV in
terms of Euclidean distance. The calculated Euclidean
distance is also used for measuring the confidence of face
detection. The threshold and confidence can be used for
estimating the performance of the algorithm automatically.
V. EXPERIMENTAL RESULTS Fig. 3: Sample Dataset of class id = 1
Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
C. Experiment Results
a) Confusion Matrix: The confusion matrix for the
face recognitioin model implemented on FPGA considering
36 test images of 3 classes is as shown in Table 1.
Predicted Class
1 2 3
True Class
1 11 0 2
2 0 13 1
3 0 0 9
Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION In future, other machine learning algorithms similar to
This paper aims at harnessing and utilizing the power of face recognition can be considered for implementing on a
FPGAs in the field of artificial intelligence and machine FPGA. The speed of face recognition model can be enhanced
learning. The FPGAs have a lot of advantages over CPUs by employing hardware accelerators. Power consumption
and GPUs. They offer increased performance, lower power comparison can be made between CPU and FPGA using an
consumption, low latency and low implementation cost external power meter. Finally, images can be processed in
compared to CPUs and GPUs. In addition, they also provide parallel by capturing data from multiple cameras.
the ability to configure hardware, specific to machine REFERENCES
learning algorithms, including parallel execution and
[1] Jiong Si, Sarah L. Harris, “Handwritten Digit Recognition System on
customized data types. an FPGA,” IEEE 8th Annual Computing and Communication
Some of these advantages are evident from the face Workshop and Conference (CCWC), pp. 402 – 407, January 2018.
recognition algorithm that is implemented in this paper. The [2] Ahmad Shawahna, Sadiq M. Sait, Aiman El-Maleh, “FPGA-based
Accelerators of Deep Learning Networks for Learning and
implementation of LBPH features for image classification Classification: A Review,” in IEEE Access, vol. 7, pp. 7823 – 7859,
uses binary data type (a custom data type). This reduces the 2019.
use of resources in FPGA and adds to the already impressive [3] Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr,
speed of FPGA. The results prove that face recognition Randy Huang, Jason Gee Hock Ong, Yeong Tat Liew, Krishnan
implementation on FPGA is 1000 times faster compared to Srivatsan, Duncan Moss, Suchit Subhaschandra, Guy Boudoukh,
CPU. Implementing a face recognition model on FPGA not “Can FPGAs Beat GPUs in Accelerating Next-Generation Deep
Neural Networks?” FPGA 17: The 2017 ACM/SIGDA International
only helps in quick detection of images but also helps in Symposium on Field-Programmable Gate Arrays, February 2017.
parallel processing of images obtained from multiple camera [4] Li Yang, Zhezhi He, Deliang Fan “A Fully On-chip Binarized
sources. This comes in handy when considering a traffic Convolutional Neural Network FPGA Implementation with Accurate
surveillance system where images are fed from different Inference,” ISLPED 18: Proceedings of the International Symposium
cameras installed in each lane. on Low Power Electronics and Design, July 2018.
[5] Paul Viola, Michael Jones, “Robust Real-Time Object Detection,”
Even though FPGAs have a lot of advantages as International Journal of Computer Vision, vol. 57, no. 2, pp. 137 –
mentioned above, they lack the support that they need in 154, 2004.
implementing machine learning models. Due to their [6] Yoav Freund, Robert E. Schapire, “A Decision Theoretic
complex nature, these models require huge amount of Generalization of On-line Learning and an Application to
computational resources and programming FPGAs for these Boosting,” AT&T Bell Laboratories, September 1995.
complex algorithms from scratch is a challenging task. To [7] T. Ojala, M. Pietikäinen, D. Harwood, “Performance Evaluation of
scale this algorithm for large datasets, a binary model may be Texture Measures with Classification based on Kullback
Discrimination of Distributions,” Proceedings of the 12th IAPR
loaded in the RAM thereby increasing the speed with a slight International Conference on Pattern Recognition (ICPR 1994), vol. 1,
degradation in the accuracy. pp. 582 – 585, October 1994.
Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.