0% found this document useful (0 votes)
146 views5 pages

FPGA Implementation of A Face Recognition System

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views5 pages

FPGA Implementation of A Face Recognition System

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FPGA Implementation of a Face Recognition System

S. Sethu Selvi Bharanidharan D Abdul Qadir


Professor, Dept. of E & C Dept. of E & C Dept. of E & C
Ramaiah Institute of Technology Ramaiah Institute of Technology Ramaiah Institute of Technology
Bangalore, India Bangalore, India Bangalore, India
[email protected] [email protected] [email protected]
2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) | 978-1-6654-2849-1/21/$31.00 ©2021 IEEE | DOI: 10.1109/CONECCT52877.2021.9622348

Pavan K R
Dept. of E & C
Ramaiah Institute of Technology
Bangalore, India
[email protected]

Abstract — Recently machine and deep learning algorithms the precision is reduced to 4-bits. This provides a high-
are considered very often for solving real life problems. These performance and low power alternative to traditional
algorithms are generally implemented on CPUs or GPUs, software implementation. In [2] the authors implement a
which are not meant to execute machine learning algorithms. CNN model on Xilinx ZYNQ FPGA. It provides a detailed
While these algorithms are required for artificial intelligence- architecture of CNN model. The paper also talks about CNN
based systems, they have the problem of being computationally compression and DNN accelerators which help to compress
intensive with large power consumption and execution time. the model parameters and reduces the number of operations.
Some applications also require functional safety, and GPUs are The paper provides alternative solutions such as parallel
expected to meet the functional safety requirements, which is a
execution, loop tiling, data reuse using internal memory,
time-consuming challenge for GPU designers. As an alternative
Field Programmable Gate Arrays (FPGAs) are preferred in
pipelining, data sizes to minimize memory footprint for
avionics and defense-based applications where functional optimizing FPGA resource utilization.
safety is a key factor. FPGA based systems offer increased A comparison of FPGAs and GPUs based on their
performance, lower power consumption, low latency and low performance is provided in [3]. The authors try to infer
implementation cost compared to CPUs and GPUs. They also whether FPGAs can beat GPUs in implementing next
can configure hardware meant for realizing machine learning generation DNN algorithms. ResNet-50 CNN algorithm was
algorithms which incorporates parallel execution and implemented on Intel’s Stratix 10 FPGA and Titan X GPU
customized data types. The primary goal of this paper is to
and proved that Stratix 10 is ~60% better than Titan X
implement a face recognition algorithm on Xilinx PYNQ Z2
FPGA and compare various performance parameters. FPGA
performance. FPGA also provided a moderate speedup of
was able to perform face recognition in real time with average 2.1x and aggressive speedup of 3.5x over Titan X. Stratix 10
precision of 91.6% and overall accuracy of 92%. delivers much better improvements in performance/Watt
over Titan X from 2.3x to 4.3x. In [4], the authors describe
Keywords — Machine learning, Face detection and implementation of parallel convolution binarized neural
recognition, FPGA implementation network (BNN) algorithm on PYNQ Z1 board. The main
aim is to reduce parameter size so that the parameters can be
I. INTRODUCTION accommodated within the 4.9MB RAM of PYNQ board.
Nowadays machine and deep learning algorithms are Using this BNN network the parameter size was reduced to
preferred for finding solutions to real life problems as these 2.3MB, which is very less compared to the parameter size of
algorithms provide enhanced system intelligence. But these a similar CNN implemented on the GPU. This model
algorithms have increased computational complexity provided an accuracy of 86% for CIFAR-10 database. One of
together with large power consumption and execution time. the important aspects that this paper describes is the
Recently GPUs are employed for implementing machine implementation of BNN on the PYNQ board using Vivado
learning algorithms, but CPUs or GPUs are not designed for tool, to synthesis and create the executable bit file which
this purpose. As an alternative, FPGAs have shown promise comprises the network architecture to configure the FPGA
in power consumption and performance, and so suitable for board. The model was uploaded onto the FPGA with the
realizing machine learning algorithms. FPGA offers saved weights of the network.
hardware specific design which guarantees increased In this paper, a face recognition system using Local
performance, lower power consumption, and decreased cost Binary Pattern Histogram (LBPH) is implemented on a CPU
compared to a CPU or GPU implementation. In addition, and FPGA. The algorithm utilized for face recognition is
they provide the ability to configure hardware, specific to elaborated and performance of the algorithms is compared in
machine learning algorithms, including parallel execution. terms of accuracy and execution time. The comparison
proves the superiority of FPGA over CPU for machine
II. EXISTING IMPLEMENTATIONS
learning algorithms implementation.
In [1], the authors implement a multilayer perceptron
(MLP) learning network for MNIST database on a Cyclone The paper is organized as follows: Section III describes
IVE FPGA. The FPGA design with 8-bit precision provides the proposed implementation on a Xilinx PYNQ Z2 FPGA
accuracy and execution time similar to a 32-bit software board. Section IV elaborates on the face recognition
solution but with a reduced clock frequency of 144 times. As algorithm considered for implementing on a FPGA. Section
power consumption is proportional to frequency, FPGA V discusses the experimental results, and the paper is
implementation provides power savings without concluded with future work elaborated in Section VI.
compromising on accuracy or performance. Accuracy
reduces from 89% to 78% and area decreases by 41% when

978-1-6654-2849-1/21/$31.00 ©2021

Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
III. FPGA IMPLEMENTATION detected by a Haar feature with a set of two adjacent
Figure 1 represents the generalized block diagram of the rectangles that lie above the eye and the cheek region. The
proposed implementation. It is divided into two sections – position of these rectangles is defined relative to the location
Software and Hardware. Software section comprises of data of a detection window that acts like a bounding box to the
acquisition and training of the model. Data acquisition target face as shown in Figure 2.
involves capturing images and videos from a visual light
camera. This also requires cleaning of images and choosing
suitable images for the dataset. Once dataset is prepared, a
suitable machine learning algorithm is selected for training
and testing on the host computer. The hardware consists of a
FPGA board with core ARM processor and the features
obtained from trained model are ported onto the FPGA. The
proposed model is again tested on FPGA. The predictions
made during the testing phase on the FPGA are considered
for evaluating the accuracy of the machine learning model. If
the performance is not satisfactory, the model is again
trained by incorporating suitable changes. This procedure is
repeated till the desired accuracy is achieved, which is the
final model to be ported on to the FPGA.

Fig. 2: Haar Features for a Face

Integral image is an image obtained by cumulatively


adding consecutive pixel intensities in horizontal and vertical
directions. Integral image enables fast calculation of Haar
like features by referring to corner points of the rectangle and
the coordinates of the point separating white and black
Fig. 1: Generalized Block Diagram regions. Six location references are sufficient to calculate the
area of two adjacent rectangles.
The facial features are detected in the images of the
dataset using Haar cascade algorithm and the training is B. AdaBoost Algorithm
implemented using LBPH features on the host computer. In AdaBoost algorithm [6] the output of different weak
This results in a .yml weights file with the histogram which learning algorithms is combined through a weighted sum to
is stored in the FPGA external storage. The HDMI and represent the output of a boosted classifier. As individual
webcam drivers of FPGA is initiated on the Jupyter learners are weak, and as each learning algorithm is slightly
Notebook through a python code. The input test image is better than random guessing, the final model converges to a
captured, and the facial features are detected by Haar cascade strong learner. Optimal or best features are selected by this
and the image is recognized using LPBH, which provides the algorithm for training the weak classifiers by constructing a
class id and confidence (histogram distance) as output. The strong classifier as a linear combination of weighted simple
class id specifies the closest match, and the confidence is weak classifiers.
used to calculate the accuracy.
For face detection, a target window is moved over the
IV. FACE DETECTION AND RECOGNITION input image and Haar features are calculated. The difference
is then compared to a threshold that classifies faces from
A. Haar Cascade Algorithm non-faces. As a single Haar feature is a weak classifier for
Haar cascade is used for face detection which identifies face detection, more Haar features are considered to describe
various objects in an image or video based on the features a face with reasonable accuracy and are therefore organized
proposed in [5]. In this algorithm a cascade function is as cascade classifiers.
trained using many positive and negative images. It is then
used to detect faces, which is the object of interest in other C. Cascading Classifiers
images. The algorithm has four stages: (a) Haar feature Cascading is a multistage ensemble learning algorithm
selection (b) Creating integral images (c) Adaboost training based on concatenation of several weak classifiers by
(d) Cascading classifiers utilizing the information from the output of a given classifier
as additional information for the subsequent classifier in the
A Haar feature is calculated by summing up the pixel cascade. Each classifier is designed for higher accuracy using
intensities in adjacent rectangular regions at a specific a boosting algorithm by considering a weighted average of
location of a detection window and calculating the difference the decisions of the weak learners. Each stage of the
between these sums. This difference is then used to classifier labels the region defined by the current location of
categorize subsections of an image. For example, in a human the sliding window as either positive or negative. Positive
face, the eyes are darker than the cheeks, which can be indicates that a face is detected and negative indicates no

Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
face is detected. If the label is negative, the classification of system. LBPH considers facial data and returns the
this region is complete, and the detector slides the window to appropriate ids of the faces. The trained recognition
the next location. If the label is positive, the classifier passes information with the ids is loaded on to the FPGA board.
the region to the next stage. The face is detected at the
current window location when the final stage classifies the In the final phase or recognition phase, a test face is
region as positive. captured through the camera and if this person had his face in
the dataset, the recognizer recognizes by returning its id and
D. Local Binary Pattern Histogram (LBPH) an index indicating the confidence of the recognizer with this
Local Binary Pattern (LBP) [7] is a simple and efficient match or else it returns an unknown face label output. The
texture operator which labels image pixel intensities by face is detected using Haar cascade frontal face algorithm.
thresholding the neighborhood of each pixel and the result is LBPH is used for creating the HFV for recognition. The
a binary number. Local Binary Pattern Histogram (LBPH) algorithm finds the match for the gray scale image of
combines LBP with Histograms of Oriented Gradients captured portion of the face and returns the label or id of its
(HOG) descriptor for improving face detection performance probable owner and the confidence of the recognizer on this
considerably on many datasets. match.

LBPH uses four parameters: (i) Radius – to describe a A. Recognition Algorithm Workflow
circular LBP (ii) Neighbors – number of sample points in a The workflow methodology of the face recognition
circular LBP. This is generally set to 8. (iii) Grid X – number system is as follows:
of cells in horizontal direction. Large Grid X implies finer
grid and higher the dimensionality of the resulting feature • Face Detection – Pre-processing images and
vector. This is also generally set to 8. (iv) Grid Y – number creating a training dataset with Haar cascades
of cells in vertical direction. This is also generally set to 8. • Feature Extraction – Haar Features, Integral
E. Detection and Recognition Algorithm Images, Cascade Classifier and Training model
To train the algorithm, a dataset with facial images is • Face Recognition – Testing in real-time with
considered. An id or label is set for each image for the HFV based face recognizer
algorithm to use this information to recognize an input image
or test image. The first computational step of LBPH is to B. Results and Discussion
create an intermediate image by segmenting only the face. The camera is interfaced with the FPGA board using
The algorithm uses a sliding window, based on radius and Jupyter Notebook or the host computer using Pycharm IDE.
neighbors. For each neighbor of the central pixel with a The required data set is collected and stored with the
threshold as the central pixel value, a binary value is set to 1 respective class ids by capturing the data through the
for pixel values equal to or higher than the threshold and 0 interfaced camera, detecting the face and saving the image in
for pixel values lower than the threshold. These binary the corresponding folder. Figure 3 shows an example folder
values are concatenated as a vector and converted to a of the dataset stored with class id = 1. By default, class id = 0
decimal value. The Grid X and Grid Y parameters are used is considered as ‘unknown’ and faces to be recognized are
to divide the image into multiple grids. For a grid of size 8 x saved as class id = 1, 2, 3... Approximately 1000 images per
8 and a histogram with 256 gray scale values a histogram class are collected.
feature vector (HFV) of length 8 x 8 x 256 = 16,384 is
obtained.
Each image in the training dataset is represented in terms
of an HFV. For the given input image, the algorithm steps
are performed and a HFV is created which represents the
input image. The two HFVs are compared, and the input
image is recognized with the label of the closest HFV in
terms of Euclidean distance. The calculated Euclidean
distance is also used for measuring the confidence of face
detection. The threshold and confidence can be used for
estimating the performance of the algorithm automatically.
V. EXPERIMENTAL RESULTS Fig. 3: Sample Dataset of class id = 1

For analyzing the performance of the proposed face


detection and recognition algorithm implemented on Xilinx
PYNQ Z2 FPGA board, a dataset is created, where the
persons to be recognized are given a unique id and a set of
their facial images are captured and stored in a database with
their respective ids. The Haar cascade frontal face detection
algorithm is employed, and data is collected through the
camera input. Only the facial part of the image is detected
and stored under the specified id. This is the initial phase or
the data gathering phase of the algorithm.
In the second phase or training the recognizer phase, the Fig. 4: PYNQ Board Setup
created dataset is used to train the recognizer in the host

Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
C. Experiment Results
a) Confusion Matrix: The confusion matrix for the
face recognitioin model implemented on FPGA considering
36 test images of 3 classes is as shown in Table 1.

TABLE I. CONFUSION MATRIX

Predicted Class
1 2 3
True Class

1 11 0 2
2 0 13 1
3 0 0 9

Fig. 5: Hardware Setup


b) Performance measures: Precision indicates the
fraction of positive class predictions which were actually
positive. Recall indicates the fraction of positive samples
correctly predicted as positive samples by the classifier. F1-
score is the harmonic mean of precision and recall to
provide a single measure. The precision, recall and F1-
score values for 36 input images from 3 classes are tabulated
in Table II.

TABLE II. PERFORMANCE MEASURES

CLASS ID PRECISION RECALL F1-SCORE


1 1.00 0.85 0.92
Fig. 6: LCD Monitor with class label and confidence measure
2 1.00 0.93 0.96
3 0.75 1.00 0.86

The mean average precision of the model is 91.6% and


the overall accuracy obtained is 92%. Macro-average and
weighted average accuracy calculate metrics for each class
individually and then considers un-weighted mean and
weighted mean of the measures, respectively. The weights
are the number of samples in each class. These values are
compared in Table III for each class.

TABLE III. AVERAGE MEASURES


CLASS ID MACRO-AVERAGE WEIGHTED AVERAGE
1 0.92 0.94
2 0.92 0.92
3 0.91 0.92
Fig. 7: Multi-class face recognition

The training dataset is created by considering images


c) Execution Time Comparison: The face recognition
from the dataset folder, detecting faces by LBPH face
recognizer, training the classes with their respective ids or model is implemented on the PYNQ board and execution
labels and storing the weights of the model. The trained time comparison is performed for recognizing the test
weights are dumped on to the FPGA by connecting the images. The model is executed on both host computer
hardware to the host computer. FPGA is programmed (software) and on PYNQ board (hardware). The host
through Jupyter Notebook and an open-source tool Balena computer has an Intel core i5 with a speed of 1.80GHz and
Etcher is used to flash the PYNQ OS (Pynq_z2_v2.5) on to 8GB RAM. The PYNQ board has a 650MHz, ARM Cortex
the SD card. A9 dual core processor with 512MB RAM. The results are
presented in Table IV. It is evident that FPGA
Figures 4 and 5 show the setup of PYNQ board with implementation is 1000x faster compared to the software
USB power supply, ethernet port connection, camera
implementation on a CPU. However, the speed also depends
(Microsoft NX-6000) and HDMI interface that is connected
on the type of operation that is performed.
to the LCD monitor. The camera is provided with a CMOS
image sensor of resolution 2.0MP, field of view of 71° and TABLE IV. EXECUTION TIME COMPARISON
effective pixels of 1600 x 1190. The output of the Haar
cascade algorithm is a gray scale image of size 200 x 200. Implementation Platform Execution Time
Figure 6 shows the LCD monitor with recognition result, for FPGA (Hardware) 1.58ms
CPU (Software) 1.58s
a real time image captured through the camera interfaced
with the FPGA board. Figure 7 shows the result obtained
when more than one face is present in test image.

Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION In future, other machine learning algorithms similar to
This paper aims at harnessing and utilizing the power of face recognition can be considered for implementing on a
FPGAs in the field of artificial intelligence and machine FPGA. The speed of face recognition model can be enhanced
learning. The FPGAs have a lot of advantages over CPUs by employing hardware accelerators. Power consumption
and GPUs. They offer increased performance, lower power comparison can be made between CPU and FPGA using an
consumption, low latency and low implementation cost external power meter. Finally, images can be processed in
compared to CPUs and GPUs. In addition, they also provide parallel by capturing data from multiple cameras.
the ability to configure hardware, specific to machine REFERENCES
learning algorithms, including parallel execution and
[1] Jiong Si, Sarah L. Harris, “Handwritten Digit Recognition System on
customized data types. an FPGA,” IEEE 8th Annual Computing and Communication
Some of these advantages are evident from the face Workshop and Conference (CCWC), pp. 402 – 407, January 2018.
recognition algorithm that is implemented in this paper. The [2] Ahmad Shawahna, Sadiq M. Sait, Aiman El-Maleh, “FPGA-based
Accelerators of Deep Learning Networks for Learning and
implementation of LBPH features for image classification Classification: A Review,” in IEEE Access, vol. 7, pp. 7823 – 7859,
uses binary data type (a custom data type). This reduces the 2019.
use of resources in FPGA and adds to the already impressive [3] Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr,
speed of FPGA. The results prove that face recognition Randy Huang, Jason Gee Hock Ong, Yeong Tat Liew, Krishnan
implementation on FPGA is 1000 times faster compared to Srivatsan, Duncan Moss, Suchit Subhaschandra, Guy Boudoukh,
CPU. Implementing a face recognition model on FPGA not “Can FPGAs Beat GPUs in Accelerating Next-Generation Deep
Neural Networks?” FPGA 17: The 2017 ACM/SIGDA International
only helps in quick detection of images but also helps in Symposium on Field-Programmable Gate Arrays, February 2017.
parallel processing of images obtained from multiple camera [4] Li Yang, Zhezhi He, Deliang Fan “A Fully On-chip Binarized
sources. This comes in handy when considering a traffic Convolutional Neural Network FPGA Implementation with Accurate
surveillance system where images are fed from different Inference,” ISLPED 18: Proceedings of the International Symposium
cameras installed in each lane. on Low Power Electronics and Design, July 2018.
[5] Paul Viola, Michael Jones, “Robust Real-Time Object Detection,”
Even though FPGAs have a lot of advantages as International Journal of Computer Vision, vol. 57, no. 2, pp. 137 –
mentioned above, they lack the support that they need in 154, 2004.
implementing machine learning models. Due to their [6] Yoav Freund, Robert E. Schapire, “A Decision Theoretic
complex nature, these models require huge amount of Generalization of On-line Learning and an Application to
computational resources and programming FPGAs for these Boosting,” AT&T Bell Laboratories, September 1995.
complex algorithms from scratch is a challenging task. To [7] T. Ojala, M. Pietikäinen, D. Harwood, “Performance Evaluation of
scale this algorithm for large datasets, a binary model may be Texture Measures with Classification based on Kullback
Discrimination of Distributions,” Proceedings of the 12th IAPR
loaded in the RAM thereby increasing the speed with a slight International Conference on Pattern Recognition (ICPR 1994), vol. 1,
degradation in the accuracy. pp. 582 – 585, October 1994.

Authorized licensed use limited to: National University of Singapore. Downloaded on August 26,2022 at 04:58:01 UTC from IEEE Xplore. Restrictions apply.

You might also like