Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
Corresponding Author:
Hoang-Sy Nguyen
Institute of Engineering and Technology, Thu Dau Mot University
Binh Duong Province, Vietnam
Email: [email protected]
1. INTRODUCTION
Face recognition technology has been replacing the human’s role in recognizing faces. The face-
recognizing equipment receives face images or videos containing human faces as input, of which biometric
facial data is subsequently extracted and processed to conduct the recognizing task [1]. Because the facial
features of an individual are unique [2], they have been acknowledged as an effective means for security
purposes, e.g., alternating the use of passwords and identity cards, and allowing authorized access. Among
popular face recognition models which have been developed by universities and companies, there are
VGGFace [3], DeepFace [4], [5], OpenFace [6], and FaceNet [7]. In [8], a face recognition model based on
the histogram of oriented gradients (HOG) and support vector machine (SVM) classifier was investigated.
Besides, in [9], a method based on the AdaBoost algorithm was used to train cascade classifiers with feature
types such as the HOG and the Haar-like. Although a better performance was achieved, it is computationally
demanding as it includes a number of weak classifiers.
On the other hand, direct training operations on faces can be challenging owing to the face
occlusion, which is common in practice. To overcome this issue, Zhang et al. [10] have based on the
Bayesian framework to propose an algorithm that locates the head using the Omega-liked shape formed by
the head-shoulder part of a person. This technique has been applied widely in automatic teller machines
(ATM). Additionally, in [11], the face-recognizing task was carried with deformable part models (DPM)
yielding remarkable results, though it requires heavy computational resources. The DPM-based system was
deployed as well in [12], which offers a reduction in error rate and false-negative face detection. Nonetheless,
this technique is limited by the usage of front-view facial images, thus, is not universal.
Recent years have witnessed the rise of convolutional neural network (CNN) application in face
detection. Deep CNN (DCNN) [13], region based convolutional neural networks (R-CNN) [14], and another
one-or two-stage deep CNN-based systems such as VGGNet [15] and ResNet [16] have showcased their
outstanding performance in comparison with their conventional counterparts. However, as there are more
convolutional layers added to the CNN, the detecting speed is reduced considerably. To overcome this issue,
a number of multi-stage face-detecting algorithms have been investigated, for example, the funnel-structured
cascade (FuSt) [17], the pyramid-based cascade model that distills knowledge online and mines hard sample
offline [18], which deliver outstanding true positive rate and performance in real-time.
CNNs are driven with data as they are trained with the extracted features and face classification.
Additionally, CNNs which are trained with 2D facial data could further be tuned with 3D one for potentially
better recognition accuracy. Tornincasa [19] showcased how the pertinent discriminating features from the
query faces can be extracted by the use of differential geometry. Dagnes et al. [20] have investigated an
algorithm that can compute the optimized marker layout to capture the face movement. To deal with the
different facial expressions and illumination, radon and wavelet transforms were combined in [21] for the
nonlinear feature extraction. Notably, a so-called DeepID model, which is constructed of a large number of
CNNs, and its extension were proposed in [22]–[24] with a better feature extracting capability. This is
realized thanks to the fact that they can process a variety of face positions and facial patches.
In this paper, we designed a face recognition system based on the FaceNet model with SVM
classifier. Then, we compare the accuracy of our proposed method with two other face recognition methods
operating on three public datasets to increase the generalization of the study. Finally, the paper showcases
how to integrate the system into a web-based timekeeping application. The obtained results regarding the
system performance and its implementation are highly applicable for both the research purpose and the
practical usage.
considered positive if it has the same identity as the anchor and vice versa for the negative image. Thanks to
this mechanism Triplet loss has been considered one of the best effective ways for learning face image 128-D
encodings. Notably, an image is considered positive if it has the same identity as the anchor and vice versa
for the negative image. Thanks to this mechanism Triplet loss has been considered one of the best effective
ways for learning face image 128-D encodings.
(a)
(b)
(c)
Figure 1. The architectures of (a) P-Net, (b) R-Net, and (c) O-Net, where “MP” means max pooling and
“Conv” means convolution. The step size in convolution and pooling is 1 and 2, respectively
Remarkably, our system allows working with Euclidean image embeddings, and the network is
trained to propose the embedding spaces (squared L2 distances) directly according to the similar faces. As a
result, the distance of the same subject images is small and that of different subjects is big. After these
embedding spaces have been created, face verification can be performed easily by setting a threshold distance
value between two points in the space. Subsequently, the SVM algorithm is applied for the classifying
operation.
Implementation of FaceNet and support vector machine in a real-time web-based … (Ly Quang Vu)
392 ISSN: 2252-8938
The results from the FaceNet face recognition is subsequently compared with two other methods
namely the principal component analysis (PCA) and SVM classifier [25], and the k-nearest neighbor (K-NN)
[26], as can be seen in Table 3, the FaceNet with SourVM can deliver the minimum accuracy level of up to
97.5%, being the highest among others. It should be noted that this model performs well even though there
exist challenges being a variety of face poses, expressions, illumination, and the use of accessories.
Table 3. Accuracy comparison using FaceNet with SVM, PCA with SVM, and K-NN
Method Dataset LFW [%] ORL [%] Yale Face Database[%]
FaceNet with SVM 99.83 97.5 [2] 98.9 [2]
PCA with SVM 62.14 95.12 82.35
K-NN 30.24 85.36 52.94
Face recognition can effectively detect human presence in a particular area of interest (AOI) such as
office, and educational institution. Herein this paper, the authors succeeded in establishing a web-based
timekeeping application. Figure 5 illustrates how the system works. The system consists of a remote server
and a database that can be accessed with a web application for monitoring and administrating purposes. An
IP camera is set at the entrance to a company to streamline video frames in real-time to the Face recognition
API. If a face is detected, an image in that time frame is preprocessed and passed on to the deep CNN to
generate 128-byte embedding.
Subsequently, the staff’s identities can be determined with the SVM classifier and the data related to
the staff’s presence such as the identities, the accuracy percentage, the time, and the date of entry are
recorded in the database. Figure 6 illustrates what information a user can see on the web application as a staff
is recognized by the system. Specifically, there is a frame identifying the detected face with the recognized
name and the accuracy at its bottom. The right side shows a list of recognized staffs along with their ID
numbers, full names, ID cards, and the entry time. In case the system cannot recognize a person’s presence
due to the missing of data, for example, an entry of new staff or a visitor, the face image of the person will
display as shown in Figure 7.
The detected face is framed with red color and labeled with “Unknown”. It should be noted that the
time and date of the unrecognized entry are recorded to assist the administrator in preparing corresponding
solutions such as adding the information of the new employee, and re-training the model. All the information
about the entries of the people as in Figure 6 can be exported to *.xls file as shown in Figure 8.
Besides, the web application has an interface for adding new facial data. Users can open the IP
camera from the application to capture new face images in real-time. These images can then be saved in the
database and assigned with a unique user ID. Consequently, the face from the image is extracted and labeled
with what the administrator may find suitable, for example, the person’s name.
The system was tested with a group of 32 staffs showing the face recognition accuracy of 96%.
Nevertheless, the system is sensitive to the lighting changes and angle between the faces and the IP camera,
which considerably downgrade the system’s accuracy. Thus, in case the system fails to recognize staff, the
staff needs to inform the person in charge of timekeeping for a manual marking.
Implementation of FaceNet and support vector machine in a real-time web-based … (Ly Quang Vu)
394 ISSN: 2252-8938
4. CONCLUSION
To conclude, in our system, the MTCNN algorithm is deployed to detect the faces, generates the
embeddings using the pre-trained FaceNet with SVM classifier, then recognizes images that are taken
through the system. The system is able to deliver in practice the recognition accuracy of 96%, given that the
images are collected under consistent conditions in terms of lighting and face-camera angle. The comparison
study can serve as a foundation for the researchers seeking for optimized face-recognizing algorithms.
Additionally, the paper also presents an established web-based application with some key concepts that can
potentially be upgraded to a commercial timekeeping product. Application of such products into practice has
proven its abilities to save companies and organizations a considerable amount and time and efforts in
timekeeping tasks. As more and more powerful algorithms are introduced and implemented into face
recognition systems, it is promising that end users will get more benefits from them. For future studies, the
system can be more fine-tuned and more training data with noises can be collected to further improve the
capability of our proposal.
REFERENCES
[1] Y. Zhang, S. Wang, H. Xia, and J. Ge, “A novel SVPWM modulation scheme,” in 2009 Twenty-Fourth Annual IEEE Applied
Power Electronics Conference and Exposition, Feb. 2009, pp. 128–131, doi: 10.1109/APEC.2009.4802644.
[2] L. Li, X. Mu, S. Li, and H. Peng, “A review of face recognition technology,” IEEE Access, vol. 8, pp. 139110–139120, 2020, doi:
10.1109/ACCESS.2020.3011028.
[3] I. William, D. R. Ignatius Moses Setiadi, E. H. Rachmawanto, H. A. Santoso, and C. A. Sari, “Face recognition using facenet
(survey, performance test, and comparison),” in 2019 Fourth International Conference on Informatics and Computing (ICIC),
Oct. 2019, pp. 1–6, doi: 10.1109/ICIC47613.2019.8985786.
[4] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” in Procedings of the British Machine Vision Conference
2015, 2015, pp. 41.1-41.12, doi: 10.5244/C.29.41.
[5] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: closing the gap to human-level performance in face verification,” in
2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 1701–1708, doi: 10.1109/CVPR.2014.220.
[6] T. Baltrusaitis, P. Robinson, and L.-P. Morency, “OpenFace: An open source facial behavior analysis toolkit,” in 2016 IEEE
Winter Conference on Applications of Computer Vision (WACV), Mar. 2016, pp. 1–10, doi: 10.1109/WACV.2016.7477553.
[7] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 815–823, doi: 10.1109/CVPR.2015.7298682.
[8] M. Drożdż and T. Kryjak, “FPGA implementation of multi-scale face detection using HOG features and SVM classifier,” Image
Processing & Communications, vol. 21, no. 3, pp. 27–44, Sep. 2016, doi: 10.1515/ipc-2016-0014.
[9] C. Ma, N. Trung, H. Uchiyama, H. Nagahara, A. Shimada, and R. Taniguchi, “Adapting local features for face detection in
thermal image,” Sensors, vol. 17, no. 12, Art. no. 2741, Nov. 2017, doi: 10.3390/s17122741.
[10] T. Zhang, J. Li, W. Jia, J. Sun, and H. Yang, “Fast and robust occluded face detection in ATM surveillance,” Pattern Recognition
Letters, vol. 107, pp. 33–40, May 2018, doi: 10.1016/j.patrec.2017.09.011.
[11] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, “Face detection without bells and whistles,” in Computer Vision
{\textendash} {ECCV} 2014, Springer International Publishing, 2014, pp. 720–735.
[12] D. Marcetic and S. Ribaric, “Deformable part-based robust face detection under occlusion by using face decomposition into face
components,” in 2016 39th International Convention on Information and Communication Technology, Electronics and
Microelectronics (MIPRO), May 2016, pp. 1365–1370, doi: 10.1109/MIPRO.2016.7522352.
[13] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,”
IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, Oct. 2016, doi: 10.1109/LSP.2016.2603342.
[14] S. Wan, Z. Chen, T. Zhang, B. Zhang, and K. Wong, “Bootstrapping face detection with hard negative examples,” Aug. 2016.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Sep. 2014, Available:
http://arxiv.org/abs/1409.1556.
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[17] S. Wu, M. Kan, Z. He, S. Shan, and X. Chen, “Funnel-structured cascade for multi-view face detection with alignment-
awareness,” Neurocomputing, vol. 221, pp. 138–145, Jan. 2017, doi: 10.1016/j.neucom.2016.09.072.
[18] S. S. Farfade, M. J. Saberian, and L.-J. Li, “Multi-view face detection using deep convolutional neural networks,” in Proceedings
of the 5th ACM on International Conference on Multimedia Retrieval, Jun. 2015, pp. 643–650, doi: 10.1145/2671188.2749408.
[19] S. Tornincasa et al., “3D facial action units and expression recognition using a crisp logic,” Computer-Aided Design and
Applications, vol. 16, no. 2, pp. 256–268, Aug. 2018, doi: 10.14733/cadaps.2019.256-268.
[20] N. Dagnes et al., “Optimal marker set assessment for motion capture of 3D mimic facial movements,” Journal of Biomechanics,
vol. 93, pp. 86–93, Aug. 2019, doi: 10.1016/j.jbiomech.2019.06.012.
[21] H. D. Vankayalapati and K. Kyamakya, “Nonlinear feature extraction approaches with application to face recognition over large
databases,” in 2009 2nd International Workshop on Nonlinear Dynamics and Synchronization, Jul. 2009, pp. 44–48, doi:
10.1109/INDS.2009.5227967.
[22] Y. Sun, D. Liang, X. Wang, and X. Tang, “DeepID3: face recognition with very deep neural networks,” Feb. 2015, [Online].
Available: http://arxiv.org/abs/1502.00873.
[23] Y. Sun, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” Jun. 2014, [Online].
Available: http://arxiv.org/abs/1406.4773.
[24] Y. Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” in 2014 IEEE Conference on
Computer Vision and Pattern Recognition, Jun. 2014, pp. 1891–1898, doi: 10.1109/CVPR.2014.244.
[25] X. Chen, L. Song, and C. Qiu, “Face recognition by feature extraction and classification,” in 2018 12th IEEE International
Conference on Anti-counterfeiting, Security, and Identification (ASID), Nov. 2018, pp. 43–46, doi:
10.1109/ICASID.2018.8693198.
[26] H. Zhang and G. Chen, “The research of face recognition based on PCA and k-nearest neighbor,” in 2012 Symposium on
Photonics and Optoelectronics, May 2012, pp. 1–4, doi: 10.1109/SOPO.2012.6270975.
[27] B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in
unconstrained environments,” 2008.
[28] “ORL (our database of faces).” https://paperswithcode.com/dataset/orl.
[29] “Yale face database.” http://vision.ucsd.edu/content/yale-face-database.
Implementation of FaceNet and support vector machine in a real-time web-based … (Ly Quang Vu)
396 ISSN: 2252-8938
BIOGRAPHIES OF AUTHORS
Hoang-Sy Nguyen was born in Binh Duong province, Vietnam. He received the
B.S. and MS.c degree from the Department of Computer Science from Ho Chi Minh City
University of Information Technology (UIT-HCMC), Vietnam in 2007, 2013, respectively. He
received his Ph.D. degree in communication technology, dissertation thesis “Energy
harvesting enable relaying networks: Design and performance analysis” from the VSB-
Technical University of Ostrava-Czech Republic, in 2019. His research interests include
Energy efficient wireless communications, 5G wireless communication networks, Network
security, Artificial Intelligence, Cloud Networks, and Big Data. Email:
[email protected].