Machine Learning Research Paper 1
Machine Learning Research Paper 1
Abstract—Due to the positive relationship between the presence on using a single image of a student for training a system
of students in classes and their performance, student attendance that is meant to detect, segment, and verify student identities
assessment is considered essential within the classroom environ- in an uncontrolled environment (class pictures). For this, we
ment, even as a tiring and time-consuming task. We proposed a
solution for student attendance control using face recognition with applied different concepts of computer vision (face detection,
deep one-shot learning and evaluated our approach in different alignment, and verification) and one-shot learning through the
conditions and image capturing devices to confirm that such a use of a pre-trained deep neural network trained in over 200
pipeline may work in a real-world setting. For better results million images (FaceNet) [4]. We evaluated the system in
regarding the high number of false negatives that often occur three different camera settings in order to check the robustness
in uncontrolled environments, we also proposed a face detection
stage using HOG and a CNN with Max-Margin Object Detection against image resolution.
based features. We achieved accuracy and F1 scores of 97% and The rest of the paper is divided as follows: Section II
98.4% with an iPhone 7 camera, 91.9% and 94.8% with a Moto provides a quick technical background to the subjects here
G camera, and 51.2% and 61.1% with a WebCam respectively. discussed; Section III discusses recent related works regard-
These experiments reinforce the effectiveness and availability of
this approach to the student attendance assessment problem since ing face recognition and attendance automation; Section IV
the recognition pipeline can be either made available for embedded presents the methodology and the system overview; Section V
processing with limited computational resources (smartphones), or and VI respectively show the results of the experiments, final
offered as “Software as a Service” tool. considerations, and future work.
Keywords—Face Recognition, Deep Learning Applications, One
shot Learning, Image Processing, Attendance System
II. T ECHNICAL BACKGROUND
I. I NTRODUCTION
A. Deep Learning
There is an intrinsic positive relationship between class
attendance and the performance of students in the academic Deep Learning (DL) is a branch of machine learning (ML)
environment [1]. For the learning to occur more naturally, it that is capable of learning the data representation through the
is necessary to encourage presence and participation in classes use of a structure of hierarchical layers, similar to the way the
in a progressive way, so that the student can relate to topics brain handles new data. DL can be described as a concept that
discussed in previous courses. can be applied to some sub-fields of ML since it represents a
When considering the importance of performing student way of how to approach problems [5].
attendance assessment, the current traditional ways still take up The state-of-the-art in image classification has been dom-
a great deal of class time and can be easily fooled. Regarding inated by DL algorithms since the launch of the ImageNet
the real presence of students during a class, it is known that challenge [6]. Due to the increase in computational power
this specific situation deserves attention since the assessment provided by cheaper GPUs, researchers and practitioners have
can even be used as an alibi in some legal cases. applied DL models to a range of different tasks that are related
In recent years several facial recognition algorithms have to image classification. This high interest of researchers in DL
been developed to perform recognition regardless of environ- has paid off since this class of algorithms has become the state-
ment, angle, and facial expression. Considering its application of-the-art in significant branches of computer vision such as
for student attendance assessment, it becomes a promising object detection, semantic segmentation, and face recognition
approach, since face recognition has several benefits compared [7].
to other biometric methods that are intrusive and require human Regarding face recognition, the application of deep learning
interaction with different devices [2] [3]. models has helped the automation process within biometrics,
In this paper, we present the experiment for a system that where the accuracy of machines has overcome the one of
automates class attendance assessment through facial recogni- humans. The importance of such claims can be noticed even in
tion using machine learning algorithms. Our method is based real-time situations where computational resources are limited
978-1-7281-7539-3/20/$31.00 ©2020 IEEE (e.g., unlocking of a mobile device with face ID) [8].
Authorized licensed use limited to: Auckland University of Technology. Downloaded on July 27,2020 at 04:08:46 UTC from IEEE Xplore. Restrictions apply.
138 Proceedings of the IWSSIP 2020
Authorized licensed use limited to: Auckland University of Technology. Downloaded on July 27,2020 at 04:08:46 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the IWSSIP 2020 139
Authorized licensed use limited to: Auckland University of Technology. Downloaded on July 27,2020 at 04:08:46 UTC from IEEE Xplore. Restrictions apply.
140 Proceedings of the IWSSIP 2020
D. Face Recognition
1) Face Feature Extraction: The FaceNet architecture [4]
proposed a new approach for face recognition and face
clustering with its deep convolutional architecture based on
GoogLeNet [28], a 22 layers deep network. It takes as input an
image of a segmented aligned face, and it outputs a 128 dimen-
sion embedding that better compacts the features presented in
the face. The crucial point of using such a model is that it was
trained to minimize the Euclidean distance of embeddings of
the same person at the same time it maximizes the same dis-
tance between embeddings of different people through the use
of a triplet loss following the structure presented in Equation
1.
2
kxai − xpi k2 +α− < kf (xai )−f (xni )k22 , ∀(xai , xpi , xni ) ∈ τ (1) Fig. 4. FaceNet example of embedding distances.
The loss to be minimized is then described in Equation 2:
the FaceNet architecture. This approach of comparing distances
N h i
X 2
with only one example per class works similarly as a K-Nearest
Loss = kf (xai ) − f (xpi )k2 − kf (xai ) − f (xni )k22 + α Neighbor classifier, where the K is set to 1.
i
(2) One trick that can be applied to the context of face verifica-
Where: α is a margin/threshold that is enforced between tion using this architecture is that, if we have a situation where
positive and negative pairs; xai , known as anchor embedding, more than one student has a Euclidean distance lower than the
is the reference face; xpi is the positive embedding with the threshold for a specific identity, we assume that the student’s
same identity of the anchor; xni is the negative sample; and τ real identity is when the distance is minimal (closer to zero).
is the list of all possible triplets within the dataset. A visual E. Experiments
description of the learning process is described in Figure 3.
1) Training setup: For obtaining one training image for each
student, the image capturing scheme was set with the placement
of a camera at a distance of 1.5m from the student, which
follows the well-known standard for 3x4 pictures. The height
of the camera was adjusted ideally for taking a centered picture
of the face.
As the first experiments showed the robustness of our pre-
Fig. 3. Triplet Loss anchor example [4]. trained model, and there was only one image per student
for training, we decided not to proceed with any fine-tuning
We used a pre-trained model with this architecture provided technique for our final architecture. However, in a real-world
by the Openface community [29], which was trained in over setting, such a step may be needed, and an online training
500k images from combining two large labeled face recognition step may be applied along with each successful attendance
datasets (CASIA-WebFace [30] and FaceScrub [31]). Since recognition case in order to update the model.
for our experiments we were able to provide only one image For updating the threshold and finding its best value for
of each student for training, the solution using this type of discriminating the faces in testing time, we first calculated the
architecture was particularly useful to our problem. This charac- Euclidean distance of every student’s face against all the other
terizes the approach as one-shot learning since the system needs student’s faces within the class in order to determine which
to learn the best threshold that separates all classes (students) would be the minimum threshold that would imply that two
using only one example per class in the verification step. An students are different people.
example of the difference in Euclidean distances for two pairs 2) Testing setup: For the testing setup, two experiments were
of different students is shown in Figure 4. designed for validating our approach: evaluation of different
2) Face Verification: In order to verify if two faces are from distances for face detection and assessment of the recognition
the same person, we use a threshold based on the Euclidean pipeline using image capturing devices with different resolu-
distance of the face embeddings. If the distance is below the tions.
threshold, the embeddings are from the same person. Otherwise, In order to grant a fair comparison of results among devices,
they belong to different people. This threshold was set initially seven student classrooms were analyzed for checking the face
based on experimentation on the Labeled Faces in the Wild detection rates with each image capturing unit. Table I shows
dataset [32] along with the evaluation of other works that use the number of students in each classroom.
Authorized licensed use limited to: Auckland University of Technology. Downloaded on July 27,2020 at 04:08:46 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the IWSSIP 2020 141
TABLE I the application of the indicated hybrid setup, since it solved the
N UMBER OF STUDENTS IN EACH CLASSROOM occurrence of false negatives and resulted in no failing cases.
Class 1 2 3 4 5 6 7
N# B. Face Recognition Results
16 18 12 17 10 33 19
of students
The results regarding accuracy and the F1 score metrics can
be seen in Table II and III. The described results were obtained
using a threshold of 0.6 for face verification, which was the best
The testing image of the class was taken according to the
discriminative value. The results are represented as absolute
setup in Figure 5. For the first experiment, students sat in
values since there was no variation in the recognition step when
different positions inside the classroom based on the following
presenting the same testing images to the pipeline.
distances far from the camera: 2, 4, 6, 7, 8, and 10 meters.
The testing image was taken with the camera in a height
TABLE II
position where there was no occlusions, or only partial ones ACCURACY FOR EVERY TESTED CLASSROOM
in order to evaluate the robustness of the system for detecting
and recognizing the faces in an uncontrolled environment. Class Iphone7 (12 MP) Moto G (8 MP) Webcam (1.2 MP)
1 100% 100% 62.5%
2 94.4% 88.9% 44.4%
3 100% 100% 25%
4 100% 82.4% 76.5%
5 90% 80% 50%
6 100% 97% 57.6%
7 94.7% 94.7% 42.7%
Average 97.1% 91.9 % 51.2%
TABLE III
F1 SCORE FOR EVERY TESTED CLASSROOM
Authorized licensed use limited to: Auckland University of Technology. Downloaded on July 27,2020 at 04:08:46 UTC from IEEE Xplore. Restrictions apply.
142 Proceedings of the IWSSIP 2020
VI. C ONCLUSION [10] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and
alignment using multitask cascaded convolutional networks,” IEEE Signal
As could be seen by the provided results, the student at- Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
tendance assessment task may be solved by the use of face [11] V. Vezhnevets, “Face and facial feature tracking for natural human-
recognition even in the presence of limited resources. Although computer interface,” 2002.
[12] S. Zafeiriou, C. Zhang, and Z. Zhang, “A survey on face detection
the performance of the pipeline was highly attached to the qual- in the wild: past, present and future,” Computer Vision and Image
ity of the image capturing device, as the technology advances, Understanding, vol. 138, pp. 1–24, 2015.
the prices of such devices tend to be even more affordable, [13] C. Zhang and Z. Zhang, “A survey of recent advances in face detection,”
2010.
which contributes to the implementation of automation systems [14] D. E. King, “Max-margin object detection,” arXiv preprint
such as the one idealized in this work for student attendance arXiv:1502.00046, 2015.
assessment. [15] N. Crosswhite, J. Byrne, C. Stauffer, O. Parkhi, Q. Cao, and A. Zisserman,
“Template adaptation for face verification and identification,” Image and
The difference in results obtained from the iPhone 7 and Vision Computing, vol. 79, pp. 35–48, 2018.
Moto G devices was consistent with the difference in their [16] M. Wang and W. Deng, “Deep face recognition: A survey,” arXiv preprint
prices, which need to be taken into consideration when design- arXiv:1804.06655, 2018.
[17] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extraction
ing such a system. One point to highlight is the importance of and classification of hyperspectral images based on convolutional neu-
the relation between image resolution and the face detection ral networks,” IEEE Transactions on Geoscience and Remote Sensing,
step since, if the device is not able to provide enough details vol. 54, no. 10, pp. 6232–6251, 2016.
[18] M. G. Krishnan and S. B. Balaji, “Implementation of automated atten-
in the image to detect the faces, the pipeline will not be able dance system using face recognition,” International Journal of Scientific
to achieve good results when checking for more robust metrics & Engineering Research, vol. 6, no. 3, 2015.
such as the F1 score. [19] P. Wagh, R. Thakare, J. Chaudhari, and S. Patil, “Attendance system
based on face recognition using eigen face and pca algorithms,” in 2015
Regarding system improvements, different setups for the International Conference on Green Computing and Internet of Things
verification setting may be implemented. For example, we may (ICGCIoT). IEEE, 2015, pp. 303–308.
use data augmentation techniques to increase the number of [20] B. Surekha, K. J. Nazare, S. V. Raju, and N. Dey, “Attendance recording
system using partial face recognition algorithm,” in Intelligent techniques
images used for training within each student database. Also, in signal processing for multimedia security. Springer, 2017, pp. 293–
we could explore other machine learning methods that can take 319.
advantage of a larger base for training in order to increase the [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer Vision
reliability and robustness of the system. and Pattern Recognition, 2016, pp. 770–778.
As future work, we could extend the attendance assessment [22] P. Hu and D. Ramanan, “Finding tiny faces,” in Proceedings of the IEEE
system by adding metrics related to student performance in Conference on Computer Vision and Pattern Recognition, 2017, pp. 951–
959.
order to enable early detection of dropout in undergraduate
[23] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the
studies. In addition, such a system may allow educators to gap to human-level performance in face verification,” in Proceedings of
analyze other problems correlated with student attendance and the IEEE Conference on Computer Vision and Pattern Recognition, 2014,
take measures to improve the present educational environment. pp. 1701–1708.
[24] P. R. Sarkar, D. Mishra, and G. R. S. Subhramanyam, “Automatic atten-
dance system using deep learning framework,” in Machine intelligence
R EFERENCES and signal analysis. Springer, 2019, pp. 335–346.
[1] S. Devadoss and J. Foltz, “Evaluation of factors influencing student [25] R. K. Chauhan, V. Pandey, and M. Lokanath, “Smart attendance system
class attendance and performance,” American Journal of Agricultural using cnn,” International Journal of Pure and Applied Mathematics, vol.
Economics, vol. 78, no. 3, pp. 499–507, 1996. 119, no. 15, pp. 675–680, 2018.
[2] B. N. Gatsheni, R. B. Kuriakose, and F. Aghdasi, “Automating a student [26] M. Arsenovic, S. Sladojevic, A. Anderla, and D. Stefanovic, “Face-
class attendance register using radio frequency identification in south time—deep learning based face recognition attendance system,” in 2017
africa,” in 2007 IEEE International Conference on Mechatronics. IEEE, IEEE 15th International Symposium on Intelligent Systems and Informat-
2007, pp. 1–5. ics (SISY). IEEE, 2017, pp. 53–58.
[3] R. Kuriakose and H. Vermaak, “Developing a java based rfid application [27] D. E. King, “Dlib-ml: A machine learning toolkit,” Journal of Machine
to automate student attendance monitoring,” in 2015 Pattern Recognition Learning Research, vol. 10, pp. 1755–1758, 2009.
Association of South Africa and Robotics and Mechatronics International [28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
Conference. IEEE, 2015, pp. 48–53. V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in
[4] F. S. e D. Kalenichenko e J. Philbin, “Facenet: A unified embedding for Proceedings of the IEEE Conference on Computer Vision and Pattern
face recognition e clustering,” in 2015 IEEE Conference on Computer Recognition, 2015, pp. 1–9.
Vision e Pattern Recognition (CVPR), June 2015, pp. 815–823. [29] B. Amos, B. Ludwiczuk, and M. Satyanarayanan, “Openface: A general-
[5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, purpose face recognition library with mobile applications,” CMU-CS-16-
no. 7553, p. 436, 2015. 118, CMU School of Computer Science, Tech. Rep., 2016.
[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: [30] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from
A large-scale hierarchical image database,” in 2009 IEEE Conference on scratch,” arXiv preprint arXiv:1411.7923, 2014.
Computer Vision and Pattern Recognition. IEEE, 2009, pp. 248–255. [31] H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face
[7] O. M. Parkhi, A. Vedaldi, A. Zisserman et al., “Deep face recognition.” datasets,” in 2014 IEEE International Conference on Image Processing
in BMVC, vol. 1, no. 3, 2015, p. 6. (ICIP). IEEE, 2014, pp. 343–347.
[8] L. Zhao and R. Tsai, “Locking and unlocking a mobile device using facial [32] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces
recognition,” 2015, uS Patent 8,994,499. in the wild: A database forstudying face recognition in unconstrained
[9] Y. Muttu and H. Virani, “Effective face detection, feature extraction & environments,” in Workshop on faces in’Real-Life’Images: detection,
neural network based approaches for facial expression recognition,” in alignment, and recognition, 2008.
2015 International Conference on Information Processing (ICIP). IEEE, [33] P. Li, L. Prieto, D. Mery, and P. Flynn, “Low resolution face recognition
2015, pp. 102–107. in the wild,” arXiv preprint arXiv:1805.11529, 2018.
Authorized licensed use limited to: Auckland University of Technology. Downloaded on July 27,2020 at 04:08:46 UTC from IEEE Xplore. Restrictions apply.