A16.Deep Learning Based Facial Recognization
Abstract :- Face recognition is the task of identifying an individual from an image of their face
and a database of know faces. Despite being a relatively easy task for most humans,
“unconstrained” face recognition by machines, specifically in settings such as malls, casinos and
transport terminals, remains an open and active area of research. It has multiple use cases in
surveillance, access control and even finding missing persons in a crowd.
However, in recent years, a large number of photos and videos have been crawled by
search engines, and uploaded to social networks, which include a variety of unconstrained
material, such as objects, faces and scenes. This large volume of data and the increase in
computational resources have enabled the use of more powerful statistical models for general
challenge of object classification in images and videos.
This research project evaluates the use of big data based machine learning approaches
such as deep convolutional neural networks for the problem of unconstrained facial recognition in
video data. It attempts to replicate and if feasible better performance of state of art leading
commercial systems trained on large proprietary datasets, using public datasets and open source
frameworks from research universities.
It is assumed that the reader has a fair understanding of neural networks and
convolutional neural networks from a both theoretical and practical standpoint.
I. INTRODUCTION1 research papers that influenced the choice of
the loss function for face recognition by the
This project attempts to reproduce the Facenet team , as well as the mathematical
performance of state of art proprietary face basis for use of deeper networks, have been
recognition systems within video systems by mentioned.[11,12].
using and tuning open source frameworks. The two references for CASIA WebFaces
For this purpose we will be utilizing today’s and YouTube Faces datasets[13,14] describe
state of art open datasets for face recognition how their respective datasets were created.
within video. Primarily we use Youtube The CASIA Webface team also gives a
Faces DB Dataset . This consists of 3425 benchmark on Youtube Faces DB, which this
videos of 1595 subjects. The videos are research will try to match or better. The
broken down to frames and the face CASIA team achieves a best performance of
recognition is done on the frames (after 90.60 %.
doing an initial face alignment, as explained Finally, the last two references [15,16] point
latter). The system is trained with Casia to open source implementations of the
WebFaces which is a public dataset OpenFace face recognition system from
consisting of 10,575 subjects and 494,414 Carnegie Mellon University
While most of the related work reviewed is NETWORK ARCHITECTURE
provided within the references section, the AND TRAINING PARAMETERS
key work referenced for this project was the
Facenet system based on Inception Network It was critical that both training and test sets
Architecture from Google[5,6] for facial were pre-processed to extract the face, using
recognition and the Resent Architecture from the same approach.
Microsoft Research.[8]. Also key architecture
referenced and used in this research is the The video were preprocessed for to face
combination of inception and Resnet detection and alignment using open source
architectures as described in [10], as it is implementation of the multi-task CNN
shown to provide dramatic improvement in algorithm [7]. This approach is known to be
performance. The rest of the referenced work invariant to poses, illuminations and
describes the research breakthroughs, such occlusions and gives better results than the
that led to widespread use of convolutional standard dlib library used for this purpose.
The below figure showing the three stage
neural network
multi-task CNN process has been extracted
architectures(LeNet[2],AlexNet[3], VGG
from the associated reference paper[7].
Net[4], GoogleNet[5])as the state of art in
technological approaches for the general The open source implementation was already
challenge of object and image recognition. pre-trained and hence there was no need to
For data pre-processing, primarily for face do any training before using it on both the
detection and alignment , to ensure pose training and test sets. The image dimensions
invariant face recognition, the work after extraction were 160x160.
referenced is Multi-Task CNN[7].
While not directly referenced, the core
One of the networks used was NN4 Additionally batch normalizations are used
as described in the Google Facenet during training.
paper[6]. The above figure depicting the NN4
layers has been extracted from this paper.
This was based on the Openface[16]
implementation of NN4 , which does not 3.2 Inception-Resnet
include the layers 4c and 4d.
For the small inception network (NN4), pre
trained models from Openface
implementation were used and there was no
additional training performed. The training was accordingly continued on
During training both Inception-Resent v1 better performing Inception-Resnet v1
and Inception-Resnet v2 networks were used. network , for two epochs on Amazon cloud
It was found that Inception-Resnet v1 gives VM instance that utilized NVIDIA Tesla v100
less training loss from the start. GPU.
The choice of learning rate and gradient
descent algorithm did not seem to impact the 4.1 Triplet Loss
training. Specially there was no different The 128 byte embedding is used to
between the choice of ADAM and RMSPROP calculate triplet loss . Description of triplet
during learning. extracted from Facenet paper [6] is provided
Below images show the loss , based on
choice of different training parameters and
Set and a much larger Test set. Additionally
while testing in the larger Test set cross
validation was used.
The Accuracy and ROC (receiver
operator characteristics) of the tests
are shown below
For testing of pretrained model of small
NN4 network, a small DevTest subset(10%)
of YoutubeDB test set was used first.
