A16.Deep Learning Based Facial Recognization
A16.Deep Learning Based Facial Recognization
A16.Deep Learning Based Facial Recognization
net/publication/325071878
CITATION READS
1 7,520
1 author:
Hrishikesh Kulkarni
1 PUBLICATION 1 CITATION
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Hrishikesh Kulkarni on 11 May 2018.
Abstract :- Face recognition is the task of identifying an individual from an image of their face
and a database of know faces. Despite being a relatively easy task for most humans,
“unconstrained” face recognition by machines, specifically in settings such as malls, casinos and
transport terminals, remains an open and active area of research. It has multiple use cases in
surveillance, access control and even finding missing persons in a crowd.
However, in recent years, a large number of photos and videos have been crawled by
search engines, and uploaded to social networks, which include a variety of unconstrained
material, such as objects, faces and scenes. This large volume of data and the increase in
computational resources have enabled the use of more powerful statistical models for general
challenge of object classification in images and videos.
This research project evaluates the use of big data based machine learning approaches
such as deep convolutional neural networks for the problem of unconstrained facial recognition in
video data. It attempts to replicate and if feasible better performance of state of art leading
commercial systems trained on large proprietary datasets, using public datasets and open source
frameworks from research universities.
It is assumed that the reader has a fair understanding of neural networks and
convolutional neural networks from a both theoretical and practical standpoint.
I. INTRODUCTION1 research papers that influenced the choice of
the loss function for face recognition by the
This project attempts to reproduce the Facenet team , as well as the mathematical
performance of state of art proprietary face basis for use of deeper networks, have been
recognition systems within video systems by mentioned.[11,12].
using and tuning open source frameworks. The two references for CASIA WebFaces
For this purpose we will be utilizing today’s and YouTube Faces datasets[13,14] describe
state of art open datasets for face recognition how their respective datasets were created.
within video. Primarily we use Youtube The CASIA Webface team also gives a
Faces DB Dataset . This consists of 3425 benchmark on Youtube Faces DB, which this
videos of 1595 subjects. The videos are research will try to match or better. The
broken down to frames and the face CASIA team achieves a best performance of
recognition is done on the frames (after 90.60 %.
doing an initial face alignment, as explained Finally, the last two references [15,16] point
latter). The system is trained with Casia to open source implementations of the
WebFaces which is a public dataset OpenFace face recognition system from
consisting of 10,575 subjects and 494,414 Carnegie Mellon University
images
2. RELATED WORK
3. DATA PREPROCESSING,
While most of the related work reviewed is NETWORK ARCHITECTURE
provided within the references section, the AND TRAINING PARAMETERS
key work referenced for this project was the
Facenet system based on Inception Network It was critical that both training and test sets
Architecture from Google[5,6] for facial were pre-processed to extract the face, using
recognition and the Resent Architecture from the same approach.
Microsoft Research.[8]. Also key architecture
referenced and used in this research is the The video were preprocessed for to face
combination of inception and Resnet detection and alignment using open source
architectures as described in [10], as it is implementation of the multi-task CNN
shown to provide dramatic improvement in algorithm [7]. This approach is known to be
performance. The rest of the referenced work invariant to poses, illuminations and
describes the research breakthroughs, such occlusions and gives better results than the
that led to widespread use of convolutional standard dlib library used for this purpose.
The below figure showing the three stage
neural network
multi-task CNN process has been extracted
architectures(LeNet[2],AlexNet[3], VGG
from the associated reference paper[7].
Net[4], GoogleNet[5])as the state of art in
technological approaches for the general The open source implementation was already
challenge of object and image recognition. pre-trained and hence there was no need to
For data pre-processing, primarily for face do any training before using it on both the
detection and alignment , to ensure pose training and test sets. The image dimensions
invariant face recognition, the work after extraction were 160x160.
referenced is Multi-Task CNN[7].
While not directly referenced, the core
1
).
3
One of the networks used was NN4 Additionally batch normalizations are used
as described in the Google Facenet during training.
paper[6]. The above figure depicting the NN4
layers has been extracted from this paper.
This was based on the Openface[16]
implementation of NN4 , which does not 3.2 Inception-Resnet
include the layers 4c and 4d.
Architecture.
4. TRAINING
For the small inception network (NN4), pre
trained models from Openface
implementation were used and there was no
additional training performed. The training was accordingly continued on
During training both Inception-Resent v1 better performing Inception-Resnet v1
and Inception-Resnet v2 networks were used. network , for two epochs on Amazon cloud
It was found that Inception-Resnet v1 gives VM instance that utilized NVIDIA Tesla v100
less training loss from the start. GPU.
The choice of learning rate and gradient
descent algorithm did not seem to impact the 4.1 Triplet Loss
training. Specially there was no different The 128 byte embedding is used to
between the choice of ADAM and RMSPROP calculate triplet loss . Description of triplet
during learning. extracted from Facenet paper [6] is provided
below.
Below images show the loss , based on
choice of different training parameters and
hyperparameters.:-
Set and a much larger Test set. Additionally
while testing in the larger Test set cross
validation was used.
6. RESULTS
The Accuracy and ROC (receiver
operator characteristics) of the tests
are shown below
5. TESTING
For testing of pretrained model of small
NN4 network, a small DevTest subset(10%)
of YoutubeDB test set was used first.
REFERENCES
[1] F. Rosenblatt (1958) : The Perceptron:
A Probabilistic Model
For Information Storage and
Organization in the Brain.
https://pdfs.semanticscholar.org/865f/b
2cfe6fdb7af2c663ef346ea05889f237108.
pdf
[2] Yann Le Cun, Yoshua Bengio,
Patrick Haffner: Gradient based
learning applied to document
recognition(1998)
It is clear form above ROC curves, that the http://yann.lecun.com/exdb/publis/pdf/l
smaller NN network performs much more ecun-98.pdf
poorly compared with the deeper Inception- [3] AlexKrizhevsky,Ilya
Sutskever,Geoffrey E. Hinton:
Resnet networks.
Imagenet Classification with deep
Furthermore for the Inception-Resnet
convolutional neural networks
networks the accuracy rate is high with just https://papers.nips.cc/paper/4824-
two epochs of training. imagenet-classification-with-deep-
convolutional-neural-networks.pdf
[4] Christian Szeged, Wei
7. CONCLUSION Liu,Yangqing Jia:
The research and tests done so far clearly Very Deep Convolutional Neural
indicate that deeper Inception-Resnet Networks for Large Scale Image
networks perform better than smaller Recognition (VGGNet)
networks. Further work on training network https://arxiv.org/pdf/1409.4842.pdf
[5]Christian Szegedy, Wei Liu,
for more epochs and tuning parameters is
Yangqing Jia:
needed to determine the optimal settings to
Going Deeper with Convolutions
obtain best results. (Inception Network-GoogLenet)\
https://arxiv.org/pdf/1409.4842.pdf
ACKNOWLEDGMENT [6] Florian Schroff, Dmitry
We would like to thank the Data Science Kalenichenko , James Philbin
department at Ryerson University for the Google Inc.:
guidance and suggestions provided during Facenet Unified Embedding for
this shorth term research project, which was Recognition and Clustering.pdf
done as part of a Capstone project for Big
https://www.cv- http://www.cs.tau.ac.il/~wolf/ytfaces/W
foundation.org/openaccess/content_cvpr olfHassnerMaoz_CVPR11.pdf
_2015/papers/Schroff_FaceNet_A_Unifi [15] David SandBerg:
ed_2015_CVPR_paper.pdf https://github.com/davidsandberg/fac
[7] Kaipeng Zhang, Zhanpeng Zhang, enet
Zhifeng Li,: Openface implementation using Python
Joint Face Detection and Alignment using and Tensorflow and using Inception-
Multi-task Cascaded Convolutional Resnet v1 and v2 network architectures
Networks [16] Brandon Amos, Bartosz
https://kpzhang93.github.io/MTCNN_fa Ludwiczuk,† Mahadev
ce_detection_alignment/paper/spl.pdf Satyanarayanan:
[8] Kaiming He Xiangyu Zhang Openface: A general-purpose face
Shaoqing Ren Jian Sun Microsoft recognition library with mobile
Research: applications
Deep Residual Learning for Image http://reports-
Recognition archive.adm.cs.cmu.edu/anon/2016/CM
https://arxiv.org/pdf/1512.03385.pdf U-CS-16-118.pdf
[9] Professor Andrew NG et al: [17] Victor Sy Wang:
https://www.coursera.org/specialization Openface Implementation using
s/deep-learning Python,Openkeras and tensorflow and
[10] Christian Szegedy Google Inc, using small NN4 Inception network
Sergey Ioffe,Vincent Vanhoucke: : https://github.com/iwantooxxoox/Keras-
Inception-v4, Inception-ResNet and OpenFace.
the Impact of Residual Connections on
Learning
https://arxiv.org/pdf/1602.07261.pdf
https://pdfs.semanticscholar.org/73ac/0
09051bba99eaea799172b28d69168b6aa02.
pdf
[11] Kilian Q. Weinberger, Lawrence
K. Saul:
Distance Metric Learning for Large
Margin Nearest Neighbour
Classification
http://jmlr.csail.mit.edu/papers/volume
10/weinberger09a/weinberger09a.pdf
[12] Sanjeev Arora, Aditya Bhaskara,
Rong Ge, Tengyu Ma:
Provable bounds for learning deep
representations
https://arxiv.org/pdf/1310.6343.pdf
[13] Dong Yi, Zhen Lei, Shengcai Liao
and Stan Z. Li:
Learning Face representation from scratch
https://arxiv.org/pdf/1411.7923.pdf
[14] Lior Wolf1 Tal Hassner2 Itay
Maoz1:
Faced recognition in unconstrained video
with matched background similarity