A16.Deep Learning Based Facial Recognization

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/325071878
Deep Learning for Facial Recognition
Technical Report · May 2018
CITATION READS
1 7,520
1 author:
Hrishikesh Kulkarni
1 PUBLICATION 1 CITATION
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Deep Learning for Facial Recognition View project
All content following this page was uploaded by Hrishikesh Kulkarni on 11 May 2018.
The user has requested enhancement of the downloaded file.

1
Unconstrained Facial Recognition using

Supervised Deep Learning on Video
Hrishikesh Kulkarni Dr.Ghassem Tofighi

G.Raymond Chang School of Instructor and Researcher
Continuing Education Data Science Lab
Ryerson University Ryerson University
[email protected] [email protected]
Abstract :- Face recognition is the task of identifying an individual from an image of their face
and a database of know faces. Despite being a relatively easy task for most humans,
“unconstrained” face recognition by machines, specifically in settings such as malls, casinos and
transport terminals, remains an open and active area of research. It has multiple use cases in
surveillance, access control and even finding missing persons in a crowd.
However, in recent years, a large number of photos and videos have been crawled by
search engines, and uploaded to social networks, which include a variety of unconstrained
material, such as objects, faces and scenes. This large volume of data and the increase in
computational resources have enabled the use of more powerful statistical models for general
challenge of object classification in images and videos.
This research project evaluates the use of big data based machine learning approaches
such as deep convolutional neural networks for the problem of unconstrained facial recognition in
video data. It attempts to replicate and if feasible better performance of state of art leading
commercial systems trained on large proprietary datasets, using public datasets and open source
frameworks from research universities.
It is assumed that the reader has a fair understanding of neural networks and
convolutional neural networks from a both theoretical and practical standpoint.
I. INTRODUCTION1 research papers that influenced the choice of
the loss function for face recognition by the
This project attempts to reproduce the Facenet team , as well as the mathematical
performance of state of art proprietary face basis for use of deeper networks, have been
recognition systems within video systems by mentioned.[11,12].
using and tuning open source frameworks. The two references for CASIA WebFaces
For this purpose we will be utilizing today’s and YouTube Faces datasets[13,14] describe
state of art open datasets for face recognition how their respective datasets were created.
within video. Primarily we use Youtube The CASIA Webface team also gives a
Faces DB Dataset . This consists of 3425 benchmark on Youtube Faces DB, which this
videos of 1595 subjects. The videos are research will try to match or better. The
broken down to frames and the face CASIA team achieves a best performance of
recognition is done on the frames (after 90.60 %.
doing an initial face alignment, as explained Finally, the last two references [15,16] point
latter). The system is trained with Casia to open source implementations of the
WebFaces which is a public dataset OpenFace face recognition system from
consisting of 10,575 subjects and 494,414 Carnegie Mellon University
images
2. RELATED WORK
3. DATA PREPROCESSING,
While most of the related work reviewed is NETWORK ARCHITECTURE
provided within the references section, the AND TRAINING PARAMETERS
key work referenced for this project was the
Facenet system based on Inception Network It was critical that both training and test sets
Architecture from Google[5,6] for facial were pre-processed to extract the face, using
recognition and the Resent Architecture from the same approach.
Microsoft Research.[8]. Also key architecture
referenced and used in this research is the The video were preprocessed for to face
combination of inception and Resnet detection and alignment using open source
architectures as described in [10], as it is implementation of the multi-task CNN
shown to provide dramatic improvement in algorithm [7]. This approach is known to be
performance. The rest of the referenced work invariant to poses, illuminations and
describes the research breakthroughs, such occlusions and gives better results than the
that led to widespread use of convolutional standard dlib library used for this purpose.
The below figure showing the three stage
neural network
multi-task CNN process has been extracted
architectures(LeNet[2],AlexNet[3], VGG
from the associated reference paper[7].
Net[4], GoogleNet[5])as the state of art in
technological approaches for the general The open source implementation was already
challenge of object and image recognition. pre-trained and hence there was no need to
For data pre-processing, primarily for face do any training before using it on both the
detection and alignment , to ensure pose training and test sets. The image dimensions
invariant face recognition, the work after extraction were 160x160.
referenced is Multi-Task CNN[7].
While not directly referenced, the core
1
).
3
The model produced a 128 byte embedding of

each person. The value of the various
parameters, specially the triplet loss
margin(0.2),number of epochs(1000),
number of batches per epoch(250), number
of images per person(20), number of persons
per batch(15), batch size(800), image
size(96x96) were chosen as per default value
within the OpenFace implementation. This
implementation, which was primarily for
measuring accuracy over public image
The network architectures chosen were dataset LFW(Labelled Faced in the Wild),
variants of Google Inception and Microsoft
was repurposed in this project for training
Research Resnet Architecture. and testing over Youtube Faces DB.
Two types of architecture are explored. One
is small Inception Network and the other is
Inception Resnet Network (v1 and v2). Gradient descent with momentum is used to
train the network. Further, triplet loss, as
They are explained below:- described in Google Facenet paper [6] is used
to select the right triplet of positive and
negative images related to an identity for
3.1 Small Inception Network training. Triplet loss is described more in
(Google Inception NN4) Section 4.
One of the networks used was NN4 Additionally batch normalizations are used
as described in the Google Facenet during training.
paper[6]. The above figure depicting the NN4
layers has been extracted from this paper.
This was based on the Openface[16]
implementation of NN4 , which does not 3.2 Inception-Resnet
include the layers 4c and 4d.
Architecture.
The other network used was a deeper

network, namely the Inception-Restnet
network v1 and v2. This combines ideas from
both Google Inception network and
Microsoft Resnet network.
The architectures are given below:-
The above figure is extracted from the

reference work[10] which further details the
individual blocks within Inception-Resent v1
and Inception-Resnet v2 architectures.
5
i.)Inception-resnet v1, learning

rate:- 0.01, optimizer RMSPROP
Below are the default values of parameters

and hyperparameters during training.
Maximum Number of epochs: 100 ii.)Inception-resnet V2, learning

Number of Images in a batch: - 90
Number of persons per batch: 45 rate : 0.01, optimizer RMSPROP
Number of images per person: 40
Number of batches per epoch: 1000
Triplet margin :- 0.2
Learning Algorithm:- RMSPROP
Learning Rate:0.01
Learning Rate decay factor: 1.0
Number of epochs between learning rate iii.)Inception-resnet V2, learning
decay: 100 rate:- 0.01, optimizer ADAM
parameter weight decay: 1e-4
Final size of image embedding:- 128 bytes
4. TRAINING
For the small inception network (NN4), pre
trained models from Openface
implementation were used and there was no
additional training performed. The training was accordingly continued on
During training both Inception-Resent v1 better performing Inception-Resnet v1
and Inception-Resnet v2 networks were used. network , for two epochs on Amazon cloud
It was found that Inception-Resnet v1 gives VM instance that utilized NVIDIA Tesla v100
less training loss from the start. GPU.
The choice of learning rate and gradient
descent algorithm did not seem to impact the 4.1 Triplet Loss
training. Specially there was no different The 128 byte embedding is used to
between the choice of ADAM and RMSPROP calculate triplet loss . Description of triplet
during learning. extracted from Facenet paper [6] is provided
below.
Below images show the loss , based on
choice of different training parameters and
hyperparameters.:-
Set and a much larger Test set. Additionally
while testing in the larger Test set cross
validation was used.
6. RESULTS
The Accuracy and ROC (receiver
operator characteristics) of the tests
are shown below
6.1 Result on DevTest set using

small NN4 network
Accuracy obtained was 54.3 +- 6.2 %
6.2 Result on DevTest set using

trained Inception-ResNet v1
network
Accuracy obtained was 84.3 % +- 6.8%
5. TESTING
For testing of pretrained model of small
NN4 network, a small DevTest subset(10%)
of YoutubeDB test set was used first.
For testing of models, trained on Inception-

Resnet v1 network, the test set of Youtube
Faces DB was divided into a smaller Test-Dev
7
Data and Predictive Analytics certification

course. We also acknowledge the
contributions made by research as listed in
references and on which our own research is
built. Lastly, we would also like to thank
Professor Andrew Ng and his team from
6.3 Result on Youtube DB Test sert Coursera who have put together one of the
using Inception-Resnet v1 network best courses on deep learning that starts
Accuracy obtained was 73 % +- 4.7% from the basic mathematical foundations and
builds upto the latest research in this field.
REFERENCES
[1] F. Rosenblatt (1958) : The Perceptron:
A Probabilistic Model
For Information Storage and
Organization in the Brain.
https://pdfs.semanticscholar.org/865f/b
2cfe6fdb7af2c663ef346ea05889f237108.
pdf
[2] Yann Le Cun, Yoshua Bengio,
Patrick Haffner: Gradient based
learning applied to document
recognition(1998)
It is clear form above ROC curves, that the http://yann.lecun.com/exdb/publis/pdf/l
smaller NN network performs much more ecun-98.pdf
poorly compared with the deeper Inception- [3] AlexKrizhevsky,Ilya
Sutskever,Geoffrey E. Hinton:
Resnet networks.
Imagenet Classification with deep
Furthermore for the Inception-Resnet
convolutional neural networks
networks the accuracy rate is high with just https://papers.nips.cc/paper/4824-
two epochs of training. imagenet-classification-with-deep-
convolutional-neural-networks.pdf
[4] Christian Szeged, Wei
7. CONCLUSION Liu,Yangqing Jia:
The research and tests done so far clearly Very Deep Convolutional Neural
indicate that deeper Inception-Resnet Networks for Large Scale Image
networks perform better than smaller Recognition (VGGNet)
networks. Further work on training network https://arxiv.org/pdf/1409.4842.pdf
[5]Christian Szegedy, Wei Liu,
for more epochs and tuning parameters is
Yangqing Jia:
needed to determine the optimal settings to
Going Deeper with Convolutions
obtain best results. (Inception Network-GoogLenet)\
https://arxiv.org/pdf/1409.4842.pdf
ACKNOWLEDGMENT [6] Florian Schroff, Dmitry
We would like to thank the Data Science Kalenichenko , James Philbin
department at Ryerson University for the Google Inc.:
guidance and suggestions provided during Facenet Unified Embedding for
this shorth term research project, which was Recognition and Clustering.pdf
done as part of a Capstone project for Big
https://www.cv- http://www.cs.tau.ac.il/~wolf/ytfaces/W
foundation.org/openaccess/content_cvpr olfHassnerMaoz_CVPR11.pdf
_2015/papers/Schroff_FaceNet_A_Unifi [15] David SandBerg:
ed_2015_CVPR_paper.pdf https://github.com/davidsandberg/fac
[7] Kaipeng Zhang, Zhanpeng Zhang, enet
Zhifeng Li,: Openface implementation using Python
Joint Face Detection and Alignment using and Tensorflow and using Inception-
Multi-task Cascaded Convolutional Resnet v1 and v2 network architectures
Networks [16] Brandon Amos, Bartosz
https://kpzhang93.github.io/MTCNN_fa Ludwiczuk,† Mahadev
ce_detection_alignment/paper/spl.pdf Satyanarayanan:
[8] Kaiming He Xiangyu Zhang Openface: A general-purpose face
Shaoqing Ren Jian Sun Microsoft recognition library with mobile
Research: applications
Deep Residual Learning for Image http://reports-
Recognition archive.adm.cs.cmu.edu/anon/2016/CM
https://arxiv.org/pdf/1512.03385.pdf U-CS-16-118.pdf
[9] Professor Andrew NG et al: [17] Victor Sy Wang:
https://www.coursera.org/specialization Openface Implementation using
s/deep-learning Python,Openkeras and tensorflow and
[10] Christian Szegedy Google Inc, using small NN4 Inception network
Sergey Ioffe,Vincent Vanhoucke: : https://github.com/iwantooxxoox/Keras-
Inception-v4, Inception-ResNet and OpenFace.
the Impact of Residual Connections on
Learning
https://pdfs.semanticscholar.org/73ac/0
09051bba99eaea799172b28d69168b6aa02.
pdf
[11] Kilian Q. Weinberger, Lawrence
K. Saul:
Distance Metric Learning for Large
Margin Nearest Neighbour
Classification
http://jmlr.csail.mit.edu/papers/volume
10/weinberger09a/weinberger09a.pdf
[12] Sanjeev Arora, Aditya Bhaskara,
Rong Ge, Tengyu Ma:
Provable bounds for learning deep
representations
[13] Dong Yi, Zhen Lei, Shengcai Liao
and Stan Z. Li:
Learning Face representation from scratch
[14] Lior Wolf1 Tal Hassner2 Itay
Maoz1:
Faced recognition in unconstrained video
with matched background similarity
View publication stats

A16.Deep Learning Based Facial Recognization

Uploaded by

Copyright:

Available Formats

A16.Deep Learning Based Facial Recognization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A16.Deep Learning Based Facial Recognization

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Deep Learning for Facial Recognition

Technical Report · May 2018

Deep Learning for Facial Recognition View project

The user has requested enhancement of the downloaded file.

Unconstrained Facial Recognition using

Hrishikesh Kulkarni Dr.Ghassem Tofighi

The model produced a 128 byte embedding of

The other network used was a deeper

The architectures are given below:-

The above figure is extracted from the

i.)Inception-resnet v1, learning

Below are the default values of parameters

Maximum Number of epochs: 100 ii.)Inception-resnet V2, learning

6.1 Result on DevTest set using

Accuracy obtained was 54.3 +- 6.2 %

6.2 Result on DevTest set using

For testing of models, trained on Inception-

Data and Predictive Analytics certification

View publication stats

You might also like