0% found this document useful (0 votes)

38 views7 pages

Chanda 2019

one shot learning

Uploaded by

arif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views7 pages

Chanda 2019

one shot learning

Uploaded by

arif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)

Face Recognition - A One-Shot Learning

Perspective
Sukalpa Chanda∗ ˆ, Asish Chakrapani GV† ˆ, Anders Brun‡ , Anders Hast‡ ,
Umapada Pal† and David Doermann§
∗ Department of Information Technology, Østfold University College, Norway
[email protected]/[email protected]
† Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, India
[email protected], [email protected]
‡ Centre for Image Analysis, Uppsala University, Sweden
{anders.brun, anders.hast}@it.uu.se
§ Computer Science and Engineering, University at Buffalo, USA
[email protected]

Abstract—Ability to learn from a single instance is something Deep learning methods learn multiple levels of representations
unique to the human species and One-shot learning algorithms and abstractions by using a cascade of processing units for
try to mimic this special capability. On the other hand, despite feature extraction and transformation. This leads to forming a
the fantastic performance of Deep Learning-based methods
on various image classification problems, performance often hierarchy of abstraction/representation, and addresses changes
depends having on a huge number of annotated training samples in face pose, illumination, and expression. Even though deep-
per class. This fact is certainly a hindrance in deploying deep learning-based methods can tackle changes in lighting, pose,
neural network-based systems in many real-life applications like and expression while performing face recognition, one disad-
face recognition. Furthermore, an addition of a new class to the vantage is its demand for a huge amount of annotated data
system will require the need to re-train the whole system from
scratch. Nevertheless, the prowess of deep learned features could to train the system and the requirement of re-training when
also not be ignored. This research aims to combine the best a new class is added. While transfer learning techniques can
of deep learned features with a traditional One-Shot learning help mitigate such problems by freezing the first few layers
framework. Results obtained on 2 publicly available datasets and tuning pre-trained weights from the last few layers on the
are very encouraging achieving over 90% accuracy on 5-way new data, it does not completely eradicate the problem.
One-Shot tasks, and 84% on 50-way One-Shot problems.
One-shot algorithms, on the other hand, use a completely
different philosophy for classification. One-shot algorithms
Keywords-One-Shot Learning, Face recognition, Siamese Net-
are meant to perform classification seeing only a handful of
works, Image Classification.
the training samples. Thus a clever amalgamation of those
I. I NTRODUCTION two techniques could combine the best of both providing
a rich feature representation using deep learning techniques
Face recognition has been extensively explored over the last
and feeding those features to a One-Shot learning framework
several decades. Its value as a non-contact biometric authen-
for classification. A widely spread strategy to implement
tication and in a wide variety of other digital applications
One-Shot learning algorithms is to use a Siamese Neural
like security, digital entertainment system, video analytics for
Network with a triplet loss function. Our work takes a Siamese
marketing, video indexing from a streaming video cannot
Neural Network-based approach to perform One-Shot learning
be ignored. Like any other image analysis problem, face
and consequent classification. Deep Neural Network-based
recognition in its early days relied mainly on hand-crafted
features from the “DLIB-ml machine learning toolkit” [1] are
features like SIFT, SURF, Local Binary Pattern, Histogram of
used for feature representation for all face images.
Gradient, Fisher vectors, but with the advent of deep-learning
The primary contribution of this research is that a novel
methodologies, there is a clear shift towards deep-learned
hybrid method combining a Siamese Neural Network with
features. During those early days, research was focused on
Res-Net encoded features for One-Shot face recognition task
improving the pre-processing stage, the introduction of local
is being proposed. We also intend to publish our dataset with
descriptors and feature transformation, but such techniques
unconstrained face images procured from “Indian Movie Faces
failed to counter the challenges of unconstrained face recogni-
Database” in the near-future for One-Shot recognition task
tion. Hand-crafted feature-based methods were used to address
performance evaluation and benchmarking.
changes in lighting, pose, and expression but failed in real life
due to their inability to address more general pose challenges. II. R ELATED W ORK
This has changed as the deep-learning methods have evolved.
Face detection and recognition methods have had significant
ˆ Equal contribution by the authors importance as an image analysis research problem for almost

978-1-7281-5686-6/19/$31.00 ©2019 IEEE 113

DOI 10.1109/SITIS.2019.00029

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.
3 decades. One of the seminal articles in the early nineties solve the One-shot task, the authors generated images in
is [2], where the authors represent faces using a small set various poses using a 3D face model to train the deep model.
2-D Eigenvectors. Face recognition methods can be broadly Zhao et al. [17] proposed an enforced softmax that contains
divided into handcrafted features-based approaches and optimal dropout, selective attenuation, L2 normalization and
later deep-learning technologies deep-learned features-based model-level optimization which boosted the standard softmax
approaches. The hand-crafted approaches focused mainly on function to produce a better representation for low-shot
high-dimensional artificial feature extraction and the reduction learning.
of features. The representative dimension reduction methods
are the subspace learning methods like Principal Component The concept of Siamese Networks was initially introduced
Analysis [3], Linear Discriminant Analysis [4] and manifold by Bromley et al. [18] for the signature verification problem
learning methods like like Locality preserving projection and further, the use of deep convolutional Siamese networks
[5]. With the advance of deep-learning, the representative for one-shot tasks with a significant accuracy has been show-
method was to learn the discriminative face representations cased in [19]. Face recognition usually consists of face detec-
directly from the original image space. For example, Hu et tion, feature extraction, and recognition. We use the dlib-ml [1]
al. [6] introduced us to the convolutional neural network toolkit which leverages image-driven neural networks to detect
applied to face recognition. It analyses the advantages and and extract the faces in a given image and then use a resnet
disadvantages of this method and shows the developmental based architecture to generate a feature vector to represent
roadmap in the future. This work is further explored and each face. In this paper, we propose a method which integrates
state-of-the-art results are obtained in [7], [8], [9], [10]. Albeit the concept of Deep convolutional Siamese networks and a
CNNs exceptional performance for some applications, such transfer learning strategy to produce a robust face recognition
algorithms struggle to deal with many real-world applications system which leverages the deep learned feature attributes.
that require learning or drawing inferences from small
amounts of data, class imbalance and adjusting to a constant III. M ETHODOLOGY
inflow of new class information. The problem of developing One-shot learning can be achieved in several ways. In
an efficient, robust face recognition system at scale is also this research we have explored two approaches: (a) Siamese
not an exception in this context. Neural Network based approach; (b) a Deep-feature encoding
approach followed by the nearest neighbor classification of
In the past few years, there have been several works that those encoded features. We settled on a method by combin-
address this problem. To address the data imbalance problem ing the two approaches. This improvised combined method
Guo et al. [11] proposed a novel underrepresented classes uses the encoded features generated out of a ResNet CNN
promotion loss term which aligned the norms of weight architecture as an input to the Siamese network, and the
vectors of underrepresented classes and normal classes thus Siamese network is being trained to discriminate between two
giving the one-shot classes an equal weight-age. Work by encoded feature vectors. In this combined approach a pre-
Wang et al. [12] proposes a framework based on CNN, trained Deep convolutional neural network (ResNet) acts as
which deals with the deficient training data by using a a feature extractor for a pair of an input image and then an
balancing regularizer and shifting the center regeneration energy function Θ is used which ties the twin networks to
to regulate norms of weight vector into the same scale compute the similarity index. When the two encoded feature
and adjusts clustering center. Insufficient training data and vectors for the input face images are obtained, the Siamese
data imbalance, however, causes the network to perform Network learns to score the similarity of those two encoded
poorly. Ding et al. [13] proposed an approach to solve feature vectors in a range of 0-1. Where 1 is assigned if both
the underrepresented class problem in one-shot learning, the input images are of the same class.
by focusing on building generative models to build extra
examples. It proposed a generative model to synthesize data A. Siamese Network
for one-shot classes by adapting the data variances and Siamese networks are a subset of deep neural network
augmenting features from other normal classes. Another work architectures that contain two identical sub-networks working
by Jhadav et al. [14] proposed the method of deep attribute in cohesion that use the same weights while taking two distinct
representation of faces for one-shot face recognition. They input vectors and are joined by a comparative function. Such
used specific attributes of human faces such as the shape networks are used to determine the similarity between two
of the face, hair, gender to fine-tune a deep CNN for face distinct inputs. It is important that not only the architectures
recognition. Their experimental results on standard datasets of the sub-networks are identical, but the weights are shared
showed that deep attribute representations performed better among them as well for the network to be called ’Siamese’.
in case of two one-shot face recognition techniques such as In this current study, the convolutional Siamese network is
an exemplar SVM and one-shot similarity kernel. Wu et al. designed to learn features of the input images regardless of
[15] proposed a framework with hybrid classifiers using a prior domain knowledge with very few samples from a given
CNN and the nearest neighbor (NN) model. The work by distribution. This model was also adopted because the twin
Hong et al. [16] proposes a domain adaptation network to networks share weights resulting in fewer parameters to train

114

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.
Fig. 1: Sibling of the Twin Siamese Network Architecture used in the experiment(twin network not depicted).

on and a lower tendency of over-ﬁtting. For experiments, a 6

small labeled support set consisting of train-validation classes g = sqrt( ) (2)
(f anin + f anout )
and test classes were used. During training, the network takes
a pair of images as the input where it learns to discriminate Here, f anin is the number of input units in the weight
between two input images based on their class labels and tensor and f anout is the number of output units in the weight
features. The task is achieved by generating probability scores tensor [20]. The biases were initialized using the default setting
which aid in perceiving whether they belong to the same class of zeros in all the layers.
or different classes. For evaluation of n way one-shot tasks, 3) Loss function: The model error for the Siamese network
the network is provided with pairs of images consisting of a during training is computed using a regularized cross entropy
reference image and one sample image from each of the n loss function. The cross-entropy function equation is as fol-
unseen classes at each instance. The label from the pair with lows
the highest probability is then given to the reference image. A
pictorial diagram of our Siamese network is shown in Fig. 1. L(xi1 , xi2 ) = y(xi1 , xi2 )logP (xi1 , xi2 )
+(1 − y(xi1 , xi2 ))log(1 − P (xi1 , xi2 )) (3)
1) Learning Details: A constant learning rate ηj is opted N 2
+λ |W |
for all the layers whilst following a step-based decay method
decaying at a uniform rate of 1% at every 500 iterations. Here i denotes the ith index of the current batch , y(xi1 , xi2 ) is
The Validation accuracy metric is calculated after every 1000 a vector of length M consisting of labels. It is assumed that it
iterations and the model with the best accuracy is saved equals 1 in case of same class and 0 in case of different class
during training. The model is trained for a maximum for for iteration N.
100,000 iterations. An early stopping condition was included B. ResNet
in case the validation accuracy does not show improvement
over 10,000 iterations. The momentum for each layer evolves The ResNet architecture was developed to address some
with a predefined linear slope until it attains a final value of issues observed in its predecessor, the VGG-Net. One thing
0.9 and it is initialized with a value of 0.5 at the beginning. lacking in VGG-Net was it tends to lose generalization ca-
The model is trained with a batch size of 8, along with a pability with an increase in the network depth. The other
linearly evolving layer-wise momentum μj for the jth layer, problem that ResNet deals with is countering the “vanishing
and L2 regularization penalization, weights for each iteration gradient” issue which is often a problem with deeper networks.
N. So the weight update rule for iteration N is: This is because gradients from the outer most layer easily
shrink to zero after several applications of the chain rule,
N
Wkj (xi1 , xi2 ) = Wkj
N N
+ ΔWkj (xi1 , xi2 ) + 2λj |Wkj | hence no weight updates are performed in the network. ResNet
N N −1 (1) introduced the “skip connection” concept and by virtue of that
Wkj (xi1 , xi2 ) = −ηj ∇Wkj
N
+ μj ΔWkj
gradients can flow directly backward from deeper layers to
where ΔWkj is the partial derivative with respect to the weight initial filters skipping intermediate layers. The Resnet used
between the j th neuron in a given layer and the k th neuron here is a pruned version of ResNet-34 [21].
in the next layer. In a pre-processing step, a CNN generates the bounding box
2) Weights: The weight initialization in all the layers in information of a face along with a set of 68 face Landmark
the network is done using the Glorot uniform initializer. The points [1] from an input image. The ResNet is fed with the
initializer draws samples from the uniform distribution of bounding box information of the face and those set of 68
[−g, g]where g is given by the equation activations points inside the face region. In order to save time

115

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.
and increase dissimilarity between the anchor image and the
negative image. Here ’a’ denotes an anchor image, ’p’ denotes
a positive image and ’n’ denotes a negative image. Another
hyperparameter variable called margin is being added to the
loss equation, that deﬁnes how far away the dissimilarities
should be. For example, if the margin = 0.4 and d(a,p) = 0.3
then d(a,n) should at least be equal to 0.7.

C. A Combined Hybrid Approach

The proposed combined approach is depicted in Fig. 3.
The Siamese network is taking as input the deep-learned
encoded features those were generated by the pruned Res-
Net CNN and learns its own set of weights intending to lower
its cross-entropy loss function.To optimize the weights for our
datasets, the weights of the initial convolutional layers were
kept constant and the update weights are carried on the ﬁnal
few layers of the network with our training samples. Note
that the Res-Net CNN is has its own set of weights and its
corresponding loss function as well.

IV. E XPERIMENTAL P ROTOCOL

We used an N-way one-shot task performed on ’N’ “support
classes” in a disjoint set each time for evaluating the perfor-
mance in the evaluation set. For our experiments, we use 4
Fig. 2: Pruned ResNet Architecture used in the experiment. values of N pertaining to the set of 5,10,20,50. The efficacy
of such algorithms is measured based on its performance on
N-way tasks. During testing for a query sample image, a
support class set S is provided consisting of ’n’ examples each
and computational resources, we have used pre-trained weights
from ’N’ different unseen classes. The algorithm then has to
from the initial layers of this network. Those weights were
determine which of the support set classes the query sample
obtained while this network was trained from scratch on a
belongs to. Two draws producing n samples each are taken,
dataset of about 3 million faces. At that time the training
and each one of the samples produced in the first draw is taken
dataset was composed of 7845 individual face images procured
as test images and compared against all samples of the second
from multiple sources such as the ”face scrub dataset, the VGG
draw. This process was done twice for each evaluation set of
dataset and a large number of images scraped from the internet.
n classes. We therefore perform 2N different one-shot tasks.
This network in the 29th layer generates a 128-dimensional
We also observe the individual set accuracy and a mean global
encoded feature for an input face image, and later that 128-
accuracy for the model has been reported.
dimensional encoded feature is being used for classification.
This network learns the weights using a loss function called A. Dataset
“Triplet Loss”. The pruned network architecture is shown in
The experiments were conducted on two publicly available
Fig. 2.
large-scale datasets: “Labeled Faces in the Wild”(LFW) [22]
1) Loss function: In this current study, the ResNet archi-
and “Indian Movie Face database” (IMFDB) [23]. Another
tecture is uses a “Triplet Loss” function, governing by the
popular dataset “MS-Celeb Low-Shot dataset” has not been
following equation:
included in the experiment for two reasons. First, some of
the image samples of the dataset has been used to train the
L = max(D(a, p) − D(a, n) + margin, 0)) (4) Res-Net based face recognition system, hence it would be
unfair to use that database while evaluating the proposed
The objective behind training this pruned ResNet is to
system, Second, the dataset is unfortunately no longer publicly
generate optimal weights such that 128-dimensional feature
available. The reason for choosing “LFW” is that it is the
embedding of an anchor image and positive image should be
most common dataset used for performance benchmarking
similar and feature embedding of anchor image and negative
of a face recognition system, and this dataset is a curated
image should be much further apart. While using the “Triplet
dataset with proper alignment and proper annotation. The
Loss” function to train the network, the 128-dimensional
“IMFDB” consists of unconstrained type images with much
feature embedding from an anchor image is compared with the
greater variability in terms of pose, illumination, and color.
128-dimensional feature embedding of both a positive sample
This variability is the reason for using IMFDB dataset in
and a negative sample. The objective here is to decrease
our experiments. The two datasets are complementary to each
dissimilarity between the anchor image and positive image

116

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.
Fig. 3: Combined hybrid architecture used in the experiment.

other in that sense. The details of the respective datasets are to accommodate slight variations such as facial hair and
given below. obstructions such as headgear, eyewear we include a few
1) LFW: - This database consists of 13,000 images of faces samples of such images as well. For each class, we considered
collected from the web. Each face has been labeled with the 20 images in total. We have a total of 100 classes. To keep
name of the person pictured. In this dataset, 1680 people the train and test set completely disjoint and to exclude any
have two or more distinct photos in the data set. To maintain overlap in the classes we removed 6 classes which were
the consistency and to ensure robustness we have various common to IMFDB and the dataset used to train the ResNet.
images for different facial positions. Further to accommodate
slight variations such as facial hair and obstructions such as V. R ESULTS & D ISCUSSIONS
headgear, eyewear, we include a few samples of such images
as well. Finally, for each class, we end up taking 15 images While conducting experiments with three different ap-
in total and due to this constraint, we remove all the classes proaches, the input test and train set for each fold were same
which have 15 images or less. After this, we are left with a for all three experiments. This was done purposely to compare
total of 96 classes. We use a deep funneling method to align the efﬁcacy of three approaches fairly.
the faces [24]. For our experiments, the subset of the LFW database
consisting of 96-face classes with 15 samples in each class
2) IMFDB: - This is a large unconstrained face database was used. Those 96 classes were selected since the rest of the
consisting of 34512 images of 100 Indian actors collected other classes have less than 15 samples. For the evaluation in
from more than 100 videos [23]. All the images are manually face recognition we perform 3 different one-shot tasks i.e. 5,
selected and cropped from the video frames resulting in a 10 and 20 way tasks so the new dataset was split into either 91-
high degree of variability in terms of scale, pose, expression, 5/ 81-10 or 71-20 train-validation & evaluation classes, where
illumination, age, resolution, occlusion, and makeup. Videos the train set was further split according to an 80-20% split
collected from the last two decades contain large diversity resulting in 72, 64 or 56 classes for training and 19, 17 or 15
in age variations compared to the images collected from the classes for validation.
Internet through a search query. IMFDB is the ﬁrst face The set of IMFDB consisting of 94-face classes with 20
database that provides detailed annotation of every image in samples in each class was used. For the evaluation in face
terms of age, pose, gender, expression and type of occlusion recognition we perform 5, 10 and 20 way tasks so the new
that may help others face-related applications. This dataset dataset was split into either 89-5/ 84-10 or 74-20 train-
exhibits a huge degree of intra-class variability as well (Fig. 4). validation & evaluation classes, where the train set was further
split according to an 80-20% split resulting in 71, 67 or 59
classes for training and 18, 17 or 15 classes for validation.
The evaluation was conducted using the same n-way one-shot
tests on the n classes from the evaluation set.
Both the datasets contain around 95 classes and for
training and evaluation, we use a fold wise method. So the
total number of folds for “n-way” is obtained as total number
of classes divided by “n” the number of classes for testing with
minimal re-sampling. Therefore, in the case of 5-way we get
19 folds, 10-way we get 9 folds and for 20-way we get 4 folds.
Note that to frame a 50-way one-shot task, given the number
Fig. 4: Example of intra class variability in IMFDB dataset. of classes in each of those two datasets we could perform
only two folds of train-test evaluation run where a few of the
To maintain the variability and to ensure robustness we classes might be re-sampled from the previous folds. By “fold”
have various images with different facial positions. Further we mean to say an unique “train-validation-test” evaluation

117

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.
set.The accuracy metric used here is true recognition rate for whereas the highest accuracy on the same dataset with Siamese
each fold in a given dataset. Network is ≈ 26.00%. A similar trend can be observed in the
case of IMFDB dataset as well.
A. Siamese Network-based Results
Out of three approaches, during our initial experiments, TABLE III: Accuracy of One shot Tasks on IMFDB using
the Siamese Network-based approach performed the worst. Dlib-ResNet-29 network
Even while dealing with a 5-way One-Shot recognition task, it
Fold Number 5-Way Task 10-Way Task 20-Way Task
could only deliver the highest accuracy of ≈ 32.50% for both
datasets. To give an idea, results obtained on n-way One-Shot Fold 1 80.80% 78.60% 80.00%
tasks on both datasets on 4 different folds are shown in Table Fold 2 82.40% 80.40% 76.50%
I and II. Since the results are not encouraging we are not Fold 3 81.00% 79.30% 78.20%
providing results for all folds with respect to different n-way Fold 4 83.60% 82.00% 75.40%
tasks.

TABLE I: Accuracy of One shot Tasks on LFW dataset TABLE IV: Accuracy of One shot Tasks on LFW dataset
using Siamese Network with own feature extractor using Dlib-ResNet-29 network
Fold Number 5-Way Task 10-Way Task 20-Way Task Fold Number 5-Way Task 10-Way Task 20-Way Task
Fold 1 32.50% 28.20% 23.40% Fold 1 88.20% 86.00% 85.30%
Fold 2 27.50% 26.70% 22.60% Fold 2 90.00% 84.60% 87.00%
Fold 3 30.00% 30.20% 25.60% Fold 3 89.00% 90.00% 82.00%
Fold 4 24.60% 24.80% 22.60% Fold 4 90.20% 89.00% 81.40%

TABLE II: Accuracy of One shot Tasks on IMFDB using

C. Results obtained from Combined Hybrid Approach
Siamese Network with own feature extractor
The classification technique that we used to perform One-
Fold Number 5-Way Task 10-Way Task 20-Way Task
Shot learning on the encoded features from Res-Net was a
Fold 1 32.80% 30.80% 24.20% naive Nearest Neighbour classification. Despite the simple
Fold 2 30.50% 27.50% 20.60% classification, such high accuracies from the ResNet-based
Fold 3 28.50% 28.60% 22.80% approach confirm that the encoded features generated by the
Fold 4 27.60% 27.60% 26.02% ResNet were very discriminative. This motivated us to couple
the discriminative feature extractor with the sophisticated
discriminator function of the Siamese network architecture.
B. ResNet-Based Face Recognizer Results In this setup, the ResNet generated encoded features were fed
The ResNet architecture for face Recognition from “DLIB” to the Siamese network which learns its own set of weights
has been used in our experiment. To save time and resources, and hence gives much higher accuracy in the range of 80.00%-
a transfer learning strategy was adopted. Here a pre-trained 84.20% even for the 50-way one-shot task. We experimented
model of the Res-Net, which was generated while training 3 exhaustively with this approach with all possible folds of data.
million face images was initially considered in this experiment. The 20-way one shot results are depicted in Table V and Table
The weights of the initial convolutional layers of this model VI depicts the typical results obtained by this method on 50-
were kept constant during training on samples from the way one-shot learning for the two datasets.
“LFW” and “IMFDB” and weights associated with all fully
connected layers were updated. The 128-dimensional feature TABLE V: Accuracy of 20-way One shot Tasks on LFW
encoding obtained from the 29th layer of an input test image Dataset & IMFDB using combined approach
is compared with 128-dimensional feature encoding vectors
Fold Number LFW IMFDB
of all support set samples, then the class of input image is
assigned to the class of nearest neighbor amongst support Fold 1 92.50% 70.00%
set samples. Results on 5-way, 10-way and 20-way One-Shot Fold 2 95.50% 72.50%
learning tasks on LFW and IMFDB dataset is depicted in Fold 3 82.50% 72.50%
Table IV and Table III respectively. Note that here also Fold 4 87.50% 80.50%
we are reporting on the same 4 folds of data that we have
reported for Siamese Network. It can be noted that with the In our experiments, for the 5-way one shot task we obtained
use of ResNet feature encoding there is a striking improvement an average accuracy of 92.44% across 19 folds on the entire
in the results compared to results obtained with the Siamese subset of LFW dataset. Further, we obtained accuracy as high
Network only based approach. The accuracy is as high as as 97.00% in few ocassions. The mean accuracy yielded by
87.00% with the 20-way One-Shot tasks on LFW dataset, the 10-way one shot tasks over a 9 fold cross-validation set

118

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.
TABLE VI: Accuracy of 50-way One shot Tasks on LFW [3] W. Zhao, R. Chellappa, and A. Krishnaswamy, “Discriminant analysis
of principal components for face recognition,” Proceedings Third IEEE
Dataset & IMFDB using combined approach International Conference on Automatic Face and Gesture Recognition,
pp. 336–341, 1998.
Fold Number LFW IMFDB [4] L.-F. Chen, H.-y. Liao, M.-T. Ko, J.-C. Lin, and G.-J. Yu, “New lda-
Fold 1 80.00% 80.50% based face recognition system which can solve the small sample size
problem,” Pattern Recognition, vol. 33, pp. 1713–1726, 10 2000.
Fold 2 82.50% 84.20%
[5] Y. C. Tan, Y. Zhao, and X. Ma, “Contourlet-based feature extraction with
lpp for face recognition,” 2011 International Conference on Multimedia
and Signal Processing, vol. 1, pp. 122–125, 2011.
[6] G. Hu, Y. Yang, D. Yi, J. Kittler, W. Christmas, S. Li, and T. Hospedales,
was 90.55% with best accuracy shooting as high as 97.50% “When face recognition meets with deep learning: An evaluation of
in one of the fold. convolutional neural networks for face recognition,” 12 2015, pp. 384–
Similar to the experiments conducted on the LFW dataset 392.
[7] C. Lu and X. Tang, “Surpassing human-level face verification perfor-
we also performed 5-way and 10-way tasks on the IMFDB mance on LFW with gaussianface,” CoRR, vol. abs/1404.3840, 2014.
dataset. The mean accuracy of the 5-way one-shot task for 19 [8] Y. Sun, X. Wang, and X. Tang, “Deep learning face representation by
folds was observed to be 82.63%. Whereas for the 10-way one- joint identification-verification,” CoRR, vol. abs/1406.4773, 2014.
[9] Y. Sun, X. Wang, and X. Tang, “Deeply learned face representations are
shot task the mean accuracy across 9 fold set was observed to sparse, selective, and robust,” CoRR, vol. abs/1412.1265, 2014.
be 79.05%. The best accuracy of the 5 and 10 way task was [10] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the
observed to be 92.50% and 87.50% respectively. gap to human-level performance in face verification,” 09 2014.
[11] Y. Guo and L. Zhang, “One-shot face recognition by promoting under-
D. Comparison with other techniques represented classes,” CoRR, vol. abs/1707.05574, 2017.
[12] L. Wang, Y. Li, and S. Wang, “Feature learning for one-shot face recog-
Though there are a large number of published results on nition,” 2018 25th IEEE International Conference on Image Processing
face recognition, however, very few works like [13], [16], [14] (ICIP), pp. 2386–2390, 2018.
[13] Z. Ding, Y. Guo, L. Zhang, and Y. Fu, “One-shot face recognition via
focus on the One-Shot face recognition task. Unfortunately, we generative learning,” 05 2018, pp. 1–7.
could compare the performance of our system with only [14] [14] A. Jadhav, V. P. Namboodiri, and K. S. Venkatesh, “Deep attributes for
as the others have used the “MS-Celeb Low Shot” dataset one-shot face recognition,” in ECCV Workshops, 2016.
[15] Y. Wu, H. Liu, and Y. Fu, “Low-shot face recognition with hybrid
meant for One-Shot recognition task and that dataset is not classifiers,” in The IEEE International Conference on Computer Vision
available from any legitimate source. In [14], the authors (ICCV) Workshops, Oct 2017.
did experiments for One-Shot recognition using the “LFW” [16] S. Hong, W. Im, J. Ryu, and H. S. Yang, “SSPP-DAN: deep domain
adaptation network for face recognition with single sample per person,”
dataset and we have compared our results with them in Table CoRR, vol. abs/1702.04069, 2017.
VII. Note that our method has outperformed the method [17] J. Zhao, Y. Cheng, Z. Wang, Y. Xu, J. Karlekar, S. Shen, and J. Feng,
proposed in [14] especially in the case of 10-way and 20-way “Know you at one glance: A compact vector representation for low-shot
learning,” 09 2017.
one-shot tasks. We plan to preserve and publish the train and [18] J. Bromley, I. Guyon, Y. LeCun et al., “Signature Verification using
test split of images that we have used for our experiments from a ”Siamese” Time Delay Neural Network,” International Journal of
the other dataset “IMFDB”, for benchmarking performance Pattern Recognition and Artificial Intelligence, vol. 7, no. 04, p. 669688,
1993.
evaluation of One-Shot face Recognition task. [19] G. Koch, R. Zemel, and R. Salakhudtdinov, “Siamese Neural Networks
for One-shot Image Recognition,” in Proceedings of the 32 nd Inter-
TABLE VII: Accuracy comparison of One shot Tasks on national Conference on Machine Learning, vol. 37, Lille, France, Jul.
2015.
LFW [20] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in Proceedings of the Thirteenth Inter-
Method 5 Way 10 Way 20 Way national Conference on Artificial Intelligence and Statistics, AISTATS
Deep attribute, Jadhav at al. [14] 94.00% 93.75% 88.87% 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, 2010, pp.
249–256.
Dlib-Siamese Net , Proposed Method 97.00% 97.50% 95.50%
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” CoRR, vol. abs/1512.03385, 2015.
[22] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled
VI. C ONCLUSIONS & F UTURE W ORK faces in the wild: A database for studying face recognition in uncon-
strained environments,” University of Massachusetts, Amherst, Tech.
This article proposes a new hybrid approach of fusing Res- Rep. 07-49, October 2007.
Net features along with a Siamese-Network classifier to handle [23] S. Setty, M. Husain, P. Beham, J. Gudavalli, M. Kandasamy, R. Vaddi,
V. Hemadri, J. C. Karure, R. Raju, B. Rajan, V. Kumar, and C. V. Jawa-
face recognition task in a One-Shot learning framework. The har, “Indian Movie Face Database: A Benchmark for Face Recognition
proposed hybrid network shows impressive performance even Under Wide Variations,” in National Conference on Computer Vision,
while dealing with 50-way One-Shot recognition tasks on two Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Dec
2013.
publicly available datasets. Future research plan is to use more [24] G. B. Huang, M. A. Mattar, H. Lee, and E. Learned-Miller, “Learning to
sophisticated discriminator function to combat 100-way One- align from scratch,” in Proceedings of the 25th International Conference
Shot recognition task. on Neural Information Processing Systems - Volume 1, ser. NIPS’12,
USA, 2012, pp. 764–772.
R EFERENCES
[1] D. E. King, “Dlib-ml: A machine learning toolkit,” Journal of Machine
Learning Research, vol. 10, pp. 1755–1758, 07 2009.
[2] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of
Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.

119

Authorized licensed use limited to: Auckland University of Technology. Downloaded on May 28,2020 at 23:47:46 UTC from IEEE Xplore. Restrictions apply.

To Determine Resistance Per CM of A Given Wire by Plotting A Graph of Potential Difference Versus Current and Reading of Pratical
80% (10)
To Determine Resistance Per CM of A Given Wire by Plotting A Graph of Potential Difference Versus Current and Reading of Pratical
14 pages
Thesis On Face Recognition Using Neural Network
100% (3)
Thesis On Face Recognition Using Neural Network
7 pages
Pre Cal
100% (1)
Pre Cal
532 pages
Week 11
No ratings yet
Week 11
56 pages
Thesis On Face Detection
100% (3)
Thesis On Face Detection
7 pages
20B91A05W9
No ratings yet
20B91A05W9
53 pages
Batch 4
No ratings yet
Batch 4
32 pages
A.I Final Report
No ratings yet
A.I Final Report
39 pages
Topic 4: Subject-Verb Agreement I. Choose The Best Answer To Finish Each Sentence
No ratings yet
Topic 4: Subject-Verb Agreement I. Choose The Best Answer To Finish Each Sentence
15 pages
The Influence of Cultural Background On Language Anxiety Among Malaysian ESL Tertiary Learners
No ratings yet
The Influence of Cultural Background On Language Anxiety Among Malaysian ESL Tertiary Learners
12 pages
Seminar Report
0% (1)
Seminar Report
18 pages
Ad A Shamsi: Deep Learning
No ratings yet
Ad A Shamsi: Deep Learning
4 pages
DL Mini Project 1
No ratings yet
DL Mini Project 1
15 pages
F3 Biology PP2 End Term 2 2023 Exam Questions
No ratings yet
F3 Biology PP2 End Term 2 2023 Exam Questions
9 pages
Guo 2016
No ratings yet
Guo 2016
6 pages
One Shot Learning For Archaeological Dataset Springer
No ratings yet
One Shot Learning For Archaeological Dataset Springer
10 pages
2018 INAPR HadyPranoto ImagesizecolordepthAgeVAriant
No ratings yet
2018 INAPR HadyPranoto ImagesizecolordepthAgeVAriant
8 pages
Listening Acept
100% (1)
Listening Acept
29 pages
Convolutional Neural Network-Based Face Recognition Using Non Subsampled Shearlet Transform and Histogram of Local Feature Descriptors
No ratings yet
Convolutional Neural Network-Based Face Recognition Using Non Subsampled Shearlet Transform and Histogram of Local Feature Descriptors
12 pages
ND Chemical Engineering
No ratings yet
ND Chemical Engineering
150 pages
Thesis For M.tech For Electronics and Communication
100% (2)
Thesis For M.tech For Electronics and Communication
4 pages
Journal Paper-2
No ratings yet
Journal Paper-2
11 pages
Face Recognition With Deep Learning Architectures
No ratings yet
Face Recognition With Deep Learning Architectures
27 pages
Collins Ks3 Science Homework Book 1
50% (2)
Collins Ks3 Science Homework Book 1
7 pages
Face Recognition For Attendance Using Transfer Learning
No ratings yet
Face Recognition For Attendance Using Transfer Learning
7 pages
Adaptive Deep Supervised Autoencoder Based Image R PDF
No ratings yet
Adaptive Deep Supervised Autoencoder Based Image R PDF
15 pages
Advance Product Quality Process
No ratings yet
Advance Product Quality Process
34 pages
Convolutional Neural Network Approach Fo
No ratings yet
Convolutional Neural Network Approach Fo
6 pages
Preprocessing Techniques To Improve CNN
No ratings yet
Preprocessing Techniques To Improve CNN
20 pages
Paper No Year Citation Title
No ratings yet
Paper No Year Citation Title
15 pages
10 1109@iciccs48265 2020 9121163
No ratings yet
10 1109@iciccs48265 2020 9121163
6 pages
One Shot Face Recognition: Mid Term Presentation Presented By, Amitrajit Chattopadhyay, 3 Year Ug, Iisc 1.04.2020
No ratings yet
One Shot Face Recognition: Mid Term Presentation Presented By, Amitrajit Chattopadhyay, 3 Year Ug, Iisc 1.04.2020
10 pages
2 B. Chapter 2 Mpu22012 2021
No ratings yet
2 B. Chapter 2 Mpu22012 2021
59 pages
Deep Face Recognition: A Survey: Mei Wang, Weihong Deng
No ratings yet
Deep Face Recognition: A Survey: Mei Wang, Weihong Deng
31 pages
A Review: Face Recognition Techniques Using Deep Learning: Ghofran Khalid Hummady, Asst. Prof. Mohand Lokman Ahmad
No ratings yet
A Review: Face Recognition Techniques Using Deep Learning: Ghofran Khalid Hummady, Asst. Prof. Mohand Lokman Ahmad
9 pages
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
No ratings yet
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
5 pages
Deep Learning For Face Recognition
No ratings yet
Deep Learning For Face Recognition
47 pages
Comparing The Effectiveness and Performance of Image Processing Algorithms in Face Recognition
No ratings yet
Comparing The Effectiveness and Performance of Image Processing Algorithms in Face Recognition
5 pages
Openface: A General-Purpose Face Recognition Library With Mobile Applications
No ratings yet
Openface: A General-Purpose Face Recognition Library With Mobile Applications
20 pages
Face Recognition System: By: Yang Li Yli@My - Harrisburgu.Edu
No ratings yet
Face Recognition System: By: Yang Li Yli@My - Harrisburgu.Edu
49 pages
Comparative Analysis of Transfer Learning CNN For Face Recognition
No ratings yet
Comparative Analysis of Transfer Learning CNN For Face Recognition
6 pages
10.1007@s00371 020 01814 8
No ratings yet
10.1007@s00371 020 01814 8
10 pages
CH 3 Plant Layout (Copy)
No ratings yet
CH 3 Plant Layout (Copy)
30 pages
Transfer Learning Convolutional Neural Network-AlexNet Achieving Face Recognition
No ratings yet
Transfer Learning Convolutional Neural Network-AlexNet Achieving Face Recognition
4 pages
Face Regonition
No ratings yet
Face Regonition
17 pages
Pokemon Magical Creatures Coloring and Activity Book - United States, United States of America - Unisystems, Incorporated - 9780766618909 - Anna's Archive
No ratings yet
Pokemon Magical Creatures Coloring and Activity Book - United States, United States of America - Unisystems, Incorporated - 9780766618909 - Anna's Archive
36 pages
Final Report Kolkata Btech
No ratings yet
Final Report Kolkata Btech
74 pages
Face Recognition Using Facenet Deep Lear
No ratings yet
Face Recognition Using Facenet Deep Lear
6 pages
Face Recognition and Identification Using Deep Learning
No ratings yet
Face Recognition and Identification Using Deep Learning
5 pages
Deep Learning For Face Recognition: A Critical Analysis: Andrew Jason Shepley
No ratings yet
Deep Learning For Face Recognition: A Critical Analysis: Andrew Jason Shepley
27 pages
Project Report Format For College
No ratings yet
Project Report Format For College
13 pages
Sagar Institute of Research & Technology Department of Electronics & Communication
No ratings yet
Sagar Institute of Research & Technology Department of Electronics & Communication
13 pages
An Efficient Face Recognition Method Based On CNN
No ratings yet
An Efficient Face Recognition Method Based On CNN
4 pages
Face Regonition
No ratings yet
Face Regonition
10 pages
Developing A Neural Network-Based Method For Faster Face Recognition by Training & Simulation
No ratings yet
Developing A Neural Network-Based Method For Faster Face Recognition by Training & Simulation
10 pages
Face Recognition Approach Via Deep and Machine Lea
No ratings yet
Face Recognition Approach Via Deep and Machine Lea
13 pages
Finland 2021
No ratings yet
Finland 2021
150 pages
Deep Convolutional Neural Network-Based Approaches
No ratings yet
Deep Convolutional Neural Network-Based Approaches
21 pages
Teoh 2021 J. Phys. Conf. Ser. 1755 012006
No ratings yet
Teoh 2021 J. Phys. Conf. Ser. 1755 012006
10 pages
A Comparison of Face Recognition Algorithm
No ratings yet
A Comparison of Face Recognition Algorithm
4 pages
Siamese Neural Networks For One-Shot Image Recognition
No ratings yet
Siamese Neural Networks For One-Shot Image Recognition
8 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
No ratings yet
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
9 pages
Face Recognition Based On Convolutional Neural Network.: November 2017
No ratings yet
Face Recognition Based On Convolutional Neural Network.: November 2017
5 pages
Face Rec With Deep Learning
No ratings yet
Face Rec With Deep Learning
10 pages
3D Face Recognition Based On Deep Learning - 8816269 PDF
No ratings yet
3D Face Recognition Based On Deep Learning - 8816269 PDF
6 pages
Westin Aristotle's Rhetorical Energeia
No ratings yet
Westin Aristotle's Rhetorical Energeia
11 pages
Soil Nails Field Pull Out Testing Evaluation and Applications
No ratings yet
Soil Nails Field Pull Out Testing Evaluation and Applications
11 pages
One Shot Learning
No ratings yet
One Shot Learning
1 page
Art of Agile Dev
No ratings yet
Art of Agile Dev
45 pages
Prakash2019 - Face Recognition
No ratings yet
Prakash2019 - Face Recognition
4 pages
Refe 1
No ratings yet
Refe 1
6 pages
Blue - Print - Pre-Board - 2 - Pol. - SC - XII
No ratings yet
Blue - Print - Pre-Board - 2 - Pol. - SC - XII
4 pages
Operating-Instruction PV18,24VKF PDF
No ratings yet
Operating-Instruction PV18,24VKF PDF
24 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Face Recognition System: Abstract-We Present An Approach To The Detection and
No ratings yet
Face Recognition System: Abstract-We Present An Approach To The Detection and
6 pages
A Robust, Low-Cost Approach To Face Detection and Face Recognition
No ratings yet
A Robust, Low-Cost Approach To Face Detection and Face Recognition
6 pages
IAWA J. Suppl.5. Wood Anatomy Mimosoideae
No ratings yet
IAWA J. Suppl.5. Wood Anatomy Mimosoideae
119 pages
Face Recognition Using Principal Component Analysis and Artificial Neural Network of Facial Images Datasets in Soft Computing
No ratings yet
Face Recognition Using Principal Component Analysis and Artificial Neural Network of Facial Images Datasets in Soft Computing
7 pages
2013 FaceRecognition Review IJCTA
No ratings yet
2013 FaceRecognition Review IJCTA
4 pages
108 Unix
No ratings yet
108 Unix
20 pages
SAWS-EnG-0632 Minimum Requirements of Geotechnical Investigations and Reports
No ratings yet
SAWS-EnG-0632 Minimum Requirements of Geotechnical Investigations and Reports
13 pages
Block Wise Sub Allocation NFSM CC (Jute) 24-25
No ratings yet
Block Wise Sub Allocation NFSM CC (Jute) 24-25
1 page
25 Combi and NT Problems
No ratings yet
25 Combi and NT Problems
7 pages
Problem Sheet
No ratings yet
Problem Sheet
2 pages
Eigen Values and Eigen Vector
No ratings yet
Eigen Values and Eigen Vector
13 pages
Determinants of Work-Readiness: Siti Nurlaela Kurjono Rasto
No ratings yet
Determinants of Work-Readiness: Siti Nurlaela Kurjono Rasto
7 pages
Lab 1 - Experiment On Electrostatics
No ratings yet
Lab 1 - Experiment On Electrostatics
5 pages
Naturebased: Solutions
No ratings yet
Naturebased: Solutions
2 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet

Chanda 2019

Uploaded by

Chanda 2019

Uploaded by

2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)

Face Recognition - A One-Shot Learning

978-1-7281-5686-6/19/$31.00 ©2019 IEEE 113

on and a lower tendency of over-ﬁtting. For experiments, a 6

C. A Combined Hybrid Approach

IV. E XPERIMENTAL P ROTOCOL

TABLE II: Accuracy of One shot Tasks on IMFDB using

You might also like