Face Recognition For E-Authentication Final Project Report (B.Tech Final Year Project Report)
Face Recognition For E-Authentication Final Project Report (B.Tech Final Year Project Report)
in
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Accredited by
NATIONAL BOARD OF
ACCREDITATION ACADEMIC
SESSION: 2021-2024
SEMESTER:VIII
Approved by
Affiliated to
1. CERTIFICATION 2
2. ACKNOWLEDGEMENT 3
3. ABSTRACT 4
4. INTRODUCTION 5
5. LITERATURE SURVEY 6
6. AIMS AND OBJECTIVES 7
7. SCOPE AND LIMITATIONS 8
8. ANALYSIS OF EXISTING SYSTEM 9
9. ANALYSIS OF PROPOSED SYSTEM 9
10. DESIGN 10
11. SYSTEM REQUIREMENTS 11
12. CODING AND METHODOLOGY 12-57
13. RESULT 58-61
14. CONCLUSION 62-63
15. SCOPE FOR FUTURE WORK 64-65
16. REFERENCES 66-68
1|Page
CERTIFICATION
This is to certify that this research work was carried out by:
____________________ ____________________
Prof. Raj Kumar Paul Prof. (Dr.) Bimal Datta
Project Supervisor & Guide Prof. & HoD CSE Dept
2|Page
ACKNOWLEDGMENT
We appreciate our supervisor Prof. Raj Kumar Paul (Assistant Professor) for the
supervision and support that he gave, which helped the progression and smoothness of the
project.
We extend our thanks to the Entire Computer Science & Engineering Department of Budge
Budge Institute of Technology, the H.O.D Prof. (Dr.) Bimal Datta and all lecturers who
prepared us from the base of computer science.
We would also like to appreciate our friends and special thanks to our parents who
encouraged, supported and helped us financially, prayerfully and morally throughout this
project.
3|Page
ABSTRACT
Face recognition technology has emerged as a promising method for enhancing security in various applications,
including e-authentication systems. In this project, we propose a robust face recognition system tailored specifically
for e-authentication purposes.
The primary objective of our project is to develop an efficient and accurate face recognition model that can
authenticate users with a high level of confidence while ensuring a seamless and user-friendly experience. To achieve
this, we employ state-of-the-art deep learning techniques, specifically convolutional neural networks (CNNs), for
feature extraction and classification.
The proposed system encompasses several key components, including face detection, feature extraction, and
matching. Initially, faces are detected and localized within input images using advanced computer vision algorithms.
Subsequently, facial features are extracted from the detected faces using a deep CNN architecture, which captures
discriminative characteristics essential for accurate identification. Finally, a matching algorithm compares the
extracted features with pre-registered templates to authenticate the user's identity.
Furthermore, to enhance the system's robustness and security, we incorporate advanced techniques such as anti-
spoofing mechanisms to detect and prevent unauthorized access attempts, including presentation attacks with fake or
manipulated images.
In addition to security considerations, we prioritize user experience by optimizing the system for speed, reliability,
and ease of use. The interface is designed to be intuitive and accessible, ensuring a smooth authentication process for
users across various devices and platforms.
Through rigorous evaluation and testing, we demonstrate the effectiveness and reliability of our face recognition
system for e-authentication purposes. The results indicate high accuracy rates and resistance to spoofing attacks,
validating its suitability for deployment in real-world scenarios where secure and user-friendly authentication is
paramount.
Despite significant recent advances in the field of face recognition, implementing face verification and recognition
efficiently at scale presents serious challenges to current approaches. In this paper we present a system, that directly
learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure
of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can
be easily implemented using standard techniques.
Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an
intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned
matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our
approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using
only 128-bytes per face.
4|Page
INTRODUCTION
In an increasingly digital world where online transactions and interactions have become ubiquitous, ensuring secure
and reliable authentication mechanisms is paramount. Traditional methods such as passwords and PINs are
susceptible to various security threats, including phishing, brute force attacks, and password leaks. As a result, there
is a growing demand for more robust and convenient authentication solutions that can effectively mitigate these risks
while offering a seamless user experience.
Face recognition technology has emerged as a promising solution to address the shortcomings of traditional
authentication methods. Leveraging the unique biometric characteristics of an individual's face, such as facial features
and patterns, face recognition systems offer a highly secure and user-friendly authentication mechanism. By analyzing
and comparing facial characteristics captured from an individual's face with stored templates, these systems can
accurately verify the identity of users in real-time.
In the context of electronic authentication (e-authentication), which encompasses a wide range of online activities
such as accessing sensitive information, conducting financial transactions, and interacting with government services,
the need for robust authentication mechanisms is particularly critical. E-authentication systems must strike a delicate
balance between security and user experience, ensuring that sensitive data and resources are protected while providing
a frictionless authentication process for users.
The primary objective of this project is to develop and implement a face recognition system specifically tailored for
e-authentication purposes. By harnessing the power of deep learning and computer vision technologies, our system
aims to provide a highly accurate, reliable, and secure authentication solution that enhances both security and user
experience.
In this introduction, we provide an overview of the motivation behind the project, the challenges associated with e-
authentication, and the potential of face recognition technology to address these challenges. We also outline the
objectives, scope, and significance of the project, setting the stage for the subsequent sections where we delve into
the technical details, implementation, and evaluation of the proposed face recognition system for e-authentication.
5|Page
LITERATURE SURVEY
Face recognition technology has garnered significant attention in recent years due to its potential applications in
various domains, including e-authentication. A thorough literature survey reveals a wealth of research and
advancements in this field, encompassing both theoretical foundations and practical implementations. Here, we
provide an overview of key studies and findings relevant to the project of Face Recognition for E-Authentication:
1. Biometric Authentication Techniques: Numerous studies have explored the effectiveness of biometric
authentication methods, including fingerprint recognition, iris scanning, and face recognition, in e-
authentication systems. Comparisons between different biometric modalities have highlighted the advantages
of face recognition in terms of user acceptance, non-intrusiveness, and ease of deployment.
2. Deep Learning for Face Recognition: The advent of deep learning techniques, particularly convolutional
neural networks (CNNs), has revolutionized face recognition research. Studies have demonstrated the superior
performance of CNN-based models in facial feature extraction and classification tasks, leading to
unprecedented accuracy rates in face recognition systems.
3. Face Detection and Feature Extraction: Face detection and feature extraction are fundamental components
of any face recognition system. Researchers have proposed various algorithms and methodologies for robust
face detection and feature extraction, ranging from traditional methods such as Viola-Jones algorithm to
advanced deep learning architectures like the ResNet and VGG networks.
4. Spoof Detection and Anti-Spoofing Techniques: Addressing security concerns associated with face
recognition, several studies have focused on developing spoof detection and anti-spoofing techniques to
prevent unauthorized access attempts using fake or manipulated facial images. These techniques often
leverage advanced machine learning algorithms to distinguish between genuine and spoofed faces based on
subtle cues and characteristics.
5. User Experience and Usability: In addition to security considerations, the user experience is a crucial aspect
of e-authentication systems. Research has explored various strategies for optimizing the usability and
accessibility of face recognition systems, such as intuitive user interfaces, adaptive feedback mechanisms,
and multi-modal authentication approaches.
6. Real-World Deployments and Case Studies: Several real-world deployments of face recognition technology
for e-authentication purposes have been documented in the literature. Case studies and practical
implementations provide valuable insights into the challenges, benefits, and considerations involved in
integrating face recognition into existing authentication systems across different sectors, including finance,
healthcare, and government services.
By synthesizing insights from existing literature and building upon previous research findings, our project aims to
contribute to the advancement of face recognition technology for e-authentication, with a focus on enhancing both
security and user experience. Through rigorous experimentation, evaluation, and validation, we seek to develop a
robust and reliable face recognition system that meets the evolving needs of e-authentication in today's digital
landscape.
6|Page
AIMS AND OBJECTIVES
The project "Face Recognition for E-Authentication" aims to develop an advanced system that utilizes face
recognition technology to enhance the security and user experience of electronic authentication (e-authentication).
With the proliferation of online transactions and interactions, the need for robust authentication mechanisms has
become increasingly crucial. Traditional methods such as passwords and PINs are prone to various security threats,
including phishing, brute force attacks, and password leaks. In this context, face recognition technology offers a
promising solution by leveraging the unique biometric characteristics of an individual's face to verify their identity
in real-time.
One of the primary aims of the project is to design and implement a face recognition system tailored specifically for
e-authentication purposes. This involves the development of sophisticated algorithms for face detection, feature
extraction, and matching, utilizing state-of-the-art deep learning techniques such as convolutional neural networks
(CNNs). By analyzing and comparing facial features captured from an individual's face with pre-registered templates,
the system aims to accurately authenticate users with a high level of confidence.
Enhancing security is another key objective of the project. In addition to traditional authentication measures, such as
username-password combinations, the face recognition system incorporates advanced anti-spoofing mechanisms to
detect and prevent unauthorized access attempts. These mechanisms are designed to distinguish between genuine
facial images and spoofed or manipulated ones, thereby mitigating the risk of presentation attacks.
Moreover, the project emphasizes the importance of optimizing the accuracy and reliability of the face recognition
system. Through rigorous testing and validation, the system aims to minimize false acceptance and rejection rates,
ensuring robust performance under various environmental conditions and scenarios. Fine-tuning the system
parameters and optimizing the algorithms are essential steps in achieving this objective.
In addition to security considerations, the project focuses on enhancing the user experience of e-authentication. A
user-friendly interface is designed to facilitate seamless interaction with the face recognition system, catering to users
of diverse backgrounds and technical proficiency levels. Usability enhancements, such as adaptive feedback
mechanisms and multi-modal authentication approaches, are incorporated to streamline the authentication process
and minimize user friction.
Furthermore, the project aims to ensure compatibility and scalability of the face recognition system. This involves
integrating the system with existing e-authentication frameworks and platforms, enabling seamless deployment in
diverse application environments. The system architecture is designed to be scalable and adaptable, allowing for
future enhancements and expansions to accommodate evolving user needs and technological advancements.
Ethical and legal considerations are paramount throughout the project. Measures are put in place to adhere to ethical
guidelines and legal regulations governing the collection, storage, and processing of biometric data. User privacy and
confidentiality are prioritized, with stringent measures such as data anonymization and secure storage practices
implemented to protect sensitive information.
Documentation and knowledge sharing are integral aspects of the project. The development process, methodologies,
and findings are comprehensively documented to facilitate knowledge sharing and replication within the academic
and professional community. Research papers, reports, and technical documentation are published to disseminate the
project outcomes and contribute to the broader advancement of face recognition technology for e-authentication.
7|Page
SCOPE
• It is typically used in security systems and can be compared to other biometrics such as fingerprint or eye iris
recognition systems.
• The facial recognition industry is experiencing tremendous growth. It has the potential to completely transform
a wide range of businesses, including security and surveillance, AI capabilities, and even personalized
advertising.
• Attendance taking using Face recognizing software.
• Monitoring attendance and retrieving.
LIMITATIONS
1. Accuracy Constraints: Despite advancements in face recognition technology, there may still be limitations
in accurately identifying individuals under certain conditions such as low lighting, occlusions, or variations
in facial expressions.
2. Hardware and Resource Requirements: Implementing a robust face recognition system may require
significant computational resources, including high-performance hardware for processing large datasets and
running complex algorithms.
3. Environmental Factors: Environmental factors such as changes in lighting conditions, camera quality, and
background noise could affect the performance of the face recognition system, leading to decreased accuracy
in real-world scenarios.
4. Privacy Concerns: The collection and storage of biometric data raise privacy concerns, as it involves
capturing and storing sensitive information about individuals. Ensuring compliance with privacy laws and
regulations is crucial to mitigate these concerns.
5. Spoofing Attacks: Despite the implementation of anti-spoofing mechanisms, the face recognition system
may still be vulnerable to sophisticated spoofing attacks using high-quality facial masks or digitally
manipulated images.
6. User Acceptance: Some users may have reservations about using biometric authentication methods due to
concerns about privacy, security, or cultural factors. Ensuring user acceptance and adoption of the system
may pose challenges.
7. Regulatory Compliance: Compliance with regulatory frameworks and standards related to biometric data
management, such as GDPR in Europe or HIPAA in the United States, may impose constraints on the
development and deployment of the face recognition system.
8. Cultural and Diversity Challenges: Cultural factors and diversity in facial features across different
populations may affect the accuracy and reliability of the face recognition system, necessitating diverse
datasets and inclusive design considerations.
9. Cost Considerations: Developing and maintaining a robust face recognition system may involve significant
costs associated with research and development, hardware infrastructure, software licensing, and ongoing
support and maintenance.
10. Interoperability Issues: Integration with existing e-authentication frameworks and platforms may
encounter interoperability issues, requiring additional efforts to ensure seamless compatibility and
functionality.
8|Page
ANALYSIS OF EXISTING SYSTEM
The analysis of existing systems for face recognition in e-authentication reveals a landscape characterized by both
advancements and limitations. Traditional password-based systems, while prevalent, suffer from vulnerabilities such
as phishing and brute force attacks. Biometric authentication methods, including face recognition, offer promising
alternatives due to their non-intrusiveness and user convenience. However, existing face recognition systems may
encounter challenges in accuracy, particularly under varying environmental conditions. Deep learning-based
approaches have shown significant improvements in face recognition accuracy, yet they remain susceptible to
overfitting and adversarial attacks. Anti-spoofing mechanisms aim to address security concerns but may not provide
foolproof protection. Commercial solutions offer convenience but may come with limitations such as licensing fees
and privacy concerns. Academic research contributes valuable insights and prototypes, yet scalability and real-world
deployment considerations remain challenges. In summary, while existing face recognition systems offer promising
solutions for e-authentication, addressing challenges related to accuracy, security, and usability requires ongoing
research and development efforts.
The proposed system for face recognition in e-authentication presents a comprehensive solution aimed at addressing
the limitations of existing systems while leveraging advancements in technology. By integrating deep learning
techniques, specifically convolutional neural networks (CNNs), the system aims to enhance accuracy and robustness
in facial recognition tasks. Additionally, the incorporation of anti-spoofing mechanisms addresses security concerns
by detecting and preventing presentation attacks. Usability is prioritized through the design of an intuitive interface
and adaptive feedback mechanisms, ensuring a seamless user experience. Compatibility and scalability are also
considered, enabling integration with existing e-authentication frameworks and accommodating future
enhancements. Ethical considerations are paramount, with measures in place to protect user privacy and comply with
regulatory requirements. Overall, the proposed system represents a significant advancement in e-authentication
technology, offering a secure, reliable, and user-friendly solution for various online applications and services.
9|Page
DESIGN
10 | P a g e
SYSTEM REQUIREMENTS:
1. HARDWARE REQUIREMENTS
2. SOFTWARE REQUIREMENTS
HARDWARE REQUIREMENTS:
• System – Windows/Linux System
• GPU
• Storage: 64GB
• Memory: 8GB
SOFTWARE REQUIREMENTS:
• OS: Windows/Linux based OS
• Google Colab for Python Programming execution (Online Platform)
• Visual Studio Code
• Python Supported
11 | P a g e
Coding and Methodology
importing Libraries
import imutils
import numpy as np
import cv2
from google.colab.patches import cv2_imshow
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode
Start webcam
def take_photo(filename='photo.jpg', quality=0.8):
js = Javascript('''
async function takePhoto(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Capture';
div.appendChild(capture);
document.body.appendChild(div);
div.appendChild(video);
video.srcObject = stream;
await video.play();
12 | P a g e
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data = eval_js('takePhoto({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
return filename
<IPython.core.display.Javascript object>
400 300
OpenCV’s deep learning face detector is based on the Single Shot Detector (SSD) framework
13 | P a g e
with a ResNet base network. The network is defined and trained using the Caffe Deep Learning
framework Download the pre-trained face detection model, consisting of two files:
• The network definition (deploy.prototxt)
• The learned weights (res10_300x300_ssd_iter_140000.caffemodel)
!wget -N
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.p
rototxt
!wget -N
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_201708
30/res10_300x300_ssd_iter_140000.caffemodel
--2024-05-16 14:59:13--
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.p
rototxt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133,
185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28104 (27K) [text/plain]
Saving to: ‘deploy.prototxt’
--2024-05-16 14:59:14--
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_201708
30/res10_300x300_ssd_iter_140000.caffemodel
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133,
185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10666211 (10M) [application/octet-stream]
Saving to: ‘res10_300x300_ssd_iter_140000.caffemodel’
14 | P a g e
[INFO] loading model...
Use the dnn.blobFromImage function to construct an input blob by resizing the image to a fixed 300x300
pixels and then normalizing it.
# resize it to have a maximum width of 400 pixels
image = imutils.resize(image, width=400)
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0, (300, 300), (104.0,
177.0, 123.0))
Pass the blob through the neural network and obtain the detections and predictions.
print("[INFO] computing object detections...")
net.setInput(blob)
detections = net.forward()
Loop over the detections and draw boxes around the detected faces
for i in range(0, detections.shape[2]):
15 | P a g e
The Google Colab file is accessible using the given link :
https://colab.research.google.com/drive/1QBS2P48Gj3ZxABy2vVx5KzEOi-xvv-lk?usp=sharing
16 | P a g e
For implementing this project we have used CNN as the algorithm and methodology. The Architecture of CNN
is given below:
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks are used
in various datasets like images, audio, and text. Different types of Neural Networks are used for different purposes,
for example for predicting the sequence of words we use Recurrent Neural Networks more precisely an LSTM,
similarly for image classification we use Convolution Neural networks. In this blog, we are going to build a basic
building block for CNN.
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is
equal to the total number of features in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be many hidden
layers depending on our model and data size. Each hidden layer can have different numbers of neurons
which are generally greater than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that layer and then by the
addition of learnable biases followed by activation function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or
softmax which converts the output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from the above step is called feedforward,
we then calculate the error using an error function, some common error functions are cross-entropy, square loss
error, etc. The error function measures how well the network is performing. After that, we backpropagate into the
model by calculating the derivatives. This step is called Backpropagation which basically is used to minimize
the loss.
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which is
predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like images
or videos where data patterns play an extensive role.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling layer,
and fully connected layers.
17 | P a g e
Simple CNN architecture
The Convolutional layer applies filters to the input image to extract features, the Pooling layer downsamples the
image to reduce computation, and the fully connected layer makes the final prediction. The network learns the
optimal filters through backpropagation and gradient descent.
How Convolutional Layers works
Convolution Neural Networks or covnets are neural networks that share their parameters. Imagine you have an
image. It can be represented as a cuboid having its length, width (dimension of the image), and height (i.e the
channel as images generally have red, green, and blue channels).
Now imagine taking a small patch of this image and running a small neural network, called a filter or kernel on
it, with say, K outputs and representing them vertically. Now slide that neural network across the whole image,
as a result, we will get another image with different widths, heights, and depths. Instead of just R, G, and B
channels now we have more channels but lesser width and height. This operation is called Convolution. If the
patch size is the same as that of the image it will be a regular neural network. Because of this small patch, we
have fewer weights.
18 | P a g e
Image source: Deep Learning Udacity
Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
• Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights and the
same depth as that of input volume (3 if the input layer is image input).
• For example, if we have to run convolution on an image with dimensions 34x34x3. The possible size of
filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
• During the forward pass, we slide each filter across the whole input volume step by step where each step
is called stride (which can have a value of 2, 3, or even 4 for high-dimensional images) and compute the
dot product between the kernel weights and patch from input volume.
• As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a result, we’ll
get output volume having a depth equal to the number of filters. The network will learn all the filters.
Layers used to build ConvNets
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of
layers, and every layer transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
• Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an
image or a sequence of images. This layer holds the raw input of the image with width 32, height 32, and
depth 3.
• Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It
applies a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller
matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot product
between kernel weight and the corresponding input image patch. The output of this layer is referred as
feature maps. Suppose we use a total of 12 filters for this layer we’ll get an output volume of dimension
32 x 32 x 12.
• Activation Layer: By adding an activation function to the output of the preceding layer, activation layers
add nonlinearity to the network. it will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU, etc.
The volume remains unchanged hence output volume will have dimensions 32 x 32 x 12.
• Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the size
of volume which makes the computation fast reduces memory and also prevents overfitting. Two common
types of pooling layers are max pooling and average pooling. If we use a max pool with 2 x 2 filters and
stride 2, the resultant volume will be of dimension 16x16x12.
19 | P a g e
Image source: cs231n.stanford.edu
• Flattening: The resulting feature maps are flattened into a one-dimensional vector after the convolution
and pooling layers so they can be passed into a completely linked layer for categorization or regression.
• Fully Connected Layers: It takes the input from the previous layer and computes the final classification
or regression task.
20 | P a g e
Input image:
Input image
Step:
• import the necessary libraries
• set the parameter
• define the kernel
• Load the image and plot it.
• Reformat the image
• Apply convolution layer operation and plot the output image.
• Apply activation layer operation and plot the output image.
• Apply pooling layer operation and plot the output image.
• Python3
21 | P a g e
# define the kernel
kernel = tf.constant([[-1, -1, -1],
[-1, 8, -1],
[-1, -1, -1],
])
# Reformat
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
kernel = tf.cast(kernel, dtype=tf.float32)
# convolution layer
conv_fn = tf.nn.conv2d
22 | P a g e
image_filter = conv_fn(
input=image,
filters=kernel,
strides=1, # or (1, 1)
padding='SAME',
)
plt.figure(figsize=(15, 5))
plt.imshow(
tf.squeeze(image_filter)
)
plt.axis('off')
plt.title('Convolution')
# activation layer
relu_fn = tf.nn.relu
# Image detection
image_detect = relu_fn(image_filter)
plt.subplot(1, 3, 2)
plt.imshow(
# Reformat for plotting
tf.squeeze(image_detect)
)
23 | P a g e
plt.axis('off')
plt.title('Activation')
# Pooling layer
pool = tf.nn.pool
image_condense = pool(input=image_detect,
window_shape=(2, 2),
pooling_type='MAX',
strides=(2, 2),
padding='SAME',
)
plt.subplot(1, 3, 3)
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.title('Pooling')
plt.show()
Output:
25 | P a g e
Face Recognition, Store and Authentication:
In this paper we present a unified system for face verification (is this the same person), recognition (who is this
person) and clustering (find common people among these faces). Our method is based on learning a Euclidean
embedding per image using a deep convolutional network. The network is trained such that the squared L2
distances in the embedding space directly correspond to face similarity:
faces of the same person have small distances and faces of distinct people have large distances.
Once this embedding has been produced, then the aforementioned tasks become straight-forward: face verification
simply involves thresholding the distance between the two embeddings; recognition becomes a k-NN
classification problem; and clustering can be achieved using off-theshelf techniques such as k-means or
agglomerative clustering.
Previous face recognition approaches based on deep networks use a classification layer trained over a set of known
face identities and then take an intermediate bottleneck layer as a representation used to generalize recognition
beyond the set of identities used in training. The downsides of this approach are its indirectness and its
inefficiency: one has to hope that the bottleneck representation generalizes well to new faces; and by using a
bottleneck layer the representation size per face is usually very large (1000s of dimensions). Some recent work
has reduced this dimensionality using PCA, but this is a linear transformation that can be easily learnt in one layer
of the network.
In contrast to these approaches, FaceNet directly trains its output to be a compact 128-D embedding using a triplet
based loss function based on LMNN. Our triplets consist of two matching face thumbnails and a non-matching
face thumbnail and the loss aims to separate the positive pair from the negative by a distance margin. The
thumbnails are tight crops of the face area, no 2D or 3D alignment, other than scale and translation is performed.
Related Work :
Similarly to other recent works which employ deep networks, our approach is a purely data driven method which
learns its representation directly from the pixels of the face. Rather than using engineered features, we use a large
dataset of labelled faces to attain the appropriate invariances to pose, illumination, and other variational
conditions.
In this paper we explore two different deep network architectures that have been recently used to great success in
the computer vision community. Both are deep convolutional networks. The first architecture is based on the
26 | P a g e
Zeiler& Fergus model which consists of multiple interleaved layers of convolutions, non-linear activations, local
response normalizations, and max pooling layers. We additionally add several 1×1×d convolution layers inspired
by
the work of. The second architecture is based on the Inception model of Szegedy et al. which was recently used
as the winning approach for ImageNet 2014. These networks use mixed layers that run several different
convolutional and pooling layers in parallel and concatenate their responses. We have found that these models
can reduce the number of parameters by up to 20 times and have the potential to reduce the number of FLOPS
required for comparable performance.
There is a vast corpus of face verification and recognition works. Reviewing it is out of the scope of this paper so
we will only briefly discuss the most relevant recent work.
The works of all employ a complex system of multiple stages, that combines the output of a deep convolutional
network with PCA for dimensionality reduction and an SVM for classification.
Zhenyao et al. employ a deep network to “warp” faces into a canonical frontal view and then learn CNN that
classifies each face as belonging to a known identity. For face verification, PCA on the network output in
conjunction with an ensemble of SVMs is used.
Taigman et al. propose a multi-stage approach that aligns faces to a general 3D shape model. A multi-class
network is trained to perform the face recognition task on over four thousand identities. The authors also
experimented with a so called Siamese network where they directly optimize the L1-distance between two face
features. Their best performance on LFW (97:35%) stems from an ensemble of three networks using different
alignments and color channels. The predicted distances (non-linear SVM predictions based on the χ2 kernel) of
those networks are combined using a non-linear SVM.
Sun et al. propose a compact and therefore relatively cheap to compute network. They use an ensemble of 25 of
these network, each operating on a different face patch. For their final performance on LFW (99:47% ) the authors
combine 50 responses (regular and flipped). Both PCA and a Joint Bayesian model that effectively correspond to
a linear transform in the embedding space are employed. Their method does not require explicit 2D/3D alignment.
The networks are trained by using a combination of classification and verification loss. The verification loss is
similar to the triplet loss we employ, in that it minimizes the L2-distance between faces of the same identity and
enforces a margin between the distance of faces of different identities. The main difference is that only pairs of
images are compared, whereas the triplet loss encourages a relative distance constraint.
A similar loss to the one used here was explored in Wang et al. for ranking images by semantic and visual
similarity.
Figure 2. Model structure. Our network consists of a Figure 3. The Triplet Loss minimizes the distance
batch input layer and a deep CNN followed by L2 between an anchor and a positive, both of which have
normalization, which results in the face embedding. the same identity, and maximizes the distance
This is followed by the triplet loss during training. between the anchor and a negative of a different
identity.
Method:
This Project uses a deep convolutional network. We discuss two different core architectures: The Zeiler&Fergus
style networks and the recent Inception type networks.
Given the model details, and treating it as a black box (see Figure 2), the most important part of our approach lies
27 | P a g e
in the end-to-end learning of the whole system. To this end we employ the triplet loss that directly reflects what we want
to achieve in face verification, recognition and clustering. Namely, we strive for an embedding f(x), from an image x into a
feature space Rd, such that the squared distance between all faces, independent of imaging conditions, of the same identity
is small, whereas the squared distance between a pair of face images from different identities is large.
Although we did not directly compare to other losses, e.g. the one using pairs of positives and negatives, as used in Eq. (2),
we believe that the triplet loss is more suitable for face verification. The motivation is that the loss from encourages all faces
of one identity to be projected onto a single point in the embedding space. The triplet loss, however, tries to enforce a margin
between each pair of faces from one person to all other faces. This allows the faces for one identity to live on a manifold,
while still enforcing the distance and thus discriminability to other identities.
The following section describes this triplet loss and how it can be learned efficiently at scale.
Triplet Loss
The embedding is represented by f (x) ∈ Rd. It em- beds an image x into a d-dimensional Euclidean space.
Additionally, we constrain this embedding to live on the d-dimensional hypersphere, i.e. f (x) 2 = 1. This loss is
motivated in the context of nearest-neighbor classifi- cation. Here we want to ensure that an image xa (anchor) of
a specific person is closer to all other images xp (positive) of the same person than it is to any image xn (negative)
of any other person. This is visualized in Figure 3.
Thus we want,
where α is a margin that is enforced between positive and negative pairs. T is the set of all possible triplets in the
training set and has cardinality N.
The loss that is being minimized is then L =
Generating all possible triplets would result in many triplets that are easily satisfied (i.e. fulfill the constraint in
Eq. (1)). These triplets would not contribute to the training and result in slower convergence, as they would still
be passed through the network. It is crucial to select hard triplets, that are active and can therefore contribute to
improving the model. The following section talks about the different approaches we use for the triplet selection.
Triplet Selection
In order to ensure fast convergence it is crucial to select triplets that violate the triplet constraint in Eq. (1). This
2
means that, given 𝑥𝑖𝑎 , we want to select an 𝑥𝑖𝑝 (hard positive) such that argmax 𝑥𝑖𝑝 ∥∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑝 )∥∥2 and similarly
𝑥𝑖𝑛 (hard negative) such that argmin 𝑥𝑖𝑛 ∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑛 )∥22 .
It is infeasible to compute the argmin and argmax across the whole training set. Additionally, it might lead to poor
training, as mislabelled and poorly imaged faces would dominate the hard positives and negatives. There are two
obvious choices that avoid this issue:
• Generate triplets offline every n steps, using the most recent network checkpoint and computing the argmin and
argmax on a subset of the data.
• Generate triplets online. This can be done by selecting the hard positive/negative exemplars from within a mini-
batch.
Here, we focus on the online generation and use large mini-batches in the order of a few thousand exemplars and
only compute the argmin and argmax within a mini-batch. To have a meaningful representation of the anchor-
positive distances, it needs to be ensured that a minimal number of exemplars of any one identity is present in
each mini-batch. In our experiments we sample the training data such that around 40 faces are selected per identity
28 | P a g e
per minibatch. Additionally, randomly sampled negative faces are added to each mini-batch.
Instead of picking the hardest positive, we use all anchor-positive pairs in a mini-batch while still selecting the
hard negatives. We don’t have a side-by-side comparison of hard anchor-positive pairs versus all anchor-positive
pairs within a mini-batch, but we found in practice that the all anchor positive method was more stable and
converged slightly faster at the beginning of training.
We also explored the offline generation of triplets in conjunction with the online generation and it may allow the
use of smaller batch sizes, but the experiments were inconclusive.
Selecting the hardest negatives can in practice lead to bad local minima early on in training, specifically it can
result in a collapsed model (i.e. f(x) = 0). In order to mitigate this, it helps to select 𝑥𝑖𝑛 such that
2
∥∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑝 )∥∥ < ∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑛 )∥2 . (4)
2 2
We call these negative exemplars semi-hard, as they are further away from the anchor than the positive exemplar,
but still hard because the squared distance is close to the anchor-positive distance. Those negatives lie inside the
margin α.
As mentioned before, correct triplet selection is crucial for fast convergence. On the one hand we would like to
use small mini-batches as these tend to improve convergence during Stochastic Gradient Descent (SGD) [20]. On
the other hand, implementation details make batches of tens to hundreds of exemplars more efficient. The main
constraint with regards to the batch size, however, is the way we select hard relevant triplets from within the mini-
batches. In most experiments we use a batch size of around 1,800 exemplars.
Deep Convolutional Networks
In all our experiments we train the CNN using Stochastic Gradient Descent (SGD) with standard backprop and
AdaGrad. In most experiments we start with a learning rate of 0:05 which we lower to finalize the model. The
models are initialized from random, similar to, and trained on a CPU cluster for 1,000 to 2,000 hours. The decrease
in the loss (and increase in accuracy) slows down drastically after 500h of training, but additional training can
still significantly improve performance. The margin α is set to 0:2. We used two types of architectures and explore
their trade-offs in more detail in the experimental section. Their practical differences lie in the difference of
parameters and FLOPS. The best model may be different depending on the application. E.g. a model running in a
data center can have many parameters and require a large number of FLOPS, whereas a model running on a mobile
phone needs to have few parameters, so that it can fit into memory. All our
29 | P a g e
layer size-in size-out kernel param FLPS
conv1 220×220×3 110×110×64 7×7×3, 2 9K 115M
pool1 110×110×64 55×55×64 3×3×64, 2 0
rnorm1 55×55×64 55×55×64 0
conv2a 55×55×64 55×55×64 1×1×64, 1 4K 13M
conv2 55×55×64 55×55×192 3×3×64, 1 111K 335M
rnorm2 55×55×192 55×55×192 0
pool2 55×55×192 28×28×192 3×3×192, 2 0
conv3a 28×28×192 28×28×192 1×1×192, 1 37K 29M
conv3 28×28×192 28×28×384 3×3×192, 1 664K 521M
pool3 28×28×384 14×14×384 3×3×384, 2 0
conv4a 14×14×384 14×14×384 1×1×384, 1 148K 29M
conv4 14×14×384 14×14×256 3×3×384, 1 885K 173M
conv5a 14×14×256 14×14×256 1×1×256, 1 66K 13M
conv5 14×14×256 14×14×256 3×3×256, 1 590K 116M
conv6a 14×14×256 14×14×256 1×1×256, 1 66K 13M
conv6 14×14×256 14×14×256 3×3×256, 1 590K 116M
pool4 14×14×256 7×7×256 3×3×256, 2 0
concat 7×7×256 7×7×256 0
fc1 7×7×256 1×32×128 maxout p=2 103M 103M
fc2 1×32×128 1×32×128 maxout p=2 34M 34M
fc7128 1×32×128 1×1×128 524K 0.5M
L2 1×1×128 1×1×128 0
total 140M 1.6B
Table 1. NN1. This table show the structure of our Zeiler&Fergus based model with 1×1 convolutions in- spired
by. The input and output sizes are described in rows × cols × #filters. The kernel is specified as rows × cols,
stride and the maxout pooling size as p = 2.
Experiments
If not mentioned otherwise we use between 100M-200M training face thumbnails consisting of about 8M
different identities. A face detector is run on each image and a tight bounding box around each face is generated.
These face thumbnails are resized to the input size of the respective network. Input sizes range from 96x96
pixels to 224x224 pixels in our experiments.
31 | P a g e
Computation Accuracy Trade-off
Before diving into the details of more specific experiments we will discuss the trade-off of accuracy versus
number of FLOPS that a particular model requires. Figure 4 shows the FLOPS on the x-axis and the accuracy at
0.001 false accept rate (FAR) on our user labelled test-data set from section 4.2. It is interesting to see the strong
correlation between the computation a model requires and the accuracy it achieves. The figure highlights the
five models (NN1, NN2, NN3, NNS1, NNS2) that we discuss in more detail in our experiments.
We also looked into the accuracy trade-off with regards to the number of model parameters. However, the
picture is not as clear in that case. For example, the Inception based model NN2 achieves a comparable
performance to NN1, but only has a 20th of the parameters. The number of FLOPS is comparable, though.
Obviously at some point the performance is expected to decrease, if the number of parameters is reduced
further. Other model architectures may allow further reductions without loss of accuracy, just like Inception did
in this case.
type output depth #1×1 #3×3 #3×3 #5×5 #5×5 pool params FLOPS
size reduce reduce proj (p)
conv1 (7×7×3, 2) 112×112×64 1 9K 119M
max pool + norm 56×56×64 0 m 3×3, 2
inception (2) 56×56×192 2 64 192 115K 360M
norm + max pool 28×28×192 0 m 3×3, 2
inception (3a) 28×28×256 2 64 96 128 16 32 m, 32p 164K 128M
inception (3b) 28×28×320 2 64 96 128 32 64 L2, 64p 228K 179M
inception (3c) 14×14×640 2 0 128 256,2 32 64,2 m 3×3,2 398K 108M
inception (4a) 14×14×640 2 256 96 192 32 64 L2, 128p 545K 107M
inception (4b) 14×14×640 2 224 112 224 32 64 L2, 128p 595K 117M
inception (4c) 14×14×640 2 192 128 256 32 64 L2, 128p 654K 128M
inception (4d) 14×14×640 2 160 144 288 32 64 L2, 128p 722K 142M
inception (4e) 7×7×1024 2 0 160 256,2 64 128,2 m 3×3,2 717K 56M
inception (5a) 7×7×1024 2 384 192 384 48 128 L2, 128p 1.6M 78M
inception (5b) 7×7×1024 2 384 192 384 48 128 m, 128p 1.6M 78M
avg pool 1×1×1024 0
fully conn 1×1×128 1 131K 0.1M
L2 normalization 1×1×128 0
total 7.5M 1.6B
Table 2. NN2. Details of the NN2 Inception incarnation. This model is almost identical to the one described in.
The two major differences are the use of L2 pooling instead of max pooling (m), where specified. I.e. instead of
taking the spatial max the L2 norm is computed. The pooling is always 3×3 (aside from the final average
pooling) and in parallel to the convolutional modules inside each Inception module. If there is a dimensionality
reduction after the pooling it is denoted with p. 1×1, 3×3, and 5×5 pooling are then concatenated to get the final
output.
32 | P a g e
Figure 5. Network Architectures. This plot shows
the complete ROC for the four different models on
our personal photos test set from section 4.2. The
sharp drop at 10E-4 FAR can be explained by noise
in the groundtruth labels. The models in order of
performance are: NN2: 224×224 input Inception
based model; NN1: Zeiler&Fergus based network
with 1×1 con- volutions; NNS1: small Inception
style model with only 220M FLOPS; NNS2: tiny
Inception model with only 20M FLOPS.
architecture VAL
NN1 (Zeiler&Fergus 87.9% ± 1.9
220×220)
NN2 (Inception 224×224) 89.4% ± 1.6
NN3 (Inception 160×160) 88.3% ± 1.7
NN4 (Inception 96×96) 82.0% ± 2.3
NNS1 (mini Inception 82.4% ± 2.4
165×165)
NNS2 (tiny Inception 51.9% ± 2.9
140×116)
Table 3. Network Architectures. This table compares the per- formance of our model architectures on the hold
out test set. Reported is the mean validation rate VAL at 10E-3 false accept rate. Also shown is the standard
error of the mean across the five test splits.
33 | P a g e
jpeg q val-rate
#pixels val-rate
10 67.3%
1,600 37.8%
20 81.4%
6,400 79.5%
30 83.9%
14,400 84.5%
50 85.5%
25,600 85.7%
70 86.1%
65,536 86.4%
90 86.5%
Table 4. Image Quality. The table on the left shows the effect on the validation rate at 10E-3 precision with
varying JPEG quality. The one on the right shows how the image size in pixels effects the validation rate at 10E-
3 precision. This experiment was done with NN1 on the first split of our test hold-out dataset.
#dims VAL
64 86.8% ± 1.7
128 87.9% ± 1.9
256 87.7% ± 1.9
512 85.6% ± 2.0
Table 5. Embedding Dimensionality. This Table compares the effect of the embedding dimensionality of our
model NN1 on our hold-out set. In addition to the VAL at 10E-3 we also show the standard error of the mean
computed across five splits.
shown in Figure 5. While the largest model achieves a dramatic improvement in accuracy compared to the tiny
NNS2, the latter can be run 30ms / image on a mobile phone and is still accurate enough to be used in face
clustering. The sharp drop in the ROC for FAR < 10−4 indicates noisy labels in the test data ground truth. At
extremely low false accept rates a single mislabeled image can have a significant impact on the curve.
Embedding Dimensionality
We explored various embedding dimensionalities and selected 128 for all experiments other than the comparison
reported in Table 5. One would expect the larger embeddings to perform at least as good as the smaller ones,
however, it is possible that they require more training to achieve the same accuracy. That said, the differences in
the performance reported in Table 5 are statistically insignificant.
Table 6. Training Data Size. This table compares the performance after 700h of training for a smaller model with
96x96 pixel inputs. The model architecture is similar to NN2, but without the 5x5 convolutions in the Inception
modules.
It should be noted, that during training a 128 dimensional float vector is used, but it can be quantized to 128-bytes
34 | P a g e
without loss of accuracy. Thus each face is compactly rep- resented by a 128 dimensional byte vector, which is
ideal for large scale clustering and recognition. Smaller embed- dings are possible at a minor loss of accuracy and
could be employed on mobile devices.
Figure 6. LFW errors. This shows all pairs of images that were incorrectly classified on LFW. Only eight of
the 13 false rejects shown here are actual errors the other five are mislabeled in LFW.
35 | P a g e
95.18%. Compared to 91.4% who also evaluate one hundred frames per video we reduce the error rate by almost
half. DeepId2+ achieved 93.2% and our method reduces this error by 30%, comparable to our improvement on
LFW.
Face Clustering
Our compact embedding lends itself to be used in order to cluster a users personal photos into groups of people
with the same identity. The constraints in assignment imposed by clustering faces, compared to the pure
verification task,
Figure 7. Face Clustering. Shown is an exemplar cluster for one user. All these images in the users personal
photo collection were clustered together.
lead to truly amazing results. Figure 7 shows one cluster in a users personal photo collection, generated using
agglomerative clustering. It is a clear showcase of the incredible invariance to occlusion, lighting, pose and even
age.
36 | P a g e
Summary
We provide a method to directly learn an embedding into an Euclidean space for face verification. This sets it
apart from other methods who use the CNN bottleneck layer, or require additional post-processing such as
concatenation of multiple models and PCA, as well as SVM classification. Our end-to-end training both simplifies
the setup and shows that directly optimizing a loss relevant to the task at hand improves performance.
Another strength of our model is that it only requires minimal alignment (tight crop around the face area), for
example, performs a complex 3D alignment. We also experimented with a similarity transform alignment and no-
tice that this can actually improve performance slightly. It is not clear if it is worth the extra complexity.
Future work will focus on better understanding of the error cases, further improving the model, and also reducing
model size and reducing CPU requirements. We will also look into ways of improving the currently extremely
long training times, e.g. variations of our curriculum learning with smaller batch sizes and offline as well as online
positive and negative mining.
Appendix: Harmonic Embedding
In this section we introduce the concept of harmonic embeddings. By this we denote a set of embeddings that are
generated by different models v1 and v2 but are compatible in the sense that they can be compared to each other.
This compatibility greatly simplifies upgrade paths.
E.g. in an scenario where embedding v1 was computed across a large set of images and a new embedding model
v2 is being rolled out, this compatibility ensures a smooth transition without the need to worry about version
incomepatibilities. Figure 8 shows results on our 3G dataset. It can be seen that the improved model NN2
significantly outperforms NN1, while the comparison of NN2 embeddings to NN1 embeddings performs at an
intermediate level.
37 | P a g e
Figure 9. Learning the Harmonic Embedding. In
order to learn a harmonic embedding, we generate
triplets that mix the v1 embeddings with the v2
embeddings that are being trained. The semihard
negatives are selected from the whole set of both v1
and v2 embeddings.
These are very interesting findings and it is somewhat surprising that it works so well. Future work can explore
how far this idea can be extended. Presumably there is a limit as to how much the v2 embedding can improve
over v1, while still being compatible. Additionally it would be interesting to train small networks that can run on
a mobile phone and are compatible to a larger server side model.
38 | P a g e
• Coding Section:
get_faces_from_camera_tkinter.py
import dlib
import numpy as np
import cv2
import os
import shutil
import time
import logging
import tkinter as tk
from tkinter import font as tkFont
from PIL import Image, ImageTk
class Face_Register:
def __init__(self):
# Tkinter GUI
self.win = tk.Tk()
self.win.title("Face Register")
39 | P a g e
self.font_step_title = tkFont.Font(family='Helvetica', size=15, weight='bold')
self.font_warning = tkFont.Font(family='Helvetica', size=15, weight='bold')
self.path_photos_from_camera = "data/data_faces_from_camera/"
self.current_face_dir = ""
self.font = cv2.FONT_ITALIC
self.out_of_range_flag = False
self.face_folder_created_flag = False
# FPS
self.frame_time = 0
self.frame_start_time = 0
self.fps = 0
self.fps_show = 0
self.start_time = time.time()
def GUI_get_input_name(self):
self.input_name_char = self.input_name.get()
self.create_face_folder()
self.label_cnt_face_in_database['text'] = str(self.existing_faces_cnt)
def GUI_info(self):
tk.Label(self.frame_right_info,
text="Face register",
40 | P a g e
font=self.font_title).grid(row=0, column=0, columnspan=3, sticky=tk.W, padx=2, pady=20)
tk.Label(self.frame_right_info,
text="Faces in current frame: ").grid(row=3, column=0, columnspan=2, sticky=tk.W, padx=5,
pady=2)
self.label_face_cnt.grid(row=3, column=2, columnspan=3, sticky=tk.W, padx=5, pady=2)
tk.Button(self.frame_right_info,
text='Input',
command=self.GUI_get_input_name).grid(row=8, column=2, padx=5)
tk.Button(self.frame_right_info,
text='Save current face',
command=self.save_current_face).grid(row=10, column=0, columnspan=3, sticky=tk.W)
41 | P a g e
self.frame_right_info.pack()
self.label_fps_info["text"] = str(self.fps.__round__(2))
def create_face_folder(self):
# Create the folders for saving faces
self.existing_faces_cnt += 1
if self.input_name_char:
self.current_face_dir = self.path_photos_from_camera + \
"person_" + str(self.existing_faces_cnt) + "_" + \
self.input_name_char
else:
self.current_face_dir = self.path_photos_from_camera + \
"person_" + str(self.existing_faces_cnt)
os.makedirs(self.current_face_dir)
self.log_all["text"] = "\"" + self.current_face_dir + "/\" created!"
logging.info("\n%-40s %s", "Create folders:", self.current_face_dir)
42 | P a g e
self.ss_cnt = 0 # Clear the cnt of screen shots
self.face_folder_created_flag = True # Face folder already created
def save_current_face(self):
if self.face_folder_created_flag:
if self.current_frame_faces_cnt == 1:
if not self.out_of_range_flag:
self.ss_cnt += 1
# Create blank image according to the size of face detected
self.face_ROI_image = np.zeros((int(self.face_ROI_height * 2), self.face_ROI_width * 2, 3),
np.uint8)
for ii in range(self.face_ROI_height * 2):
for jj in range(self.face_ROI_width * 2):
self.face_ROI_image[ii][jj] = self.current_frame[self.face_ROI_height_start - self.hh + ii][
self.face_ROI_width_start - self.ww + jj]
self.log_all["text"] = "\"" + self.current_face_dir + "/img_face_" + str(
self.ss_cnt) + ".jpg\"" + " saved!"
self.face_ROI_image = cv2.cvtColor(self.face_ROI_image, cv2.COLOR_BGR2RGB)
def get_frame(self):
try:
if self.cap.isOpened():
ret, frame = self.cap.read()
frame = cv2.resize(frame, (640,480))
return ret, cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
except:
print("Error: No video input!!!")
# Refresh frame
self.win.after(20, self.process)
def run(self):
self.pre_work_mkdir()
self.check_existing_faces_cnt()
self.GUI_info()
self.process()
self.win.mainloop()
def main():
logging.basicConfig(level=logging.INFO)
Face_Register_con = Face_Register()
Face_Register_con.run()
if __name__ == '__main__':
main()
44 | P a g e
features_extraction_to_csv.py
import os
import dlib
import csv
import numpy as np
import logging
import cv2
def return_128d_features(path_img):
img_rd = cv2.imread(path_img)
faces = detector(img_rd, 1)
# For photos of faces saved, we need to make sure that we can detect faces from the cropped images
if len(faces) != 0:
shape = predictor(img_rd, faces[0])
face_descriptor = face_reco_model.compute_face_descriptor(img_rd, shape)
else:
face_descriptor = 0
logging.warning("no face")
return face_descriptor
def return_features_mean_personX(path_face_personX):
features_list_personX = []
photos_list = os.listdir(path_face_personX)
if photos_list:
for i in range(len(photos_list)):
# return_128d_features() 128D / Get 128D features for single image of personX
logging.info("%-40s %-20s", " / Reading image:", path_face_personX + "/" + photos_list[i])
features_128d = return_128d_features(path_face_personX + "/" + photos_list[i])
# Jump if no face detected from image
45 | P a g e
if features_128d == 0:
i += 1
else:
features_list_personX.append(features_128d)
else:
logging.warning(" Warning: No images in%s/", path_face_personX)
if features_list_personX:
features_mean_personX = np.array(features_list_personX, dtype=object).mean(axis=0)
else:
features_mean_personX = np.zeros(128, dtype=object, order='C')
return features_mean_personX
def main():
logging.basicConfig(level=logging.INFO)
# Get the order of latest person
person_list = os.listdir("data/data_faces_from_camera/")
person_list.sort()
if len(person.split('_', 2)) == 2:
# "person_x"
person_name = person
else:
# "person_x_tom"
person_name = person.split('_', 2)[-1]
features_mean_personX = np.insert(features_mean_personX, 0, person_name, axis=0)
# features_mean_personX will be 129D, person name + 128 features
writer.writerow(features_mean_personX)
logging.info('\n')
logging.info("Save all the features of faces registered into: data/features_all.csv")
if __name__ == '__main__':
main()
attendance_taker.py
import dlib
import numpy as np
import cv2
import os
import pandas as pd
import time
46 | P a g e
import logging
import sqlite3
import datetime
# Dlib Resnet Use Dlib resnet50 model to get 128D face descriptor
face_reco_model =
dlib.face_recognition_model_v1("data/data_dlib/dlib_face_recognition_resnet_model_v1.dat")
class Face_Recognizer:
def __init__(self):
self.font = cv2.FONT_ITALIC
# FPS
self.frame_time = 0
self.frame_start_time = 0
self.fps = 0
self.fps_show = 0
self.start_time = time.time()
47 | P a g e
# List to save centroid positions of ROI in frame N-1 and N
self.last_frame_face_centroid_list = []
self.current_frame_face_centroid_list = []
def update_fps(self):
now = time.time()
48 | P a g e
# Refresh fps per second
if str(self.start_time).split(".")[0] != str(now).split(".")[0]:
self.fps_show = self.fps
self.start_time = now
self.frame_time = now - self.frame_start_time
self.fps = 1.0 / self.frame_time
self.frame_start_time = now
@staticmethod
# / Compute the e-distance between two 128D features
def return_euclidean_distance(feature_1, feature_2):
feature_1 = np.array(feature_1)
feature_2 = np.array(feature_2)
dist = np.sqrt(np.sum(np.square(feature_1 - feature_2)))
return dist
# / Use centroid tracker to link face_x in current frame with person_x in last frame
def centroid_tracker(self):
for i in range(len(self.current_frame_face_centroid_list)):
e_distance_current_frame_person_x_list = []
# For object 1 in current_frame, compute e-distance with object 1/2/3/4/... in last frame
for j in range(len(self.last_frame_face_centroid_list)):
self.last_current_frame_centroid_e_distance = self.return_euclidean_distance(
self.current_frame_face_centroid_list[i], self.last_frame_face_centroid_list[j])
e_distance_current_frame_person_x_list.append(
self.last_current_frame_centroid_e_distance)
last_frame_num = e_distance_current_frame_person_x_list.index(
min(e_distance_current_frame_person_x_list))
self.current_frame_face_name_list[i] = self.last_frame_face_name_list[last_frame_num]
for i in range(len(self.current_frame_face_name_list)):
img_rd = cv2.putText(img_rd, "Face_" + str(i + 1), tuple(
[int(self.current_frame_face_centroid_list[i][0]), int(self.current_frame_face_centroid_list[i][1])]),
self.font,
0.8, (255, 190, 0),
1,
49 | P a g e
cv2.LINE_AA)
# insert data in database
if existing_entry:
print(f"{name} is already marked as present for {current_date}")
else:
current_time = datetime.datetime.now().strftime('%H:%M:%S')
cursor.execute("INSERT INTO attendance (name, time, date) VALUES (?, ?, ?)", (name, current_time,
current_date))
conn.commit()
print(f"{name} marked as present for {current_date} at {current_time}")
conn.close()
50 | P a g e
self.current_frame_face_position_list = []
if "unknown" in self.current_frame_face_name_list:
self.reclassify_interval_cnt += 1
if self.current_frame_face_cnt != 0:
for k, d in enumerate(faces):
self.current_frame_face_position_list.append(tuple(
[faces[k].left(), int(faces[k].bottom() + (faces[k].bottom() - faces[k].top()) / 4)]))
self.current_frame_face_centroid_list.append(
[int(faces[k].left() + faces[k].right()) / 2,
int(faces[k].top() + faces[k].bottom()) / 2])
img_rd = cv2.rectangle(img_rd,
tuple([d.left(), d.top()]),
tuple([d.right(), d.bottom()]),
(255, 255, 255), 2)
for i in range(self.current_frame_face_cnt):
# 6.2 Write names under ROI
img_rd = cv2.putText(img_rd, self.current_frame_face_name_list[i],
self.current_frame_face_position_list[i], self.font, 0.8, (0, 255, 255), 1,
cv2.LINE_AA)
self.draw_note(img_rd)
self.current_frame_face_X_e_distance_list = []
# 6.2.2.3
# For every faces detected, compare the faces in the database
for i in range(len(self.face_features_known_list)):
#
if str(self.face_features_known_list[i][0]) != '0.0':
e_distance_tmp = self.return_euclidean_distance(
self.current_frame_face_feature_list[k],
self.face_features_known_list[i])
logging.debug(" with person %d, the e-distance: %f", i + 1, e_distance_tmp)
self.current_frame_face_X_e_distance_list.append(e_distance_tmp)
else:
# person_X
self.current_frame_face_X_e_distance_list.append(999999999)
print(type(self.face_name_known_list[similar_person_num]))
print(nam)
self.attendance(nam)
else:
logging.debug(" Face recognition result: Unknown person")
self.update_fps()
cv2.namedWindow("camera", 1)
cv2.imshow("camera", img_rd)
logging.debug("Frame ends\n\n")
def run(self):
# cap = cv2.VideoCapture("video.mp4") # Get video stream from video file
cap = cv2.VideoCapture(0) # Get video stream from camera
self.process(cap)
cap.release()
cv2.destroyAllWindows()
def main():
# logging.basicConfig(level=logging.DEBUG) # Set log level to 'logging.DEBUG' to print debug info of every
frame
logging.basicConfig(level=logging.INFO)
Face_Recognizer_con = Face_Recognizer()
Face_Recognizer_con.run()
if __name__ == '__main__':
main()
app.py
app = Flask(__name__)
@app.route('/')
def index():
return render_template('index.html', selected_date='', no_data=False)
@app.route('/attendance', methods=['POST'])
def attendance():
selected_date = request.form.get('selected_date')
selected_date_obj = datetime.strptime(selected_date, '%Y-%m-%d')
formatted_date = selected_date_obj.strftime('%Y-%m-%d')
53 | P a g e
conn = sqlite3.connect('attendance.db')
cursor = conn.cursor()
conn.close()
if not attendance_data:
return render_template('index.html', selected_date=selected_date, no_data=True)
if __name__ == '__main__':
app.run(debug=True)
templates/index.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Attendance Tracker Sheet</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
<style>
body {
font-family: Arial, sans-serif;
background-color: #f4f4f4;
}
form {
margin-top: 50px;
display: flex;
flex-direction: column;
align-items: center;
border: 1px solid #ddd;
padding: 20px;
border-radius: 5px;
box-shadow: 0px 0px 10px rgba(0, 0, 0, 0.2);
background-color: #fff;
width: 50%;
margin-left: auto;
margin-right: auto;
}
label {
font-size: 20px;
margin-bottom: 10px;
color: #333;
}
54 | P a g e
input[type="date"] {
padding: 10px 20px;
border-radius: 5px;
border: none;
margin-bottom: 20px;
font-size: 18px;
width: 100%;
box-sizing: border-box;
margin-top: 10px;
margin-bottom: 20px;
}
button[type="submit"] {
background-color: #333;
color: #fff;
border: none;
padding: 10px;
border-radius: 5px;
cursor: pointer;
font-size: 18px;
}
button[type="submit"]:hover {
background-color: #555;
}
</style>
</head>
<body>
<div class="jumbotron text-center">
<h1 class="display-4">Attendance Tracker Sheet</h1>
</div>
<hr>
requirements.txt
dlib==19.17.0
numpy==1.22.0
scikit-image==0.18.3
pandas==1.3.4
opencv-python==4.5.4.58
flask
Usage
1. Collect the Faces Dataset by running python get_faces_from_camera_tkinter.py .
2. Convert the dataset into python features_extraction_to_csv.py.
3. To take the attendance run python attendance_taker.py .
4. Check the Database by python app.py.
The Face Recognition system basically identifies the 68 points in human Face. As per the below figure.
56 | P a g e
Fig : The 68 landmarks detected by dlib library
57 | P a g e
RESULT
Executing get_faces_from_camera_tkinter.py
58 | P a g e
executing features_extraction_to_csv.py
59 | P a g e
executing attendance_taker.py
60 | P a g e
executing app.py
61 | P a g e
CONCLUSION
The project "Face Recognition for E-Authentication" marks a pivotal step forward in the field of biometric
authentication, particularly in enhancing the security and user experience of electronic authentication (e-
authentication) systems. As the digital landscape evolves, the reliance on secure and efficient authentication methods
becomes increasingly critical. Traditional methods such as passwords and PINs are becoming less reliable due to
vulnerabilities like phishing attacks, password leaks, and brute force attacks. Face recognition technology, leveraging
the unique biometric characteristics of an individual's face, offers a promising solution to these challenges.
The project's primary aim was to develop a robust and efficient face recognition system tailored specifically for e-
authentication purposes. This involved the design and implementation of sophisticated algorithms for face detection,
feature extraction, and matching, utilizing state-of-the-art deep learning techniques such as convolutional neural
networks (CNNs). By analyzing and comparing facial features captured from an individual's face with pre-registered
templates, the system aims to authenticate users with high accuracy and confidence.
One of the key achievements of the project is the enhancement of security measures. Beyond traditional authentication
methods, the system incorporates advanced anti-spoofing mechanisms to detect and prevent unauthorized access
attempts. These mechanisms are designed to differentiate between genuine facial images and spoofed or manipulated
ones, thus mitigating the risk of presentation attacks. Additionally, encryption and secure communication protocols
were integrated to protect sensitive data during the authentication process, ensuring compliance with privacy laws
and regulations governing biometric data.
Another significant aspect of the project is the focus on optimizing accuracy and reliability. Through rigorous testing
and validation, the system minimizes false acceptance and rejection rates, ensuring robust performance under various
environmental conditions and scenarios. The fine-tuning of system parameters and optimization of algorithms were
crucial steps in achieving this objective. The project also explored adaptive learning techniques to continuously
improve the system's performance as it encounters new data and scenarios.
The project placed a strong emphasis on enhancing the user experience. An intuitive and user-friendly interface was
designed to facilitate seamless interaction with the face recognition system. Usability enhancements such as adaptive
feedback mechanisms and multi-modal authentication approaches were incorporated to streamline the authentication
process and minimize user friction. These efforts ensure that the system is accessible to users with diverse
backgrounds and varying levels of technical proficiency.
Compatibility and scalability were also key considerations in the project. The system was designed to be compatible
with existing e-authentication frameworks and platforms, enabling seamless integration in diverse application
environments. The architecture of the system was made scalable and adaptable to accommodate future enhancements
and expansions. This ensures that the system can evolve with technological advancements and changing user needs.
Ethical and legal considerations were paramount throughout the project. Measures were put in place to adhere to
ethical guidelines and legal regulations governing the collection, storage, and processing of biometric data. User
privacy and confidentiality were prioritized, with stringent measures such as data anonymization and secure storage
practices implemented to protect sensitive information. Regular ethical reviews and assessments were conducted to
ensure compliance and address any emerging concerns.
Documentation and knowledge sharing were integral components of the project. The development process,
methodologies, and findings were comprehensively documented to facilitate knowledge sharing and replication
within the academic and professional community. Research papers, reports, and technical documentation were
published to disseminate the project outcomes and contribute to the broader advancement of face recognition
technology for e-authentication.
In conclusion, the project "Face Recognition for E-Authentication" successfully demonstrates the potential of face
recognition technology to provide a secure, reliable, and user-friendly solution for electronic authentication. By
addressing key objectives such as system development, security enhancement, accuracy optimization, user experience
62 | P a g e
enhancement, compatibility and scalability, ethical and legal compliance, and comprehensive documentation, the
project makes a valuable contribution to the field. The advancements achieved in this project pave the way for the
broader adoption of face recognition technology in various online applications and services, enhancing the security
and efficiency of digital interactions in today's interconnected world.
The successful implementation of the face recognition system for e-authentication underscores the importance of
continuous innovation and development in the field of biometric authentication. As technology advances and new
challenges emerge, ongoing research and development efforts will be crucial to further improve the accuracy,
security, and usability of face recognition systems. Future work may focus on exploring advanced machine learning
techniques, incorporating additional biometric modalities, and addressing emerging security threats to ensure that
face recognition technology remains a reliable and robust solution for e-authentication.
Overall, the project "Face Recognition for E-Authentication" represents a significant milestone in the evolution of
authentication technologies. By leveraging the unique capabilities of face recognition, the project enhances both the
security and user experience of e-authentication systems, contributing to a safer and more efficient digital landscape.
Through its comprehensive approach and focus on continuous improvement, the project sets a strong foundation for
future advancements in the field, ensuring that face recognition technology continues to play a crucial role in securing
digital interactions and protecting user identities.
63 | P a g e
SCOPE FOR FUTURE WORK
The future work for the project of "Face Recognition for E-Authentication" encompasses a broad spectrum of
advancements and explorations, aimed at enhancing the performance, security, and applicability of face recognition
technology in e-authentication systems. As the digital landscape continues to evolve, and as face recognition
technology matures, several key areas for future research and development become evident. These areas include
improving algorithmic accuracy and robustness, addressing ethical and privacy concerns, enhancing user experience,
exploring multi-modal authentication systems, ensuring compliance with emerging regulations, and expanding the
applicability of face recognition systems to new domains and environments.
Firstly, enhancing the algorithmic accuracy and robustness of face recognition systems remains a paramount focus.
This involves leveraging advancements in deep learning and artificial intelligence to develop more sophisticated
algorithms capable of handling a wider range of variations in facial expressions, lighting conditions, and occlusions.
Future work could explore the integration of advanced neural network architectures, such as transformers, and the
application of transfer learning to improve the generalization capabilities of face recognition models. Additionally,
ongoing research into adversarial robustness is crucial, as it addresses the susceptibility of face recognition systems
to adversarial attacks that can manipulate the input data to deceive the system.
Secondly, ethical and privacy concerns surrounding the use of biometric data, including face recognition, must be
thoroughly addressed. Future work should focus on developing techniques for secure and privacy-preserving face
recognition, such as federated learning and homomorphic encryption. These techniques enable the training and
deployment of face recognition models without directly accessing or storing sensitive biometric data, thereby
mitigating privacy risks. Furthermore, the development of transparent and explainable AI models will enhance user
trust and compliance with ethical guidelines by providing insights into how face recognition decisions are made.
Enhancing user experience is another critical area for future research. This involves designing more intuitive and
accessible user interfaces that cater to diverse user populations, including individuals with disabilities. Future work
could explore the use of haptic feedback, voice assistance, and other adaptive technologies to create more inclusive
authentication experiences. Additionally, user studies and usability testing will be essential to identify and address
pain points in the user journey, ensuring that the face recognition system is not only secure but also user-friendly.
Exploring multi-modal authentication systems represents a promising avenue for future research. Combining face
recognition with other biometric modalities, such as voice recognition, fingerprint scanning, or iris recognition, can
significantly enhance the security and reliability of e-authentication systems. Multi-modal authentication systems can
leverage the strengths of each modality to provide a more comprehensive and robust authentication solution. Future
work should focus on developing seamless integration techniques and evaluating the performance of multi-modal
systems in various real-world scenarios.
Ensuring compliance with emerging regulations and standards is another crucial aspect of future work. As
governments and regulatory bodies continue to develop and implement regulations related to biometric data usage, it
is essential to stay abreast of these changes and ensure that face recognition systems comply with all relevant laws
and standards. This includes implementing robust data protection measures, conducting regular audits, and
maintaining transparency with users about how their biometric data is collected, stored, and used.
64 | P a g e
Expanding the applicability of face recognition systems to new domains and environments also presents significant
opportunities for future research. Beyond e-authentication, face recognition technology can be applied to areas such
as access control, surveillance, and personalized user experiences. Each of these applications presents unique
challenges and requirements that must be addressed. For example, in surveillance, face recognition systems must be
able to operate effectively in crowded and dynamic environments, while in personalized user experiences, they must
be able to recognize individuals across different contexts and devices.
Moreover, future work should focus on addressing the challenges of scalability and deployment in diverse
environments. Developing lightweight and efficient models that can run on a variety of devices, from high-
performance servers to resource-constrained mobile devices, is essential for widespread adoption. Techniques such
as model compression, quantization, and edge computing can help achieve this goal by reducing the computational
and memory requirements of face recognition models.
Another important area for future research is the continuous improvement of anti-spoofing mechanisms. As attackers
develop more sophisticated spoofing techniques, it is crucial to stay ahead by developing advanced liveness detection
methods and other countermeasures. Future work could explore the use of multi-spectral imaging, thermal cameras,
and behavioral biometrics to enhance the ability of face recognition systems to detect and prevent spoofing attempts.
Furthermore, cross-cultural and demographic considerations must be taken into account to ensure the fairness and
inclusivity of face recognition systems. Future research should focus on developing datasets and models that are
representative of diverse populations, minimizing biases that can lead to unequal performance across different
demographic groups. Collaboration with international organizations and communities can help identify and address
cultural nuances and ethical considerations specific to different regions.
In addition to technical advancements, future work should also focus on fostering collaboration and knowledge
sharing within the research community. Establishing open-source initiatives, creating benchmark datasets, and
organizing conferences and workshops can facilitate the exchange of ideas and best practices. Collaboration with
industry partners can also help bridge the gap between research and practical applications, ensuring that advancements
in face recognition technology are translated into real-world solutions that benefit society.
Finally, future work should consider the long-term societal implications of face recognition technology. This includes
exploring the impact on privacy, civil liberties, and social dynamics. Engaging with policymakers, ethicists, and the
public in discussions about the appropriate use and regulation of face recognition technology is essential to ensure
that its deployment aligns with societal values and expectations.
In conclusion, the scope for future work in the project of "Face Recognition for E-Authentication" is vast and
multifaceted, encompassing technical advancements, ethical considerations, user experience enhancements,
regulatory compliance, and societal impact. By addressing these areas, future research and development efforts can
ensure that face recognition technology continues to evolve and improve, providing secure, reliable, and user-friendly
authentication solutions for a wide range of applications. Through ongoing innovation and collaboration, face
recognition technology can play a crucial role in shaping the future of digital security and interactions.
65 | P a g e
REFERENCES
1. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proc. of ICML, New York, NY,
USA, 2009. 2
2. D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Proc. ECCV,
2012. 2
3. D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In Proc. ECCV, 2014.
7
4. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, Q. V.
Le, and A. Y. Ng. Large scale distributed deep networks. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K.
Weinberger, editors, NIPS, pages 1232–1240. 2012. 10
5. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization.
J. Mach. Learn. Res., 12:2121–2159, July 2011. 4
6. I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In In ICML, 2013.
4
7. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying
face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst,
October 2007. 5
8. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation
applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, Dec. 1989. 2, 4
9. M. Lin, Q. Chen, and S. Yan. Network in network. CoRR, abs/1312.4400, 2013. 2, 4, 6
10. C. Lu and X. Tang. Surpassing human-level face verification performance on LFW with gaussianface. CoRR,
abs/1404.3840, 2014. 1
11. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors.
Nature, 1986. 2, 4
12. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In S. Thrun, L. Saul, and B.
Schölkopf, editors, NIPS, pages 41–48. MIT Press, 2004. 2
13. T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression (PIE) database. In In Proc. FG,
2002. 2
14. Y. Sun, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. CoRR,
abs/1406.4773, 2014. 1, 2, 3
15. Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. CoRR,
abs/1412.1265, 2014. 1, 2, 5, 8
16. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
Going deeper with convolutions. CoRR, abs/1409.4842, 2014. 2, 3, 4, 5, 6, 10
17. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face
verification. In IEEE Conf. on CVPR, 2014. 1, 2, 5, 7, 8, 9
18. J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y. Wu. Learning fine-grained
image similarity with deep ranking. CoRR, abs/1404.4661, 2014. 2
19. K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor
classification. In NIPS. MIT Press, 2006. 2, 3
20. D. R. Wilson and T. R. Martinez. The general inefficiency of batch training for gradient descent learning.
Neural Networks, 16(10):1429–1451, 2003. 4
21. L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background
similarity. In IEEE Conf. on CVPR, 2011. 5
22. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901,
2013. 2, 3, 4, 6
66 | P a g e
23. Z. Zhu, P. Luo, X. Wang, and X. Tang. Recover canonicalview faces in the wild with deep neural networks.
CoRR, abs/1404.3543, 2014. 2
24. https://ijrar.org/papers/IJRAR1CNP010.pdf
25. https://ieeexplore.ieee.org/document/6640086
26. https://m2pfintech.com/blog/how-has-facial-recognition-revolutionized-identity-verification-and-authentication/
27. https://www.analyticsvidhya.com/blog/2023/12/the-deep-learning-revolution-in-facial-recognition-for-secure-
login-systems/
28. https://thesai.org/Downloads/Volume4No6/Paper_10-
Face_Recognition_as_an_Authentication_Technique_in_Electronic_Voting.pdf
29. https://www.hypr.com/security-encyclopedia/face-authentication
30. https://arxiv.org/pdf/2103.15144
31. [1] Hacıoglu, M ˘ ujgan & Karakas¸ Geyik, Seda. (2015). An Empiri- ¨ cal Research on General Internet Usage
Patterns of Undergraduate Students. Procedia - Social and Behavioral Sciences. 195. 895-904.
10.1016/j.sbspro.2015.06.369.
32. Meng, X. (2008). Study on the Model of E-Commerce Identity Authentication based on Multi-biometric
Features Identification. Proceedings-2008 ISECS International Colloquium on Computing, Communication,
Control, and Management, 200.
33. A. Alotaibi and A. Mahmmod, “Enhancing OAuth services security by an authentication service with face
recognition,” 2015 IEEE Long Isl. Syst. Appl. Technol. Conf. LISAT 2015, 2015,
doi:10.1109/LISAT.2015.7160208.
34. S. Khokad and V. Kala, ”A study of SLIDE Algorithm: Revolutionary AI Algorithm that Speeds Up Deep
Learning on CPUs,” 2020 International Conference on Smart Innovations in Design, Environment, Management,
Planning and Computing (ICSIDEMPC), AURANGABAD, 2020, pp. 188-191,
doi:10.1109/ICSIDEMPC49020.2020.9299644.
35. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.2016.
36. SAS. (2020). Machine Learning. https://www.sas.com/en us/insights/analytics/machinelearning.html#:
:text=Machine learning is a method,decisions with minimal human intervention.
37. Hargrave, M. (2020). Deep Learning. https://www.investopedia.com/terms/d/deeplearning.asp#: :text=Deep
learning is an AI,is both unstructured and unlabeled.
38. Luxand. (n.d.). Face recognition widget for your website. https://luxand.cloud/widget
39. IDentification, E. (n.d.). SmileID. https://www.electronicid.eu/en/solutions/smileid
40. T. Nyein and A. N. Oo, “University Classroom Attendance System Using FaceNet and Support Vector
Machine,” 2019 Int. Conf. Adv. Inf. Technol. ICAIT 2019, pp. 171–176, 2019, doi: 10.1109/AITC.2019.8921316.
41. B. Prihasto et al., “A survey of deep face recognition in the wild,” 2016 Int. Conf. Orange Technol. ICOT 2016,
vol. 2018- Janua, pp. 76–79, 2018, doi: 10.1109/ICOT.2016.8278983.
42. S. I. Serengil and A. Ozpinar, “LightFace: A Hybrid Deep Face Recognition Framework,” pp. 2–6, 2020.
43. O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep Face Recognition,” 2015, pp. 41.1- 41.12, doi:
10.5244/c.29.41.
44. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the gap to human- level performance in
face verification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2014, pp. 1701–1708, doi: 10.1109/CVPR.2014.220.
45. T. Baltrusaitis, P. Robinson, and L. P. Morency, “OpenFace: An open source facial be- havior analysis toolkit,”
in 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, 2016, pp. 1–10, doi:
10.1109/WACV.2016.7477553.
46. F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and
clustering,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2015, vol. 07-12-June, pp. 815–823, doi:10.1109/CVPR.2015.7298682.
67 | P a g e
47. “Geogia Tech Face Database.” http://www.anefian.com/research/face rec o.htm.
48. Mondal, S. K., Mukhopadhyay, I., Dutta, S. (2020). Review and Comparison of Face Detection Techniques.
Advances in Intelligent Systems and Computing
49. Zhao, Q., and Zhang, S. (2011). A face detection method based on corner verifying. 2011 International
Conference on Computer Science and Service System, CSSS 2011 - Proceedings, 2854–2857.
https://doi.org/10.1109/CSSS.2011.5974784
50. D. Sandberg, “Facenet,” 2018. https://github.com/davidsandberg/facenet.
51. H. Tania, “keras-facenet,” 2018. https://github.com/nyokimtl/keras-facenet.
68 | P a g e