0% found this document useful (0 votes)

115 views69 pages

Face Recognition For E-Authentication Final Project Report (B.Tech Final Year Project Report)

Uploaded by

Sudip Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views69 pages

Face Recognition For E-Authentication Final Project Report (B.Tech Final Year Project Report)

Uploaded by

Sudip Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

PROJECT REPORT ON

“Face Recognization for E-Authentication”

PAPER NAME:
PROJECT-III
PAPER CODE:
PROJ-CS881
of

SUDIP BASU UNIVERSITY ROLL NO. 27600121211

AKSHITA UNIVERSITY ROLL NO. 27600121228

SHIFALI KUMARI UNIVERSITY ROLL NO. 27600121230

SOMSUNDAR MONDAL UNIVERSITY ROLL NO. 27600121192

ASMITA CHOWDHURY UNIVERSITY ROLL NO. 27600121195

in
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Accredited by
NATIONAL BOARD OF

ACCREDITATION ACADEMIC

SESSION: 2021-2024

BUDGE BUDGE INSTITUTE OF TECHNOLOGY

Nischintapur , Budge Budge, Kolkata-700137

SEMESTER:VIII
Approved by

All India Council for Technical Education New Delhi, India

Accredited by

National Assessment and Accreditation Council

Affiliated to

Maulana Abul Kalam Azad University of Technology

(Formerly West Bengal University of Technology)
TABLE OF CONTENT

Sl. No. Items Page Number

1. CERTIFICATION 2
2. ACKNOWLEDGEMENT 3
3. ABSTRACT 4
4. INTRODUCTION 5
5. LITERATURE SURVEY 6
6. AIMS AND OBJECTIVES 7
7. SCOPE AND LIMITATIONS 8
8. ANALYSIS OF EXISTING SYSTEM 9
9. ANALYSIS OF PROPOSED SYSTEM 9
10. DESIGN 10
11. SYSTEM REQUIREMENTS 11
12. CODING AND METHODOLOGY 12-57
13. RESULT 58-61
14. CONCLUSION 62-63
15. SCOPE FOR FUTURE WORK 64-65
16. REFERENCES 66-68

1|Page
CERTIFICATION

This is to certify that this research work was carried out by:

Sudip Basu Univesity Roll Number: 27600121211

Somsundar Mondal Univesity Roll Number: 27600121192
Akshita Univesity Roll Number: 27600121228
Shifali Kumari Univesity Roll Number: 27600121230
Asmita Chowdhury Univesity Roll Number: 27600121195

of the Department of COMPUTER SCIENCE & ENGINEERING under the

supervision of Prof. Raj Kumar Paul (Assistant Professor).

____________________ ____________________
Prof. Raj Kumar Paul Prof. (Dr.) Bimal Datta
Project Supervisor & Guide Prof. & HoD CSE Dept

2|Page
ACKNOWLEDGMENT

We appreciate our supervisor Prof. Raj Kumar Paul (Assistant Professor) for the
supervision and support that he gave, which helped the progression and smoothness of the
project.

We extend our thanks to the Entire Computer Science & Engineering Department of Budge
Budge Institute of Technology, the H.O.D Prof. (Dr.) Bimal Datta and all lecturers who
prepared us from the base of computer science.

We would also like to appreciate our friends and special thanks to our parents who
encouraged, supported and helped us financially, prayerfully and morally throughout this
project.

3|Page
ABSTRACT

Face recognition technology has emerged as a promising method for enhancing security in various applications,
including e-authentication systems. In this project, we propose a robust face recognition system tailored specifically
for e-authentication purposes.
The primary objective of our project is to develop an efficient and accurate face recognition model that can
authenticate users with a high level of confidence while ensuring a seamless and user-friendly experience. To achieve
this, we employ state-of-the-art deep learning techniques, specifically convolutional neural networks (CNNs), for
feature extraction and classification.
The proposed system encompasses several key components, including face detection, feature extraction, and
matching. Initially, faces are detected and localized within input images using advanced computer vision algorithms.
Subsequently, facial features are extracted from the detected faces using a deep CNN architecture, which captures
discriminative characteristics essential for accurate identification. Finally, a matching algorithm compares the
extracted features with pre-registered templates to authenticate the user's identity.
Furthermore, to enhance the system's robustness and security, we incorporate advanced techniques such as anti-
spoofing mechanisms to detect and prevent unauthorized access attempts, including presentation attacks with fake or
manipulated images.
In addition to security considerations, we prioritize user experience by optimizing the system for speed, reliability,
and ease of use. The interface is designed to be intuitive and accessible, ensuring a smooth authentication process for
users across various devices and platforms.
Through rigorous evaluation and testing, we demonstrate the effectiveness and reliability of our face recognition
system for e-authentication purposes. The results indicate high accuracy rates and resistance to spoofing attacks,
validating its suitability for deployment in real-world scenarios where secure and user-friendly authentication is
paramount.
Despite significant recent advances in the field of face recognition, implementing face verification and recognition
efficiently at scale presents serious challenges to current approaches. In this paper we present a system, that directly
learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure
of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can
be easily implemented using standard techniques.
Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an
intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned
matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our
approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using
only 128-bytes per face.

4|Page
INTRODUCTION

In an increasingly digital world where online transactions and interactions have become ubiquitous, ensuring secure
and reliable authentication mechanisms is paramount. Traditional methods such as passwords and PINs are
susceptible to various security threats, including phishing, brute force attacks, and password leaks. As a result, there
is a growing demand for more robust and convenient authentication solutions that can effectively mitigate these risks
while offering a seamless user experience.

Face recognition technology has emerged as a promising solution to address the shortcomings of traditional
authentication methods. Leveraging the unique biometric characteristics of an individual's face, such as facial features
and patterns, face recognition systems offer a highly secure and user-friendly authentication mechanism. By analyzing
and comparing facial characteristics captured from an individual's face with stored templates, these systems can
accurately verify the identity of users in real-time.

In the context of electronic authentication (e-authentication), which encompasses a wide range of online activities
such as accessing sensitive information, conducting financial transactions, and interacting with government services,
the need for robust authentication mechanisms is particularly critical. E-authentication systems must strike a delicate
balance between security and user experience, ensuring that sensitive data and resources are protected while providing
a frictionless authentication process for users.

The primary objective of this project is to develop and implement a face recognition system specifically tailored for
e-authentication purposes. By harnessing the power of deep learning and computer vision technologies, our system
aims to provide a highly accurate, reliable, and secure authentication solution that enhances both security and user
experience.

In this introduction, we provide an overview of the motivation behind the project, the challenges associated with e-
authentication, and the potential of face recognition technology to address these challenges. We also outline the
objectives, scope, and significance of the project, setting the stage for the subsequent sections where we delve into
the technical details, implementation, and evaluation of the proposed face recognition system for e-authentication.

5|Page
LITERATURE SURVEY

Face recognition technology has garnered significant attention in recent years due to its potential applications in
various domains, including e-authentication. A thorough literature survey reveals a wealth of research and
advancements in this field, encompassing both theoretical foundations and practical implementations. Here, we
provide an overview of key studies and findings relevant to the project of Face Recognition for E-Authentication:
1. Biometric Authentication Techniques: Numerous studies have explored the effectiveness of biometric
authentication methods, including fingerprint recognition, iris scanning, and face recognition, in e-
authentication systems. Comparisons between different biometric modalities have highlighted the advantages
of face recognition in terms of user acceptance, non-intrusiveness, and ease of deployment.
2. Deep Learning for Face Recognition: The advent of deep learning techniques, particularly convolutional
neural networks (CNNs), has revolutionized face recognition research. Studies have demonstrated the superior
performance of CNN-based models in facial feature extraction and classification tasks, leading to
unprecedented accuracy rates in face recognition systems.
3. Face Detection and Feature Extraction: Face detection and feature extraction are fundamental components
of any face recognition system. Researchers have proposed various algorithms and methodologies for robust
face detection and feature extraction, ranging from traditional methods such as Viola-Jones algorithm to
advanced deep learning architectures like the ResNet and VGG networks.
4. Spoof Detection and Anti-Spoofing Techniques: Addressing security concerns associated with face
recognition, several studies have focused on developing spoof detection and anti-spoofing techniques to
prevent unauthorized access attempts using fake or manipulated facial images. These techniques often
leverage advanced machine learning algorithms to distinguish between genuine and spoofed faces based on
subtle cues and characteristics.
5. User Experience and Usability: In addition to security considerations, the user experience is a crucial aspect
of e-authentication systems. Research has explored various strategies for optimizing the usability and
accessibility of face recognition systems, such as intuitive user interfaces, adaptive feedback mechanisms,
and multi-modal authentication approaches.
6. Real-World Deployments and Case Studies: Several real-world deployments of face recognition technology
for e-authentication purposes have been documented in the literature. Case studies and practical
implementations provide valuable insights into the challenges, benefits, and considerations involved in
integrating face recognition into existing authentication systems across different sectors, including finance,
healthcare, and government services.
By synthesizing insights from existing literature and building upon previous research findings, our project aims to
contribute to the advancement of face recognition technology for e-authentication, with a focus on enhancing both
security and user experience. Through rigorous experimentation, evaluation, and validation, we seek to develop a
robust and reliable face recognition system that meets the evolving needs of e-authentication in today's digital
landscape.

6|Page
AIMS AND OBJECTIVES

The project "Face Recognition for E-Authentication" aims to develop an advanced system that utilizes face
recognition technology to enhance the security and user experience of electronic authentication (e-authentication).
With the proliferation of online transactions and interactions, the need for robust authentication mechanisms has
become increasingly crucial. Traditional methods such as passwords and PINs are prone to various security threats,
including phishing, brute force attacks, and password leaks. In this context, face recognition technology offers a
promising solution by leveraging the unique biometric characteristics of an individual's face to verify their identity
in real-time.
One of the primary aims of the project is to design and implement a face recognition system tailored specifically for
e-authentication purposes. This involves the development of sophisticated algorithms for face detection, feature
extraction, and matching, utilizing state-of-the-art deep learning techniques such as convolutional neural networks
(CNNs). By analyzing and comparing facial features captured from an individual's face with pre-registered templates,
the system aims to accurately authenticate users with a high level of confidence.
Enhancing security is another key objective of the project. In addition to traditional authentication measures, such as
username-password combinations, the face recognition system incorporates advanced anti-spoofing mechanisms to
detect and prevent unauthorized access attempts. These mechanisms are designed to distinguish between genuine
facial images and spoofed or manipulated ones, thereby mitigating the risk of presentation attacks.
Moreover, the project emphasizes the importance of optimizing the accuracy and reliability of the face recognition
system. Through rigorous testing and validation, the system aims to minimize false acceptance and rejection rates,
ensuring robust performance under various environmental conditions and scenarios. Fine-tuning the system
parameters and optimizing the algorithms are essential steps in achieving this objective.
In addition to security considerations, the project focuses on enhancing the user experience of e-authentication. A
user-friendly interface is designed to facilitate seamless interaction with the face recognition system, catering to users
of diverse backgrounds and technical proficiency levels. Usability enhancements, such as adaptive feedback
mechanisms and multi-modal authentication approaches, are incorporated to streamline the authentication process
and minimize user friction.
Furthermore, the project aims to ensure compatibility and scalability of the face recognition system. This involves
integrating the system with existing e-authentication frameworks and platforms, enabling seamless deployment in
diverse application environments. The system architecture is designed to be scalable and adaptable, allowing for
future enhancements and expansions to accommodate evolving user needs and technological advancements.
Ethical and legal considerations are paramount throughout the project. Measures are put in place to adhere to ethical
guidelines and legal regulations governing the collection, storage, and processing of biometric data. User privacy and
confidentiality are prioritized, with stringent measures such as data anonymization and secure storage practices
implemented to protect sensitive information.
Documentation and knowledge sharing are integral aspects of the project. The development process, methodologies,
and findings are comprehensively documented to facilitate knowledge sharing and replication within the academic
and professional community. Research papers, reports, and technical documentation are published to disseminate the
project outcomes and contribute to the broader advancement of face recognition technology for e-authentication.

7|Page
SCOPE

• It is typically used in security systems and can be compared to other biometrics such as fingerprint or eye iris
recognition systems.
• The facial recognition industry is experiencing tremendous growth. It has the potential to completely transform
a wide range of businesses, including security and surveillance, AI capabilities, and even personalized
advertising.
• Attendance taking using Face recognizing software.
• Monitoring attendance and retrieving.

LIMITATIONS

1. Accuracy Constraints: Despite advancements in face recognition technology, there may still be limitations
in accurately identifying individuals under certain conditions such as low lighting, occlusions, or variations
in facial expressions.
2. Hardware and Resource Requirements: Implementing a robust face recognition system may require
significant computational resources, including high-performance hardware for processing large datasets and
running complex algorithms.
3. Environmental Factors: Environmental factors such as changes in lighting conditions, camera quality, and
background noise could affect the performance of the face recognition system, leading to decreased accuracy
in real-world scenarios.
4. Privacy Concerns: The collection and storage of biometric data raise privacy concerns, as it involves
capturing and storing sensitive information about individuals. Ensuring compliance with privacy laws and
regulations is crucial to mitigate these concerns.
5. Spoofing Attacks: Despite the implementation of anti-spoofing mechanisms, the face recognition system
may still be vulnerable to sophisticated spoofing attacks using high-quality facial masks or digitally
manipulated images.
6. User Acceptance: Some users may have reservations about using biometric authentication methods due to
concerns about privacy, security, or cultural factors. Ensuring user acceptance and adoption of the system
may pose challenges.
7. Regulatory Compliance: Compliance with regulatory frameworks and standards related to biometric data
management, such as GDPR in Europe or HIPAA in the United States, may impose constraints on the
development and deployment of the face recognition system.
8. Cultural and Diversity Challenges: Cultural factors and diversity in facial features across different
populations may affect the accuracy and reliability of the face recognition system, necessitating diverse
datasets and inclusive design considerations.
9. Cost Considerations: Developing and maintaining a robust face recognition system may involve significant
costs associated with research and development, hardware infrastructure, software licensing, and ongoing
support and maintenance.
10. Interoperability Issues: Integration with existing e-authentication frameworks and platforms may
encounter interoperability issues, requiring additional efforts to ensure seamless compatibility and
functionality.

8|Page
ANALYSIS OF EXISTING SYSTEM

The analysis of existing systems for face recognition in e-authentication reveals a landscape characterized by both
advancements and limitations. Traditional password-based systems, while prevalent, suffer from vulnerabilities such
as phishing and brute force attacks. Biometric authentication methods, including face recognition, offer promising
alternatives due to their non-intrusiveness and user convenience. However, existing face recognition systems may
encounter challenges in accuracy, particularly under varying environmental conditions. Deep learning-based
approaches have shown significant improvements in face recognition accuracy, yet they remain susceptible to
overfitting and adversarial attacks. Anti-spoofing mechanisms aim to address security concerns but may not provide
foolproof protection. Commercial solutions offer convenience but may come with limitations such as licensing fees
and privacy concerns. Academic research contributes valuable insights and prototypes, yet scalability and real-world
deployment considerations remain challenges. In summary, while existing face recognition systems offer promising
solutions for e-authentication, addressing challenges related to accuracy, security, and usability requires ongoing
research and development efforts.

ANALYSIS OF PROPOSED SYSTEM

The proposed system for face recognition in e-authentication presents a comprehensive solution aimed at addressing
the limitations of existing systems while leveraging advancements in technology. By integrating deep learning
techniques, specifically convolutional neural networks (CNNs), the system aims to enhance accuracy and robustness
in facial recognition tasks. Additionally, the incorporation of anti-spoofing mechanisms addresses security concerns
by detecting and preventing presentation attacks. Usability is prioritized through the design of an intuitive interface
and adaptive feedback mechanisms, ensuring a seamless user experience. Compatibility and scalability are also
considered, enabling integration with existing e-authentication frameworks and accommodating future
enhancements. Ethical considerations are paramount, with measures in place to protect user privacy and comply with
regulatory requirements. Overall, the proposed system represents a significant advancement in e-authentication
technology, offering a secure, reliable, and user-friendly solution for various online applications and services.

9|Page
DESIGN

Work-flow diagram for Face Recognization for E-Authentication

10 | P a g e
SYSTEM REQUIREMENTS:

1. HARDWARE REQUIREMENTS
2. SOFTWARE REQUIREMENTS

HARDWARE REQUIREMENTS:
• System – Windows/Linux System
• GPU
• Storage: 64GB

• Memory: 8GB

SOFTWARE REQUIREMENTS:
• OS: Windows/Linux based OS
• Google Colab for Python Programming execution (Online Platform)
• Visual Studio Code
• Python Supported

11 | P a g e
Coding and Methodology

Face Detection using Python3 in Google Colab:

#Face detection using pre-trained model

importing Libraries
import imutils
import numpy as np
import cv2
from google.colab.patches import cv2_imshow
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode

Start webcam
def take_photo(filename='photo.jpg', quality=0.8):
js = Javascript('''
async function takePhoto(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Capture';
div.appendChild(capture);

const video = document.createElement('video');

video.style.display = 'block';
const stream = await navigator.mediaDevices.getUserMedia({video: true});

document.body.appendChild(div);
div.appendChild(video);
video.srcObject = stream;
await video.play();

// Resize the output to fit the video element.

google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

// Wait for Capture to be clicked.

await new Promise((resolve) => capture.onclick = resolve);

const canvas = document.createElement('canvas');

canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getVideoTracks()[0].stop();

12 | P a g e
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data = eval_js('takePhoto({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
return filename

Click 'Capture' to make photo using your webcam.

image_file = take_photo()

<IPython.core.display.Javascript object>

Read, resize and display the image.

#image = cv2.imread(image_file, cv2.IMREAD_UNCHANGED)
image = cv2.imread(image_file)

# resize it to have a maximum width of 400 pixels

image = imutils.resize(image, width=400)
(h, w) = image.shape[:2]
print(w,h)
cv2_imshow(image)

400 300

OpenCV’s deep learning face detector is based on the Single Shot Detector (SSD) framework
13 | P a g e
with a ResNet base network. The network is defined and trained using the Caffe Deep Learning
framework Download the pre-trained face detection model, consisting of two files:
• The network definition (deploy.prototxt)
• The learned weights (res10_300x300_ssd_iter_140000.caffemodel)
!wget -N
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.p
rototxt
!wget -N
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_201708
30/res10_300x300_ssd_iter_140000.caffemodel

--2024-05-16 14:59:13--
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.p
rototxt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133,
185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28104 (27K) [text/plain]
Saving to: ‘deploy.prototxt’

deploy.prototxt 0%[ ] 0 --.-KB/s

deploy.prototxt 100%[===================>] 27.45K --.-KB/s in 0.001s

Last-modified header missing -- time-stamps turned off.

2024-05-16 14:59:14 (22.4 MB/s) - ‘deploy.prototxt’ saved [28104/28104]

--2024-05-16 14:59:14--
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_201708
30/res10_300x300_ssd_iter_140000.caffemodel
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133,
185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10666211 (10M) [application/octet-stream]
Saving to: ‘res10_300x300_ssd_iter_140000.caffemodel’

res10_300x300_ssd_i 100%[===================>] 10.17M --.-KB/s in 0.1s

Last-modified header missing -- time-stamps turned off.

2024-05-16 14:59:14 (100 MB/s) - ‘res10_300x300_ssd_iter_140000.caffemodel’ saved
[10666211/10666211]

Load the pre-trained face detection network model from disk

print("[INFO] loading model...")
prototxt = 'deploy.prototxt'
model = 'res10_300x300_ssd_iter_140000.caffemodel'
net = cv2.dnn.readNetFromCaffe(prototxt, model)

14 | P a g e
[INFO] loading model...

Use the dnn.blobFromImage function to construct an input blob by resizing the image to a fixed 300x300
pixels and then normalizing it.
# resize it to have a maximum width of 400 pixels
image = imutils.resize(image, width=400)
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0, (300, 300), (104.0,
177.0, 123.0))

Pass the blob through the neural network and obtain the detections and predictions.
print("[INFO] computing object detections...")
net.setInput(blob)
detections = net.forward()

[INFO] computing object detections...

Loop over the detections and draw boxes around the detected faces
for i in range(0, detections.shape[2]):

# extract the confidence (i.e., probability) associated with the prediction

confidence = detections[0, 0, i, 2]

# filter out weak detections by ensuring the `confidence` is

# greater than the minimum confidence threshold
if confidence > 0.5:
# compute the (x, y)-coordinates of the bounding box for the object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# draw the bounding box of the face along with the associated probability
text = "{:.2f}%".format(confidence * 100)
y = startY - 10 if startY - 10 > 10 else startY + 10
cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2)
cv2.putText(image, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

Show the resulting image

cv2_imshow(image)

15 | P a g e
The Google Colab file is accessible using the given link :

https://colab.research.google.com/drive/1QBS2P48Gj3ZxABy2vVx5KzEOi-xvv-lk?usp=sharing

16 | P a g e
For implementing this project we have used CNN as the algorithm and methodology. The Architecture of CNN
is given below:
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks are used
in various datasets like images, audio, and text. Different types of Neural Networks are used for different purposes,
for example for predicting the sequence of words we use Recurrent Neural Networks more precisely an LSTM,
similarly for image classification we use Convolution Neural networks. In this blog, we are going to build a basic
building block for CNN.
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is
equal to the total number of features in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be many hidden
layers depending on our model and data size. Each hidden layer can have different numbers of neurons
which are generally greater than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that layer and then by the
addition of learnable biases followed by activation function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or
softmax which converts the output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from the above step is called feedforward,
we then calculate the error using an error function, some common error functions are cross-entropy, square loss
error, etc. The error function measures how well the network is performing. After that, we backpropagate into the
model by calculating the derivatives. This step is called Backpropagation which basically is used to minimize
the loss.
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which is
predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like images
or videos where data patterns play an extensive role.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling layer,
and fully connected layers.

17 | P a g e
Simple CNN architecture
The Convolutional layer applies filters to the input image to extract features, the Pooling layer downsamples the
image to reduce computation, and the fully connected layer makes the final prediction. The network learns the
optimal filters through backpropagation and gradient descent.
How Convolutional Layers works
Convolution Neural Networks or covnets are neural networks that share their parameters. Imagine you have an
image. It can be represented as a cuboid having its length, width (dimension of the image), and height (i.e the
channel as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural network, called a filter or kernel on
it, with say, K outputs and representing them vertically. Now slide that neural network across the whole image,
as a result, we will get another image with different widths, heights, and depths. Instead of just R, G, and B
channels now we have more channels but lesser width and height. This operation is called Convolution. If the
patch size is the same as that of the image it will be a regular neural network. Because of this small patch, we
have fewer weights.

18 | P a g e
Image source: Deep Learning Udacity
Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
• Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights and the
same depth as that of input volume (3 if the input layer is image input).
• For example, if we have to run convolution on an image with dimensions 34x34x3. The possible size of
filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
• During the forward pass, we slide each filter across the whole input volume step by step where each step
is called stride (which can have a value of 2, 3, or even 4 for high-dimensional images) and compute the
dot product between the kernel weights and patch from input volume.
• As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a result, we’ll
get output volume having a depth equal to the number of filters. The network will learn all the filters.
Layers used to build ConvNets
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of
layers, and every layer transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
• Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an
image or a sequence of images. This layer holds the raw input of the image with width 32, height 32, and
depth 3.
• Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It
applies a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller
matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot product
between kernel weight and the corresponding input image patch. The output of this layer is referred as
feature maps. Suppose we use a total of 12 filters for this layer we’ll get an output volume of dimension
32 x 32 x 12.
• Activation Layer: By adding an activation function to the output of the preceding layer, activation layers
add nonlinearity to the network. it will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU, etc.
The volume remains unchanged hence output volume will have dimensions 32 x 32 x 12.
• Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the size
of volume which makes the computation fast reduces memory and also prevents overfitting. Two common
types of pooling layers are max pooling and average pooling. If we use a max pool with 2 x 2 filters and
stride 2, the resultant volume will be of dimension 16x16x12.

19 | P a g e
Image source: cs231n.stanford.edu
• Flattening: The resulting feature maps are flattened into a one-dimensional vector after the convolution
and pooling layers so they can be passed into a completely linked layer for categorization or regression.
• Fully Connected Layers: It takes the input from the previous layer and computes the final classification
or regression task.

Image source: cs231n.stanford.edu

• Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the probability
score of each class.
Example:
Let’s consider an image and apply the convolution layer, activation layer, and pooling layer operation to extract
the inside feature.

20 | P a g e
Input image:

Input image
Step:
• import the necessary libraries
• set the parameter
• define the kernel
• Load the image and plot it.
• Reformat the image
• Apply convolution layer operation and plot the output image.
• Apply activation layer operation and plot the output image.
• Apply pooling layer operation and plot the output image.
• Python3

# import the necessary libraries

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from itertools import product

# set the param

plt.rc('figure', autolayout=True)
plt.rc('image', cmap='magma')

21 | P a g e
# define the kernel
kernel = tf.constant([[-1, -1, -1],
[-1, 8, -1],
[-1, -1, -1],
])

# load the image

image = tf.io.read_file('Ganesh.jpg')
image = tf.io.decode_jpeg(image, channels=1)
image = tf.image.resize(image, size=[300, 300])

# plot the image

img = tf.squeeze(image).numpy()
plt.figure(figsize=(5, 5))
plt.imshow(img, cmap='gray')
plt.axis('off')
plt.title('Original Gray Scale image')
plt.show();

# Reformat
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
kernel = tf.cast(kernel, dtype=tf.float32)

# convolution layer
conv_fn = tf.nn.conv2d

22 | P a g e
image_filter = conv_fn(
input=image,
filters=kernel,
strides=1, # or (1, 1)
padding='SAME',
)

plt.figure(figsize=(15, 5))

# Plot the convolved image

plt.subplot(1, 3, 1)

plt.imshow(
tf.squeeze(image_filter)
)
plt.axis('off')
plt.title('Convolution')

# activation layer
relu_fn = tf.nn.relu
# Image detection
image_detect = relu_fn(image_filter)

plt.subplot(1, 3, 2)
plt.imshow(
# Reformat for plotting
tf.squeeze(image_detect)
)

23 | P a g e
plt.axis('off')
plt.title('Activation')

# Pooling layer
pool = tf.nn.pool
image_condense = pool(input=image_detect,
window_shape=(2, 2),
pooling_type='MAX',
strides=(2, 2),
padding='SAME',
)

plt.subplot(1, 3, 3)
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.title('Pooling')
plt.show()

Output:

Original Grayscale image

24 | P a g e
Output
Advantages of Convolutional Neural Networks (CNNs):
1. Good at detecting patterns and features in images, videos, and audio signals.
2. Robust to translation, rotation, and scaling invariance.
3. End-to-end training, no need for manual feature extraction.
4. Can handle large amounts of data and achieve high accuracy.
Disadvantages of Convolutional Neural Networks (CNNs):
1. Computationally expensive to train and require a lot of memory.
2. Can be prone to overfitting if not enough data or proper regularization is used.
3. Requires large amounts of labeled data.
4. Interpretability is limited, it’s hard to understand what the network has learned.

25 | P a g e
Face Recognition, Store and Authentication:

In this paper we present a unified system for face verification (is this the same person), recognition (who is this
person) and clustering (find common people among these faces). Our method is based on learning a Euclidean
embedding per image using a deep convolutional network. The network is trained such that the squared L2
distances in the embedding space directly correspond to face similarity:

Figure 1. Illumination and Pose invariance. Pose and illumination have

been a long standing problem in face recognition. This figure shows the
output distances between pairs of faces of the same and a different person
in different pose and illumination combinations. A distance of 0:0 means
the faces are identical, 4:0 corresponds to the opposite spectrum, two
different identities. You can see that a threshold of 1.1 would classify
every pair correctly.

faces of the same person have small distances and faces of distinct people have large distances.
Once this embedding has been produced, then the aforementioned tasks become straight-forward: face verification
simply involves thresholding the distance between the two embeddings; recognition becomes a k-NN
classification problem; and clustering can be achieved using off-theshelf techniques such as k-means or
agglomerative clustering.

Previous face recognition approaches based on deep networks use a classification layer trained over a set of known
face identities and then take an intermediate bottleneck layer as a representation used to generalize recognition
beyond the set of identities used in training. The downsides of this approach are its indirectness and its
inefficiency: one has to hope that the bottleneck representation generalizes well to new faces; and by using a
bottleneck layer the representation size per face is usually very large (1000s of dimensions). Some recent work
has reduced this dimensionality using PCA, but this is a linear transformation that can be easily learnt in one layer
of the network.
In contrast to these approaches, FaceNet directly trains its output to be a compact 128-D embedding using a triplet
based loss function based on LMNN. Our triplets consist of two matching face thumbnails and a non-matching
face thumbnail and the loss aims to separate the positive pair from the negative by a distance margin. The
thumbnails are tight crops of the face area, no 2D or 3D alignment, other than scale and translation is performed.

Related Work :

Similarly to other recent works which employ deep networks, our approach is a purely data driven method which
learns its representation directly from the pixels of the face. Rather than using engineered features, we use a large
dataset of labelled faces to attain the appropriate invariances to pose, illumination, and other variational
conditions.
In this paper we explore two different deep network architectures that have been recently used to great success in
the computer vision community. Both are deep convolutional networks. The first architecture is based on the
26 | P a g e
Zeiler& Fergus model which consists of multiple interleaved layers of convolutions, non-linear activations, local
response normalizations, and max pooling layers. We additionally add several 1×1×d convolution layers inspired
by
the work of. The second architecture is based on the Inception model of Szegedy et al. which was recently used
as the winning approach for ImageNet 2014. These networks use mixed layers that run several different
convolutional and pooling layers in parallel and concatenate their responses. We have found that these models
can reduce the number of parameters by up to 20 times and have the potential to reduce the number of FLOPS
required for comparable performance.
There is a vast corpus of face verification and recognition works. Reviewing it is out of the scope of this paper so
we will only briefly discuss the most relevant recent work.
The works of all employ a complex system of multiple stages, that combines the output of a deep convolutional
network with PCA for dimensionality reduction and an SVM for classification.
Zhenyao et al. employ a deep network to “warp” faces into a canonical frontal view and then learn CNN that
classifies each face as belonging to a known identity. For face verification, PCA on the network output in
conjunction with an ensemble of SVMs is used.
Taigman et al. propose a multi-stage approach that aligns faces to a general 3D shape model. A multi-class
network is trained to perform the face recognition task on over four thousand identities. The authors also
experimented with a so called Siamese network where they directly optimize the L1-distance between two face
features. Their best performance on LFW (97:35%) stems from an ensemble of three networks using different
alignments and color channels. The predicted distances (non-linear SVM predictions based on the χ2 kernel) of
those networks are combined using a non-linear SVM.
Sun et al. propose a compact and therefore relatively cheap to compute network. They use an ensemble of 25 of
these network, each operating on a different face patch. For their final performance on LFW (99:47% ) the authors
combine 50 responses (regular and flipped). Both PCA and a Joint Bayesian model that effectively correspond to
a linear transform in the embedding space are employed. Their method does not require explicit 2D/3D alignment.
The networks are trained by using a combination of classification and verification loss. The verification loss is
similar to the triplet loss we employ, in that it minimizes the L2-distance between faces of the same identity and
enforces a margin between the distance of faces of different identities. The main difference is that only pairs of
images are compared, whereas the triplet loss encourages a relative distance constraint.
A similar loss to the one used here was explored in Wang et al. for ranking images by semantic and visual
similarity.

Figure 2. Model structure. Our network consists of a Figure 3. The Triplet Loss minimizes the distance
batch input layer and a deep CNN followed by L2 between an anchor and a positive, both of which have
normalization, which results in the face embedding. the same identity, and maximizes the distance
This is followed by the triplet loss during training. between the anchor and a negative of a different
identity.

Method:

This Project uses a deep convolutional network. We discuss two different core architectures: The Zeiler&Fergus
style networks and the recent Inception type networks.
Given the model details, and treating it as a black box (see Figure 2), the most important part of our approach lies
27 | P a g e
in the end-to-end learning of the whole system. To this end we employ the triplet loss that directly reflects what we want
to achieve in face verification, recognition and clustering. Namely, we strive for an embedding f(x), from an image x into a
feature space Rd, such that the squared distance between all faces, independent of imaging conditions, of the same identity
is small, whereas the squared distance between a pair of face images from different identities is large.
Although we did not directly compare to other losses, e.g. the one using pairs of positives and negatives, as used in Eq. (2),
we believe that the triplet loss is more suitable for face verification. The motivation is that the loss from encourages all faces
of one identity to be projected onto a single point in the embedding space. The triplet loss, however, tries to enforce a margin
between each pair of faces from one person to all other faces. This allows the faces for one identity to live on a manifold,
while still enforcing the distance and thus discriminability to other identities.
The following section describes this triplet loss and how it can be learned efficiently at scale.

Triplet Loss
The embedding is represented by f (x) ∈ Rd. It em- beds an image x into a d-dimensional Euclidean space.
Additionally, we constrain this embedding to live on the d-dimensional hypersphere, i.e. f (x) 2 = 1. This loss is
motivated in the context of nearest-neighbor classification. Here we want to ensure that an image xa (anchor) of
a specific person is closer to all other images xp (positive) of the same person than it is to any image xn (negative)
of any other person. This is visualized in Figure 3.

Thus we want,

where α is a margin that is enforced between positive and negative pairs. T is the set of all possible triplets in the
training set and has cardinality N.
The loss that is being minimized is then L =

Generating all possible triplets would result in many triplets that are easily satisfied (i.e. fulfill the constraint in
Eq. (1)). These triplets would not contribute to the training and result in slower convergence, as they would still
be passed through the network. It is crucial to select hard triplets, that are active and can therefore contribute to
improving the model. The following section talks about the different approaches we use for the triplet selection.

Triplet Selection
In order to ensure fast convergence it is crucial to select triplets that violate the triplet constraint in Eq. (1). This
2
means that, given 𝑥𝑖𝑎 , we want to select an 𝑥𝑖𝑝 (hard positive) such that argmax 𝑥𝑖𝑝 ∥∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑝 )∥∥2 and similarly
𝑥𝑖𝑛 (hard negative) such that argmin 𝑥𝑖𝑛 ∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑛 )∥22 .
It is infeasible to compute the argmin and argmax across the whole training set. Additionally, it might lead to poor
training, as mislabelled and poorly imaged faces would dominate the hard positives and negatives. There are two
obvious choices that avoid this issue:
• Generate triplets offline every n steps, using the most recent network checkpoint and computing the argmin and
argmax on a subset of the data.
• Generate triplets online. This can be done by selecting the hard positive/negative exemplars from within a mini-
batch.
Here, we focus on the online generation and use large mini-batches in the order of a few thousand exemplars and
only compute the argmin and argmax within a mini-batch. To have a meaningful representation of the anchor-
positive distances, it needs to be ensured that a minimal number of exemplars of any one identity is present in
each mini-batch. In our experiments we sample the training data such that around 40 faces are selected per identity
28 | P a g e
per minibatch. Additionally, randomly sampled negative faces are added to each mini-batch.
Instead of picking the hardest positive, we use all anchor-positive pairs in a mini-batch while still selecting the
hard negatives. We don’t have a side-by-side comparison of hard anchor-positive pairs versus all anchor-positive
pairs within a mini-batch, but we found in practice that the all anchor positive method was more stable and
converged slightly faster at the beginning of training.
We also explored the offline generation of triplets in conjunction with the online generation and it may allow the
use of smaller batch sizes, but the experiments were inconclusive.
Selecting the hardest negatives can in practice lead to bad local minima early on in training, specifically it can
result in a collapsed model (i.e. f(x) = 0). In order to mitigate this, it helps to select 𝑥𝑖𝑛 such that
2
∥∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑝 )∥∥ < ∥𝑓(𝑥𝑖𝑎 ) − 𝑓(𝑥𝑖𝑛 )∥2 . (4)
2 2

We call these negative exemplars semi-hard, as they are further away from the anchor than the positive exemplar,
but still hard because the squared distance is close to the anchor-positive distance. Those negatives lie inside the
margin α.
As mentioned before, correct triplet selection is crucial for fast convergence. On the one hand we would like to
use small mini-batches as these tend to improve convergence during Stochastic Gradient Descent (SGD) [20]. On
the other hand, implementation details make batches of tens to hundreds of exemplars more efficient. The main
constraint with regards to the batch size, however, is the way we select hard relevant triplets from within the mini-
batches. In most experiments we use a batch size of around 1,800 exemplars.
Deep Convolutional Networks
In all our experiments we train the CNN using Stochastic Gradient Descent (SGD) with standard backprop and
AdaGrad. In most experiments we start with a learning rate of 0:05 which we lower to finalize the model. The
models are initialized from random, similar to, and trained on a CPU cluster for 1,000 to 2,000 hours. The decrease
in the loss (and increase in accuracy) slows down drastically after 500h of training, but additional training can
still significantly improve performance. The margin α is set to 0:2. We used two types of architectures and explore
their trade-offs in more detail in the experimental section. Their practical differences lie in the difference of
parameters and FLOPS. The best model may be different depending on the application. E.g. a model running in a
data center can have many parameters and require a large number of FLOPS, whereas a model running on a mobile
phone needs to have few parameters, so that it can fit into memory. All our

29 | P a g e
layer size-in size-out kernel param FLPS
conv1 220×220×3 110×110×64 7×7×3, 2 9K 115M
pool1 110×110×64 55×55×64 3×3×64, 2 0
rnorm1 55×55×64 55×55×64 0
conv2a 55×55×64 55×55×64 1×1×64, 1 4K 13M
conv2 55×55×64 55×55×192 3×3×64, 1 111K 335M
rnorm2 55×55×192 55×55×192 0
pool2 55×55×192 28×28×192 3×3×192, 2 0
conv3a 28×28×192 28×28×192 1×1×192, 1 37K 29M
conv3 28×28×192 28×28×384 3×3×192, 1 664K 521M
pool3 28×28×384 14×14×384 3×3×384, 2 0
conv4a 14×14×384 14×14×384 1×1×384, 1 148K 29M
conv4 14×14×384 14×14×256 3×3×384, 1 885K 173M
conv5a 14×14×256 14×14×256 1×1×256, 1 66K 13M
conv5 14×14×256 14×14×256 3×3×256, 1 590K 116M
conv6a 14×14×256 14×14×256 1×1×256, 1 66K 13M
conv6 14×14×256 14×14×256 3×3×256, 1 590K 116M
pool4 14×14×256 7×7×256 3×3×256, 2 0
concat 7×7×256 7×7×256 0
fc1 7×7×256 1×32×128 maxout p=2 103M 103M
fc2 1×32×128 1×32×128 maxout p=2 34M 34M
fc7128 1×32×128 1×1×128 524K 0.5M
L2 1×1×128 1×1×128 0
total 140M 1.6B
Table 1. NN1. This table show the structure of our Zeiler&Fergus based model with 1×1 convolutions inspired
by. The input and output sizes are described in rows × cols × #filters. The kernel is specified as rows × cols,
stride and the maxout pooling size as p = 2.

models use rectified linear units as the non-linear activation function.

The first category, shown in Table 1, adds 1×1×d convolutional layers, as suggested in, between the standard
convolutional layers of the Zeiler&Fergus architecture and results in a model 22 layers deep. It has a total of 140
million parameters and requires around 1.6 billion FLOPS per image.
The second category we use is based on GoogLeNet style Inception models. These models have 20× fewer
parameters (around 6.6M-7.5M) and up to 5× fewer FLOPS (between 500M-1.6B). Some of these models are
dramatically reduced in size (both depth and number of filters), so that they can be run on a mobile phone. One,
NNS1, has 26M parameters and only requires 220M FLOPS per image. The other, NNS2, has 4.3M parameters
and 20M FLOPS. Table 2 describes NN2 our largest network in detail. NN3 is identical in architecture but has a
reduced input size of 160x160. NN4 has an input size of only 96x96, thereby drastically reducing the CPU
requirements (285M FLOPS vs 1.6B for NN2). In addition to the reduced input size it does not use 5x5
convolutions in the higher layers as the receptive field is already too small by then. Generally we found that the
5x5 convolutions can be removed throughout
with only a minor drop in accuracy. Figure 4 compares all our models.

Datasets and Evaluation

We evaluate our method on four datasets and with the exception of Labelled Faces in the Wild and YouTube
Faces we evaluate our method on the face verification task. I.e. given a pair of two face images a squared L2
30 | P a g e
distance threshold 𝐷(𝑥𝑖 , 𝑥𝑗 ) is used to determine the classification of same and different. All faces pairs (i; j) of
the same identity are denoted with Psame, whereas all pairs of different identities are denoted with Pdiff.
We define the set of all true accepts as
TA(d) = {(i, j) ∈ Psame, with D(xi, xj) ≤ d} . (5)
These are the face pairs (i, j) that were correctly classified as same at threshold d. Similarly
FA(d) = {(i, j) ∈ Pdiff, with D(xi, xj) ≤ d} (6)
is the set of all pairs that was incorrectly classified as same
(false accept).
The validation rate VAL(d) and the false accept rate FAR(d) for a given face distance d are then defined as

Hold-out Test Set

We keep a hold out set of around one million images, that has the same distribution as our training set, but disjoint
identities. For evaluation we split it into five disjoint sets of 200k images each. The FAR and VAL rate are then
computed on 100k × 100k image pairs. Standard error is reported across the five splits.
Personal Photos
This is a test set with similar distribution to our training set, but has been manually verified to have very clean
labels. It consists of three personal photo collections with a total of around 12k images. We compute the FAR and
VAL rate across all 12k squared pairs of images.
Academic Datasets
Labeled Faces in the Wild (LFW) is the de-facto academic test set for face verification. We follow the standard
protocol for unrestricted, labeled outside data and report the mean classification accuracy as well as the standard
error of the mean.
Youtube Faces DB is a new dataset that has gained popularity in the face recognition community. The setup is
similar to LFW, but instead of verifying pairs of images, pairs of videos are used.

Figure 4. FLOPS vs. Accuracy trade-off.

Shown is the trade-off between FLOPS and
accuracy for a wide range of different model
sizes and architectures. Highlighted are the four
models that we focus on in our experiments.

Experiments
If not mentioned otherwise we use between 100M-200M training face thumbnails consisting of about 8M
different identities. A face detector is run on each image and a tight bounding box around each face is generated.
These face thumbnails are resized to the input size of the respective network. Input sizes range from 96x96
pixels to 224x224 pixels in our experiments.
31 | P a g e
Computation Accuracy Trade-off
Before diving into the details of more specific experiments we will discuss the trade-off of accuracy versus
number of FLOPS that a particular model requires. Figure 4 shows the FLOPS on the x-axis and the accuracy at
0.001 false accept rate (FAR) on our user labelled test-data set from section 4.2. It is interesting to see the strong
correlation between the computation a model requires and the accuracy it achieves. The figure highlights the
five models (NN1, NN2, NN3, NNS1, NNS2) that we discuss in more detail in our experiments.
We also looked into the accuracy trade-off with regards to the number of model parameters. However, the
picture is not as clear in that case. For example, the Inception based model NN2 achieves a comparable
performance to NN1, but only has a 20th of the parameters. The number of FLOPS is comparable, though.
Obviously at some point the performance is expected to decrease, if the number of parameters is reduced
further. Other model architectures may allow further reductions without loss of accuracy, just like Inception did
in this case.

type output depth #1×1 #3×3 #3×3 #5×5 #5×5 pool params FLOPS
size reduce reduce proj (p)
conv1 (7×7×3, 2) 112×112×64 1 9K 119M
max pool + norm 56×56×64 0 m 3×3, 2
inception (2) 56×56×192 2 64 192 115K 360M
norm + max pool 28×28×192 0 m 3×3, 2
inception (3a) 28×28×256 2 64 96 128 16 32 m, 32p 164K 128M
inception (3b) 28×28×320 2 64 96 128 32 64 L2, 64p 228K 179M
inception (3c) 14×14×640 2 0 128 256,2 32 64,2 m 3×3,2 398K 108M
inception (4a) 14×14×640 2 256 96 192 32 64 L2, 128p 545K 107M
inception (4b) 14×14×640 2 224 112 224 32 64 L2, 128p 595K 117M
inception (4c) 14×14×640 2 192 128 256 32 64 L2, 128p 654K 128M
inception (4d) 14×14×640 2 160 144 288 32 64 L2, 128p 722K 142M
inception (4e) 7×7×1024 2 0 160 256,2 64 128,2 m 3×3,2 717K 56M
inception (5a) 7×7×1024 2 384 192 384 48 128 L2, 128p 1.6M 78M
inception (5b) 7×7×1024 2 384 192 384 48 128 m, 128p 1.6M 78M
avg pool 1×1×1024 0
fully conn 1×1×128 1 131K 0.1M
L2 normalization 1×1×128 0
total 7.5M 1.6B
Table 2. NN2. Details of the NN2 Inception incarnation. This model is almost identical to the one described in.
The two major differences are the use of L2 pooling instead of max pooling (m), where specified. I.e. instead of
taking the spatial max the L2 norm is computed. The pooling is always 3×3 (aside from the final average
pooling) and in parallel to the convolutional modules inside each Inception module. If there is a dimensionality
reduction after the pooling it is denoted with p. 1×1, 3×3, and 5×5 pooling are then concatenated to get the final
output.

32 | P a g e
Figure 5. Network Architectures. This plot shows
the complete ROC for the four different models on
our personal photos test set from section 4.2. The
sharp drop at 10E-4 FAR can be explained by noise
in the groundtruth labels. The models in order of
performance are: NN2: 224×224 input Inception
based model; NN1: Zeiler&Fergus based network
with 1×1 convolutions; NNS1: small Inception
style model with only 220M FLOPS; NNS2: tiny
Inception model with only 20M FLOPS.

architecture VAL
NN1 (Zeiler&Fergus 87.9% ± 1.9
220×220)
NN2 (Inception 224×224) 89.4% ± 1.6
NN3 (Inception 160×160) 88.3% ± 1.7
NN4 (Inception 96×96) 82.0% ± 2.3
NNS1 (mini Inception 82.4% ± 2.4
165×165)
NNS2 (tiny Inception 51.9% ± 2.9
140×116)
Table 3. Network Architectures. This table compares the performance of our model architectures on the hold
out test set. Reported is the mean validation rate VAL at 10E-3 false accept rate. Also shown is the standard
error of the mean across the five test splits.

Effect of CNN Model

We now discuss the performance of our four selected models in more detail. On the one hand we have our
traditional Zeiler&Fergus based architecture with 1×1 convolutions (see Table 1). On the other hand we have
Inception based models that dramatically reduce the model size. Overall, in the final performance the top models
of both architectures perform comparably. However, some of our Inception based models, such as NN3, still
achieve good performance while significantly reducing both the FLOPS and the model size.

The detailed evaluation on our personal photos test set is

33 | P a g e
jpeg q val-rate
#pixels val-rate
10 67.3%
1,600 37.8%
20 81.4%
6,400 79.5%
30 83.9%
14,400 84.5%
50 85.5%
25,600 85.7%
70 86.1%
65,536 86.4%
90 86.5%

Table 4. Image Quality. The table on the left shows the effect on the validation rate at 10E-3 precision with
varying JPEG quality. The one on the right shows how the image size in pixels effects the validation rate at 10E-
3 precision. This experiment was done with NN1 on the first split of our test hold-out dataset.

#dims VAL
64 86.8% ± 1.7
128 87.9% ± 1.9
256 87.7% ± 1.9
512 85.6% ± 2.0
Table 5. Embedding Dimensionality. This Table compares the effect of the embedding dimensionality of our
model NN1 on our hold-out set. In addition to the VAL at 10E-3 we also show the standard error of the mean
computed across five splits.

shown in Figure 5. While the largest model achieves a dramatic improvement in accuracy compared to the tiny
NNS2, the latter can be run 30ms / image on a mobile phone and is still accurate enough to be used in face
clustering. The sharp drop in the ROC for FAR < 10−4 indicates noisy labels in the test data ground truth. At
extremely low false accept rates a single mislabeled image can have a significant impact on the curve.

Sensitivity to Image Quality

Table 4 shows the robustness of our model across a wide range of image sizes. The network is surprisingly robust
with respect to JPEG compression and performs very well down to a JPEG quality of 20. The performance drop
is very small for face thumbnails down to a size of 120x120 pixels and even at 80x80 pixels it shows acceptable
performance. This is notable, because the network was trained on 220x220 input images. Training with lower
resolution faces could improve this range further.

Embedding Dimensionality
We explored various embedding dimensionalities and selected 128 for all experiments other than the comparison
reported in Table 5. One would expect the larger embeddings to perform at least as good as the smaller ones,
however, it is possible that they require more training to achieve the same accuracy. That said, the differences in
the performance reported in Table 5 are statistically insignificant.

#training images VAL

2,600,000 76.3%
26,000,000 85.1%
52,000,000 85.1%
260,000,000 86.2%

Table 6. Training Data Size. This table compares the performance after 700h of training for a smaller model with
96x96 pixel inputs. The model architecture is similar to NN2, but without the 5x5 convolutions in the Inception
modules.

It should be noted, that during training a 128 dimensional float vector is used, but it can be quantized to 128-bytes

34 | P a g e
without loss of accuracy. Thus each face is compactly represented by a 128 dimensional byte vector, which is
ideal for large scale clustering and recognition. Smaller embeddings are possible at a minor loss of accuracy and
could be employed on mobile devices.

Amount of Training Data

Table 6 shows the impact of large amounts of training data. Due to time constraints this evaluation was run on a
smaller model; the effect may be even larger on larger models. It is clear that using tens of millions of exemplars
results in a clear boost of accuracy on our personal photo test set from. Compared to only millions of images the
relative reduction in error is 60%. Using another order of magnitude more images (hundreds of millions) still
gives a small boost, but the improvement tapers off.
Performance on LFW
We evaluate our model on LFW using the standard protocol for unrestricted, labeled outside data. Nine training
splits are used to select the L2-distance threshold. Classification (same or different) is then performed on the tenth
test split. The selected optimal threshold is 1.242 for all test splits except split eighth (1.256).
Our model is evaluated in two modes:
1. Fixed center crop of the LFW provided thumbnail.
2. A proprietary face detector (similar to Picasa) is run on the provided LFW thumbnails. If it fails to align the
face (this happens for two images), the LFW alignment is used.
Figure 6 gives an overview of all failure cases. It shows false accepts on the top as well as false rejects at the bot-
tom. We achieve a classification accuracy of 98.87%±0.15 when using the fixed center crop described in (1) and
the record breaking 99.63%±0.09 standard error of the mean when using the extra face alignment (2). This reduces
the
error reported for DeepFace in by more than a factor of 7 and the previous state-of-the-art reported for
DeepId2+ in by 30%. This is the performance of model NN1, but even the much smaller NN3 achieves
performance that is not statistically significantly different.

False accept False reject

Figure 6. LFW errors. This shows all pairs of images that were incorrectly classified on LFW. Only eight of
the 13 false rejects shown here are actual errors the other five are mislabeled in LFW.

Performance on Youtube Faces DB

We use the average similarity of all pairs of the first one hundred frames that our face detector detects in each
video. This gives us a classification accuracy of 95.12%±0.39. Using the first one thousand frames results in

35 | P a g e
95.18%. Compared to 91.4% who also evaluate one hundred frames per video we reduce the error rate by almost
half. DeepId2+ achieved 93.2% and our method reduces this error by 30%, comparable to our improvement on
LFW.
Face Clustering
Our compact embedding lends itself to be used in order to cluster a users personal photos into groups of people
with the same identity. The constraints in assignment imposed by clustering faces, compared to the pure
verification task,

Figure 7. Face Clustering. Shown is an exemplar cluster for one user. All these images in the users personal
photo collection were clustered together.

lead to truly amazing results. Figure 7 shows one cluster in a users personal photo collection, generated using
agglomerative clustering. It is a clear showcase of the incredible invariance to occlusion, lighting, pose and even
age.

36 | P a g e
Summary

We provide a method to directly learn an embedding into an Euclidean space for face verification. This sets it
apart from other methods who use the CNN bottleneck layer, or require additional post-processing such as
concatenation of multiple models and PCA, as well as SVM classification. Our end-to-end training both simplifies
the setup and shows that directly optimizing a loss relevant to the task at hand improves performance.

Figure 8. Harmonic Embedding Compatibility.

These ROCs show the compatibility of the harmonic
embeddings of NN2 to the embeddings of NN1. NN2
is an improved model that performs much better than
NN1. When comparing embeddings generated by
NN1 to the harmonic ones generated by NN2 we can
see the compatibility between the two. In fact, the
mixed mode performance is still better than NN1 by
itself.

Another strength of our model is that it only requires minimal alignment (tight crop around the face area), for
example, performs a complex 3D alignment. We also experimented with a similarity transform alignment and no-
tice that this can actually improve performance slightly. It is not clear if it is worth the extra complexity.
Future work will focus on better understanding of the error cases, further improving the model, and also reducing
model size and reducing CPU requirements. We will also look into ways of improving the currently extremely
long training times, e.g. variations of our curriculum learning with smaller batch sizes and offline as well as online
positive and negative mining.
Appendix: Harmonic Embedding
In this section we introduce the concept of harmonic embeddings. By this we denote a set of embeddings that are
generated by different models v1 and v2 but are compatible in the sense that they can be compared to each other.
This compatibility greatly simplifies upgrade paths.
E.g. in an scenario where embedding v1 was computed across a large set of images and a new embedding model
v2 is being rolled out, this compatibility ensures a smooth transition without the need to worry about version
incomepatibilities. Figure 8 shows results on our 3G dataset. It can be seen that the improved model NN2
significantly outperforms NN1, while the comparison of NN2 embeddings to NN1 embeddings performs at an
intermediate level.

37 | P a g e
Figure 9. Learning the Harmonic Embedding. In
order to learn a harmonic embedding, we generate
triplets that mix the v1 embeddings with the v2
embeddings that are being trained. The semihard
negatives are selected from the whole set of both v1
and v2 embeddings.

Harmonic Triplet Loss

In order to learn the harmonic embedding we mix embeddings of v1 together with the embeddings v2, that are
being learned. This is done inside the triplet loss and results in additionally generated triplets that encourage the
compatibility between the different embedding versions. Figure 9 visualizes the different combinations of triplets
that contribute to the triplet loss.
We initialized the v2 embedding from an independently trained NN2 and retrained the last layer (embedding
layer) from random initialization with the compatibility encouraging triplet loss. First only the last layer is
retrained, then we continue training the whole v2 network with the harmonic loss.
Figure 10 shows a possible interpretation of how this compatibility may work in practice. The vast majority of v2
embeddings may be embedded near the corresponding v1 embedding, however, incorrectly placed v1 embeddings
can be perturbed slightly such that their new location in embedding space improves verification accuracy.

These are very interesting findings and it is somewhat surprising that it works so well. Future work can explore
how far this idea can be extended. Presumably there is a limit as to how much the v2 embedding can improve
over v1, while still being compatible. Additionally it would be interesting to train small networks that can run on
a mobile phone and are compatible to a larger server side model.

Figure 10. Harmonic Embedding Space. This

visualisation sketches a possible interpretation of
how harmonic embeddings are able to improve
verification accuracy while maintaining com-
patibility to less accurate embeddings. In this
scenario there is one misclassified face, whose
embedding is perturbed to the “correct” location
in v2.

38 | P a g e
• Coding Section:
get_faces_from_camera_tkinter.py

import dlib
import numpy as np
import cv2
import os
import shutil
import time
import logging
import tkinter as tk
from tkinter import font as tkFont
from PIL import Image, ImageTk

# Use frontal face detector of Dlib

detector = dlib.get_frontal_face_detector()

class Face_Register:
def __init__(self):

self.current_frame_faces_cnt = 0 # cnt for counting faces in current frame

self.existing_faces_cnt = 0 # cnt for counting saved faces
self.ss_cnt = 0 # cnt for screen shots

# Tkinter GUI
self.win = tk.Tk()
self.win.title("Face Register")

# PLease modify window size here if needed

self.win.geometry("1000x500")

# GUI left part

self.frame_left_camera = tk.Frame(self.win)
self.label = tk.Label(self.win)
self.label.pack(side=tk.LEFT)
self.frame_left_camera.pack()

# GUI right part

self.frame_right_info = tk.Frame(self.win)
self.label_cnt_face_in_database = tk.Label(self.frame_right_info, text=str(self.existing_faces_cnt))
self.label_fps_info = tk.Label(self.frame_right_info, text="")
self.input_name = tk.Entry(self.frame_right_info)
self.input_name_char = ""
self.label_warning = tk.Label(self.frame_right_info)
self.label_face_cnt = tk.Label(self.frame_right_info, text="Faces in current frame: ")
self.log_all = tk.Label(self.frame_right_info)

self.font_title = tkFont.Font(family='Helvetica', size=20, weight='bold')

39 | P a g e
self.font_step_title = tkFont.Font(family='Helvetica', size=15, weight='bold')
self.font_warning = tkFont.Font(family='Helvetica', size=15, weight='bold')

self.path_photos_from_camera = "data/data_faces_from_camera/"
self.current_face_dir = ""
self.font = cv2.FONT_ITALIC

# Current frame and face ROI position

self.current_frame = np.ndarray
self.face_ROI_image = np.ndarray
self.face_ROI_width_start = 0
self.face_ROI_height_start = 0
self.face_ROI_width = 0
self.face_ROI_height = 0
self.ww = 0
self.hh = 0

self.out_of_range_flag = False
self.face_folder_created_flag = False

# FPS
self.frame_time = 0
self.frame_start_time = 0
self.fps = 0
self.fps_show = 0
self.start_time = time.time()

self.cap = cv2.VideoCapture(0) # Get video stream from camera

# self.cap = cv2.VideoCapture("test.mp4") # Input local video

# Delete old face folders

def GUI_clear_data(self):
# "/data_faces_from_camera/person_x/"...
folders_rd = os.listdir(self.path_photos_from_camera)
for i in range(len(folders_rd)):
shutil.rmtree(self.path_photos_from_camera + folders_rd[i])
if os.path.isfile("data/features_all.csv"):
os.remove("data/features_all.csv")
self.label_cnt_face_in_database['text'] = "0"
self.existing_faces_cnt = 0
self.log_all["text"] = "Face images and `features_all.csv` removed!"

def GUI_get_input_name(self):
self.input_name_char = self.input_name.get()
self.create_face_folder()
self.label_cnt_face_in_database['text'] = str(self.existing_faces_cnt)

def GUI_info(self):
tk.Label(self.frame_right_info,
text="Face register",
40 | P a g e
font=self.font_title).grid(row=0, column=0, columnspan=3, sticky=tk.W, padx=2, pady=20)

tk.Label(self.frame_right_info, text="FPS: ").grid(row=1, column=0, sticky=tk.W, padx=5, pady=2)

self.label_fps_info.grid(row=1, column=1, sticky=tk.W, padx=5, pady=2)

tk.Label(self.frame_right_info, text="Faces in database: ").grid(row=2, column=0, sticky=tk.W, padx=5,

pady=2)
self.label_cnt_face_in_database.grid(row=2, column=1, sticky=tk.W, padx=5, pady=2)

tk.Label(self.frame_right_info,
text="Faces in current frame: ").grid(row=3, column=0, columnspan=2, sticky=tk.W, padx=5,
pady=2)
self.label_face_cnt.grid(row=3, column=2, columnspan=3, sticky=tk.W, padx=5, pady=2)

self.label_warning.grid(row=4, column=0, columnspan=3, sticky=tk.W, padx=5, pady=2)

# Step 1: Clear old data

tk.Label(self.frame_right_info,
font=self.font_step_title,
text="Step 1: Clear face photos").grid(row=5, column=0, columnspan=2, sticky=tk.W, padx=5,
pady=20)
tk.Button(self.frame_right_info,
text='Clear',
command=self.GUI_clear_data).grid(row=6, column=0, columnspan=3, sticky=tk.W, padx=5,
pady=2)

# Step 2: Input name and create folders for face

tk.Label(self.frame_right_info,
font=self.font_step_title,
text="Step 2: Input name").grid(row=7, column=0, columnspan=2, sticky=tk.W, padx=5, pady=20)

tk.Label(self.frame_right_info, text="Name: ").grid(row=8, column=0, sticky=tk.W, padx=5, pady=0)

self.input_name.grid(row=8, column=1, sticky=tk.W, padx=0, pady=2)

tk.Button(self.frame_right_info,
text='Input',
command=self.GUI_get_input_name).grid(row=8, column=2, padx=5)

# Step 3: Save current face in frame

tk.Label(self.frame_right_info,
font=self.font_step_title,
text="Step 3: Save face image").grid(row=9, column=0, columnspan=2, sticky=tk.W, padx=5,
pady=20)

tk.Button(self.frame_right_info,
text='Save current face',
command=self.save_current_face).grid(row=10, column=0, columnspan=3, sticky=tk.W)

# Show log in GUI

self.log_all.grid(row=11, column=0, columnspan=20, sticky=tk.W, padx=5, pady=20)

41 | P a g e
self.frame_right_info.pack()

# Mkdir for saving photos and csv

def pre_work_mkdir(self):
# Create folders to save face images and csv
if os.path.isdir(self.path_photos_from_camera):
pass
else:
os.mkdir(self.path_photos_from_camera)

# Start from person_x+1

def check_existing_faces_cnt(self):
if os.listdir("data/data_faces_from_camera/"):
# Get the order of latest person
person_list = os.listdir("data/data_faces_from_camera/")
person_num_list = []
for person in person_list:
person_order = person.split('_')[1].split('_')[0]
person_num_list.append(int(person_order))
self.existing_faces_cnt = max(person_num_list)

# Start from person_1

else:
self.existing_faces_cnt = 0

# Update FPS of Video stream

def update_fps(self):
now = time.time()
# Refresh fps per second
if str(self.start_time).split(".")[0] != str(now).split(".")[0]:
self.fps_show = self.fps
self.start_time = now
self.frame_time = now - self.frame_start_time
self.fps = 1.0 / self.frame_time
self.frame_start_time = now

self.label_fps_info["text"] = str(self.fps.__round__(2))

def create_face_folder(self):
# Create the folders for saving faces
self.existing_faces_cnt += 1
if self.input_name_char:
self.current_face_dir = self.path_photos_from_camera + \
"person_" + str(self.existing_faces_cnt) + "_" + \
self.input_name_char
else:
self.current_face_dir = self.path_photos_from_camera + \
"person_" + str(self.existing_faces_cnt)
os.makedirs(self.current_face_dir)
self.log_all["text"] = "\"" + self.current_face_dir + "/\" created!"
logging.info("\n%-40s %s", "Create folders:", self.current_face_dir)
42 | P a g e
self.ss_cnt = 0 # Clear the cnt of screen shots
self.face_folder_created_flag = True # Face folder already created

def save_current_face(self):
if self.face_folder_created_flag:
if self.current_frame_faces_cnt == 1:
if not self.out_of_range_flag:
self.ss_cnt += 1
# Create blank image according to the size of face detected
self.face_ROI_image = np.zeros((int(self.face_ROI_height * 2), self.face_ROI_width * 2, 3),
np.uint8)
for ii in range(self.face_ROI_height * 2):
for jj in range(self.face_ROI_width * 2):
self.face_ROI_image[ii][jj] = self.current_frame[self.face_ROI_height_start - self.hh + ii][
self.face_ROI_width_start - self.ww + jj]
self.log_all["text"] = "\"" + self.current_face_dir + "/img_face_" + str(
self.ss_cnt) + ".jpg\"" + " saved!"
self.face_ROI_image = cv2.cvtColor(self.face_ROI_image, cv2.COLOR_BGR2RGB)

cv2.imwrite(self.current_face_dir + "/img_face_" + str(self.ss_cnt) + ".jpg", self.face_ROI_image)

logging.info("%-40s %s/img_face_%s.jpg", "Save into：",
str(self.current_face_dir), str(self.ss_cnt) + ".jpg")
else:
self.log_all["text"] = "Please do not out of range!"
else:
self.log_all["text"] = "No face in current frame!"
else:
self.log_all["text"] = "Please run step 2!"

def get_frame(self):
try:
if self.cap.isOpened():
ret, frame = self.cap.read()
frame = cv2.resize(frame, (640,480))
return ret, cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
except:
print("Error: No video input!!!")

# Main process of face detection and saving

def process(self):
ret, self.current_frame = self.get_frame()
faces = detector(self.current_frame, 0)
# Get frame
if ret:
self.update_fps()
self.label_face_cnt["text"] = str(len(faces))
# Face detected
if len(faces) != 0:
# Show the ROI of faces
for k, d in enumerate(faces):
43 | P a g e
self.face_ROI_width_start = d.left()
self.face_ROI_height_start = d.top()
# Compute the size of rectangle box
self.face_ROI_height = (d.bottom() - d.top())
self.face_ROI_width = (d.right() - d.left())
self.hh = int(self.face_ROI_height / 2)
self.ww = int(self.face_ROI_width / 2)

# If the size of ROI > 480x640

if (d.right() + self.ww) > 640 or (d.bottom() + self.hh > 480) or (d.left() - self.ww < 0) or (
d.top() - self.hh < 0):
self.label_warning["text"] = "OUT OF RANGE"
self.label_warning['fg'] = 'red'
self.out_of_range_flag = True
color_rectangle = (255, 0, 0)
else:
self.out_of_range_flag = False
self.label_warning["text"] = ""
color_rectangle = (255, 255, 255)
self.current_frame = cv2.rectangle(self.current_frame,
tuple([d.left() - self.ww, d.top() - self.hh]),
tuple([d.right() + self.ww, d.bottom() + self.hh]),
color_rectangle, 2)
self.current_frame_faces_cnt = len(faces)

# Convert PIL.Image.Image to PIL.Image.PhotoImage

img_Image = Image.fromarray(self.current_frame)
img_PhotoImage = ImageTk.PhotoImage(image=img_Image)
self.label.img_tk = img_PhotoImage
self.label.configure(image=img_PhotoImage)

# Refresh frame
self.win.after(20, self.process)

def run(self):
self.pre_work_mkdir()
self.check_existing_faces_cnt()
self.GUI_info()
self.process()
self.win.mainloop()

def main():
logging.basicConfig(level=logging.INFO)
Face_Register_con = Face_Register()
Face_Register_con.run()

if __name__ == '__main__':
main()

44 | P a g e
features_extraction_to_csv.py
import os
import dlib
import csv
import numpy as np
import logging
import cv2

# Path of cropped faces

path_images_from_camera = "data/data_faces_from_camera/"

# Use frontal face detector of Dlib

detector = dlib.get_frontal_face_detector()

# Get face landmarks

predictor = dlib.shape_predictor('data/data_dlib/shape_predictor_68_face_landmarks.dat')

# Use Dlib resnet50 model to get 128D face descriptor

face_reco_model =
dlib.face_recognition_model_v1("data/data_dlib/dlib_face_recognition_resnet_model_v1.dat")

# Return 128D features for single image

def return_128d_features(path_img):
img_rd = cv2.imread(path_img)
faces = detector(img_rd, 1)

logging.info("%-40s %-20s", " Image with faces detected:", path_img)

# For photos of faces saved, we need to make sure that we can detect faces from the cropped images
if len(faces) != 0:
shape = predictor(img_rd, faces[0])
face_descriptor = face_reco_model.compute_face_descriptor(img_rd, shape)
else:
face_descriptor = 0
logging.warning("no face")
return face_descriptor

# Return the mean value of 128D face descriptor for person X

def return_features_mean_personX(path_face_personX):
features_list_personX = []
photos_list = os.listdir(path_face_personX)
if photos_list:
for i in range(len(photos_list)):
# return_128d_features() 128D / Get 128D features for single image of personX
logging.info("%-40s %-20s", " / Reading image:", path_face_personX + "/" + photos_list[i])
features_128d = return_128d_features(path_face_personX + "/" + photos_list[i])
# Jump if no face detected from image
45 | P a g e
if features_128d == 0:
i += 1
else:
features_list_personX.append(features_128d)
else:
logging.warning(" Warning: No images in%s/", path_face_personX)

if features_list_personX:
features_mean_personX = np.array(features_list_personX, dtype=object).mean(axis=0)
else:
features_mean_personX = np.zeros(128, dtype=object, order='C')
return features_mean_personX

def main():
logging.basicConfig(level=logging.INFO)
# Get the order of latest person
person_list = os.listdir("data/data_faces_from_camera/")
person_list.sort()

with open("data/features_all.csv", "w", newline="") as csvfile:

writer = csv.writer(csvfile)
for person in person_list:
# Get the mean/average features of face/personX, it will be a list with a length of 128D
logging.info("%sperson_%s", path_images_from_camera, person)
features_mean_personX = return_features_mean_personX(path_images_from_camera + person)

if len(person.split('_', 2)) == 2:
# "person_x"
person_name = person
else:
# "person_x_tom"
person_name = person.split('_', 2)[-1]
features_mean_personX = np.insert(features_mean_personX, 0, person_name, axis=0)
# features_mean_personX will be 129D, person name + 128 features
writer.writerow(features_mean_personX)
logging.info('\n')
logging.info("Save all the features of faces registered into: data/features_all.csv")

if __name__ == '__main__':
main()
attendance_taker.py

import dlib
import numpy as np
import cv2
import os
import pandas as pd
import time
46 | P a g e
import logging
import sqlite3
import datetime

# Dlib / Use frontal face detector of Dlib

detector = dlib.get_frontal_face_detector()

# Dlib landmark / Get face landmarks

predictor = dlib.shape_predictor('data/data_dlib/shape_predictor_68_face_landmarks.dat')

# Dlib Resnet Use Dlib resnet50 model to get 128D face descriptor
face_reco_model =
dlib.face_recognition_model_v1("data/data_dlib/dlib_face_recognition_resnet_model_v1.dat")

# Create a connection to the database

conn = sqlite3.connect("attendance.db")
cursor = conn.cursor()

# Create a table for the current date

current_date = datetime.datetime.now().strftime("%Y_%m_%d") # Replace hyphens with underscores
table_name = "attendance"
create_table_sql = f"CREATE TABLE IF NOT EXISTS {table_name} (name TEXT, time TEXT, date DATE,
UNIQUE(name, date))"
cursor.execute(create_table_sql)

# Commit changes and close the connection

conn.commit()
conn.close()

class Face_Recognizer:
def __init__(self):
self.font = cv2.FONT_ITALIC

# FPS
self.frame_time = 0
self.frame_start_time = 0
self.fps = 0
self.fps_show = 0
self.start_time = time.time()

# cnt for frame

self.frame_cnt = 0

# Save the features of faces in the database

self.face_features_known_list = []
# / Save the name of faces in the database
self.face_name_known_list = []

47 | P a g e
# List to save centroid positions of ROI in frame N-1 and N
self.last_frame_face_centroid_list = []
self.current_frame_face_centroid_list = []

# List to save names of objects in frame N-1 and N

self.last_frame_face_name_list = []
self.current_frame_face_name_list = []

# cnt for faces in frame N-1 and N

self.last_frame_face_cnt = 0
self.current_frame_face_cnt = 0

# Save the e-distance for faceX when recognizing

self.current_frame_face_X_e_distance_list = []

# Save the positions and names of current faces captured

self.current_frame_face_position_list = []
# Save the features of people in current frame
self.current_frame_face_feature_list = []

# e distance between centroid of ROI in last and current frame

self.last_current_frame_centroid_e_distance = 0

# Reclassify after 'reclassify_interval' frames

self.reclassify_interval_cnt = 0
self.reclassify_interval = 10

# "features_all.csv" / Get known faces from "features_all.csv"

def get_face_database(self):
if os.path.exists("data/features_all.csv"):
path_features_known_csv = "data/features_all.csv"
csv_rd = pd.read_csv(path_features_known_csv, header=None)
for i in range(csv_rd.shape[0]):
features_someone_arr = []
self.face_name_known_list.append(csv_rd.iloc[i][0])
for j in range(1, 129):
if csv_rd.iloc[i][j] == '':
features_someone_arr.append('0')
else:
features_someone_arr.append(csv_rd.iloc[i][j])
self.face_features_known_list.append(features_someone_arr)
logging.info("Faces in Database： %d", len(self.face_features_known_list))
return 1
else:
logging.warning("'features_all.csv' not found!")
logging.warning("Please run 'get_faces_from_camera.py' "
"and 'features_extraction_to_csv.py' before 'face_reco_from_camera.py'")
return 0

def update_fps(self):
now = time.time()
48 | P a g e
# Refresh fps per second
if str(self.start_time).split(".")[0] != str(now).split(".")[0]:
self.fps_show = self.fps
self.start_time = now
self.frame_time = now - self.frame_start_time
self.fps = 1.0 / self.frame_time
self.frame_start_time = now

@staticmethod
# / Compute the e-distance between two 128D features
def return_euclidean_distance(feature_1, feature_2):
feature_1 = np.array(feature_1)
feature_2 = np.array(feature_2)
dist = np.sqrt(np.sum(np.square(feature_1 - feature_2)))
return dist

# / Use centroid tracker to link face_x in current frame with person_x in last frame
def centroid_tracker(self):
for i in range(len(self.current_frame_face_centroid_list)):
e_distance_current_frame_person_x_list = []
# For object 1 in current_frame, compute e-distance with object 1/2/3/4/... in last frame
for j in range(len(self.last_frame_face_centroid_list)):
self.last_current_frame_centroid_e_distance = self.return_euclidean_distance(
self.current_frame_face_centroid_list[i], self.last_frame_face_centroid_list[j])

e_distance_current_frame_person_x_list.append(
self.last_current_frame_centroid_e_distance)

last_frame_num = e_distance_current_frame_person_x_list.index(
min(e_distance_current_frame_person_x_list))
self.current_frame_face_name_list[i] = self.last_frame_face_name_list[last_frame_num]

# cv2 window / putText on cv2 window

def draw_note(self, img_rd):
# / Add some info on windows
cv2.putText(img_rd, "Face Recognizer with Deep Learning", (20, 40), self.font, 1, (255, 255, 255), 1,
cv2.LINE_AA)
cv2.putText(img_rd, "Frame: " + str(self.frame_cnt), (20, 100), self.font, 0.8, (0, 255, 0), 1,
cv2.LINE_AA)
cv2.putText(img_rd, "FPS: " + str(self.fps.__round__(2)), (20, 130), self.font, 0.8, (0, 255, 0), 1,
cv2.LINE_AA)
cv2.putText(img_rd, "Faces: " + str(self.current_frame_face_cnt), (20, 160), self.font, 0.8, (0, 255, 0), 1,
cv2.LINE_AA)
cv2.putText(img_rd, "Q: Quit", (20, 450), self.font, 0.8, (255, 255, 255), 1, cv2.LINE_AA)

for i in range(len(self.current_frame_face_name_list)):
img_rd = cv2.putText(img_rd, "Face_" + str(i + 1), tuple(
[int(self.current_frame_face_centroid_list[i][0]), int(self.current_frame_face_centroid_list[i][1])]),
self.font,
0.8, (255, 190, 0),
1,
49 | P a g e
cv2.LINE_AA)
# insert data in database

def attendance(self, name):

current_date = datetime.datetime.now().strftime('%Y-%m-%d')
conn = sqlite3.connect("attendance.db")
cursor = conn.cursor()
# Check if the name already has an entry for the current date
cursor.execute("SELECT * FROM attendance WHERE name = ? AND date = ?", (name, current_date))
existing_entry = cursor.fetchone()

if existing_entry:
print(f"{name} is already marked as present for {current_date}")
else:
current_time = datetime.datetime.now().strftime('%H:%M:%S')
cursor.execute("INSERT INTO attendance (name, time, date) VALUES (?, ?, ?)", (name, current_time,
current_date))
conn.commit()
print(f"{name} marked as present for {current_date} at {current_time}")

conn.close()

# Face detection and recognition wit OT from input video stream

def process(self, stream):
# 1. Get faces known from "features.all.csv"
if self.get_face_database():
while stream.isOpened():
self.frame_cnt += 1
logging.debug("Frame " + str(self.frame_cnt) + " starts")
flag, img_rd = stream.read()
kk = cv2.waitKey(1)

# 2. Detect faces for frame X

faces = detector(img_rd, 0)

# 3. Update cnt for faces in frames

self.last_frame_face_cnt = self.current_frame_face_cnt
self.current_frame_face_cnt = len(faces)

# 4. Update the face name list in last frame

self.last_frame_face_name_list = self.current_frame_face_name_list[:]

# 5. update frame centroid list

self.last_frame_face_centroid_list = self.current_frame_face_centroid_list
self.current_frame_face_centroid_list = []

# 6.1 if cnt not changes

if (self.current_frame_face_cnt == self.last_frame_face_cnt) and (
self.reclassify_interval_cnt != self.reclassify_interval):
logging.debug("scene 1: No face cnt changes in this frame!!!")

50 | P a g e
self.current_frame_face_position_list = []

if "unknown" in self.current_frame_face_name_list:
self.reclassify_interval_cnt += 1

if self.current_frame_face_cnt != 0:
for k, d in enumerate(faces):
self.current_frame_face_position_list.append(tuple(
[faces[k].left(), int(faces[k].bottom() + (faces[k].bottom() - faces[k].top()) / 4)]))
self.current_frame_face_centroid_list.append(
[int(faces[k].left() + faces[k].right()) / 2,
int(faces[k].top() + faces[k].bottom()) / 2])

img_rd = cv2.rectangle(img_rd,
tuple([d.left(), d.top()]),
tuple([d.right(), d.bottom()]),
(255, 255, 255), 2)

# Multi-faces in current frame, use centroid-tracker to track

if self.current_frame_face_cnt != 1:
self.centroid_tracker()

for i in range(self.current_frame_face_cnt):
# 6.2 Write names under ROI
img_rd = cv2.putText(img_rd, self.current_frame_face_name_list[i],
self.current_frame_face_position_list[i], self.font, 0.8, (0, 255, 255), 1,
cv2.LINE_AA)
self.draw_note(img_rd)

# 6.2 If cnt of faces changes, 0->1 or 1->0 or ...

else:
logging.debug("scene 2: / Faces cnt changes in this frame")
self.current_frame_face_position_list = []
self.current_frame_face_X_e_distance_list = []
self.current_frame_face_feature_list = []
self.reclassify_interval_cnt = 0

# 6.2.1 Face cnt decreases: 1->0, 2->1, ...

if self.current_frame_face_cnt == 0:
logging.debug(" / No faces in this frame!!!")
# clear list of names and features
self.current_frame_face_name_list = []
# 6.2.2 / Face cnt increase: 0->1, 0->2, ..., 1->2, ...
else:
logging.debug(" scene 2.2 Get faces in this frame and do face recognition")
self.current_frame_face_name_list = []
for i in range(len(faces)):
shape = predictor(img_rd, faces[i])
self.current_frame_face_feature_list.append(
face_reco_model.compute_face_descriptor(img_rd, shape))
self.current_frame_face_name_list.append("unknown")
51 | P a g e
# 6.2.2.1 Traversal all the faces in the database
for k in range(len(faces)):
logging.debug(" For face %d in current frame:", k + 1)
self.current_frame_face_centroid_list.append(
[int(faces[k].left() + faces[k].right()) / 2,
int(faces[k].top() + faces[k].bottom()) / 2])

self.current_frame_face_X_e_distance_list = []

# 6.2.2.2 Positions of faces captured

self.current_frame_face_position_list.append(tuple(
[faces[k].left(), int(faces[k].bottom() + (faces[k].bottom() - faces[k].top()) / 4)]))

# 6.2.2.3
# For every faces detected, compare the faces in the database
for i in range(len(self.face_features_known_list)):
#
if str(self.face_features_known_list[i][0]) != '0.0':
e_distance_tmp = self.return_euclidean_distance(
self.current_frame_face_feature_list[k],
self.face_features_known_list[i])
logging.debug(" with person %d, the e-distance: %f", i + 1, e_distance_tmp)
self.current_frame_face_X_e_distance_list.append(e_distance_tmp)
else:
# person_X
self.current_frame_face_X_e_distance_list.append(999999999)

# 6.2.2.4 / Find the one with minimum e distance

similar_person_num = self.current_frame_face_X_e_distance_list.index(
min(self.current_frame_face_X_e_distance_list))

if min(self.current_frame_face_X_e_distance_list) < 0.4:

self.current_frame_face_name_list[k] = self.face_name_known_list[similar_person_num]
logging.debug(" Face recognition result: %s",
self.face_name_known_list[similar_person_num])

# Insert attendance record

nam =self.face_name_known_list[similar_person_num]

print(type(self.face_name_known_list[similar_person_num]))
print(nam)
self.attendance(nam)
else:
logging.debug(" Face recognition result: Unknown person")

# 7. / Add note on cv2 window

self.draw_note(img_rd)

# 8. 'q' / Press 'q' to exit

if kk == ord('q'):
52 | P a g e
break

self.update_fps()
cv2.namedWindow("camera", 1)
cv2.imshow("camera", img_rd)

logging.debug("Frame ends\n\n")

def run(self):
# cap = cv2.VideoCapture("video.mp4") # Get video stream from video file
cap = cv2.VideoCapture(0) # Get video stream from camera
self.process(cap)

cap.release()
cv2.destroyAllWindows()

def main():
# logging.basicConfig(level=logging.DEBUG) # Set log level to 'logging.DEBUG' to print debug info of every
frame
logging.basicConfig(level=logging.INFO)
Face_Recognizer_con = Face_Recognizer()
Face_Recognizer_con.run()

if __name__ == '__main__':
main()

app.py

from flask import Flask, render_template, request

import sqlite3
from datetime import datetime

app = Flask(__name__)

@app.route('/')
def index():
return render_template('index.html', selected_date='', no_data=False)

@app.route('/attendance', methods=['POST'])
def attendance():
selected_date = request.form.get('selected_date')
selected_date_obj = datetime.strptime(selected_date, '%Y-%m-%d')
formatted_date = selected_date_obj.strftime('%Y-%m-%d')
53 | P a g e
conn = sqlite3.connect('attendance.db')
cursor = conn.cursor()

cursor.execute("SELECT name, time FROM attendance WHERE date = ?", (formatted_date,))

attendance_data = cursor.fetchall()

conn.close()

if not attendance_data:
return render_template('index.html', selected_date=selected_date, no_data=True)

return render_template('index.html', selected_date=selected_date, attendance_data=attendance_data)

if __name__ == '__main__':
app.run(debug=True)

templates/index.html

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Attendance Tracker Sheet</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
<style>
body {
font-family: Arial, sans-serif;
background-color: #f4f4f4;
}
form {
margin-top: 50px;
display: flex;
flex-direction: column;
align-items: center;
border: 1px solid #ddd;
padding: 20px;
border-radius: 5px;
box-shadow: 0px 0px 10px rgba(0, 0, 0, 0.2);
background-color: #fff;
width: 50%;
margin-left: auto;
margin-right: auto;
}
label {
font-size: 20px;
margin-bottom: 10px;
color: #333;
}
54 | P a g e
input[type="date"] {
padding: 10px 20px;
border-radius: 5px;
border: none;
margin-bottom: 20px;
font-size: 18px;
width: 100%;
box-sizing: border-box;
margin-top: 10px;
margin-bottom: 20px;
}
button[type="submit"] {
background-color: #333;
color: #fff;
border: none;
padding: 10px;
border-radius: 5px;
cursor: pointer;
font-size: 18px;
}
button[type="submit"]:hover {
background-color: #555;
}
</style>
</head>
<body>
<div class="jumbotron text-center">
<h1 class="display-4">Attendance Tracker Sheet</h1>
</div>
<hr>

<form action="/attendance" method="POST" id="attn-form">

<label for="selected_date">Select Date: </label>
<input type="date" id="selected_date" name="selected_date" required value="{{ selected_date }}">
<button type="submit" class="btn btn-outline-success">Show attendance</button>
</form>

<div class="container mt-5">

{% if no_data %}
<div class="alert alert-warning" role="alert">
No attendance data available for the selected date.
</div>
{% endif %}

<h2>Attendance Data Table</h2>

<table class="table">
<thead>
<tr>
<th scope="col">Name</th>
<th scope="col">Time</th>
</tr>
55 | P a g e
</thead>
<tbody>
{% for name, time in attendance_data %}
<tr>
<td>{{ name }}</td>
<td>{{ time }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-

HwwvtgBNo3bZJJLYd8oVXjrBZt8cqVSpeBNS5n7C8IVInixGAoxmnlMuBnhbgrkm"
crossorigin="anonymous"></script>
</body>
</html>

requirements.txt

dlib==19.17.0
numpy==1.22.0
scikit-image==0.18.3
pandas==1.3.4
opencv-python==4.5.4.58
flask

Usage
1. Collect the Faces Dataset by running python get_faces_from_camera_tkinter.py .
2. Convert the dataset into python features_extraction_to_csv.py.
3. To take the attendance run python attendance_taker.py .
4. Check the Database by python app.py.

Note : Download the dlib models from

https://drive.google.com/drive/folders/12It2jeNQOxwStBxtagL1vvIJokoz-DL4?usp=sharing and place the data
folder inside the project folder.

The Face Recognition system basically identifies the 68 points in human Face. As per the below figure.

56 | P a g e
Fig : The 68 landmarks detected by dlib library

57 | P a g e
RESULT

Executing get_faces_from_camera_tkinter.py

58 | P a g e
executing features_extraction_to_csv.py

59 | P a g e
executing attendance_taker.py

60 | P a g e
executing app.py

61 | P a g e
CONCLUSION

The project "Face Recognition for E-Authentication" marks a pivotal step forward in the field of biometric
authentication, particularly in enhancing the security and user experience of electronic authentication (e-
authentication) systems. As the digital landscape evolves, the reliance on secure and efficient authentication methods
becomes increasingly critical. Traditional methods such as passwords and PINs are becoming less reliable due to
vulnerabilities like phishing attacks, password leaks, and brute force attacks. Face recognition technology, leveraging
the unique biometric characteristics of an individual's face, offers a promising solution to these challenges.
The project's primary aim was to develop a robust and efficient face recognition system tailored specifically for e-
authentication purposes. This involved the design and implementation of sophisticated algorithms for face detection,
feature extraction, and matching, utilizing state-of-the-art deep learning techniques such as convolutional neural
networks (CNNs). By analyzing and comparing facial features captured from an individual's face with pre-registered
templates, the system aims to authenticate users with high accuracy and confidence.
One of the key achievements of the project is the enhancement of security measures. Beyond traditional authentication
methods, the system incorporates advanced anti-spoofing mechanisms to detect and prevent unauthorized access
attempts. These mechanisms are designed to differentiate between genuine facial images and spoofed or manipulated
ones, thus mitigating the risk of presentation attacks. Additionally, encryption and secure communication protocols
were integrated to protect sensitive data during the authentication process, ensuring compliance with privacy laws
and regulations governing biometric data.
Another significant aspect of the project is the focus on optimizing accuracy and reliability. Through rigorous testing
and validation, the system minimizes false acceptance and rejection rates, ensuring robust performance under various
environmental conditions and scenarios. The fine-tuning of system parameters and optimization of algorithms were
crucial steps in achieving this objective. The project also explored adaptive learning techniques to continuously
improve the system's performance as it encounters new data and scenarios.
The project placed a strong emphasis on enhancing the user experience. An intuitive and user-friendly interface was
designed to facilitate seamless interaction with the face recognition system. Usability enhancements such as adaptive
feedback mechanisms and multi-modal authentication approaches were incorporated to streamline the authentication
process and minimize user friction. These efforts ensure that the system is accessible to users with diverse
backgrounds and varying levels of technical proficiency.
Compatibility and scalability were also key considerations in the project. The system was designed to be compatible
with existing e-authentication frameworks and platforms, enabling seamless integration in diverse application
environments. The architecture of the system was made scalable and adaptable to accommodate future enhancements
and expansions. This ensures that the system can evolve with technological advancements and changing user needs.
Ethical and legal considerations were paramount throughout the project. Measures were put in place to adhere to
ethical guidelines and legal regulations governing the collection, storage, and processing of biometric data. User
privacy and confidentiality were prioritized, with stringent measures such as data anonymization and secure storage
practices implemented to protect sensitive information. Regular ethical reviews and assessments were conducted to
ensure compliance and address any emerging concerns.
Documentation and knowledge sharing were integral components of the project. The development process,
methodologies, and findings were comprehensively documented to facilitate knowledge sharing and replication
within the academic and professional community. Research papers, reports, and technical documentation were
published to disseminate the project outcomes and contribute to the broader advancement of face recognition
technology for e-authentication.
In conclusion, the project "Face Recognition for E-Authentication" successfully demonstrates the potential of face
recognition technology to provide a secure, reliable, and user-friendly solution for electronic authentication. By
addressing key objectives such as system development, security enhancement, accuracy optimization, user experience
62 | P a g e
enhancement, compatibility and scalability, ethical and legal compliance, and comprehensive documentation, the
project makes a valuable contribution to the field. The advancements achieved in this project pave the way for the
broader adoption of face recognition technology in various online applications and services, enhancing the security
and efficiency of digital interactions in today's interconnected world.
The successful implementation of the face recognition system for e-authentication underscores the importance of
continuous innovation and development in the field of biometric authentication. As technology advances and new
challenges emerge, ongoing research and development efforts will be crucial to further improve the accuracy,
security, and usability of face recognition systems. Future work may focus on exploring advanced machine learning
techniques, incorporating additional biometric modalities, and addressing emerging security threats to ensure that
face recognition technology remains a reliable and robust solution for e-authentication.
Overall, the project "Face Recognition for E-Authentication" represents a significant milestone in the evolution of
authentication technologies. By leveraging the unique capabilities of face recognition, the project enhances both the
security and user experience of e-authentication systems, contributing to a safer and more efficient digital landscape.
Through its comprehensive approach and focus on continuous improvement, the project sets a strong foundation for
future advancements in the field, ensuring that face recognition technology continues to play a crucial role in securing
digital interactions and protecting user identities.

63 | P a g e
SCOPE FOR FUTURE WORK

The future work for the project of "Face Recognition for E-Authentication" encompasses a broad spectrum of
advancements and explorations, aimed at enhancing the performance, security, and applicability of face recognition
technology in e-authentication systems. As the digital landscape continues to evolve, and as face recognition
technology matures, several key areas for future research and development become evident. These areas include
improving algorithmic accuracy and robustness, addressing ethical and privacy concerns, enhancing user experience,
exploring multi-modal authentication systems, ensuring compliance with emerging regulations, and expanding the
applicability of face recognition systems to new domains and environments.
Firstly, enhancing the algorithmic accuracy and robustness of face recognition systems remains a paramount focus.
This involves leveraging advancements in deep learning and artificial intelligence to develop more sophisticated
algorithms capable of handling a wider range of variations in facial expressions, lighting conditions, and occlusions.
Future work could explore the integration of advanced neural network architectures, such as transformers, and the
application of transfer learning to improve the generalization capabilities of face recognition models. Additionally,
ongoing research into adversarial robustness is crucial, as it addresses the susceptibility of face recognition systems
to adversarial attacks that can manipulate the input data to deceive the system.
Secondly, ethical and privacy concerns surrounding the use of biometric data, including face recognition, must be
thoroughly addressed. Future work should focus on developing techniques for secure and privacy-preserving face
recognition, such as federated learning and homomorphic encryption. These techniques enable the training and
deployment of face recognition models without directly accessing or storing sensitive biometric data, thereby
mitigating privacy risks. Furthermore, the development of transparent and explainable AI models will enhance user
trust and compliance with ethical guidelines by providing insights into how face recognition decisions are made.
Enhancing user experience is another critical area for future research. This involves designing more intuitive and
accessible user interfaces that cater to diverse user populations, including individuals with disabilities. Future work
could explore the use of haptic feedback, voice assistance, and other adaptive technologies to create more inclusive
authentication experiences. Additionally, user studies and usability testing will be essential to identify and address
pain points in the user journey, ensuring that the face recognition system is not only secure but also user-friendly.
Exploring multi-modal authentication systems represents a promising avenue for future research. Combining face
recognition with other biometric modalities, such as voice recognition, fingerprint scanning, or iris recognition, can
significantly enhance the security and reliability of e-authentication systems. Multi-modal authentication systems can
leverage the strengths of each modality to provide a more comprehensive and robust authentication solution. Future
work should focus on developing seamless integration techniques and evaluating the performance of multi-modal
systems in various real-world scenarios.
Ensuring compliance with emerging regulations and standards is another crucial aspect of future work. As
governments and regulatory bodies continue to develop and implement regulations related to biometric data usage, it
is essential to stay abreast of these changes and ensure that face recognition systems comply with all relevant laws
and standards. This includes implementing robust data protection measures, conducting regular audits, and
maintaining transparency with users about how their biometric data is collected, stored, and used.

64 | P a g e
Expanding the applicability of face recognition systems to new domains and environments also presents significant
opportunities for future research. Beyond e-authentication, face recognition technology can be applied to areas such
as access control, surveillance, and personalized user experiences. Each of these applications presents unique
challenges and requirements that must be addressed. For example, in surveillance, face recognition systems must be
able to operate effectively in crowded and dynamic environments, while in personalized user experiences, they must
be able to recognize individuals across different contexts and devices.
Moreover, future work should focus on addressing the challenges of scalability and deployment in diverse
environments. Developing lightweight and efficient models that can run on a variety of devices, from high-
performance servers to resource-constrained mobile devices, is essential for widespread adoption. Techniques such
as model compression, quantization, and edge computing can help achieve this goal by reducing the computational
and memory requirements of face recognition models.
Another important area for future research is the continuous improvement of anti-spoofing mechanisms. As attackers
develop more sophisticated spoofing techniques, it is crucial to stay ahead by developing advanced liveness detection
methods and other countermeasures. Future work could explore the use of multi-spectral imaging, thermal cameras,
and behavioral biometrics to enhance the ability of face recognition systems to detect and prevent spoofing attempts.
Furthermore, cross-cultural and demographic considerations must be taken into account to ensure the fairness and
inclusivity of face recognition systems. Future research should focus on developing datasets and models that are
representative of diverse populations, minimizing biases that can lead to unequal performance across different
demographic groups. Collaboration with international organizations and communities can help identify and address
cultural nuances and ethical considerations specific to different regions.
In addition to technical advancements, future work should also focus on fostering collaboration and knowledge
sharing within the research community. Establishing open-source initiatives, creating benchmark datasets, and
organizing conferences and workshops can facilitate the exchange of ideas and best practices. Collaboration with
industry partners can also help bridge the gap between research and practical applications, ensuring that advancements
in face recognition technology are translated into real-world solutions that benefit society.
Finally, future work should consider the long-term societal implications of face recognition technology. This includes
exploring the impact on privacy, civil liberties, and social dynamics. Engaging with policymakers, ethicists, and the
public in discussions about the appropriate use and regulation of face recognition technology is essential to ensure
that its deployment aligns with societal values and expectations.
In conclusion, the scope for future work in the project of "Face Recognition for E-Authentication" is vast and
multifaceted, encompassing technical advancements, ethical considerations, user experience enhancements,
regulatory compliance, and societal impact. By addressing these areas, future research and development efforts can
ensure that face recognition technology continues to evolve and improve, providing secure, reliable, and user-friendly
authentication solutions for a wide range of applications. Through ongoing innovation and collaboration, face
recognition technology can play a crucial role in shaping the future of digital security and interactions.

65 | P a g e
REFERENCES
1. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proc. of ICML, New York, NY,
USA, 2009. 2
2. D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Proc. ECCV,
2012. 2
3. D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In Proc. ECCV, 2014.
7
4. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, Q. V.
Le, and A. Y. Ng. Large scale distributed deep networks. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K.
Weinberger, editors, NIPS, pages 1232–1240. 2012. 10
5. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization.
J. Mach. Learn. Res., 12:2121–2159, July 2011. 4
6. I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In In ICML, 2013.
4
7. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying
face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst,
October 2007. 5
8. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation
applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, Dec. 1989. 2, 4
9. M. Lin, Q. Chen, and S. Yan. Network in network. CoRR, abs/1312.4400, 2013. 2, 4, 6
10. C. Lu and X. Tang. Surpassing human-level face verification performance on LFW with gaussianface. CoRR,
abs/1404.3840, 2014. 1
11. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors.
Nature, 1986. 2, 4
12. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In S. Thrun, L. Saul, and B.
Schölkopf, editors, NIPS, pages 41–48. MIT Press, 2004. 2
13. T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression (PIE) database. In In Proc. FG,
2002. 2
14. Y. Sun, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. CoRR,
abs/1406.4773, 2014. 1, 2, 3
15. Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. CoRR,
abs/1412.1265, 2014. 1, 2, 5, 8
16. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
Going deeper with convolutions. CoRR, abs/1409.4842, 2014. 2, 3, 4, 5, 6, 10
17. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face
verification. In IEEE Conf. on CVPR, 2014. 1, 2, 5, 7, 8, 9
18. J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y. Wu. Learning fine-grained
image similarity with deep ranking. CoRR, abs/1404.4661, 2014. 2
19. K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor
classification. In NIPS. MIT Press, 2006. 2, 3
20. D. R. Wilson and T. R. Martinez. The general inefficiency of batch training for gradient descent learning.
Neural Networks, 16(10):1429–1451, 2003. 4
21. L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background
similarity. In IEEE Conf. on CVPR, 2011. 5
22. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901,
2013. 2, 3, 4, 6
66 | P a g e
23. Z. Zhu, P. Luo, X. Wang, and X. Tang. Recover canonicalview faces in the wild with deep neural networks.
CoRR, abs/1404.3543, 2014. 2
24. https://ijrar.org/papers/IJRAR1CNP010.pdf
25. https://ieeexplore.ieee.org/document/6640086
26. https://m2pfintech.com/blog/how-has-facial-recognition-revolutionized-identity-verification-and-authentication/
27. https://www.analyticsvidhya.com/blog/2023/12/the-deep-learning-revolution-in-facial-recognition-for-secure-
login-systems/
28. https://thesai.org/Downloads/Volume4No6/Paper_10-
Face_Recognition_as_an_Authentication_Technique_in_Electronic_Voting.pdf
29. https://www.hypr.com/security-encyclopedia/face-authentication
30. https://arxiv.org/pdf/2103.15144
31. [1] Hacıoglu, M ˘ ujgan & Karakas¸ Geyik, Seda. (2015). An Empiri- ¨ cal Research on General Internet Usage
Patterns of Undergraduate Students. Procedia - Social and Behavioral Sciences. 195. 895-904.
10.1016/j.sbspro.2015.06.369.
32. Meng, X. (2008). Study on the Model of E-Commerce Identity Authentication based on Multi-biometric
Features Identification. Proceedings-2008 ISECS International Colloquium on Computing, Communication,
Control, and Management, 200.
33. A. Alotaibi and A. Mahmmod, “Enhancing OAuth services security by an authentication service with face
recognition,” 2015 IEEE Long Isl. Syst. Appl. Technol. Conf. LISAT 2015, 2015,
doi:10.1109/LISAT.2015.7160208.
34. S. Khokad and V. Kala, ”A study of SLIDE Algorithm: Revolutionary AI Algorithm that Speeds Up Deep
Learning on CPUs,” 2020 International Conference on Smart Innovations in Design, Environment, Management,
Planning and Computing (ICSIDEMPC), AURANGABAD, 2020, pp. 188-191,
doi:10.1109/ICSIDEMPC49020.2020.9299644.
35. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.2016.
36. SAS. (2020). Machine Learning. https://www.sas.com/en us/insights/analytics/machinelearning.html#:
:text=Machine learning is a method,decisions with minimal human intervention.
37. Hargrave, M. (2020). Deep Learning. https://www.investopedia.com/terms/d/deeplearning.asp#: :text=Deep
learning is an AI,is both unstructured and unlabeled.
38. Luxand. (n.d.). Face recognition widget for your website. https://luxand.cloud/widget
39. IDentification, E. (n.d.). SmileID. https://www.electronicid.eu/en/solutions/smileid
40. T. Nyein and A. N. Oo, “University Classroom Attendance System Using FaceNet and Support Vector
Machine,” 2019 Int. Conf. Adv. Inf. Technol. ICAIT 2019, pp. 171–176, 2019, doi: 10.1109/AITC.2019.8921316.
41. B. Prihasto et al., “A survey of deep face recognition in the wild,” 2016 Int. Conf. Orange Technol. ICOT 2016,
vol. 2018- Janua, pp. 76–79, 2018, doi: 10.1109/ICOT.2016.8278983.
42. S. I. Serengil and A. Ozpinar, “LightFace: A Hybrid Deep Face Recognition Framework,” pp. 2–6, 2020.
43. O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep Face Recognition,” 2015, pp. 41.1- 41.12, doi:
10.5244/c.29.41.
44. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the gap to human- level performance in
face verification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2014, pp. 1701–1708, doi: 10.1109/CVPR.2014.220.
45. T. Baltrusaitis, P. Robinson, and L. P. Morency, “OpenFace: An open source facial be- havior analysis toolkit,”
in 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, 2016, pp. 1–10, doi:
10.1109/WACV.2016.7477553.
46. F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and
clustering,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2015, vol. 07-12-June, pp. 815–823, doi:10.1109/CVPR.2015.7298682.
67 | P a g e
47. “Geogia Tech Face Database.” http://www.anefian.com/research/face rec o.htm.
48. Mondal, S. K., Mukhopadhyay, I., Dutta, S. (2020). Review and Comparison of Face Detection Techniques.
Advances in Intelligent Systems and Computing
49. Zhao, Q., and Zhang, S. (2011). A face detection method based on corner verifying. 2011 International
Conference on Computer Science and Service System, CSSS 2011 - Proceedings, 2854–2857.
https://doi.org/10.1109/CSSS.2011.5974784
50. D. Sandberg, “Facenet,” 2018. https://github.com/davidsandberg/facenet.
51. H. Tania, “keras-facenet,” 2018. https://github.com/nyokimtl/keras-facenet.

68 | P a g e

Face Recog SRS
No ratings yet
Face Recog SRS
6 pages
Research Project MCA 3rd Sem
No ratings yet
Research Project MCA 3rd Sem
29 pages
Secure Online Payment
No ratings yet
Secure Online Payment
74 pages
Final Report Kist Final Year Project
No ratings yet
Final Report Kist Final Year Project
78 pages
Literarture Survey
No ratings yet
Literarture Survey
45 pages
Face Recognition
No ratings yet
Face Recognition
9 pages
Face Detection
No ratings yet
Face Detection
14 pages
Vicky Project Final
No ratings yet
Vicky Project Final
68 pages
My Project Face Dectet
No ratings yet
My Project Face Dectet
57 pages
Sameer - Face Recognition Using Python
No ratings yet
Sameer - Face Recognition Using Python
37 pages
New A Project Report-3
No ratings yet
New A Project Report-3
81 pages
Final Report
No ratings yet
Final Report
53 pages
Face Detection and Recognition System by Gurupdesh Singh
No ratings yet
Face Detection and Recognition System by Gurupdesh Singh
93 pages
MiniProject Report
No ratings yet
MiniProject Report
76 pages
John Paul
No ratings yet
John Paul
43 pages
Industrial Cybersecurity
From Everand
Industrial Cybersecurity
Anand Shinde
No ratings yet
Face Recognisation Project0
No ratings yet
Face Recognisation Project0
45 pages
Deep Yaaaa 777
No ratings yet
Deep Yaaaa 777
54 pages
Face Recognition Using AI Black Book Project For 3rd Year Students
No ratings yet
Face Recognition Using AI Black Book Project For 3rd Year Students
104 pages
312d4d2a c508 4add b031 F9a570ac47b0 Final Year Project Format
No ratings yet
312d4d2a c508 4add b031 F9a570ac47b0 Final Year Project Format
30 pages
Attendance
No ratings yet
Attendance
32 pages
Minor Review
No ratings yet
Minor Review
23 pages
Face Detection and Recognition System
No ratings yet
Face Detection and Recognition System
46 pages
Major Poject Report
No ratings yet
Major Poject Report
41 pages
Synopsis Formateoksg
No ratings yet
Synopsis Formateoksg
11 pages
Divya
No ratings yet
Divya
32 pages
Face Recognition
No ratings yet
Face Recognition
30 pages
Projectreport
No ratings yet
Projectreport
21 pages
Mini - Project Final Report
No ratings yet
Mini - Project Final Report
33 pages
New Repo
No ratings yet
New Repo
24 pages
Fineal Project
No ratings yet
Fineal Project
29 pages
Minor Project Samyak
No ratings yet
Minor Project Samyak
23 pages
Project
No ratings yet
Project
18 pages
Web Based Facial
No ratings yet
Web Based Facial
8 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
7 pages
Oodp Project Report
No ratings yet
Oodp Project Report
20 pages
Report
No ratings yet
Report
29 pages
Combined Minor
No ratings yet
Combined Minor
30 pages
Final Project 3
No ratings yet
Final Project 3
27 pages
Wa0005.
No ratings yet
Wa0005.
17 pages
Research Paper Final
No ratings yet
Research Paper Final
6 pages
New Text Document
No ratings yet
New Text Document
6 pages
Seminar Thesis
100% (1)
Seminar Thesis
12 pages
Chaitanya PDF
No ratings yet
Chaitanya PDF
2 pages
Student Attendance System Based On The Face Recognition of Webcam'S Image of The Classroom
100% (1)
Student Attendance System Based On The Face Recognition of Webcam'S Image of The Classroom
21 pages
Genuine Authentication: Centralized Infrastructure To Seamlessly Integrate Facial Recognition Capabilities Into Your Web Application
No ratings yet
Genuine Authentication: Centralized Infrastructure To Seamlessly Integrate Facial Recognition Capabilities Into Your Web Application
2 pages
30 Page Sample Format For Project Report of DT
No ratings yet
30 Page Sample Format For Project Report of DT
30 pages
Final Year Project Research Paper
No ratings yet
Final Year Project Research Paper
10 pages
2021 Zkteco Catalogue Europe
No ratings yet
2021 Zkteco Catalogue Europe
98 pages
Projece Report
No ratings yet
Projece Report
27 pages
A Project Report
No ratings yet
A Project Report
22 pages
Wa0007.
No ratings yet
Wa0007.
8 pages
Synp Face
No ratings yet
Synp Face
16 pages
DL MiniProject
No ratings yet
DL MiniProject
27 pages
Facial Recognition Attendence System
No ratings yet
Facial Recognition Attendence System
45 pages
Mini Project Report - Group 3
No ratings yet
Mini Project Report - Group 3
30 pages
Prash .1
No ratings yet
Prash .1
4 pages
Report Final
No ratings yet
Report Final
5 pages
Introduction Frs
No ratings yet
Introduction Frs
5 pages
Biometric Devices Paper BB
0% (1)
Biometric Devices Paper BB
3 pages
Synopsis: PROJECT TITLE: "Securing Online Transaction Using Face Recognition"
No ratings yet
Synopsis: PROJECT TITLE: "Securing Online Transaction Using Face Recognition"
7 pages
MINIPROJECT A3 Sheet
No ratings yet
MINIPROJECT A3 Sheet
1 page
LG V30 User-Guide
No ratings yet
LG V30 User-Guide
198 pages
Real Time Face Detection Report
No ratings yet
Real Time Face Detection Report
78 pages
Divya Girase
No ratings yet
Divya Girase
12 pages
DS-K1T671 Series Face Recognition Terminal: Available Model
No ratings yet
DS-K1T671 Series Face Recognition Terminal: Available Model
4 pages
Criminal Identification Using Raspberry Pi: Vivek Kalaskar, Amit Vajrashetti, Swanand Zarekar, Ramij Shaikh
No ratings yet
Criminal Identification Using Raspberry Pi: Vivek Kalaskar, Amit Vajrashetti, Swanand Zarekar, Ramij Shaikh
3 pages
2nd Review.1702579061232
No ratings yet
2nd Review.1702579061232
24 pages
Surveillance System With Human Intrusion Detection
No ratings yet
Surveillance System With Human Intrusion Detection
9 pages
Comparative Analysis of Transfer Learning CNN For Face Recognition
No ratings yet
Comparative Analysis of Transfer Learning CNN For Face Recognition
6 pages
ETHICS Long Quiz
No ratings yet
ETHICS Long Quiz
7 pages
Hirevue Interview
No ratings yet
Hirevue Interview
3 pages
Salu Project Report
No ratings yet
Salu Project Report
93 pages
Automatic Player Face Detection and Recognition For Players in Cricket Games - PPT 3
No ratings yet
Automatic Player Face Detection and Recognition For Players in Cricket Games - PPT 3
38 pages
IOT CP Presentation
No ratings yet
IOT CP Presentation
13 pages
Machine Learning Approach For Face Recognition Based Attendance System
No ratings yet
Machine Learning Approach For Face Recognition Based Attendance System
6 pages
Affectiva: Facial Expression and Gestures
No ratings yet
Affectiva: Facial Expression and Gestures
5 pages
Criminal Identification Using Raspberry Pi: Vivek Kalaskar, Amit Vajrashetti, Swanand Zarekar, Ramij Shaikh
No ratings yet
Criminal Identification Using Raspberry Pi: Vivek Kalaskar, Amit Vajrashetti, Swanand Zarekar, Ramij Shaikh
3 pages
VK12 Pro
No ratings yet
VK12 Pro
11 pages
Virtual Employee Monitoring A Review On Tools Opportunities Challenges and Decision Factors
No ratings yet
Virtual Employee Monitoring A Review On Tools Opportunities Challenges and Decision Factors
14 pages
Abhishek G.S Report
No ratings yet
Abhishek G.S Report
31 pages
Robotic Products FR
No ratings yet
Robotic Products FR
19 pages
Case Study The Bank Robbery
No ratings yet
Case Study The Bank Robbery
2 pages
Face Mask Detection Software
No ratings yet
Face Mask Detection Software
2 pages
Person Authentication For Restricted Areas Using Number Plate Detection and Face Recognition
No ratings yet
Person Authentication For Restricted Areas Using Number Plate Detection and Face Recognition
10 pages
Moot-Problem-2025 240727 110833
No ratings yet
Moot-Problem-2025 240727 110833
18 pages
Role of Artificial Intelligence and Its Impact On The Tourism Industry of India
No ratings yet
Role of Artificial Intelligence and Its Impact On The Tourism Industry of India
13 pages
ANSYS Workbench 2023 R2: A Tutorial Approach, 6th Edition
From Everand
ANSYS Workbench 2023 R2: A Tutorial Approach, 6th Edition
Prof. Sham Tickoo
No ratings yet
Technology Invansion
No ratings yet
Technology Invansion
6 pages
Smartface Platform: Datasheet
No ratings yet
Smartface Platform: Datasheet
7 pages