0% found this document useful (0 votes)

68 views33 pages

MasterThesis V0

This document provides an overview of facial recognition systems, including definitions, applications, challenges, evaluation metrics, and the evolution of algorithms. It discusses the three main steps of facial recognition systems: face detection, feature extraction, and face recognition. For face detection, it covers popular classical and deep learning algorithms as well as challenges such as lighting, pose, and facial expressions. Common metrics for evaluating face detection models include IoU, precision, recall, PR curves, ROC curves, average precision, and mean average precision.

Uploaded by

AGGOUNE LINA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views33 pages

MasterThesis V0

Uploaded by

AGGOUNE LINA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

‫الجمهورية الشعبية الديمقراطية الجزائرية‬

République Algérienne Démocratique et Populaire

‫وزارة التعليم العالي و البحث العلمي‬
Ministère de l’Enseignement Supérieur et de la Recherche Scientifique ‫المدرسة العليا‬

‫ بسيدي بلعباس‬1945٠ ‫ ماي‬80٠ ‫لإلعالم اآللي‬

École Supérieure en Informatique
-08 Mai 1945- Sidi Bel Abbès

THESIS
To obtain the diploma of Master
Field: Computer Science
Specialty: Information System and Web development
”Systèmes d’Information et Web (SIW)”

Theme

Facial recognition system

Presented by:
AGGOUN LINA
LAOUEDJ SARAH
Supervised by:
BENSLIMANE Sidi Mohammed
BOUSMAHA Rabab
Academic Year : 2022/2023

State of Art
Definition:

Face recognition system is a popular study task in the field of image processing and
computer vision, owing to its potentially enormous application as well as its theoretical
value. This system is widely deployed in many real-world applications such as security,
surveillance, homeland security, access control, image search, human-machine, and
entertainment. However, these applications pose different challenges such as lighting
conditions and facial expressions.
the characteristics that make a face recognition system useful are the following: its ability to
work with both videos and images, to process in real time, to be robust in different lighting
conditions, to be independent of the person (regardless of hair, ethnicity, or gender), and to
be able to work with faces from different angles, Different types of sensors, including RGB,
depth, EEG, thermal, and wearable inertial sensors, are used to obtain data. These sensors
may provide extra information and help the face recognition systems to identify face images
in both static images and video sequences.
Three basic steps are used to develop a robust face recognition system: face detection,
feature extraction, and face recognition. The face detection step is used to detect and locate
the human face image obtained by the system. The feature extraction step is employed to
extract the feature vectors for any human face located in the first step. Finally, the face
recognition step includes the features extracted from the human face in order to compare it
with all template face databases to decide the human face identity.
1. Face detection:

Face Detection is a Computer Vision task in which a computer program can detect
the presence of human faces and also find their location in an image or a video
stream. This is the first and most crucial step for most computer vision applications
involving a face.

1.1 Applications of Face Detection:

Nowadays, Face Detection is being used in a huge number of domains, including Security, Marketing,
Healthcare, Entertainment, Law Enforcement, Surveillance, Photography, Gaming, Video
Conferencing, etc. Let’s look at some specific use cases.

1.1.1 Camera Autofocus

In 2006, an early form of facial feature detection was introduced in digital cameras to aid in
autofocus. Since then, almost all digital cameras include some facial detection mode to detect the
faces in the camera frame and keep them focused.
1.1.2 Face Recognition
The most popular application of Face Detection is Face Recognition. In any Face Recognition system,
detecting the Face is the primary step. The features of the detected face can be used to associate the
face with a person for recognition.

1.1.3 Gender Classification

Once we have the detected face region, we can use a classification model on top of that to
distinguish between males and females.

1.1.4 Landmark Detection

Many face applications utilize the location of landmarks of the face, such as the eyelids, corner points
of the lips, or tip of the nose. These landmarks are localized within the face region we get from the
face detector.
1.1.5 Attendance
Facial Detection can be used to find the number of people in a classroom or in an event to note the
strength of people present.

1.1.6 Snapchat/Instagram camera filters

Most of the camera filters on social media applications are built on top of and are made possible with
Face Detection.

1.1.7 Crowd Analysis

Facial Detection can measure the crowd’s strength and density in a public space for crowd analysis.
1.2 Challenges of Detecting a Face
Numerous things hinder the performance of a Face Detector.

1.2.1 Occlusion
Occlusion greatly affects the ability of any system to detect the face as only a part of the face is
visible, and it is hard to say with confidence whether there is a face in the frame when only part of it
is visible.

1.2.2 Lighting
Any change in the subject’s lighting conditions poses an issue for face detection as it is not necessary
that the method is designed/trained to handle the variation in the lighting.

1.2.3 Skin Color

Skin color in facial detection has always been a topic of discussion, as it is found that some of the face
detectors were biased toward some skin colors.

Also, a particular skin color might behave differently in various lighting conditions than any other skin
color, bringing an added challenge to the detection system.
1.2.4 Pose
The pose or orientation of a face in the image frame affects the performance of the Face detector, as
some methods can only detect frontal faces and fail when the face is sideways or turned slightly to
one side.

1.2.5 Facial Expressions

Facial expressions should be taken care of when designing the features of a face or training a deep
learning model, as the face is unlikely to always be neutral in the real world. Any change in the
expressions of the face would mean the features of the face would change, and the detection system
might not consider it a real face.

1.2.6 Accessories/Makeup/Facial Hair

The accessories used, facial hair or modifications done on faces might also affect the performance of
the Face Detection system if they are not taken into account while designing or training the Face
Detector. Sunglasses, Face masks, Beards, Tattoos, and Dramatic makeup are a few examples.
1.2.7 Scale of Face
The scale of the face might change with respect to the image/video frame, and depending on the
facial detection system, the face might be too small to be detected.

1.3 Metrics Used for Evaluating Face Detection Models

The metrics used in Facial Detection are the same as any other object detection problem. The
popular metrics used are:

1.3.1 IoU – Intersection over Union

Intersection over Union (IoU) is a metric that quantifies the degree of overlap between two regions.
IoU metric evaluates the correctness of a prediction. The value ranges from 0 to 1. With the help of
the IoU threshold value, we can decide whether a prediction is True Positive, False Positive, or False
Negative.

1.3.2 Precision
Precision measures the proportion of predicted positives that are correct. If you are wondering how
to calculate precision, it is simply the True Positives out of total detections. Mathematically, it’s
defined as follows:

P = TP/(TP+FP) = TP/Total Predictions

The value ranges from 0 to 1.

1.3.3 Recall
Recall measures the proportion of actual positives that were predicted correctly. It is the True
Positives out of all Ground Truths. Mathematically, it is defined as follows.

R = TP/(TP+FN) = TP/Total Ground Truths

Similar to Precision, the value of Recall also ranges from 0 to 1.

1.3.4 PR Curve – Precision-Recall Curve
The Precision-Recall Curve is a plot with Precision on the y-axis and recall on the x-axis. It shows the
precision as a recall function for all different threshold values.

1.3.5 ROC Curve – Receiver Operating Characteristic

The Receiver Operating Characteristic (ROC) curve is a plot that shows the performance of a model as
a function of its cut-off threshold (similar to the precision-recall curve).

It essentially shows the Recall against the false positive rate (FPR) for various threshold values.

1.3.6 AP – Average Precision

Interestingly, Average Precision (AP) is not the average of Precision (P). The term AP has evolved with
time. For simplicity, we can say that it is the area under the precision-recall curve.

The area under the curve is used to summarize the performance of a model into a single measure. It
is important when comparing the performance of different models. A model with a high AUC can
occasionally score worse in a specific region than another model with a lower AUC. But in practice,
the AUC performs well as a general measure of predictive accuracy.
1.3.7 MAP – Mean Average Precision
As the name suggests, Mean Average Precision or mAP is the average of AP over all detected classes
in multiclass object detection

mAP = 1/n * sum(AP), where n is the number of classes.

To arrive at the mAP, while evaluating a model, Average Precision (AP) is calculated for each class
separately.

1.4 Evolution Timeline of Face Detection Algorithms

The figure below highlights the important Face Detection algorithms over time. This is NOT an
exhaustive list by any means. Please let us know in the comments section if you want us to include
any other models.

1.5 Classical Algorithms of Face Detection:

1.5.1 Haar cascades (2001) :
The algorithm proposed by Viola & Jones in their paper “Rapid Object Detection using a Boosted
Cascade of Simple Features” and refined for face detection in “Robust Real-Time Face Detection”
was one of the first real-time object (and face) detection method. The algorithm is based on three
main concepts: Haar features and the integral image, feature selection via Adaboost and the
attentional cascade. To summarize, a sliding window is passed over the image on which face
detection is performed and for each of the sliding window’s position, features are generated from
the pixels in the sliding window. Then these features are passed through a cascade of classifiers to
identify if these pixels correspond to a face. I will now enter into some more details.

1.5.2 DLib-HOG (2005) :

HOG (Histogram of Oriented Gradients) is a widely used method for object detection, including face
detection. It's based on the idea of capturing the object's structural information through the gradient
orientation of the image pixels.

Here's a high-level overview of how the HOG method works for face detection:

1. Pre-processing: The input image is first pre-processed to adjust the brightness and contrast.

2. Gradient Computation: The gradient information of the image is computed to capture the
edge information of the image.

3. Cell-level Histogram: The gradient information is divided into small cells and a histogram of
gradient orientations is computed for each cell.

4. Block-level Histogram: The histograms from several cells are combined to form a block-level
histogram, which is used to capture the structural information of the object.

5. Normalization: The block-level histograms are normalized to account for changes in

illumination and shadow.

6. SVM Classifier: The normalized histograms are used to train a Support Vector Machine (SVM)
classifier to distinguish between faces and non-faces.

7. Sliding Window Detection: A sliding window approach is used to scan the image, where the
classifier is applied on each window to detect the presence of a face.

Overall, HOG-based face detection is computationally efficient and has shown good performance in
many applications.

1.5.3 Deep Learning Based Face Detectors:

With all these face detectors discussed above doing their job, do we really need newer face-
detection techniques? The answer is yes. While they may provide decent accuracy, the speed is
found wanting.

 A classical Face-Detection technique might fail to detect a face in a few frames, which may
lead to the application not performing as desired or cause complications in the system.

 Even if the faces are detected in every frame, the process might take too long. This slows
down the application and, at times, robs it of its whole essence.

No wonder we needed to switch to newer state-of-the-art Face Detectors. These provide high
accuracy (such that no face goes undetected) at very high speeds and can also be used in
microprocessors with low computing power.

1.5.4 SSD (Dec 2015):

Single Shot Multibox Detector, the method’s name reveals most of the details about the model. The
SSD model detects the object in a single pass over the input image. Unlike other models, which
traverse the image more than once to get an output detection.
SSD architecture

The SSD model is made up of 2 parts, namely

The Backbone model: The Backbone model is a typical pre-trained image classification network that
works as the feature map extractor. Here, the image final image classification layers of the model are
removed to give us only the extracted feature maps.

The SSD head: SSD head is made up of a couple of convolutional layers stacked together, and it is
added to the top of the backbone model. This gives us the output as the bounding boxes over the
objects. These convolutional layers detect the various objects in the image.

1.5.5 MTCNN (April 2016):

A more recent model, MTCNN, stands for Multi-Task Cascaded Convolutional Neural Network.
Published in 2016 by Zhang et al., this commonly used model consists of neural networks connected
in a cascade fashion.

The proposed MTCNN architecture consists of three stages of CNNs. In the first stage, P-Net
(Proposal Network), it produces candidate windows quickly through a shallow CNN. Then in the R-
Net (Refine Network) stage, it refines the windows by rejecting many non-face bounding boxes
through a more complex CNN. Finally, the O-Net (Output Network) stage uses a more powerful CNN
to refine the result again and output five facial landmarks positions.

Though an accurate model, it isn’t fast enough for real-time applications.

1.5.6 Dual Shot Face Detector (April 2019):

Dual Shot Face Detector is a novel Face Detection approach that addresses the following three major
aspects of Facial Detection:

1. Better feature learning

Feature Enhance Module (FEM)

DSFD involves a Feature-Enhance Module (FEM) that enhances the originally received feature maps,
thus extending the single shot detector to a dual shot detector. This module helps incorporate the
current layer’s information along with the feature maps of the previous layers and maintains a
context relationship between the anchors. This helps obtain more discriminate and robust features.

2. Progressive loss design – Loss functions such as Focal Loss and Hierarchical Loss
address the class-imbalance problem and consider original and enhanced learning features,
respectively. However, they are not equipped to progressively learn the feature maps at
different levels and shots. DSFD involves a Progressive Anchor Loss (PAL) computed by two
sets of anchors. It assigns smaller anchor sizes in the first shot and larger ones in the second.
This helps facilitate the features effectively.
3. Anchor assign-based data augmentation – Anchors are generated for each
feature map. Some research involves strategies to increase positive anchors. Such a strategy
ignores the random sampling in data augmentation, resulting in an imbalance between
positive and negative anchors. DSFD uses Improved Anchor Matching (IAM), which involves
anchor-based data augmentation. This provides a better match between the anchors and
ground truth and leads to better initialization for the face-box regressor.

All the above-mentioned aspects are mutually exclusive and can work simultaneously to improve
performance. As you can see, all these techniques relate to a two-stream design, so it has been
named Dual Shot Face Detector. It has the ability to remain robust even under variations in
illumination, pose, scale, occlusion, etc.

When introduced, it achieved state-of-the-art results on the WIDER Face dataset.

Easy Medium Hard

Validation Set 96.6 95.7 90.4
Test Set 96 95.3 90
AP (%) of DSFD on WIDER Face splits

1.5.7 RetinaFace (May 2019):

RetinaFace is a practical single-stage SOTA (State Of The Art) face detector initially introduced in the
arXiv technical report and then accepted by CVPR 2020. It is a part of the InsightFace project from
DeepInsight, which is also credited with many more top Face-Recognition techniques like ArcFace,
SubCenter ArcFace, PartialFC, and multiple facial applications too.

RetinaFace detects 900 faces (at a threshold of 0.5) out of 1151 people

It takes pixel-wise face localization to the next level. RetinaFace cleverly takes advantage of extra-
supervised and self-supervised multi-task learning to perform face localization on various scales of
faces, as seen in the above figure.

The idea behind the RetinaFace Face-Detection technique

Many recent state-of-the-art methods focus on single-stage face detection techniques, which
densely sample face locations and scales on feature pyramids. Such a technique provides better
performance at a faster speed compared to two-stage methods.
RetinaFace improves this single-stage framework by:

 Exploiting multi-task losses coming from strongly supervised and self-supervised signals.

 Employing a multi-task learning strategy to simultaneously predict the face score, face box,
five facial landmarks, and 3D position and correspondence of each face pixel.

The multitask loss function used by RetinaFace includes the following losses:

1. Face classification loss is a softmax loss for binary classes (face/not face).

2. Face box regression loss – The target bounding boxes are normalized and are in the format
[(x_center, y_center, width, height]).

3. Facial landmark regression loss – This regression technique also normalizes the target.

4. Dense regression loss – Supervised signals increase the significance of better face box and
landmark locations.

It achieves state-of-the-art results on the WIDER Face dataset.

Easy Medium Hard

Validation Set 96.9 96.1 91.8
Test Set 96.3 95.6 91.4
AP(%) of RatinaFace on WIDER Face splits

1.5.8 MediaPipe (June 2019):

A framework for building perception pipelines that perform inferences over arbitrary sensory data,
MediaPipe includes images, video streams, and audio data.

It can be used to rapidly prototype perception pipelines with reusable components and in
production-ready Machine Learning applications.

ML solutions offered by MediaPipe

MediaPipe provides an ultrafast Face Detection solution that is based on BlazeFace.

 It uses a lightweight feature extractor inspired by the MobileNet model and a GPU-friendly
anchor scheme modified from Single Shot Multibox Detector (SSD).

 It also replaces Non-Maximum Suppression with an improved tie-resolution strategy.

The model can detect 6 facial landmarks.

It provides a JavaScript API to implement Facial Detection on the web and an API to include it on
Android, iOS, and Desktop applications.

1.5.8 YuNet (Oct 2021):

Traditionally OpenCV face detection was equipped with the face detectors like Haar
cascades and HOG detectors that worked well for frontal faces but failed otherwise. The recent
release of OpenCV (4.5.4 Oct 2021) saw the addition of a face detection model called YuNet that
solves this problem.

It is a CNN-based face detector developed by Chengrui Wang and Yuantao Feng. It is a very
lightweight and fast model. With a model size of less than an MB, it can be loaded on almost any
device. It adopts mobilenet as its backbone and contains 85000 parameters in total.

The main and well-known repository, libfacedetection, takes YuNet as the detection model and

offers pure C++ implementation without dependence on DL frameworks, and reaches a detection
rate of 77.34 FPS for 640 × 480 images to 2,027.74 FPS for 128 × 96 images on an INTEL i7-1065G7
CPU at 1.3 GHz

It achieves a respectable score on the validation set of the WIDER Face dataset for such a lightweight
model.

Easy Medium Hard

Validation Set 0.856 0.842 0.727
AP (%) of YuNet on the WIDER Validation set

1.5.9 Dlib CNN:

The Convolutional neural network (CNN) is a neural feed system that is used primarily for computer
vision. They give a dense neural network component as well as an artificial picture pre-treatment.
CNN's are special kinds of neural networks for grid-like data analysis. The visual system of animals
inspires the construction of the CNN.

In previous methods, much of the work involved selecting filters to establish the characteristics such
that as much detail could be extracted from the image as possible. This work can now be automated
with increasing profound understanding and greater computational skills. The CNNs are called since
the original image data is combined with a series of filters. The number of filters to be applied is the
parameter to be selected and the filtration dimension. The filter dimension is known as the step
length. Typical phase values range from 2 to 5. In this particular case, the output of the CNN is a
binary classification that takes value 1 when a face exists, otherwise value 0. A paper on Max-Margin
Object Detection (MMOD) is also implemented for improved performance. This model works with
various facial orientations and is occlusion sturdy and its training method really fast. But is very slow
and cannot detect smaller faces since they are specialized in the size of 80 to 80 faces. You must then
ensure that the face size of the submission is larger than that. However, for smaller faces, you should
train your own facial detector.

1.5.10 Open CV – DNN:

It's a Caffe model built upon the Single Shot-Multibox Detector (SSD) and its foundation is ResNet-10.
It was launched in its deep neural network (DNN) module after OpenCV 3.3. [12] A quantified version
of TensorFlow is also available, but we are going to use the Caffe model. This model is very reliable,
works on CPU in real-time, works well on various facial orientations, works even with significant
occlusion and can even detect faces of different sizes.

OpenCV provides 2 models for this face detectors :

• Floating-point 16 version of the original Caffe implementation (5.4 MB)

• The 8-bit quantized version using TensorFlow (2.7 MB)

1.6 Comparison of Face Detectors

The following table presents a comparison of all the above Face-Detection models based on their
inference speed in Frames Per Second (FPS) and Average Precision (AP).

Performance Comparison of Face Detectors (Speed/FPS)

Model FPS (On Colab GPU) FPS (On Colab CPU)

Haar cascade - 19.95
Dlib - 33.92
SSD 19.90 15.58
MTCNN 2.11 1.81
MediaPipe 323.63 225.34
RetinaFace Resnet50 72.24 1.43
RetinaFace MobilenetV1 69.50 28.89
Dual Shot Face Detector 18.89 0.22
YuNet - 49.43

Comparison of the models, based on FPS

Comparison graph of the model, based on FPS

Performance Comparison of Face Detectors (Average Precision)

Model [email protected]
SSD 0.931
MTCNN 0.915
MediaPipe 0.743
RetinaFace Resnet50 0.994
RetinaFace MobilenetV1 0.994
Dual Shot Face Detector 0.989
YuNet 0.994
Comparison of the models, based on AP

Comparison graph of the models, based on AP

1.7 Inference Comparison under Various Conditions

Let’s compare the inference results for all methods in different conditions that affect the detections.
1.7.1 Face Accessories and Make-up

RetinaFace-Resnet50, YuNet, and DSFD work perfectly and are not affected, while the other models
fail in multiple cases, with Haar Cascades and DLib-HOG performing the worst, as they have hand-
crafted features.

1.7.2 Facial Expressions

Virtually all face detection methods discussed above work well for faces with different expressions.
Haar Cascade misses one face, which is expected as the face is tilted, and the hand-crafted features
don’t consider such wide variations in facial features.

1.7.3 Lighting Conditions

In different lighting conditions, MTCNN, DLib-HOG, and Haar Cascades perform the worst, failing in
two or more images. Many methods fail for the third image as only half of the facial features are
visible for detection.
1.7.4 Occlusion
MTCNN, DLib-Hog, and Haar Cascades fail miserably to detect occluded faces. Other methods
manage to detect faces in all the images.

1.7.5 Pose
DSFD and RetinaFace-Resnet50 win the race for detecting faces in different poses, with YuNet
performing respectably.

1.7.6 Scale of face

Both the RetinaFace models and DSFD take the lead here, detecting even the tiniest of faces.
Interestingly MediPipe is greatly affected by changes in the scale of faces and misses most of them.
1.7.7 Skin color
It’s good to see almost all the methods working well to detect faces of different skin colors.

1.8 Choosing the Best Face Detection Model

After discussing all the above methods, which one should you be using? Choosing the model that
best suits you will depend on the requirements of your particular application. Below are the three
conditions that might define your requirements.

1.8.1 Best Detection Accuracy

If you want the best-in-class detection accuracy and don’t want to miss any faces,
then DSFD or Retinaface-resnet50 model is what you should go for.

Remember that it will be very slow and won’t make sense for real-time inference.

1.8.2 Best Detection Speed

If you want the utmost inference speed and don’t mind missing faces in uncontrolled conditions,
then MediaPipe’s face detection solution is what you want.

1.8.3 Best Overall – Balanced speed and accuracy

If a good balance of speed and performance is what you are after, you should check out
the YuNet and RetinaFace-Mobilenetv1 models. Both are very fast models with real-time inference
speed while still maintaining decent accuracy.

2. The feature extraction:

step is employed to extract the feature vectors for any human face located in the first step (face
detection) . it represents a face with a set of features vector called a “signature” that describes the
prominent features of the face image such as mouth, nose, and eyes with their geometry distribution

The goal of feature extraction is to convert high-dimensional, complex data into a set of compact and
informative features that can be easily analysed and used to build predictive models

Several techniques involve extracting the shape of the mouth, eyes, or nose to identify the face using
the size and distance, HOG , Eigenface, independent component analysis (ICA), linear discriminant
analysis (LDA) , scale-invariant feature transform (SIFT) , gabor filter, local phase quantization (LPQ) ,
Haar wavelets, Fourier transforms , and local binary pattern (LBP) techniques are widely used to
extract the face features

2.1 techniques are used to extract the face features

2.1.1 Eigenface and PCA :
PCA is a linear dimensionality reduction technique that is widely used for feature extraction. It
transforms the data into a new, lower-dimensional space by projecting it onto a set of orthogonal
basis vectors, called principal components. The principal components are determined by the
eigenvectors of the covariance matrix of the data, and they capture the most important patterns or
variations in the data.

Eigenfaces, is a specific application of PCA to face images. In this method, a set of face images is
treated as a matrix, and PCA is applied to obtain the principal components, which are used to
represent the faces in a lower-dimensional space. The eigenfaces correspond to the eigenvectors of
the covariance matrix and capture the most important features of the faces, such as facial
expressions and lighting conditions.

In both PCA and eigenfaces, the goal is to convert high-dimensional, complex data into a set of
compact and informative features that can be easily analyzed and used for tasks like classification
and recognition.

Example of dimensional reduction when applying principal component analysis (PCA)

2.1.2 independent component analysis (ICA): It is similar to principal

component analysis (PCA), but instead of finding a linear transformation of the data that maximizes
variance, ICA finds a linear transformation that maximizes non-, ICA has many applications in signal
processing, image analysis, and machine learning. For example, it can be used to separate mixed
signals, like audio signals or brain signals, into their independent components. It can also be used to
extract independent features from high-dimensional data, like images or text.

2.1.3 Fisherface linear discriminant analysis (LDA) : The Fisherface method is

based on the same principle of similarity as the Eigenfaces method. The objective of this method is to
reduce the high dimensional image space based on the linear discriminant analysis (LDA) technique
instead of the PCA technique. The LDA technique is commonly used for dimensionality reduction and
face recognition [66]. PCA is an unsupervised technique, while LDA is a supervised learning technique
and uses the data information

2.1.4 scale-invariant feature transform (SIFT) : SIFT is an algorithm used to

detect and describe the local features of an image. The main idea of the SIFT descriptor is to convert
the image into a representation composed of points of interest. These points contain the
characteristic information of the face image. SIFT presents invariance to scale and rotation. It is
commonly used today and fast, which is essential in real-time applications, to realize this algorithm
we have four steps:

1 detection of the maximum and minimum points in the space-scale

2 location of characteristic points

3 assignment of orientation

4 a descriptor of the characteristic point.

2.1.5 gabor filter : Gabor filters are designed to be similar to the receptive fields of simple
cells in the primary visual cortex of the brain. They provide a way to analyze images by representing
texture, orientation, and frequency information.

In practical applications, a Gabor filter is constructed as a wavelet function that is modulated by a

Gaussian envelope. The wavelet component of the filter allows it to capture information about
texture and frequency, while the Gaussian component provides spatial localization. The parameters
of the Gabor filter, such as the frequency, orientation, and scale, can be adjusted to match the
features of interest in the image.

By varying the Gabor parameters new filters are generated and we get closer to the real face
2.1.6 local binary pattern (LBP) : is a texture-based method used in computer
vision and image processing for feature extraction and representation. The idea behind LBP is
to create a unique binary pattern for each pixel in an image by comparing the value of the
center pixel to the values of its surrounding pixels. The resulting binary patterns capture the
spatial relationships between the pixels in the image, and can be used to identify and classify
different textures. LBP has been extended to include rotation invariant, uniform, and multi-scale
variations, which can improve its performance in certain applications.

2.2. Comparison between techniques of face features

AUTHOR TECHNIQUE DATABASE LIMITATION ADVANTAGE RESULT

USED
Lenc et al. SIFT LFW Still far to be Sufficiently 97.30%
AR perfect robust on 95.80%
FERET lower quality 98.04%
real data

Annalakshmi ICA and LDA LFW Sensitivity Good accuracy 88%

et al
Annalakshmi PCA and LDA LFW Sensitivity Specificity 59%
et al
Gowda et al. LPQ and LDA MEPCO Computation Good 99.13%
time accuracy
Perlibakas PCA and FERET Precision Pose 87.77%
and Vytautas Gabor filter
Hafez et al. Gabor filter ORL Pose Good 98.33% C.
and LDA C. YaleB recognition 99.33%
performance
Zhang et al. PCA YALE Recognition Reduce the 84.21%
rate dimensionality
84.21%
Khoi et al LBP ORL Sensitivity to -Robustness to LBP
noise shape changes depends on
-Invariance to several
illumination factors
-
Computational
efficiency

3 Face recognition:
This step considers the features extracted from the background during the feature extraction step
and compares it with known faces stored in a specific database. There are two general applications of
face recognition, one is called identification and another one is called verification. During the
identification step, a test face is compared with a set of faces aiming to find the most likely match.
During the identification step, a test face is compared with a known face in the database in order to
make the acceptance or rejection decision. Convolutional neural network (CNN), k-nearest neighbour
(K-NN), DeepFace, VGG-FACE, FaceNet and Siamese neural network are known to effectively address
this task.
3.1 Techniques are used for face recognition
3.1.1Convolutional Neural Networks (CNNs): CNNs are a popular deep learning architecture for
image classification and object detection tasks, including face recognition.

In face recognition, a CNN takes an input face image and applies multiple convolutional and pooling
layers to extract meaningful features from the image. These features are then fed into fully
connected layers to produce a prediction of the identity of the person in the face image.

One common approach for face recognition using CNNs is to train a network to predict a compact
representation, called an embedding, for each face image. The embeddings are learned such that
similar faces have similar embeddings. At test time, the embedding for a query face image can be
compared to the embeddings for a set of reference face images to find the closest match.

Another approach is to use a CNN to directly predict the identity of the person in a face image, by
training the network to predict a probability distribution over a set of identities. At test time, the
network outputs a prediction for the identity of the person in the query face image.

3.1.2 K-Nearest Neighbors (KNN): KNN is a simple machine learning algorithm that can be used for
face recognition tasks. In KNN-based face recognition, each face image is represented as a feature
vector, which captures the important information about the face.

At test time, the feature vector for a query face image is compared to the feature vectors of a set of
reference face images. The closest K reference face images to the query face image are selected as
the "nearest neighbors", and the identity of the person in the query face image is predicted based on
the majority class of the K nearest neighbors.

KNN is simple to implement and can be used for face recognition tasks with limited computational
resources. However, KNN can be sensitive to the choice of K and the feature representation used,
and may not perform as well as more advanced machine learning algorithms such as Convolutional
Neural Networks (CNNs) in terms of accuracy and robustness to variations in lighting, pose, and
expression.

In recent years, KNN has mainly been used as a baseline comparison method for evaluating the
performance of more advanced face recognition algorithms. However, KNN can still be a useful tool
for face recognition in certain scenarios, such as real-time face recognition on small datasets with
limited computational resources.
3.1.3 DeepFace: DeepFace is a face recognition system developed by Facebook in 2014, based on
deep learning algorithms. It was one of the first deep learning-based face recognition systems to
achieve human-level accuracy on standard benchmarks.

DeepFace uses a Convolutional Neural Network (CNN) to extract features from a face image and
produce a compact representation, called an embedding, for each face. The embeddings are learned
such that similar faces have similar embeddings.

At test time, the embedding for a query face image can be compared to the embeddings for a set of
reference face images to find the closest match. The embeddings can also be used for other face
recognition tasks, such as face verification (determining if two face images belong to the same
person) and face clustering (grouping similar face images together).

3.1.4 VGG-Face: VGG Face is a deep convolutional neural network architecture for face recognition
that was developed by researchers at the Visual Geometry Group (VGG) at the University of Oxford.
The architecture is based on the VGGNet architecture, which was developed for the ImageNet Large
Scale Visual Recognition Challenge (ILSVRC) and achieved state-of-the-art performance in image
classification tasks.

In VGG Face, the network takes an input face image and applies multiple convolutional and pooling
layers to extract meaningful features from the image. These features are then fed into fully
connected layers to produce a compact representation, called an embedding, for each face. The
embeddings are learned such that similar faces have similar embeddings.

3.1.5 FaceNet: FaceNet is a deep learning-based face recognition system developed by researchers at
Google. It was introduced in 2015 and achieved state-of-the-art performance on standard
benchmark datasets for face recognition at the time.

FaceNet uses a Convolutional Neural Network (CNN) to extract features from a face image and
produce a compact representation, called an embedding, for each face. The embeddings are learned
such that similar faces have similar embeddings. This is achieved by training the network to minimize
the Euclidean distance between the embeddings of similar faces and maximize the distance between
the embeddings of dissimilar faces.

3.1.6 Siamese Neural Networks (SNNs): SNNs are a type of deep learning architecture that can be
used for face recognition tasks. SNNs are designed to compare the similarity of two inputs and
determine if they are the same or not.
In the context of face recognition, a Siamese Network can be trained to compare the similarity of two
face images and determine if they belong to the same person. During training, the network is
presented with pairs of images, one of which contains a face of a known individual, and the other
contains a face of a different individual or no face at all. The network then learns to compare the two
images and determine if they are similar or not.

At test time, the SNN can be used to compare a query face image to a set of reference face images.
The network computes a similarity score between the query and each reference image, and the
reference image with the highest similarity score is considered the match.

Siamese Networks have been found to be effective for face recognition tasks, as they can learn to
compare the unique features of faces and determine their similarity, even if the faces are presented
at different scales, rotations, and poses.

Method Face Recognition (%)

CNN 99,83
KNN 90-95
DeepFace 97,35
VGG-Face 99,13
FaceNet 99,63
SNN 99,65

In summary, the choice of face recognition technique depends on the specific requirements and
constraints of the task at hand which is recognizing employees in a meeting room as well as on the
needs and demands of the company. However, deep learning-based approaches such as CNNs and
siamese networks have become increasingly popular in recent years due to their ability to learn high-
level features directly from face images and achieve high accuracy on a wide range of face
recognition benchmarks.

4. Databases Used :
Many databases containing information that enables the evaluation of face recognition systems are

available on the market. However, these databases are generally adapted to the needs of some

specific recognition algorithms, each of which has been constructed with various image acquisition

conditions (changes in illumination, pose, facial expressions) as well as the number of sessions for
each individual. These databases range in size, scope and purpose.

LFW: Labeled Faces in the Wild dataset is a benchmark database of face photographs designed for

studying the problem of face recognition. The LFW dataset consists of more than 13,000 images of

faces collected from the internet, each labeled with the name of the person in the image. It is often

used to train and test face recognition systems, as well as to develop and evaluate new face

recognition techniques.

4.1 Some famous databases:

4.1.1 CASIA-WebFace: The CASIA-WebFace dataset is used for face verification and face

identification

tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web. It

was automatically collected by the CASIA group and then manually refined. As is common for sets

that are collected by looking at celebrities or famous people, this set presents a long tail distribution

in terms of the images that are associated to a subject. This means that there are some frequent and

usually more famous subjects that comprise most of the images, while others are only described by a

few images.

4.1.2 UMDFaces: used a mix of human annotators via Amazon Mechanical Turk (AMT) and already
trained

deep-based face analysis tools to build medium-sized sets that are much tougher than the already

available sets. Another UMDFaces peculiarity is the fact that, unlike CASIA and VGGFace, the set

contains both still images (usually high quality) and video frames (often affected by motion blur). The

set provides annotations of facial keypoints, face pose angles, gender information. The set consists of

367,888 face annotations in still images for 8,277 subjects, and also 3.7 million annotated video

frames from about 22K videos of 3,100 subjects. Although the UMDFaces numbers are smaller than

the other sets, it presents a wider pose distribution than CASIA and VGGFace.

4.1.3 VGGFace2: is an improved version of VGGFace created in order to mitigate the deficiency of its

predecessor. VGGFace2 contains 3.31 million images of 9,131 subjects collected among celebrities,

but also famous people such as professors or politicians. It is designed to cover a large range of pose,

age and ethnicity, and to reduce label noise as much as possible. The reduction in the label noise was

achieved with the interplay of manual and automatic processes.

IMDb-Face: is a dataset of face images collected from the Internet Movie Database (IMDb) website.
It was created for the purpose of face detection and recognition research. The dataset contains over

80,000 face images of more than 5,000 individuals, making it one of the largest publicly available face

image datasets. The images were annotated with facial landmarks and attributes, such as gender,

age, and facial expression. This new set claims to be the largest noise-controlled face collection.

YTF (YouTube Faces): This dataset contains over 3,000 videos of faces, providing a large and diverse

set of images for training.

4.1.4 IJB-A (IARPA Janus Benchmark A): The IARPA Janus Benchmark A (IJB-A) database is developed
with

the aim to augment more challenges to the face recognition task by collecting facial images with a

wide variation in pose, illumination, expression, resolution and occlusion. IJB-A is constructed by

collecting 5,712 images and 2,085 videos from 500 identities, with an average of 11.4 images and 4.2

videos per identity.

4.1.5 WebFace260M: WebFace260M is a million-scale face benchmark, which is constructed for the

research community towards closing the data gap behind the industry. It consists of: - Noisy 4M

identities and 260M faces - High-quality training data with 42M images of 2M identities by using

automatic cleaning - A test set with rich attributes and a time-constrained evaluation protocol

MS-Celeb-1M dataset is a large-scale face recognition dataset consists of 100K identities, and each

identity has about 100 facial images. The original identity labels are obtained automatically from

webpages.

4.1.6 MegaFace: is a large-scale face recognition evaluation dataset created by Carnegie Mellon
University.

It contains over a million images of over 6,000 individuals, making it one of the largest publicly

available face recognition datasets. The dataset was created to evaluate the performance of face

recognition algorithms and to facilitate research in the field.

MegaFace consists of two parts: a gallery set and a probe set. The gallery set contains images of

individuals that are used as reference templates for recognition, while the probe set contains images

of the same individuals that are used to test the recognition algorithms. The probe set also includes

images of imposters (individuals who are not in the gallery set) to evaluate the ability of the

algorithms to reject non-matches.

The MegaFace dataset has been used in several face recognition benchmarks, including the Face

Recognition Grand Challenge and the WildFace challenge, and it has been instrumental in advancing
the state-of-the-art in face recognition technology.

For this project, we are not going to use any of these databases. we will collect our own database
using a google chrome extension to get the images of certain people from google image search.

Report
No ratings yet
Report
33 pages
Ship-Motion Prediction - Algorithms and Simulation Results
No ratings yet
Ship-Motion Prediction - Algorithms and Simulation Results
4 pages
Chhavi
No ratings yet
Chhavi
8 pages
Precision-Based Face Detection Algorithm Implementation On FPGA
No ratings yet
Precision-Based Face Detection Algorithm Implementation On FPGA
17 pages
Face Detection Thesis
100% (3)
Face Detection Thesis
59 pages
Face Recognitionand Face Detection Benefits
No ratings yet
Face Recognitionand Face Detection Benefits
7 pages
CV Unit - 5
No ratings yet
CV Unit - 5
46 pages
Rowatt & Kirkpatrick (2002) JSSR
100% (1)
Rowatt & Kirkpatrick (2002) JSSR
15 pages
Module 5
No ratings yet
Module 5
11 pages
Chapter One
No ratings yet
Chapter One
39 pages
Face Detection
No ratings yet
Face Detection
14 pages
Miniproj Final
No ratings yet
Miniproj Final
15 pages
Face Detection Recognition
No ratings yet
Face Detection Recognition
21 pages
CV Unit 3
No ratings yet
CV Unit 3
21 pages
Detect Faces Efficiently A Survey and Evaluations
No ratings yet
Detect Faces Efficiently A Survey and Evaluations
19 pages
TQM Model of Elements-Deployment Table Developed From Quality Award and Its Application
No ratings yet
TQM Model of Elements-Deployment Table Developed From Quality Award and Its Application
291 pages
Malhotra 19
No ratings yet
Malhotra 19
37 pages
End-To-End Deep Learning Pipeline For Facial Expression Recognition
No ratings yet
End-To-End Deep Learning Pipeline For Facial Expression Recognition
14 pages
Research Paper Ppt-1
No ratings yet
Research Paper Ppt-1
11 pages
GCD Detailed Syllabus
No ratings yet
GCD Detailed Syllabus
24 pages
.A General Review of Human Face Image Detection Using Machine Learning Classifier
No ratings yet
.A General Review of Human Face Image Detection Using Machine Learning Classifier
4 pages
BlackHat DC 09 Nguyen Face Not Your Password
No ratings yet
BlackHat DC 09 Nguyen Face Not Your Password
16 pages
Face Recognition Based On Machine Learning
No ratings yet
Face Recognition Based On Machine Learning
6 pages
Face Recognition in Image Processing
No ratings yet
Face Recognition in Image Processing
7 pages
Adegenet Tutorial
No ratings yet
Adegenet Tutorial
63 pages
4 Criminal Detection System 5th Sem Report
No ratings yet
4 Criminal Detection System 5th Sem Report
29 pages
Shreya CSP116 Report
No ratings yet
Shreya CSP116 Report
13 pages
Nitesh Kumar: A Proficient Machine Learning/Database Developer With 2 Years of Experience
No ratings yet
Nitesh Kumar: A Proficient Machine Learning/Database Developer With 2 Years of Experience
2 pages
Mini Project
No ratings yet
Mini Project
10 pages
ML - Viva QnA - Doubtly - in
No ratings yet
ML - Viva QnA - Doubtly - in
14 pages
Analysis of A Complex of Statistical Variables Into Principal Components
No ratings yet
Analysis of A Complex of Statistical Variables Into Principal Components
25 pages
Review and Comparison of Face Detection Algorithms: Kirti Dang Shanu Sharma
No ratings yet
Review and Comparison of Face Detection Algorithms: Kirti Dang Shanu Sharma
5 pages
Face Detection and Recognition System Using Digital Image Processing
No ratings yet
Face Detection and Recognition System Using Digital Image Processing
5 pages
Ai Syllabus
No ratings yet
Ai Syllabus
7 pages
Syllabus For Data Science & Artificial Intelligence
No ratings yet
Syllabus For Data Science & Artificial Intelligence
48 pages
An Introduction To Face Detection and Recognition
No ratings yet
An Introduction To Face Detection and Recognition
73 pages
He J.jog.2017.01.004
No ratings yet
He J.jog.2017.01.004
39 pages
Documento 5
No ratings yet
Documento 5
22 pages
Banchar Arnonkijpanich, Barbara Hammer and Alexander Hasenfuss - Local Matrix Adaptation in Topographic Neural Maps
No ratings yet
Banchar Arnonkijpanich, Barbara Hammer and Alexander Hasenfuss - Local Matrix Adaptation in Topographic Neural Maps
34 pages
Rims Manual
No ratings yet
Rims Manual
34 pages
Project Report Facial Recognition
No ratings yet
Project Report Facial Recognition
11 pages
Performance Comparison of Face Detection and Recognition Algorithms
No ratings yet
Performance Comparison of Face Detection and Recognition Algorithms
10 pages
Are View On Face Detection Methods
No ratings yet
Are View On Face Detection Methods
12 pages
Machine Learning and Data Science ANSWER
No ratings yet
Machine Learning and Data Science ANSWER
9 pages
DSP Mini Project Report
No ratings yet
DSP Mini Project Report
14 pages
6 PDF
No ratings yet
6 PDF
5 pages
3 Criminal Detection Project Proposal
No ratings yet
3 Criminal Detection Project Proposal
15 pages
Proposal For The Reasearch
No ratings yet
Proposal For The Reasearch
6 pages
Face Recognition Ieee Paper Based On Image Processing
100% (1)
Face Recognition Ieee Paper Based On Image Processing
9 pages
Attendance System Based On The Face Recognition of Webcam's Image of The Classroom
No ratings yet
Attendance System Based On The Face Recognition of Webcam's Image of The Classroom
11 pages
Jurnal Internasional 4
100% (1)
Jurnal Internasional 4
9 pages
Face Detection Based On Skin Color: Yang Ling Gu Xiaohan June2012
No ratings yet
Face Detection Based On Skin Color: Yang Ling Gu Xiaohan June2012
31 pages
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
91% (34)
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
28 pages
Face Tracking and Robot Shoot Using Adaboost Method
No ratings yet
Face Tracking and Robot Shoot Using Adaboost Method
7 pages
Wireless Communication Using Matlab
No ratings yet
Wireless Communication Using Matlab
6 pages
Introduction To Innovative Projects Face Recognition (Opencv)
No ratings yet
Introduction To Innovative Projects Face Recognition (Opencv)
20 pages
4 - Discrete Model-Based Operation of Cooling Tower Based On Statistical Analysis
No ratings yet
4 - Discrete Model-Based Operation of Cooling Tower Based On Statistical Analysis
8 pages
Computer Vision Assignment
No ratings yet
Computer Vision Assignment
9 pages
Report C++ Project Face Detection
No ratings yet
Report C++ Project Face Detection
11 pages
Face Detection and Recognition Technology
No ratings yet
Face Detection and Recognition Technology
17 pages
Research Paper - Human Face Detection
No ratings yet
Research Paper - Human Face Detection
8 pages
Population Thinking Neuroscience
No ratings yet
Population Thinking Neuroscience
14 pages
Hybrid Motion and Face Recognition With Detection For Criminal Identifications
No ratings yet
Hybrid Motion and Face Recognition With Detection For Criminal Identifications
6 pages
Facial Recognition and Detection
No ratings yet
Facial Recognition and Detection
36 pages
Statistic
No ratings yet
Statistic
12 pages
Face Recognition Report 1
No ratings yet
Face Recognition Report 1
26 pages
Complete Matrix Differential Calculus With Applications in Statistics and Econometrics 3rd Edition Jan R. Magnus PDF For All Chapters
100% (1)
Complete Matrix Differential Calculus With Applications in Statistics and Econometrics 3rd Edition Jan R. Magnus PDF For All Chapters
55 pages
Face Recognition
No ratings yet
Face Recognition
4 pages
The Real Time Face Detection and Recognition System
No ratings yet
The Real Time Face Detection and Recognition System
9 pages
Survey On Face Detection Algorithms
No ratings yet
Survey On Face Detection Algorithms
7 pages
IEEE Paper
No ratings yet
IEEE Paper
5 pages
EAPP Notes Q2 Lessons 4 9
No ratings yet
EAPP Notes Q2 Lessons 4 9
8 pages
Determination of Water Type in Benghazi Plain Aquifers by Chemical
No ratings yet
Determination of Water Type in Benghazi Plain Aquifers by Chemical
13 pages
Face Recognition
No ratings yet
Face Recognition
20 pages
Investment in Intangible Assets and Economic Complexity 2025 Research Policy
No ratings yet
Investment in Intangible Assets and Economic Complexity 2025 Research Policy
14 pages
Face Recognition System With Face Detection
No ratings yet
Face Recognition System With Face Detection
10 pages
Rajagiri School of Engineering & Technology, Kakkanad
No ratings yet
Rajagiri School of Engineering & Technology, Kakkanad
26 pages
Face Detection and Recognition Technology
No ratings yet
Face Detection and Recognition Technology
6 pages
Face Detection in Python Using OpenCV
No ratings yet
Face Detection in Python Using OpenCV
17 pages
Face Recognition
No ratings yet
Face Recognition
17 pages
Bond Pricing and Yield Curve Modeling A Structural Approach Riccardo Rebonato Instant Download
No ratings yet
Bond Pricing and Yield Curve Modeling A Structural Approach Riccardo Rebonato Instant Download
88 pages
A Survey On Face Recognition Techniques
No ratings yet
A Survey On Face Recognition Techniques
7 pages
Report
No ratings yet
Report
5 pages
The Exploration of Face Recognition Techniques
No ratings yet
The Exploration of Face Recognition Techniques
9 pages
Applications of Mathematics in Field of Information Technology
No ratings yet
Applications of Mathematics in Field of Information Technology
11 pages
Real-Time Face Detection From One Camera: 1. Research Team
No ratings yet
Real-Time Face Detection From One Camera: 1. Research Team
6 pages
Face Detection Tracking Opencv
No ratings yet
Face Detection Tracking Opencv
6 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Facial Recognition System: Unlocking the Power of Visual Intelligence
From Everand
Facial Recognition System: Unlocking the Power of Visual Intelligence
Fouad Sabry
No ratings yet
Facial Recognition System: Fundamentals and Applications
From Everand
Facial Recognition System: Fundamentals and Applications
Fouad Sabry
No ratings yet