0% found this document useful (0 votes)
68 views33 pages

MasterThesis V0

This document provides an overview of facial recognition systems, including definitions, applications, challenges, evaluation metrics, and the evolution of algorithms. It discusses the three main steps of facial recognition systems: face detection, feature extraction, and face recognition. For face detection, it covers popular classical and deep learning algorithms as well as challenges such as lighting, pose, and facial expressions. Common metrics for evaluating face detection models include IoU, precision, recall, PR curves, ROC curves, average precision, and mean average precision.

Uploaded by

AGGOUNE LINA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views33 pages

MasterThesis V0

This document provides an overview of facial recognition systems, including definitions, applications, challenges, evaluation metrics, and the evolution of algorithms. It discusses the three main steps of facial recognition systems: face detection, feature extraction, and face recognition. For face detection, it covers popular classical and deep learning algorithms as well as challenges such as lighting, pose, and facial expressions. Common metrics for evaluating face detection models include IoU, precision, recall, PR curves, ROC curves, average precision, and mean average precision.

Uploaded by

AGGOUNE LINA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

‫الجمهورية الشعبية الديمقراطية الجزائرية‬

République Algérienne Démocratique et Populaire


‫وزارة التعليم العالي و البحث العلمي‬
Ministère de l’Enseignement Supérieur et de la Recherche Scientifique ‫المدرسة العليا‬

‫ بسيدي بلعباس‬1945٠ ‫ ماي‬80٠ ‫لإلعالم اآللي‬


École Supérieure en Informatique
-08 Mai 1945- Sidi Bel Abbès

THESIS
To obtain the diploma of Master
Field: Computer Science
Specialty: Information System and Web development
”Systèmes d’Information et Web (SIW)”

Theme

Facial recognition system

Presented by:
AGGOUN LINA
LAOUEDJ SARAH
Supervised by:
BENSLIMANE Sidi Mohammed
BOUSMAHA Rabab
Academic Year : 2022/2023

State of Art
Definition:

Face recognition system is a popular study task in the field of image processing and
computer vision, owing to its potentially enormous application as well as its theoretical
value. This system is widely deployed in many real-world applications such as security,
surveillance, homeland security, access control, image search, human-machine, and
entertainment. However, these applications pose different challenges such as lighting
conditions and facial expressions.
the characteristics that make a face recognition system useful are the following: its ability to
work with both videos and images, to process in real time, to be robust in different lighting
conditions, to be independent of the person (regardless of hair, ethnicity, or gender), and to
be able to work with faces from different angles, Different types of sensors, including RGB,
depth, EEG, thermal, and wearable inertial sensors, are used to obtain data. These sensors
may provide extra information and help the face recognition systems to identify face images
in both static images and video sequences.
Three basic steps are used to develop a robust face recognition system: face detection,
feature extraction, and face recognition. The face detection step is used to detect and locate
the human face image obtained by the system. The feature extraction step is employed to
extract the feature vectors for any human face located in the first step. Finally, the face
recognition step includes the features extracted from the human face in order to compare it
with all template face databases to decide the human face identity.
1. Face detection:

Face Detection is a Computer Vision task in which a computer program can detect
the presence of human faces and also find their location in an image or a video
stream. This is the first and most crucial step for most computer vision applications
involving a face.

1.1 Applications of Face Detection:


Nowadays, Face Detection is being used in a huge number of domains, including Security, Marketing,
Healthcare, Entertainment, Law Enforcement, Surveillance, Photography, Gaming, Video
Conferencing, etc. Let’s look at some specific use cases.

1.1.1 Camera Autofocus


In 2006, an early form of facial feature detection was introduced in digital cameras to aid in
autofocus. Since then, almost all digital cameras include some facial detection mode to detect the
faces in the camera frame and keep them focused.
1.1.2 Face Recognition
The most popular application of Face Detection is Face Recognition. In any Face Recognition system,
detecting the Face is the primary step. The features of the detected face can be used to associate the
face with a person for recognition.

1.1.3 Gender Classification


Once we have the detected face region, we can use a classification model on top of that to
distinguish between males and females.

1.1.4 Landmark Detection


Many face applications utilize the location of landmarks of the face, such as the eyelids, corner points
of the lips, or tip of the nose. These landmarks are localized within the face region we get from the
face detector.
1.1.5 Attendance
Facial Detection can be used to find the number of people in a classroom or in an event to note the
strength of people present.

1.1.6 Snapchat/Instagram camera filters


Most of the camera filters on social media applications are built on top of and are made possible with
Face Detection.

1.1.7 Crowd Analysis


Facial Detection can measure the crowd’s strength and density in a public space for crowd analysis.
1.2 Challenges of Detecting a Face
Numerous things hinder the performance of a Face Detector.

1.2.1 Occlusion
Occlusion greatly affects the ability of any system to detect the face as only a part of the face is
visible, and it is hard to say with confidence whether there is a face in the frame when only part of it
is visible.

1.2.2 Lighting
Any change in the subject’s lighting conditions poses an issue for face detection as it is not necessary
that the method is designed/trained to handle the variation in the lighting. 

1.2.3 Skin Color


Skin color in facial detection has always been a topic of discussion, as it is found that some of the face
detectors were biased toward some skin colors.

Also, a particular skin color might behave differently in various lighting conditions than any other skin
color, bringing an added challenge to the detection system.
1.2.4 Pose
The pose or orientation of a face in the image frame affects the performance of the Face detector, as
some methods can only detect frontal faces and fail when the face is sideways or turned slightly to
one side.

1.2.5 Facial Expressions


Facial expressions should be taken care of when designing the features of a face or training a deep
learning model, as the face is unlikely to always be neutral in the real world. Any change in the
expressions of the face would mean the features of the face would change, and the detection system
might not consider it a real face.

1.2.6 Accessories/Makeup/Facial Hair


The accessories used, facial hair or modifications done on faces might also affect the performance of
the Face Detection system if they are not taken into account while designing or training the Face
Detector. Sunglasses, Face masks, Beards, Tattoos, and Dramatic makeup are a few examples.
1.2.7 Scale of Face
The scale of the face might change with respect to the image/video frame, and depending on the
facial detection system, the face might be too small to be detected.

1.3 Metrics Used for Evaluating Face Detection Models


The metrics used in Facial Detection are the same as any other object detection problem. The
popular metrics used are:

1.3.1 IoU – Intersection over Union


Intersection over Union (IoU) is a metric that quantifies the degree of overlap between two regions.
IoU metric evaluates the correctness of a prediction. The value ranges from 0 to 1. With the help of
the IoU threshold value, we can decide whether a prediction is True Positive, False Positive, or False
Negative.

1.3.2 Precision
Precision measures the proportion of predicted positives that are correct. If you are wondering how
to calculate precision, it is simply the True Positives out of total detections. Mathematically, it’s
defined as follows:

P = TP/(TP+FP) = TP/Total Predictions

The value ranges from 0 to 1.

1.3.3 Recall
Recall measures the proportion of actual positives that were predicted correctly. It is the True
Positives out of all Ground Truths. Mathematically, it is defined as follows.

R = TP/(TP+FN) = TP/Total Ground Truths

Similar to Precision, the value of Recall also ranges from 0 to 1. 


1.3.4 PR Curve – Precision-Recall Curve
The Precision-Recall Curve is a plot with Precision on the y-axis and recall on the x-axis. It shows the
precision as a recall function for all different threshold values.

1.3.5 ROC Curve – Receiver Operating Characteristic


The Receiver Operating Characteristic (ROC) curve is a plot that shows the performance of a model as
a function of its cut-off threshold (similar to the precision-recall curve).

It essentially shows the Recall against the false positive rate (FPR) for various threshold values.

1.3.6 AP – Average Precision


Interestingly, Average Precision (AP) is not the average of Precision (P). The term AP has evolved with
time. For simplicity, we can say that it is the area under the precision-recall curve. 

The area under the curve is used to summarize the performance of a model into a single measure. It
is important when comparing the performance of different models. A model with a high AUC can
occasionally score worse in a specific region than another model with a lower AUC. But in practice,
the AUC performs well as a general measure of predictive accuracy.
1.3.7 MAP – Mean Average Precision
As the name suggests, Mean Average Precision or mAP is the average of AP over all detected classes
in multiclass object detection

mAP = 1/n * sum(AP), where n is the number of classes.

To arrive at the mAP, while evaluating a model, Average Precision (AP) is calculated for each class
separately.

1.4 Evolution Timeline of Face Detection Algorithms


The figure below highlights the important Face Detection algorithms over time. This is NOT an
exhaustive list by any means. Please let us know in the comments section if you want us to include
any other models.

1.5 Classical Algorithms of Face Detection:


1.5.1 Haar cascades (2001) :
The algorithm proposed by Viola & Jones in their paper “Rapid Object Detection using a Boosted
Cascade of Simple Features” and refined for face detection in “Robust Real-Time Face Detection”
was one of the first real-time object (and face) detection method. The algorithm is based on three
main concepts: Haar features and the integral image, feature selection via Adaboost and the
attentional cascade. To summarize, a sliding window is passed over the image on which face
detection is performed and for each of the sliding window’s position, features are generated from
the pixels in the sliding window. Then these features are passed through a cascade of classifiers to
identify if these pixels correspond to a face. I will now enter into some more details.

1.5.2 DLib-HOG (2005) :


HOG (Histogram of Oriented Gradients) is a widely used method for object detection, including face
detection. It's based on the idea of capturing the object's structural information through the gradient
orientation of the image pixels.

Here's a high-level overview of how the HOG method works for face detection:

1. Pre-processing: The input image is first pre-processed to adjust the brightness and contrast.

2. Gradient Computation: The gradient information of the image is computed to capture the
edge information of the image.

3. Cell-level Histogram: The gradient information is divided into small cells and a histogram of
gradient orientations is computed for each cell.

4. Block-level Histogram: The histograms from several cells are combined to form a block-level
histogram, which is used to capture the structural information of the object.

5. Normalization: The block-level histograms are normalized to account for changes in


illumination and shadow.

6. SVM Classifier: The normalized histograms are used to train a Support Vector Machine (SVM)
classifier to distinguish between faces and non-faces.

7. Sliding Window Detection: A sliding window approach is used to scan the image, where the
classifier is applied on each window to detect the presence of a face.

Overall, HOG-based face detection is computationally efficient and has shown good performance in
many applications.

1.5.3 Deep Learning Based Face Detectors:


With all these face detectors discussed above doing their job, do we really need newer face-
detection techniques? The answer is yes. While they may provide decent accuracy, the speed is
found wanting.

 A classical Face-Detection technique might fail to detect a face in a few frames, which may
lead to the application not performing as desired or cause complications in the system. 

 Even if the faces are detected in every frame, the process might take too long. This slows
down the application and, at times, robs it of its whole essence. 

No wonder we needed to switch to newer state-of-the-art Face Detectors. These provide high
accuracy (such that no face goes undetected) at very high speeds and can also be used in
microprocessors with low computing power.

1.5.4 SSD (Dec 2015):


Single Shot Multibox Detector, the method’s name reveals most of the details about the model. The
SSD model detects the object in a single pass over the input image. Unlike other models, which
traverse the image more than once to get an output detection.
SSD architecture

The SSD model is made up of 2 parts, namely

The Backbone model: The Backbone model is a typical pre-trained image classification network that
works as the feature map extractor. Here, the image final image classification layers of the model are
removed to give us only the extracted feature maps.

The SSD head: SSD head is made up of a couple of convolutional layers stacked together, and it is
added to the top of the backbone model. This gives us the output as the bounding boxes over the
objects. These convolutional layers detect the various objects in the image.

1.5.5 MTCNN (April 2016):


A more recent model, MTCNN, stands for Multi-Task Cascaded Convolutional Neural Network.
Published in 2016 by Zhang et al., this commonly used model consists of neural networks connected
in a cascade fashion. 

The proposed MTCNN architecture consists of three stages of CNNs. In the first stage, P-Net
(Proposal Network), it produces candidate windows quickly through a shallow CNN. Then in the R-
Net (Refine Network) stage, it refines the windows by rejecting many non-face bounding boxes
through a more complex CNN. Finally, the O-Net (Output Network) stage uses a more powerful CNN
to refine the result again and output five facial landmarks positions.

Though an accurate model, it isn’t fast enough for real-time applications.


1.5.6 Dual Shot Face Detector (April 2019):

Dual Shot Face Detector is a novel Face Detection approach that addresses the following three major
aspects of Facial Detection:

1. Better feature learning

Feature Enhance Module (FEM)

DSFD involves a Feature-Enhance Module (FEM) that enhances the originally received feature maps,
thus extending the single shot detector to a dual shot detector. This module helps incorporate the
current layer’s information along with the feature maps of the previous layers and maintains a
context relationship between the anchors. This helps obtain more discriminate and robust features.

2. Progressive loss design – Loss functions such as Focal Loss and Hierarchical Loss
address the class-imbalance problem and consider original and enhanced learning features,
respectively. However, they are not equipped to progressively learn the feature maps at
different levels and shots. DSFD involves a Progressive Anchor Loss (PAL) computed by two
sets of anchors. It assigns smaller anchor sizes in the first shot and larger ones in the second.
This helps facilitate the features effectively.
3. Anchor assign-based data augmentation – Anchors are generated for each
feature map. Some research involves strategies to increase positive anchors. Such a strategy
ignores the random sampling in data augmentation, resulting in an imbalance between
positive and negative anchors. DSFD uses Improved Anchor Matching (IAM), which involves
anchor-based data augmentation. This provides a better match between the anchors and
ground truth and leads to better initialization for the face-box regressor. 

All the above-mentioned aspects are mutually exclusive and can work simultaneously to improve
performance. As you can see, all these techniques relate to a two-stream design, so it has been
named Dual Shot Face Detector. It has the ability to remain robust even under variations in
illumination, pose, scale, occlusion, etc.

When introduced, it achieved state-of-the-art results on the WIDER Face dataset.

Easy Medium Hard


Validation Set 96.6 95.7 90.4
Test Set 96 95.3 90
AP (%) of DSFD on WIDER Face splits

1.5.7 RetinaFace (May 2019):


RetinaFace is a practical single-stage SOTA (State Of The Art) face detector initially introduced in the
arXiv technical report and then accepted by CVPR 2020. It is a part of the InsightFace project from
DeepInsight, which is also credited with many more top Face-Recognition techniques like ArcFace,
SubCenter ArcFace, PartialFC, and multiple facial applications too.

RetinaFace detects 900 faces (at a threshold of 0.5) out of 1151 people

It takes pixel-wise face localization to the next level. RetinaFace cleverly takes advantage of extra-
supervised and self-supervised multi-task learning to perform face localization on various scales of
faces, as seen in the above figure. 

The idea behind the RetinaFace Face-Detection technique

Many recent state-of-the-art methods focus on single-stage face detection techniques, which
densely sample face locations and scales on feature pyramids. Such a technique provides better
performance at a faster speed compared to two-stage methods. 
RetinaFace improves this single-stage framework by:

 Exploiting multi-task losses coming from strongly supervised and self-supervised signals. 

 Employing a multi-task learning strategy to simultaneously predict the face score, face box,
five facial landmarks, and 3D position and correspondence of each face pixel. 

The multitask loss function used by RetinaFace includes the following losses:

1. Face classification loss is a softmax loss for binary classes (face/not face).

2. Face box regression loss – The target bounding boxes are normalized and are in the format
[(x_center, y_center, width, height]).

3. Facial landmark regression loss – This regression technique also normalizes the target.

4. Dense regression loss – Supervised signals increase the significance of better face box and
landmark locations. 

It achieves state-of-the-art results on the WIDER Face dataset.

Easy Medium Hard


Validation Set 96.9 96.1 91.8
Test Set 96.3 95.6 91.4
AP(%) of RatinaFace on WIDER Face splits

1.5.8 MediaPipe (June 2019):


A framework for building perception pipelines that perform inferences over arbitrary sensory data,
MediaPipe includes images, video streams, and audio data. 

It can be used to rapidly prototype perception pipelines with reusable components and in
production-ready Machine Learning applications. 

ML solutions offered by MediaPipe

MediaPipe provides an ultrafast Face Detection solution that is based on BlazeFace.

 It uses a lightweight feature extractor inspired by the MobileNet model and a GPU-friendly
anchor scheme modified from Single Shot Multibox Detector (SSD). 

 It also replaces Non-Maximum Suppression with an improved tie-resolution strategy. 


The model can detect 6 facial landmarks. 

It provides a JavaScript API to implement Facial Detection on the web and an API to include it on
Android, iOS, and Desktop applications. 

1.5.8 YuNet (Oct 2021):


Traditionally OpenCV face detection was equipped with the face detectors like Haar
cascades and HOG detectors that worked well for frontal faces but failed otherwise. The recent
release of OpenCV (4.5.4 Oct 2021) saw the addition of a face detection model called YuNet that
solves this problem. 

It is a CNN-based face detector developed by Chengrui Wang and Yuantao Feng. It is a very
lightweight and fast model. With a model size of less than an MB, it can be loaded on almost any
device. It adopts mobilenet as its backbone and contains 85000 parameters in total. 

The main and well-known repository, libfacedetection, takes YuNet as the detection model and


offers pure C++ implementation without dependence on DL frameworks, and reaches a detection
rate of 77.34 FPS for 640 × 480 images to 2,027.74 FPS for 128 × 96 images on an INTEL i7-1065G7
CPU at 1.3 GHz 

It achieves a respectable score on the validation set of the WIDER Face dataset for such a lightweight
model. 

Easy Medium Hard


Validation Set 0.856 0.842 0.727
AP (%) of YuNet on the WIDER Validation set

1.5.9 Dlib CNN:


The Convolutional neural network (CNN) is a neural feed system that is used primarily for computer
vision. They give a dense neural network component as well as an artificial picture pre-treatment.
CNN's are special kinds of neural networks for grid-like data analysis. The visual system of animals
inspires the construction of the CNN.

In previous methods, much of the work involved selecting filters to establish the characteristics such
that as much detail could be extracted from the image as possible. This work can now be automated
with increasing profound understanding and greater computational skills. The CNNs are called since
the original image data is combined with a series of filters. The number of filters to be applied is the
parameter to be selected and the filtration dimension. The filter dimension is known as the step
length. Typical phase values range from 2 to 5. In this particular case, the output of the CNN is a
binary classification that takes value 1 when a face exists, otherwise value 0. A paper on Max-Margin
Object Detection (MMOD) is also implemented for improved performance. This model works with
various facial orientations and is occlusion sturdy and its training method really fast. But is very slow
and cannot detect smaller faces since they are specialized in the size of 80 to 80 faces. You must then
ensure that the face size of the submission is larger than that. However, for smaller faces, you should
train your own facial detector.

1.5.10 Open CV – DNN:


It's a Caffe model built upon the Single Shot-Multibox Detector (SSD) and its foundation is ResNet-10.
It was launched in its deep neural network (DNN) module after OpenCV 3.3. [12] A quantified version
of TensorFlow is also available, but we are going to use the Caffe model. This model is very reliable,
works on CPU in real-time, works well on various facial orientations, works even with significant
occlusion and can even detect faces of different sizes.

OpenCV provides 2 models for this face detectors :

• Floating-point 16 version of the original Caffe implementation (5.4 MB)


• The 8-bit quantized version using TensorFlow (2.7 MB)

1.6 Comparison of Face Detectors


The following table presents a comparison of all the above Face-Detection models based on their
inference speed in Frames Per Second (FPS) and Average Precision (AP). 

Performance Comparison of Face Detectors (Speed/FPS)

Model FPS (On Colab GPU) FPS (On Colab CPU)


Haar cascade - 19.95
Dlib - 33.92
SSD 19.90 15.58
MTCNN 2.11 1.81
MediaPipe 323.63 225.34
RetinaFace Resnet50 72.24 1.43
RetinaFace MobilenetV1 69.50 28.89
Dual Shot Face Detector 18.89 0.22
YuNet - 49.43

Comparison of the models, based on FPS


Comparison graph of the model, based on FPS

Performance Comparison of Face Detectors (Average Precision)


Model [email protected]
SSD 0.931
MTCNN 0.915
MediaPipe 0.743
RetinaFace Resnet50 0.994
RetinaFace MobilenetV1 0.994
Dual Shot Face Detector 0.989
YuNet 0.994
Comparison of the models, based on AP

Comparison graph of the models, based on AP

1.7 Inference Comparison under Various Conditions


Let’s compare the inference results for all methods in different conditions that affect the detections.
1.7.1 Face Accessories and Make-up

RetinaFace-Resnet50, YuNet, and DSFD work perfectly and are not affected, while the other models
fail in multiple cases, with Haar Cascades and DLib-HOG performing the worst, as they have hand-
crafted features.

1.7.2 Facial Expressions


Virtually all face detection methods discussed above work well for faces with different expressions.
Haar Cascade misses one face, which is expected as the face is tilted, and the hand-crafted features
don’t consider such wide variations in facial features.

1.7.3 Lighting Conditions


In different lighting conditions, MTCNN, DLib-HOG, and Haar Cascades perform the worst, failing in
two or more images. Many methods fail for the third image as only half of the facial features are
visible for detection.
1.7.4 Occlusion
MTCNN, DLib-Hog, and Haar Cascades fail miserably to detect occluded faces. Other methods
manage to detect faces in all the images.

1.7.5 Pose
DSFD and RetinaFace-Resnet50 win the race for detecting faces in different poses, with YuNet
performing respectably.

1.7.6 Scale of face


Both the RetinaFace models and DSFD take the lead here, detecting even the tiniest of faces.
Interestingly MediPipe is greatly affected by changes in the scale of faces and misses most of them.
1.7.7 Skin color
It’s good to see almost all the methods working well to detect faces of different skin colors.

1.8 Choosing the Best Face Detection Model


After discussing all the above methods, which one should you be using? Choosing the model that
best suits you will depend on the requirements of your particular application. Below are the three
conditions that might define your requirements.

1.8.1 Best Detection Accuracy


If you want the best-in-class detection accuracy and don’t want to miss any faces,
then DSFD or Retinaface-resnet50 model is what you should go for.

Remember that it will be very slow and won’t make sense for real-time inference.

1.8.2 Best Detection Speed


If you want the utmost inference speed and don’t mind missing faces in uncontrolled conditions,
then MediaPipe’s face detection solution is what you want.

1.8.3 Best Overall – Balanced speed and accuracy


If a good balance of speed and performance is what you are after, you should check out
the YuNet and RetinaFace-Mobilenetv1 models. Both are very fast models with real-time inference
speed while still maintaining decent accuracy.

2. The feature extraction:

step is employed to extract the feature vectors for any human face located in the first step (face
detection) . it represents a face with a set of features vector called a “signature” that describes the
prominent features of the face image such as mouth, nose, and eyes with their geometry distribution

The goal of feature extraction is to convert high-dimensional, complex data into a set of compact and
informative features that can be easily analysed and used to build predictive models

Several techniques involve extracting the shape of the mouth, eyes, or nose to identify the face using
the size and distance, HOG , Eigenface, independent component analysis (ICA), linear discriminant
analysis (LDA) , scale-invariant feature transform (SIFT) , gabor filter, local phase quantization (LPQ) ,
Haar wavelets, Fourier transforms , and local binary pattern (LBP) techniques are widely used to
extract the face features

2.1 techniques are used to extract the face features


2.1.1 Eigenface and PCA :
PCA is a linear dimensionality reduction technique that is widely used for feature extraction. It
transforms the data into a new, lower-dimensional space by projecting it onto a set of orthogonal
basis vectors, called principal components. The principal components are determined by the
eigenvectors of the covariance matrix of the data, and they capture the most important patterns or
variations in the data.

Eigenfaces, is a specific application of PCA to face images. In this method, a set of face images is
treated as a matrix, and PCA is applied to obtain the principal components, which are used to
represent the faces in a lower-dimensional space. The eigenfaces correspond to the eigenvectors of
the covariance matrix and capture the most important features of the faces, such as facial
expressions and lighting conditions.

In both PCA and eigenfaces, the goal is to convert high-dimensional, complex data into a set of
compact and informative features that can be easily analyzed and used for tasks like classification
and recognition.

Example of dimensional reduction when applying principal component analysis (PCA)

2.1.2 independent component analysis (ICA): It is similar to principal


component analysis (PCA), but instead of finding a linear transformation of the data that maximizes
variance, ICA finds a linear transformation that maximizes non-, ICA has many applications in signal
processing, image analysis, and machine learning. For example, it can be used to separate mixed
signals, like audio signals or brain signals, into their independent components. It can also be used to
extract independent features from high-dimensional data, like images or text.

2.1.3 Fisherface linear discriminant analysis (LDA) : The Fisherface method is


based on the same principle of similarity as the Eigenfaces method. The objective of this method is to
reduce the high dimensional image space based on the linear discriminant analysis (LDA) technique
instead of the PCA technique. The LDA technique is commonly used for dimensionality reduction and
face recognition [66]. PCA is an unsupervised technique, while LDA is a supervised learning technique
and uses the data information

2.1.4 scale-invariant feature transform (SIFT) : SIFT is an algorithm used to


detect and describe the local features of an image. The main idea of the SIFT descriptor is to convert
the image into a representation composed of points of interest. These points contain the
characteristic information of the face image. SIFT presents invariance to scale and rotation. It is
commonly used today and fast, which is essential in real-time applications, to realize this algorithm
we have four steps:

1 detection of the maximum and minimum points in the space-scale

2 location of characteristic points

3 assignment of orientation

4 a descriptor of the characteristic point.

2.1.5 gabor filter : Gabor filters are designed to be similar to the receptive fields of simple
cells in the primary visual cortex of the brain. They provide a way to analyze images by representing
texture, orientation, and frequency information.

In practical applications, a Gabor filter is constructed as a wavelet function that is modulated by a


Gaussian envelope. The wavelet component of the filter allows it to capture information about
texture and frequency, while the Gaussian component provides spatial localization. The parameters
of the Gabor filter, such as the frequency, orientation, and scale, can be adjusted to match the
features of interest in the image.

By varying the Gabor parameters new filters are generated and we get closer to the real face
2.1.6 local binary pattern (LBP) : is a texture-based method used in computer
vision and image processing for feature extraction and representation. The idea behind LBP is
to create a unique binary pattern for each pixel in an image by comparing the value of the
center pixel to the values of its surrounding pixels. The resulting binary patterns capture the
spatial relationships between the pixels in the image, and can be used to identify and classify
different textures. LBP has been extended to include rotation invariant, uniform, and multi-scale
variations, which can improve its performance in certain applications.

2.2. Comparison between techniques of face features

AUTHOR TECHNIQUE DATABASE LIMITATION ADVANTAGE RESULT


USED
Lenc et al. SIFT LFW Still far to be Sufficiently 97.30%
AR perfect robust on 95.80%
FERET lower quality 98.04%
real data

Annalakshmi ICA and LDA LFW Sensitivity Good accuracy 88%


et al
Annalakshmi PCA and LDA LFW Sensitivity Specificity 59%
et al
Gowda et al. LPQ and LDA MEPCO Computation Good 99.13%
time accuracy
Perlibakas PCA and FERET Precision Pose 87.77%
and Vytautas Gabor filter
Hafez et al. Gabor filter ORL Pose Good 98.33% C.
and LDA C. YaleB recognition 99.33%
performance
Zhang et al. PCA YALE Recognition Reduce the 84.21%
rate dimensionality
84.21%
Khoi et al LBP ORL Sensitivity to -Robustness to LBP
noise shape changes depends on
-Invariance to several
illumination factors
-
Computational
efficiency

3 Face recognition:
This step considers the features extracted from the background during the feature extraction step
and compares it with known faces stored in a specific database. There are two general applications of
face recognition, one is called identification and another one is called verification. During the
identification step, a test face is compared with a set of faces aiming to find the most likely match.
During the identification step, a test face is compared with a known face in the database in order to
make the acceptance or rejection decision. Convolutional neural network (CNN), k-nearest neighbour
(K-NN), DeepFace, VGG-FACE, FaceNet and Siamese neural network are known to effectively address
this task.
3.1 Techniques are used for face recognition
3.1.1Convolutional Neural Networks (CNNs): CNNs are a popular deep learning architecture for
image classification and object detection tasks, including face recognition.

In face recognition, a CNN takes an input face image and applies multiple convolutional and pooling
layers to extract meaningful features from the image. These features are then fed into fully
connected layers to produce a prediction of the identity of the person in the face image.

One common approach for face recognition using CNNs is to train a network to predict a compact
representation, called an embedding, for each face image. The embeddings are learned such that
similar faces have similar embeddings. At test time, the embedding for a query face image can be
compared to the embeddings for a set of reference face images to find the closest match.

Another approach is to use a CNN to directly predict the identity of the person in a face image, by
training the network to predict a probability distribution over a set of identities. At test time, the
network outputs a prediction for the identity of the person in the query face image.

3.1.2 K-Nearest Neighbors (KNN): KNN is a simple machine learning algorithm that can be used for
face recognition tasks. In KNN-based face recognition, each face image is represented as a feature
vector, which captures the important information about the face.

At test time, the feature vector for a query face image is compared to the feature vectors of a set of
reference face images. The closest K reference face images to the query face image are selected as
the "nearest neighbors", and the identity of the person in the query face image is predicted based on
the majority class of the K nearest neighbors.

KNN is simple to implement and can be used for face recognition tasks with limited computational
resources. However, KNN can be sensitive to the choice of K and the feature representation used,
and may not perform as well as more advanced machine learning algorithms such as Convolutional
Neural Networks (CNNs) in terms of accuracy and robustness to variations in lighting, pose, and
expression.

In recent years, KNN has mainly been used as a baseline comparison method for evaluating the
performance of more advanced face recognition algorithms. However, KNN can still be a useful tool
for face recognition in certain scenarios, such as real-time face recognition on small datasets with
limited computational resources.
3.1.3 DeepFace: DeepFace is a face recognition system developed by Facebook in 2014, based on
deep learning algorithms. It was one of the first deep learning-based face recognition systems to
achieve human-level accuracy on standard benchmarks.

DeepFace uses a Convolutional Neural Network (CNN) to extract features from a face image and
produce a compact representation, called an embedding, for each face. The embeddings are learned
such that similar faces have similar embeddings.

At test time, the embedding for a query face image can be compared to the embeddings for a set of
reference face images to find the closest match. The embeddings can also be used for other face
recognition tasks, such as face verification (determining if two face images belong to the same
person) and face clustering (grouping similar face images together).

3.1.4 VGG-Face: VGG Face is a deep convolutional neural network architecture for face recognition
that was developed by researchers at the Visual Geometry Group (VGG) at the University of Oxford.
The architecture is based on the VGGNet architecture, which was developed for the ImageNet Large
Scale Visual Recognition Challenge (ILSVRC) and achieved state-of-the-art performance in image
classification tasks.

In VGG Face, the network takes an input face image and applies multiple convolutional and pooling
layers to extract meaningful features from the image. These features are then fed into fully
connected layers to produce a compact representation, called an embedding, for each face. The
embeddings are learned such that similar faces have similar embeddings.

At test time, the embedding for a query face image can be compared to the embeddings for a set of
reference face images to find the closest match. The embeddings can also be used for other face
recognition tasks, such as face verification (determining if two face images belong to the same
person) and face clustering (grouping similar face images together).

3.1.5 FaceNet: FaceNet is a deep learning-based face recognition system developed by researchers at
Google. It was introduced in 2015 and achieved state-of-the-art performance on standard
benchmark datasets for face recognition at the time.

FaceNet uses a Convolutional Neural Network (CNN) to extract features from a face image and
produce a compact representation, called an embedding, for each face. The embeddings are learned
such that similar faces have similar embeddings. This is achieved by training the network to minimize
the Euclidean distance between the embeddings of similar faces and maximize the distance between
the embeddings of dissimilar faces.

At test time, the embedding for a query face image can be compared to the embeddings for a set of
reference face images to find the closest match. The embeddings can also be used for other face
recognition tasks, such as face verification (determining if two face images belong to the same
person) and face clustering (grouping similar face images together).

3.1.6 Siamese Neural Networks (SNNs): SNNs are a type of deep learning architecture that can be
used for face recognition tasks. SNNs are designed to compare the similarity of two inputs and
determine if they are the same or not.
In the context of face recognition, a Siamese Network can be trained to compare the similarity of two
face images and determine if they belong to the same person. During training, the network is
presented with pairs of images, one of which contains a face of a known individual, and the other
contains a face of a different individual or no face at all. The network then learns to compare the two
images and determine if they are similar or not.

At test time, the SNN can be used to compare a query face image to a set of reference face images.
The network computes a similarity score between the query and each reference image, and the
reference image with the highest similarity score is considered the match.

Siamese Networks have been found to be effective for face recognition tasks, as they can learn to
compare the unique features of faces and determine their similarity, even if the faces are presented
at different scales, rotations, and poses.

Method Face Recognition (%)


CNN 99,83
KNN 90-95
DeepFace 97,35
VGG-Face 99,13
FaceNet 99,63
SNN 99,65

In summary, the choice of face recognition technique depends on the specific requirements and
constraints of the task at hand which is recognizing employees in a meeting room as well as on the
needs and demands of the company. However, deep learning-based approaches such as CNNs and
siamese networks have become increasingly popular in recent years due to their ability to learn high-
level features directly from face images and achieve high accuracy on a wide range of face
recognition benchmarks.

4. Databases Used :
Many databases containing information that enables the evaluation of face recognition systems are

available on the market. However, these databases are generally adapted to the needs of some

specific recognition algorithms, each of which has been constructed with various image acquisition

conditions (changes in illumination, pose, facial expressions) as well as the number of sessions for
each individual. These databases range in size, scope and purpose.

LFW: Labeled Faces in the Wild dataset is a benchmark database of face photographs designed for

studying the problem of face recognition. The LFW dataset consists of more than 13,000 images of

faces collected from the internet, each labeled with the name of the person in the image. It is often

used to train and test face recognition systems, as well as to develop and evaluate new face

recognition techniques.

4.1 Some famous databases:

4.1.1 CASIA-WebFace: The CASIA-WebFace dataset is used for face verification and face


identification

tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web. It

was automatically collected by the CASIA group and then manually refined. As is common for sets

that are collected by looking at celebrities or famous people, this set presents a long tail distribution

in terms of the images that are associated to a subject. This means that there are some frequent and

usually more famous subjects that comprise most of the images, while others are only described by a

few images.

4.1.2 UMDFaces: used a mix of human annotators via Amazon Mechanical Turk (AMT) and already
trained

deep-based face analysis tools to build medium-sized sets that are much tougher than the already

available sets. Another UMDFaces peculiarity is the fact that, unlike CASIA and VGGFace, the set

contains both still images (usually high quality) and video frames (often affected by motion blur). The

set provides annotations of facial keypoints, face pose angles, gender information. The set consists of

367,888 face annotations in still images for 8,277 subjects, and also 3.7 million annotated video

frames from about 22K videos of 3,100 subjects. Although the UMDFaces numbers are smaller than

the other sets, it presents a wider pose distribution than CASIA and VGGFace.

4.1.3 VGGFace2: is an improved version of VGGFace created in order to mitigate the deficiency of its

predecessor. VGGFace2 contains 3.31 million images of 9,131 subjects collected among celebrities,

but also famous people such as professors or politicians. It is designed to cover a large range of pose,

age and ethnicity, and to reduce label noise as much as possible. The reduction in the label noise was

achieved with the interplay of manual and automatic processes.

IMDb-Face: is a dataset of face images collected from the Internet Movie Database (IMDb) website.
It was created for the purpose of face detection and recognition research. The dataset contains over

80,000 face images of more than 5,000 individuals, making it one of the largest publicly available face

image datasets. The images were annotated with facial landmarks and attributes, such as gender,

age, and facial expression. This new set claims to be the largest noise-controlled face collection.

YTF (YouTube Faces): This dataset contains over 3,000 videos of faces, providing a large and diverse

set of images for training.

4.1.4 IJB-A (IARPA Janus Benchmark A): The IARPA Janus Benchmark A (IJB-A) database is developed
with

the aim to augment more challenges to the face recognition task by collecting facial images with a

wide variation in pose, illumination, expression, resolution and occlusion. IJB-A is constructed by

collecting 5,712 images and 2,085 videos from 500 identities, with an average of 11.4 images and 4.2

videos per identity.

4.1.5 WebFace260M: WebFace260M is a million-scale face benchmark, which is constructed for the

research community towards closing the data gap behind the industry. It consists of: - Noisy 4M

identities and 260M faces - High-quality training data with 42M images of 2M identities by using

automatic cleaning - A test set with rich attributes and a time-constrained evaluation protocol

MS-Celeb-1M dataset is a large-scale face recognition dataset consists of 100K identities, and each

identity has about 100 facial images. The original identity labels are obtained automatically from

webpages.

4.1.6 MegaFace: is a large-scale face recognition evaluation dataset created by Carnegie Mellon
University.

It contains over a million images of over 6,000 individuals, making it one of the largest publicly

available face recognition datasets. The dataset was created to evaluate the performance of face

recognition algorithms and to facilitate research in the field.

MegaFace consists of two parts: a gallery set and a probe set. The gallery set contains images of

individuals that are used as reference templates for recognition, while the probe set contains images

of the same individuals that are used to test the recognition algorithms. The probe set also includes

images of imposters (individuals who are not in the gallery set) to evaluate the ability of the

algorithms to reject non-matches.

The MegaFace dataset has been used in several face recognition benchmarks, including the Face

Recognition Grand Challenge and the WildFace challenge, and it has been instrumental in advancing
the state-of-the-art in face recognition technology.

For this project, we are not going to use any of these databases. we will collect our own database
using a google chrome extension to get the images of certain people from google image search.

You might also like