Report
Report
MACHINE LEARNING
PROJECT REPORT
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE &
ENGINEERING
Submitted By
ASHISH JHA (2K21/CO/107)
AVIRAL (2K21/CO/120)
AYUSH KUMAR SINGH (2K21/CO/126)
Project Guide
Dr. Aruna Bhatt
Department of CSE
Delhi Technological University
(Govt. of NCT, Delhi)
ABSTRACT
Face recognition is a cutting-edge biometric technology that
identifies and verifies individuals based on their facial features. It
has gained widespread applications in areas such as surveillance,
security systems, authentication, and human-computer interaction.
With the increasing reliance on digital and automated solutions,
accurate and efficient face recognition systems have become a
critical requirement in both personal and professional domains.
Declaration ............................................................................................................................. 1
Certificate ............................................................................................................................... 2
Abstract .................................................................................................................................. 3
Acknowledgement……….……………………………………………………………….....4
Content……….……………………………………………………………………...............
5 List of figures ……….……………………………………………………………………...
6 List of
abbreviations………………………………………………………………………....7 1.
Introduction………………………………………………………………………… 8
1.1. Overview ……….…………….……………………………………………... 9
1.2. Problem formulation………...………………………………………………...9
1.3. Objectives……….…………………………………………………………...10
1.4. Deep Learning…..……….…….……………………………………………..11
1.5. CNN...……………….…….…….………………………………………… 11
1.6. Significance of Deep Learning in Face Recognition …….
………………… 12
2. Related Work……….……………………………………………………………...13
2.1. Review of datasets...…………………………………………………….…...13
2.2. Review of studies...….……....……………..……...…………………………15
2.3. Limitations of existing work……...….………………………………………16
3. Proposed Methodology……….……………...…………………………………… 17
3.1 System Overview...........................................................................................17
3.2 Face Detection................................................................................................18
3.3 Face Alignment................................................................................................18
3.4 Facial Feature Amplication.............................................................................19
3.5 Feature Detection.............................................................................................20
3.6 Face Recogntion...............................................................................................20
3.7 Real-Time
Processing....................................................................................... 21
3.8 Accuracy
Improvement…………………………………………………..22 3.9
Challenges and
Considerations………………………………………………….22
4. Bibliography……………………………..….……….………………………………23
LIST OF FIGURES
1.1 Overview
The key challenge in face recognition lies in the variation of faces due to
factors like lighting, facial expressions, pose, age, and occlusions (such
as glasses or hats). The ability to accurately identify faces in such varied
conditions requires a system that can adapt and generalize well to
different scenarios. Machine learning, and particularly deep learning, has
emerged as the most effective approach to address these challenges. By
learning from large datasets, face recognition models can adapt to these
variations and identify faces with remarkable accuracy.
This project aims to develop a robust face recognition system that can
overcome these challenges. By leveraging deep learning techniques,
particularly Convolutional Neural Networks (CNNs), this system is
designed to detect, align, and recognize faces in images or video
streams. The system aims to be scalable, efficient, and accurate,
ensuring that it performs well under real-world conditions such as
variations in lighting, pose, and partial occlusions.
In the context of face recognition, CNNs are used for feature extraction,
where the network learns to identify key facial features that distinguish
one individual from another. The typical architecture of a CNN consists of
several convolutional layers, pooling layers, and fully connected layers,
which work together to extract and classify features from the input
image. This makes CNNs particularly well-suited for tasks like face
recognition, where the features to be identified are spatially dependent
and highly complex.
2. VGGFace
◦ Size: 2.6 million images
◦ Subjects: 2,622 individuals
◦ Purpose: The VGGFace dataset is one of the largest publicly
available datasets for face recognition, used for training deep
learning models. It contains a wide range of faces captured in
various conditions.
◦ Challenges: It contains images from both professional and
casual settings, and the images cover a broad variety of face
expressions, age groups, and ethnic backgrounds.
3. MS-Celeb-1M
◦ Size: 10 million images
◦ Subjects: 100,000 individuals
◦ Purpose: The MS-Celeb-1M dataset is one of the largest face
recognition datasets in terms of the number of images. It is
designed to support large-scale face recognition systems.
◦ Challenges: The large size of the dataset allows the model to
be trained on diverse data, but it also presents challenges
related to data cleaning and ensuring that the identities are
correctly labeled.
Numerous approaches have been proposed for face recognition over the
years, ranging from traditional methods to deep learning-based
techniques. This section provides an overview of the evolution of face
recognition systems.
1. Traditional Methods
Early face recognition systems relied on methods like Eigenfaces
and Fisherfaces, which performed dimensionality reduction to
represent faces in a lower-dimensional space:
◦ Eigenfaces (Principal Component Analysis - PCA) reduced the
dimensionality of face images by projecting them onto a
lower-dimensional space. These methods were effective in
controlled environments but struggled with variations in
lighting, pose, and expression.
◦ Fisherfaces (Linear Discriminant Analysis - LDA) aimed to
find a lower-dimensional representation that maximized class
separability. While Fisherfaces improved performance over
Eigenfaces, they still lacked robustness against variations in
real-world scenarios.
Advances
More recent advancements have focused on improving face
recognition in challenging real-world conditions, such as varying
lighting, pose, and occlusion.
◦ MTCNN (Multi-task Cascaded Convolutional Networks):
This model is used for both face detection and alignment. It
performs exceptionally well in detecting faces in images with
various poses, lighting conditions, and occlusions. The MTCNN
approach involves multiple stages, including a proposal
network, a refinement network, and an output network for
accurate bounding box predictions.
◦ ArcFace: ArcFace is a more recent model that uses additive
angular margin loss to improve the accuracy of face
recognition. It introduced a new loss function that helps to
separate classes in the embedding space, resulting in better
performance in real-world recognition tasks.
1. Face Detection
2. Face Alignment
3. Feature Amplification
4. Feature Extraction
5. Face Recognition
6. Real-Time Processing
Face detection is the first step in the face recognition pipeline. This task
involves locating one or more faces within an image or video frame and
creating a bounding box around each detected face. The system needs
to handle various challenges, including different poses, lighting
conditions, and occlusions.
The face detection process is carried out using Haar cascades and deep
learning-based models such as MTCNN (Multi-task Cascaded
Convolutional Networks), which can detect faces with high accuracy
under a wide range of conditions. MTCNN works by performing three
stages of processing:
Once faces are detected, the next step is face alignment. Face
alignment involves rotating and scaling the detected faces to a
consistent orientation and size, ensuring that the model can extract
meaningful features regardless of the original pose or size of the face.
After the facial Alignment, Region based amplification takes place. In this
process, the facial regions such as Eyes, Nose, etc. that hold high
discriminative power, are amplified.
To isolate specific regions, first detect facial landmarks. Tools for this include:
Once you have landmarks, you can extract regions like eyes, nose, and mouth.
Attention-Driven Amplification
Use a CNN for each region (e.g., one branch for eyes, another for mouth).
Fuse the outputs at a later stage for final embeddings.
Handle Occlusions
If parts of the face (e.g., the mouth under a mask) are obscured, amplify
visible regions like eyes and nose.
Use adaptive masking to focus on unoccluded areas.
3.5 Feature Extraction
After feature extraction, the next step is face recognition, where the
goal is to identify the person by comparing the extracted features to a
database of known faces. To do this, the system performs a similarity
comparison between the features of the detected face and those in the
stored database.