DL MiniProject
DL MiniProject
A PROJECT REPORT ON
BACHELOR OF ENGINEERING
Computer Engineering
BY
PVPIT
DEPARTMENT OF COMPUTER ENGINEERING
Padmabhooshan Vasantdada Patil Institute of Technology
Bavdhan, Pune 411021
SAVITRIBAI PHULE PUNE UNIVERSITY
2023-2024
CERTIFICATE
This is to certify that Take Swapnil Rajendra has completed the Project
Report work under my guidance and supervision and that, I have verified the work
for its originality in documentation, problem statement, and results presented in
the project. Any reproduction of other necessary work is with prior permission and
has given due owner- ship and is included in the references.
Place:
Date: (Prof. R. C. Pachhade )
ACKNOWLEDGEMENT
1 SYNOPSIS 2
1.1 Project Title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 TECHNICAL KEYWORDS 3
2.1 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Area of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 INTRODUCTION 4
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Methodolgy 7
4.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Architecture 10
5.1 Proposed Architecture...........................................................................................10
5.2 Statistical Shape Models.......................................................................................11
5.3 Convolutional Layer...............................................................................................11
5.3.1 Convolutional..............................................................................................12
5.3.2 Max-Pooling................................................................................................12
5.3.3 Fully-Connected..........................................................................................12
5.4 Pooling layer............................................................................................................12
5.5 Batch Normalization..............................................................................................13
5.6 Rectified layer Unit................................................................................................13
5.7 Batch size.................................................................................................................13
5.8 Epochs......................................................................................................................14
1
Chapter 1
SYNOPSIS
2
Chapter 2
TECHNICAL KEYWORDS
3
Chapter 3
INTRODUCTION
3.1 Introduction
The implementation of human face recognition has gained significant
attention in recent years due to its wide range of applications in various fields such
as security, surveil- lance, biometrics, and human-computer interaction. Human
face recognition refers to the automated identification or verification of individuals
based on their facial features. With the advancements in computer vision, image
processing, and machine learning techniques, human face recognition systems have
become more accurate, reliable, and efficient. These systems use a combination of
algorithms, models, and datasets to extract and analyze facial features from images
or video streams. The primary goal of implementing human face recognition is to
develop a system that can accurately identify or verify individuals in real-time
scenarios. This involves capturing or obtaining facial images, detecting fa- cial
landmarks, extracting relevant features, and comparing them against a database of
known faces. The system then matches the captured face with the stored
representations to determine the identity of the person.
4
Face recognition is a visual pattern recognition problem. In detail, a face
recognition system with the input of an arbitrary image will search in database to
output people’s identification in the input image. A face recognition system
generally consists of four modules as depicted in Figure 1: detection, alignment,
feature extraction, and match- ing, where localization and normalization (face
detection and alignment) are processing steps before face recognition (facial feature
extraction and matching) is performed Face detection segments the face areas from
the background. In the case of video, the de- tected faces may need to be tracked
using a face tracking component. Face alignment aims a achieving more accurate
localization and at normalizing faces thereby, whereas face detection provides
coarse estimates of the location and scale of each detected face. Facial components,
such as eyes, nose, and mouth and facial outline, are located; based on the location
points, the input face image is normalized with respect to geometrical properties,
such as size and pose, using geometrical transforms or morphing.
Methodolgy
4.1 Preprocessing
A large dataset of face images is collected, including images of different individuals
and under different lighting and pose conditions. Data preprocessing: The face
images are preprocessed to remove noise, align the faces, and normalize the
illumination. Feature extraction: The preprocessed face images are then fed into a
deep neural network to extract high-level features that capture the important
characteristics of a face. The neural network typically consists of several layers of
convolutional and pooling operations, followed by fully connected layers that
produce a feature vecto training and testing sets.
4.4 Training
The extracted features are then used to train the neural network to distinguish
between different faces. This is typically done using a supervised learning
approach, where the network is trained on a labeled dataset of face images and
their corresponding identities.
4.5 Testing
After the neural network has been trained, it can be tested on a separate dataset
to evaluate its performance. This typically involves measuring the accuracy of the
network in correctly identifying the individuals in the test dataset.
4.6 Deployment
Once the neural network has been trained and tested, it can be deployed in a real-
world application for face recognition. This typically involves capturing a face
image, preprocessing it, and then feeding it into the neural network to obtain a
feature vector. The feature vector is then compared to a database of known faces to
determine the identity
of the individual in the image. Overall, human face recognition using DNNs is a
complex process that requires a large amount of data, sophisticated neural network
architectures, and careful preprocessing and training. However, with the increasing
availability of large datasets and powerful computing resources, DNN-based face
recognition systems have become increasingly accurate and effective in real-world
applications.
Chapter 5
Architecture
10
5.2 Statistical Shape Models
A face shape can be represented by points as a -element vector, . Given s training
face images, there are shape vectors . Before we can perform statistical analysis on
these vectors, it is important that the shapes represented are in the same
coordinate frame. Figure 5 illustrates shape model.
5.3.2 Max-Pooling
After each convolutional layer, there may be a pooling layer. The pooling layer
takes small rectangular blocks from the convolutional layer and subsamples it to
produce a single output from that block. There are several ways to do this pooling,
such as taking the average or the maximum, or a learned linear combination of the
neurons in the block. Our pooling layers will always be max-pooling layers; that is,
they take the maximum of the block they are pooling.
5.3.3 Fully-Connected
Finally, after several convolutional and max pooling layers, the highlevel reasoning
in the neural network is done via fully connected layers. A fully connected layer
takes all neurons in the previous layer (be it fully connected, pooling, or
convolutional) and connects it to every single neuron it has. Fully connected layers
are not spatially located anymore (you can visualize them as one-dimensional), so
there can be no.
5.8 Epochs
The number of epochs denotes how many times the entire dataset has passed
forward and backward through the neural network, i.e., one epoch is when every
image has been seen once during training. Nevertheless, this concept should not be
confused with iterations. The number of iterations corresponds to the total number
of forward and backward passes, with each pass using a batch and depends on the
6.1 Dataset
We used a publicly available face recognition dataset called ”Labeled Faces in the
Wild” (LFW) for our project. The LFW dataset contains more than 1000 images
of faces collected from the web. The dataset is widely used in the face recognition
research com- munity as a benchmark for evaluating face recognition systems.
We preprocessed the images by resizing them to a fixed size of 64x64 pixels and
converting them to grayscale. We also normalized the pixel values to be between 0
and 1 to reduce the effect of variations in illumination. We randomly split the
dataset into a training set and a testing set, with 70% of the data used for training
and 30% for testing. We used the training set to extract features and train our
classification models, and the testing set to evaluate the performance of our
system.
Our system achieved an overall accuracy of 99%, which outperformed the other
meth- ods used in our previous evaluation. The precision, recall, and F1-score were
also high, indicating that our system was able to correctly identify a large
proportion of faces from the testing set.
CONCLUSION