final project 3
final project 3
INTRODUCTION
Facial recognition systems are advanced biometric solutions that identify and verify individuals by
analysing unique facial features. These systems are widely used in security, authentication, and
surveillance due to their contactless and efficient nature. The subsequent sections provide a detailed
overview of facial recognition workflows, including data acquisition, preprocessing, feature
extraction, and classification, along with insights into system performance, challenges, and future
improvements.
1.1 FACIAL RECOGNITION SYSTEMS
The primary purpose of this project is to design and implement a facial recognition system that
enhances security, improves user experience, and demonstrates the practical application of biometric
authentication in various domains. With the growing reliance on digital systems and sensitive data,
traditional security measures such as PINs, passwords, and physical keys have proven insufficient in
addressing modern security challenges. This project aims to address these shortcomings by
leveraging facial recognition technology as a robust, efficient, and user-friendly solution for
authentication and access control.
Another important purpose of the project is to demonstrate the efficiency and convenience of
contactless and non-intrusive authentication. In today’s fast-paced world, users demand solutions that
are not only secure but also seamless and easy to use. This project highlights the ability of facial
recognition technology to meet these demands. Unlike fingerprint scanning or physical tokens, facial
recognition requires no physical interaction with devices, ensuring a hygienic and hassle-free
process. Users can be authenticated simply by looking at a camera, making the system both user-
friendly and time-efficient.
Additionally, the project aims to showcase the versatility of facial recognition technology across a
wide range of applications. From securing financial transactions in the banking sector to enhancing
surveillance systems in public safety, the potential use cases are vast. By implementing a practical
facial recognition system, this project seeks to demonstrate its applicability in real-world scenarios,
such as access control in workplaces, secure device unlocking, and identity verification in law
enforcement. These applications highlight the transformative potential of facial recognition
technology in improving operational efficiency and reducing reliance on vulnerable traditional
methods.
A key focus of the project is also to address potential challenges and limitations associated with
facial recognition technology. Issues such as privacy concerns, data protection, and the ethical use of
biometric data are critical considerations. This project aims to implement best practices for data
handling, ensuring that the system is compliant with privacy regulations and respects users’ rights.
By integrating transparency and security measures, it seeks to build trust in the technology and
promote its responsible use.
Ultimately, this project is driven by the goal of creating a facial recognition system that balances
security, efficiency, and ethical considerations. By providing a secure, accurate, and user-friendly
authentication solution, the project contributes to advancing biometric technology as a reliable
alternative to traditional methods. Through practical implementation, it demonstrates the potential of
facial recognition to revolutionize security systems and meet the evolving needs of modern society.
1.2 WORKFLOW
2
2. BACKGROUND FOR THE PROJECT
Facial recognition systems offer efficient and contactless authentication solutions, with their
applications and workflows discussed in subsequent sections.
2.1 INTRODUCTION
The rapid evolution of technology has fundamentally changed the way people interact with systems
and safeguard sensitive information. In recent years, the growing reliance on digital platforms, cloud
computing, and connected devices has amplified the need for robust security mechanisms.
Traditional security methods such as passwords, PINs, and physical keys, while still widely used,
have shown critical vulnerabilities. Passwords can be easily guessed, hacked, or forgotten, while
physical tokens can be stolen, duplicated, or misplaced. These limitations have driven the demand for
more advanced, reliable, and user-centric solutions, leading to the emergence of biometric
technologies like facial recognition.
Facial recognition technology is based on identifying and verifying individuals by analyzing their
unique facial features. Unlike other biometric methods such as fingerprint or retina scanning, facial
recognition is non-intrusive and does not require physical interaction with devices, making it both
convenient and hygienic. This has made it a preferred choice for many applications, ranging from
personal device security to large-scale surveillance systems. The roots of facial recognition
technology can be traced back to the 1960s, with early research focusing on manually identifying
facial landmarks. Over the decades, advancements in computer vision and artificial intelligence (AI)
have transformed it into a highly sophisticated tool capable of real-time identification and
authentication.
The adoption of facial recognition technology has expanded significantly across various sectors. In
the banking and financial industries, it is used to verify customers' identities, prevent fraud, and
enable secure transactions. Law enforcement agencies rely on it to identify suspects, solve cases, and
enhance public safety. The technology has also found widespread use in consumer electronics, such
as smartphones and laptops, providing users with a seamless way to unlock devices and access
personal data securely.
Despite its growing popularity, the implementation of facial recognition is not without challenges.
Privacy concerns, data security, and ethical considerations are among the most pressing issues
associated with this technology. Unauthorized use of facial data, biases in algorithms, and potential
misuse for surveillance have raised questions about its impact on individual rights and freedoms.
These concerns have spurred research into improving the fairness, accuracy, and transparency of
3
facial recognition systems while ensuring compliance with data protection regulations such as GDPR
and CCPA.
This project is built on the premise of addressing both the opportunities and challenges of facial
recognition technology. It aims to develop a secure, efficient, and user-friendly system for
authentication and access control, highlighting the practical applications of the technology in real-
world scenarios. By integrating advanced AI algorithms and adhering to ethical practices, the project
seeks to demonstrate the potential of facial recognition to enhance security while maintaining user
trust and privacy.
The project's background is rooted in the increasing need for robust security solutions in an
interconnected world. It acknowledges the growing threats to traditional authentication methods and
leverages the unique strengths of facial recognition to create a system that is not only secure but also
adaptable to a wide range of applications. By addressing the challenges and showcasing the benefits,
the project aims to contribute to the responsible advancement of biometric technology.
4
Sr No. Title Year Dataset Pre-Processing Model Results
Techniques
1. Viola, P., & Jones, M. (2001). A 2001 Self- None (as Haar Proposed a highly
Revolutionary Approach to Real- constructed detailed in the Cascade efficient and
Time Object Detection Using dataset of literature) Classifier computationally
Integral Images, Boosted Cascade
faces and non- Multi-site lightweight
Classifiers, and Simple Features.
faces Harmonization framework for
Data object detection,
Normalization capable of
achieving real-time
detection speeds
with high accuracy.
2. Turk, M., & Pentland, A. (1991). 1991 Yale Face PCA for PCA for Applied PCA for
The Birth of Eigenfaces: Principal Database dimensionality dimensionality face recognition,
Component Analysis-Based reduction reduction innovatively
Dimensionality Reduction for
projecting images
Face Recognition Applications
Spatial into a reduced
Normalization subspace for
efficient
Temporal classification.
Filtering
3. Zhang, K., Zhang, Z., Li, Z., & 2016 Database 92 Functional MTCNN Applied PCA for
Qiao, Y. (2016). A Comprehensive individuals Connectivity face recognition,
Multitask Model for Simultaneous with Typical Analysis innovatively
Face Detection and Alignment
Development projecting images
Using Cascaded Convolutional
(TD) Augmentation into a reduced
Networks
via DCGAN subspace for
efficient
classification.
4. Ahmad, I., & Aftab, S. (2019). 2009 ABIDE Brain Atlas- Bidirectional Accuracy:
Face Recognition Utilizing based Long Short- CC200 Atlas:
MATLAB with a Hybrid PCA and Parcellation Term Memory 70.92%
LDA Framework for Enhanced
Bidirectional AAL Atlas: 68.72%
Dimensionality Reduction and
Long Short-
Discriminative Power
Spatial- Term Memory
Temporal Data Graph
Representation Convolutional
Networks
(GCNs)
5
5. Deng, X., Zhang, L., Liu, R., 2023 ABIDE I/II Atlas-based Bidirectional Accuracy:
Chen, Z., & Xiao, B. (2023). ST- Parcellation Long Short- CC200 Atlas:
ASDNET: A BLSTM-FCN- Time-Series Term Memory 70.26%.
Transformer based ASD
Extraction: Transformer AAL Atlas: 67.80%
Classification Model for Time-
Extracted Full
series fMRI [22].
signals Convolutional
parcellated Network
region to create Transformer
spatial- (FCN-
temporal data Transformer
representations.
6
1. Integration of Cost-Effective Hardware:
o a. Unlike traditional face detection systems requiring high-end hardware, this project
uses the ESP32-CAM, a cost-efficient module, for real-time processing and detection.
o b. The system demonstrates the feasibility of implementing advanced technologies on
resource-constrained devices.
2. Real-Time Face Detection on Embedded Systems:
o a. The project leverages lightweight algorithms optimized for edge devices to deliver
real-time performance without relying on external servers.
o b. Preprocessing techniques, such as noise reduction and edge detection, enhance
detection accuracy even in constrained environments.
3. Scalable and Modular Design:
o a. The system's modular architecture allows integration with other security systems,
such as solenoid locks and alarm triggers.
o b. It provides a foundation for expanding capabilities, such as emotion recognition or
mask detection, in the future.
This project demonstrates the novelty of combining advanced computer vision techniques with cost-
effective hardware, providing a practical and scalable solution for real-time face detection
applications.
7
3. WORKFLOW
We have followed a sequential approach from preprocessing the data to implementing the hardware.
The workflow followed in this project is summarized in the below image.
Processor:
The ESP32 microcontroller features dual-core processing power with a clock speed of up to
8
240 MHz. It supports multiple peripherals and handles communication with external
components efficiently.
Camera Module:
The OV2640 camera provides a resolution of up to 2 megapixels and supports various
resolutions like VGA, SVGA, and UXGA. It is optimized for low-light environments and
wide-angle applications, ensuring clear images in diverse conditions.
Onboard Flash:
It includes a built-in flash for illumination in low-light settings, which can be controlled
programmatically. This feature enhances image quality during face detection.
Connectivity:
o Wi-Fi: Enables wireless data transmission to external servers or applications.
Applications:
o Captures real-time facial images.
o Sends processed data to external devices or servers via Wi-Fi for further analysis.
Capacitors:
o Electrolytic Capacitors: Smooth out the DC supply and minimize ripple voltage.
3. Relay Module
The relay module acts as an intermediary between the ESP32 and high-power hardware components
such as solenoid locks or alarms. It provides electrical isolation and allows the ESP32-CAM to
control devices operating at different voltage levels.
Output Side:
o Switches higher voltage devices (e.g., solenoid locks operating at 12V).
10
Applications:
o Controls solenoid locks for access management.
Connections:
o TX/RX Pins: Connect to the RX/TX pins of the ESP32-CAM for data transmission
and reception.
o GND: Common ground between the ESP32-CAM and the programmer.
Mode Selection:
o The ESP32-CAM must be set to bootloader mode for flashing. This is achieved by
connecting the GPIO0 pin to GND during reset.
Applications:
o Essential for the initial setup and configuration of the ESP32-CAM.
11
o Sturdy and tamper-resistant design ensures security.
Types:
o Fail-safe locks: Remain locked during power loss.
Power Requirements:
o Requires sufficient current to activate the locking mechanism, provided by the relay
and stabilized power source.
Applications:
o Used in smart doors, cabinets, or secure enclosures.
Design Features:
o Cutouts for the camera lens and LEDs for unobstructed functionality.
12
o Secure mounting points for installation on walls, doors, or kiosks.
Applications:
o Indoor and outdoor setups for smart home security, offices, or industrial use.
13
o The small onboard PCB antenna, while compact, was sensitive to interference from
other electronic devices operating in the same frequency band (2.4 GHz).
o Poor alignment of the ESP32-CAM’s antenna with the Wi-Fi access point degraded
signal strength.
o Environmental factors like walls, metal objects, and electronic noise also contributed
to packet losses.
Solution:
o Antenna Optimization: The ESP32-CAM module was oriented to ensure a direct
line of sight to the Wi-Fi router or access point whenever possible. Additionally, the
antenna's position was carefully adjusted to maximize signal strength.
o External Antenna (Optional): An external IPEX connector and antenna can be used
to improve range and signal reliability significantly.
o Reduced Interference: The system was tested in various locations to identify and
minimize sources of interference. Moving the ESP32-CAM away from high-noise
devices (e.g., microwaves, Bluetooth devices) improved performance.
2. Heat Dissipation
Prolonged operation of the ESP32-CAM, especially under high processing loads (e.g., continuous
face detection), led to overheating. This could degrade performance or even cause system instability.
Issue:
Overheating affected the ESP32 module's performance, resulting in throttling or unexpected
resets during extended usage. The compact design of the ESP32-CAM limited passive
cooling options, leading to heat buildup within the enclosure.
Analysis:
o Heat was generated primarily by the ESP32’s dual-core processor during intensive
computations, Wi-Fi transmissions, and image processing tasks.
o The absence of proper ventilation and heat dissipation mechanisms compounded the
problem.
Solution:
o Thermal Insulation: Thermal pads were placed between the ESP32 module and the
enclosure to transfer heat away from the board.
o Improved Enclosure Design: Ventilation slots were added to the enclosure to
facilitate airflow and dissipate heat more effectively.
o Placement Optimization: The ESP32-CAM was mounted in a well-ventilated area,
reducing heat accumulation.
14
o Optional Active Cooling: For demanding applications, a small heat sink or microfan
can be attached to the module to enhance cooling further.
3. Power Consumption
Consistent power delivery is critical for the ESP32-CAM to maintain stable operation. Power
fluctuations can lead to inconsistent performance, system resets, or damage to the hardware.
Issue:
Voltage drops and spikes caused by inconsistent power delivery affected the ESP32-CAM’s
performance, particularly during high-current operations like image transmission or
activating external peripherals.
Analysis:
o The ESP32-CAM requires a stable 5V supply but is sensitive to variations in the input
voltage.
o Sudden power demands from peripherals, such as activating a relay or solenoid lock,
could momentarily disrupt the power supply.
o Noise in the power line caused additional instability.
Solution:
o Voltage Regulation: A 7805 linear voltage regulator was used to ensure a steady 5V
output. This regulator efficiently handled input voltages between 7V and 12V,
providing a consistent supply to the ESP32-CAM.
o Capacitor Filtering:
Electrolytic capacitors (e.g., 470 μF) were placed across the input and output
of the regulator to smooth out voltage ripples.
Ceramic capacitors (e.g., 0.1 μF) were added to filter high-frequency noise.
o Power Distribution: A separate power line was dedicated to high-current peripherals
like the solenoid lock to reduce the load on the ESP32-CAM’s supply.
o Backup Power: For critical applications, a battery backup or uninterruptible power
supply (UPS) was recommended to handle power outages or voltage dips.
15
2. Face Cropping: Face cropping involves extracting the region of interest (ROI)
corresponding to the detected face. The bounding box coordinates are used to isolate the face
from the frame. This cropped region is then used for further analysis.
3.
Image Resizing: Image resizing ensures uniformity by resizing the cropped face image to a
consistent dimension, such as [227, 227]. This step is crucial for accurate classification. A
Haar classifier is used for face detection before resizing.
4. File Naming and Saving: Each pre-processed image is saved with a unique name, like
0.bmp, 1.bmp, and so on. This makes it easier to label and organize the images for
classification. The files are stored for quick and easy access later.
5. Creating dataset: Organize the saved images into directories or datasets suitable for training
and testing a classification model. Label them according to the classification categories.
Regions of interest
A Region of Interest (ROI) is a specific part of an image where the algorithm focuses to find a
particular object, like a face. Instead of analysing the entire image at once, the classifier looks at
smaller sections of the image to detect patterns or features that match the object it’s trained to
recognize.
How the classifier looks for ROI:
The Haar classifier divides the image into small rectangular windows and analyses these for
specific patterns, called Haar features, which represent variations in pixel intensity, such as
edges, lines, and corners.
When detecting objects like faces, it looks for characteristic patterns, such as dark regions for
eyes, lighter areas for the forehead, and darker regions for the mouth.
To detect objects of different sizes, the algorithm scales the window to examine both small
and large regions.
It uses a cascading process, starting with simple pattern checks (e.g., edges or contrasts) to
quickly discard non-relevant sections.
Only windows that pass initial checks undergo further analysis with more complex patterns,
making the detection process efficient and accurate.
16
Fig. 11 Feature extraction
About Alexnet:
AlexNet is a deep convolutional neural network (CNN) designed for image classification
tasks.
It consists of 8 layers: 5 convolutional layers followed by 3 fully connected layers.
The model uses ReLU (Rectified Linear Unit) activation functions, which helped it achieve
faster training compared to previous models.
It employs data augmentation and dropout to reduce overfitting during training.
17
Steps involved in training:
1. Load the pretrained AlexNet model: AlexNet is a pre-trained deep convolutional neural
network for image classification. It includes layers optimized for general image datasets, such
as ImageNet. This makes it suitable for various image recognition tasks.
2. Modifying network architecture: The layers of AlexNet are stored in g.Layers. The fully
connected layer (layer 23) is replaced with a new one to match the number of classes, such as
fullyConnectedLayer(2) for binary classification. The classification layer (layer 25) is
updated for the custom classification task.
3. Load and Label dataset: The image Datastore function loads all images from the specified
directory (data storage) and automatically assigns labels based on the folder names. This
ensures the data is correctly organized for training.
4. Set Training Options: The optimizer used is Stochastic Gradient Descent with Momentum
(SGDM), which helps the network converge faster and avoid local minima. The initial learn
rate is set to 0.001, controlling how much the weights are adjusted during each update. The
max epochs are set to 20, meaning the network will train on the full dataset 20 times. The
mini batch size is 64, specifying how many images are processed together in one training
step.
5. Train the Network: It involves forward propagation, where input images are passed through
the network to make predictions, and backpropagation, where weights are adjusted based on
errors and gradients to minimize classification errors. This process is repeated iteratively to
improve the model's performance.
3.3 IMAGE DETECTION
After capturing and training the images, the face detection process begins by using the trained model,
such as AlexNet, to classify new images. The captured image is first pre-processed, including
resizing to the required dimensions. Then, the model is used to predict the class of the image by
passing it through the network, which processes the features learned during training. The output is
the predicted label or classification, indicating the detected object or face in the image. This process
allows real-time face detection based on the model's learned features and patterns.
18
Keysteps involved in Image Detection:
1. Initialize Webcam: We initialize the webcam to capture images of the subject.
2. Load the trained model: This loads the previously trained data model using MATLAB, for
further processing and decision making.
3. Initialize face detector: This initializes a pre trained face detection algorithm to detect faces
in the image.
4. Capture Image in a Loop: The loop continuously captures the images until a suitable match
is found/ not found.
5. Face detection: The face detector identifies faces in the image and provides the coordinates
of their locations. If a face is detected, the region containing the face is cropped, resized to a
standard size, and then prepared for classification.
4. Implementation Details
o Frame Rate: Balanced to support real-time detection without overloading the system.
19
Fig. 14 ESP 32 Interface
20
Feature Localization:
The face detection algorithm identifies bounding boxes around the face. These regions are
cropped and further processed to focus only on relevant areas.
Dimensionality Reduction:
Cropped facial regions are resized and normalized, reducing dimensionality while preserving
key features like eyes, nose, and mouth.
Real-Time Processing:
Extracted features are immediately passed to the classification model, allowing for live
detection and decision-making.
Implementation:
The classifier processes grayscale images to detect faces by scanning for specific Haar-like
features. A cascade of stages progressively eliminates non-face regions, resulting in fast and
accurate detection.
21
4.4.2 MODEL 2 – TensorFlow Lite Implementation
A TensorFlow Lite model is used for real-time face detection and classification, leveraging the
ESP32-CAM’s processing capabilities.
Feature Reduction Using Autoencoders:
An autoencoder compresses high-dimensional images into reduced representations for
efficient processing.
o Encoder: Includes dense layers to extract key features from images, using ReLU
activation and a bottleneck layer for compression.
o Decoder: Reconstructs images for validation during training, ensuring meaningful
features are retained.
Face Detection Model:
The reduced features are input into a lightweight CNN classifier optimized for detecting
facial patterns.
o Architecture: Includes convolutional layers with max-pooling for feature extraction
and fully connected layers for classification.
o Output: A sigmoid-activated layer predicts binary probabilities, indicating the
presence or absence of a face.
Training Details:
o Loss Function: Binary cross-entropy ensures accurate classification.
22
3. Scalability:
The modular design allows for integrating advanced models or additional sensors if needed.
Precision:
The fraction of true positives among all positive detections, indicating how reliable the
detected faces are.
Recall (Sensitivity):
The fraction of actual faces correctly detected, reflecting the system's ability to identify all
faces.
F1 Score:
The harmonic mean of precision and recall, providing a balanced measure of the model's
performance.
23
The face detection model was trained using real-time data collected via the ESP32-CAM.
Training Dataset:
Consisted of a mix of images captured in varied lighting and environmental conditions to
ensure robustness.
Model Architecture:
A lightweight CNN model was implemented, with training performed on a 70-30 split of the
dataset.
Optimization:
The model was trained over 25 epochs using the Adam optimizer with a learning rate of
0.001. Loss was minimized using binary cross-entropy, and early stopping was employed to
prevent overfitting.
24
2. Lightweight Architecture: Optimized CNN design ensured compatibility with the ESP32-
CAM's limited resources.
3. Robust Preprocessing: Techniques like noise reduction and edge detection enhanced the
model's reliability.
Limitations:
1. Hardware Constraints: The ESP32-CAM's limited processing power restricted the complexity
of the model architecture.
2. Lighting Sensitivity: Performance slightly degraded under extremely low-light conditions.
3. Generalization: Testing was limited to locally collected data; broader datasets may reveal
additional challenges.
25
allowed for easy adaptability and integration into broader systems, such as smart locks, surveillance
tools, and contactless authentication setups.
This project exemplifies the potential of cost-effective IoT hardware like the ESP32-CAM in solving
real-world problems by delivering practical and scalable solutions. The combination of real-time
processing, minimal latency, and high reliability positions this system as an ideal candidate for
applications in security, monitoring, and smart automation.
6.1 Future Work
While the project achieved its primary goals, several areas of enhancement and expansion have been
identified to further improve the system's effectiveness and applicability:
1. Validation on Larger and Diverse Datasets
Expanding the system's training and testing datasets to include diverse conditions such as
varying facial expressions, age groups, ethnicities, and environmental lighting will help
improve the system's robustness and generalizability. Deploying the system in real-world
settings, such as offices, homes, and public spaces, can provide additional insights into its
performance across different scenarios.
2. Integration with Advanced Models
Exploring more advanced face detection algorithms, such as TensorFlow Lite
implementations or models like YOLO (You Only Look Once), could enhance detection
speed and accuracy. These models, optimized for edge devices, could address complex
scenarios, such as detecting multiple faces in crowded environments.
3. Low-Light Performance Enhancement
Addressing the limitations in low-light conditions by integrating infrared (IR) cameras or
utilizing advanced preprocessing techniques, such as histogram equalization and noise
suppression, can ensure consistent performance in poorly lit environments.
4. Multifunctional System Development
Beyond face detection, the system could be extended to include emotion recognition, mask
detection, or facial attribute analysis. This would broaden the scope of applications, making it
suitable for industries like healthcare, retail, and public safety.
5. Energy Optimization for Long-Term Operation
Optimizing the power management system for long-term use, including integrating solar
panels or high-efficiency batteries, could make the system more sustainable and ideal for
remote locations with limited power access.
6. Real-World Deployment
Developing a comprehensive, end-to-end product that includes hardware, firmware, and user-
friendly software interfaces would facilitate the system's deployment in real-world
applications. For example:
o Access Control: Integrating the face detection system with solenoid locks and cloud-
based authentication for secure smart lock systems.
26
o Surveillance: Deploying the system in smart security cameras for live monitoring and
automated alerts.
7. Edge Computing and Cloud Integration
Enhancing the system's capabilities by enabling hybrid edge-cloud architectures. The ESP32-
CAM can handle local processing for immediate responses, while the cloud can provide
advanced analytics, storage, and machine learning model updates.
8. Multifaceted Security Integration
The system can be combined with other security modalities, such as RFID and biometric
sensors, to create multi-factor authentication systems for improved access control.
9. Multiclass Classification
Upgrading the system to recognize multiple faces simultaneously or classify specific
attributes (e.g., age group, gender) will provide nuanced insights and expand the system's
utility.
10. User Feedback Loop
Incorporating feedback mechanisms to allow users to provide input on incorrect detections
can improve model accuracy over time. The system can learn and adapt dynamically, further
refining its performance.
6.2 Long-Term Vision
The advancements in face detection and related technologies can pave the way for a wide range of
applications across industries. The ultimate goal is to transition from a prototype to a market-ready
solution that combines affordability, scalability, and superior performance. By continuing to innovate
and address current limitations, this system has the potential to become a cornerstone of intelligent
automation and modern security systems.
27