DL Mid
DL Mid
Deep Learning
Mid Term
Instructor:
Submitted by:
Contents
Introduction .......................................................................................................................... 3
Methodology .......................................................................................................................... 6
Part A ................................................................................................................................ 8
Part B ................................................................................................................................ 8
Conclusion........................................................................................................................... 14
References ........................................................................................................................... 15
3
Introduction
Object detection is a critical task in computer vision which enables machines to identify and
locate objects within images or video frames. Among the various object detection methods, the
You Only Look Once (YOLO) model stands out for its real-time performance and high
single pass, making it exceptionally fast compared to traditional detection systems that rely on
The YOLO model operates by dividing the input image into a grid and predicting bounding
boxes and class probabilities for each grid cell. This approach allows YOLO to detect multiple
objects of different classes in a single inference step, making it well-suited for real-time
In this report, we delve into the application of YOLO for face detection, a task crucial for
In this task, we begin by obtaining a face detection dataset from Kaggle, which contains
annotated images with bounding boxes around faces. We then implement a basic
YOLOv3 model using OpenCV and fine-tune it to detect faces specifically. The
objective is to update the fully connected component of the YOLO model to specialize
in detecting human faces. We train the model using the provided dataset and evaluate
Throughout both tasks, we aim to not only achieve accurate face detection but also optimize
the models for deployment on resource-constrained devices, such as mobile phones or edge
computing platforms. By leveraging the capabilities of YOLO and customizing it for face
detection, we seek to address the unique challenges posed by this task and pave the way for
Experiment Description
In this experiment, we aim to utilize the YOLO model for face detection. The basic YOLOv3
model is trained to detect a wide range of objects across 80 different classes. However, for our
specific task of face detection, we need to adapt the model to detect faces exclusively.
For Part A, we start by downloading a face detection dataset from Kaggle. We then implement
a basic YOLOv3 model and fine-tune it to detect faces using the provided dataset. The model
is modified to update only the fully connected component to specialize in face detection.
personalized version. We explore the impact of removing certain pre-trained layers from the
original YOLO network and contrast it with a single multi-layer perceptron. Additionally, we
aim to reduce the number of trainable parameters while maintaining or improving performance.
6
Methodology
In Part A of our experiment, we commenced by preparing the face detection dataset obtained
from Kaggle. After organizing and preprocessing the dataset, including resizing images to a
consistent resolution, we proceeded to implement the basic YOLOv3 model using the OpenCV
library. We initialized the model with pre-trained weights and modified the fully connected
component to focus exclusively on detecting human faces. The model was trained using the
annotated dataset, where we optimized the model parameters iteratively to minimize the
detection loss.
For Part B, which involved the development of a personalized YOLO model for face detection,
we adopted a similar approach but with additional considerations for model modification and
optimization. Given the substantial size of the face detection dataset, consisting of over 60,000
images, we opted to train the model on batches of 3000 images to manage computational
resources efficiently. This batch-wise training strategy allowed us to iteratively update the
model parameters while monitoring performance metrics to ensure convergence. During model
streamline the model for face detection by removing unnecessary layers, adjusting network
parameters, and introducing novel components tailored specifically to the task at hand.
the model architecture and tracked the impact on performance metrics to inform our decision-
making process.
of the personalized YOLO model on a separate validation set. We compared the performance
of the personalized model with the basic YOLO implementation, assessing key metrics such as
detection accuracy, inference speed, and model size. Additionally, we analyzed the trade-offs
7
between model complexity and performance to identify the optimal configuration for face
the effectiveness and efficiency of the personalized YOLO model for real-world applications,
laying the groundwork for future research and development in the field of computer vision and
object detection.
8
Part A
The basic YOLO implementation successfully detects faces in images from the provided
dataset. However, some images may exhibit occlusions or complex backgrounds, leading to
false positives or missed detections. Overall, the model demonstrates promising results in
Part B
In Part B, our personalized YOLO model for face detection demonstrated strong performance,
accurately localizing human faces with precise bounding boxes. Despite training on batches of
3000 images from the large dataset, totalling over 60,000 images, the model showcased
An essential aspect of face detection is the confidence scores associated with each detection.
These scores reflect the model's certainty in its predictions, ranging from 0 to 1. We observed
that clear and well-defined faces yielded higher confidence scores, while occluded or
ambiguous faces resulted in lower scores, highlighting the model's uncertainty in challenging
scenarios. The confusion matrix for validation shows that the model performed well on the
validation set. All the non-zero values are on the diagonal, indicating that the model correctly
The following figures show the original image, the image with faces bounded in rectangles,
Conclusion
In conclusion, we have demonstrated the effectiveness of using YOLO for face detection tasks.
By fine-tuning the basic YOLOv3 model and developing a personalized version, we achieve
accurate and efficient face detection capabilities. These models have various applications in
security, surveillance, and human-computer interaction. Future work may involve further
optimizing the personalized YOLO architecture and exploring additional enhancements for
References
[1] F. Gurkan, B. Sagman and B. Gunsel, "YOLOv3 as a Deep Face Detector," 2019 11th
International Conference on Electrical and Electronics Engineering (ELECO), Bursa,
Turkey, 2019, pp. 605-609, doi: 10.23919/ELECO47770.2019.8990641